Please see my answers inline ... On Mon, Apr 4, 2011 at 8:45 PM, Stack <[email protected]> wrote: > On Mon, Apr 4, 2011 at 2:30 AM, Bogdan Ghidireac <[email protected]> wrote: >> Is is possible to add a timeout and then force a System.exit() ? >> > > Yes. Of course. Sounds bad. How you think this scenario came about?
My M/R job reads from a table and creates a lot of data that is inserted into a second table. Because this new table is empty and I did not split the keys in advance, the region server where the first region was created is hit really hard (60-100K ops/sec). The OOM exception happens during this time, only for one or maybe two servers. The exception triggers a server shutdown... Once the initial region splits and the traffic is distributed, the problem does not happen any more. > Is the zk ensemble up and running still? The ZK ensemble is running fine. I have 3 zk servers running ZK 3.3.2. > Whats the last thing in this regionserver log? This is the RS log http://pastebin.com/Cvx8zS54 > Anything in the .out file? This is the System.out/err I http://pastebin.com/gNNVUzvZ > I've not seen this > before but, hey, the world is a wide and wonderful place. We could > run the zk close inside a thread and interrupt if it goes on too long > (Let me ask the zk boys if they've seen this before too). > I am subscribed to ZK list too and I have seen you email. I am using ZK 3.3.2 ... > St.Ack > Thank you, Bogdan
