Couple of things you can try. 1. Increase the Heap Size for the tasks. 2. Since, your OOM happening randomly, try setting -XX:+HeapDumpOnOutOfMemoryError for your child JVM parameters. Atleast you can detect, why your heap growing -is it due to a leak ? or if you need to increase the heap size for your mappers or reduces from this heap dump analysis.
3. Other reason is due to poor JVM GC tuning. Sometimes, default can't catchup with the garbage created. This needs some GC tuning. -Bharath From: common-user@hadoop.apache.org To: cascading-u...@googlegroups.com; core-u...@hadoop.apache.org Cc: Sent: Sunday, September 26, 2010 12:55:15 AM Subject: java.lang.OutOfMemoryError: GC overhead limit exceeded Greetings, I'm running into a brain-numbing problem on Elastic MapReduce. I'm running a decent-size task (22,000 mappers, a ton of GZipped input blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores"). I get failures randomly --- sometimes at the end of my 6-step process, sometimes at the first reducer phase, sometimes in the mapper. It seems to fail in multiple areas. Mostly in the reducers. Any ideas? Here's the settings I've changed: -Xmx400m 6 max mappers 1 max reducer 1GB swap partition mapred.job.reuse.jvm.num.tasks=50 mapred.reduce.parallel.copies=3 java.lang.OutOfMemoryError: GC overhead limit exceeded at java.nio.CharBuffer.wrap(CharBuffer.java:350) at java.nio.CharBuffer.wrap(CharBuffer.java:373) at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138) at java.lang.StringCoding.decode(StringCoding.java:173) at java.lang.String.(String.java:443) at java.lang.String.(String.java:515) at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116) at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144) at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154) at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101) at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75) at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33) at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74) at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34) at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142) at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586) -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science