Hi, "[..]if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the command line."
This is what often happens in MapReduce operations when u process a lot of data. I recommend to try <property> <name>mapred.child.java.opts</name> <value>-Xmx1024m -XX:-UseGCOverheadLimit</value> </property> also from my personal experience when process a lot of data often it is much cheaper to kill JVM rather than wait for GC. For that reason if you have a lot of BIG tasks rather than tons of small tasks do not reuse JVM, killing JVM and starting it again often much cheaper than trying to GC 1GB of ram(don't know why, it just tuned out in my tests). <property> <name>mapred.job.reuse.jvm.num.tasks</name> <value>1</value> </description> Regards, Vitaliy S On Sun, Sep 26, 2010 at 11:55 AM, Bradford Stephens <bradfordsteph...@gmail.com> wrote: > Greetings, > > I'm running into a brain-numbing problem on Elastic MapReduce. I'm > running a decent-size task (22,000 mappers, a ton of GZipped input > blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores"). > > I get failures randomly --- sometimes at the end of my 6-step process, > sometimes at the first reducer phase, sometimes in the mapper. It > seems to fail in multiple areas. Mostly in the reducers. Any ideas? > > Here's the settings I've changed: > -Xmx400m > 6 max mappers > 1 max reducer > 1GB swap partition > mapred.job.reuse.jvm.num.tasks=50 > mapred.reduce.parallel.copies=3 > > > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.nio.CharBuffer.wrap(CharBuffer.java:350) > at java.nio.CharBuffer.wrap(CharBuffer.java:373) > at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138) > at java.lang.StringCoding.decode(StringCoding.java:173) > at java.lang.String.(String.java:443) > at java.lang.String.(String.java:515) > at > org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116) > at > cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144) > at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154) > at > cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101) > at > cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75) > at > cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33) > at > cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74) > at > cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34) > at > cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142) > at > cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55) > at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) > at > org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136) > at > org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) > at > org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) > at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) > at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586) > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science >