In our case there were three things that helped:
- reduce *mapred.reduce.parallel.copies *(and incur some execution time penalty on large clusters) - increase the # of reducers (and just have the larger # of output files) - decrease the amount of map output (may or may not be possible in your case) Alex K On Fri, May 7, 2010 at 8:25 PM, Ted Yu <yuzhih...@gmail.com> wrote: > You need to lower mapred.job.shuffle.input.buffer.percent to 20% or 25%. > I didn't have time recently to find the root cause in 0.20.2 > > I was told that shuffle has been rewritten in 0.21 > You may give it a try. > > On Fri, May 7, 2010 at 8:08 PM, Bo Shi <b...@visiblemeasures.com> wrote: > > > Hey Ted, any further insights on this? We're encountering a similar > > issue (on CD2). I'll be applying MAPREDUCE-1182 to see if that > > resolves our case but it sounds like that JIRA didn't completely > > eliminate the problem for some folks. > > > > On Wed, Mar 10, 2010 at 11:54 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > I pressed send key a bit early. > > > > > > I will have to dig a bit deeper. > > > Hopefully someone can find reader.close() call after which I will look > > for > > > another possible root cause :-) > > > > > > > > > On Wed, Mar 10, 2010 at 7:48 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > >> Thanks to Andy for the log he provided. > > >> > > >> You can see from the log below that size increased steadily from > > 341535057 > > >> to 408181692, approaching maxSize. Then OOME: > > >> > > >> > > >> 2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask: > > reserve: > > >> pos=start requestedSize=3893000 size=341535057 numPendingRequests=0 > > >> maxSize=417601952 > > >> 2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask: > > reserve: > > >> pos=end requestedSize=3893000 size=345428057 numPendingRequests=0 > > >> maxSize=417601952 > > >> ... > > >> 2010-03-10 18:38:35,950 INFO org.apache.hadoop.mapred.ReduceTask: > > reserve: > > >> pos=end requestedSize=635753 size=408181692 numPendingRequests=0 > > >> maxSize=417601952 > > >> 2010-03-10 18:38:36,603 INFO org.apache.hadoop.mapred.ReduceTask: Task > > >> attempt_201003101826_0001_r_000004_0: Failed fetch #1 from > > >> attempt_201003101826_0001_m_000875_0 > > >> > > >> 2010-03-10 18:38:36,603 WARN org.apache.hadoop.mapred.ReduceTask: > > >> attempt_201003101826_0001_r_000004_0 adding host > > hd17.dfs.returnpath.netto penalty box, next contact in 4 seconds > > >> 2010-03-10 18:38:36,604 INFO org.apache.hadoop.mapred.ReduceTask: > > >> attempt_201003101826_0001_r_000004_0: Got 1 map-outputs from previous > > >> failures > > >> 2010-03-10 18:38:36,605 FATAL org.apache.hadoop.mapred.TaskRunner: > > >> attempt_201003101826_0001_r_000004_0 : Map output copy failure : > > >> java.lang.OutOfMemoryError: Java heap space > > >> at > > >> > > > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1513) > > >> at > > >> > > > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1413) > > >> at > > >> > > > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1266) > > >> at > > >> > > > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1200) > > >> > > >> Looking at the call to unreserve() in ReduceTask, two were for > > IOException > > >> and the other was for Sanity check (line 1557). Meaning they wouldn't > be > > >> called in normal execution path. > > >> > > >> I see one call in IFile.InMemoryReader close() method: > > >> // Inform the RamManager > > >> ramManager.unreserve(bufferSize); > > >> > > >> And InMemoryReader is used in createInMemorySegments(): > > >> Reader<K, V> reader = > > >> new InMemoryReader<K, V>(ramManager, mo.mapAttemptId, > > >> mo.data, 0, mo.data.length); > > >> > > >> But I don't see reader.close() in ReduceTask file. > > >> > > > > > > > > >> On Wed, Mar 10, 2010 at 3:34 PM, Chris Douglas <chri...@yahoo-inc.com > > >wrote: > > >> > > >>> I don't think this OOM is a framework bug per se, and given the > > >>> rewrite/refactoring of the shuffle in MAPREDUCE-318 (in 0.21), tuning > > the > > >>> 0.20 shuffle semantics is likely not worthwhile (though data > informing > > >>> improvements to trunk would be excellent). Most likely (and > > tautologically), > > >>> ReduceTask simply requires more memory than is available and the job > > failure > > >>> can be avoided by either 0) increasing the heap size or 1) lowering > > >>> mapred.shuffle.input.buffer.percent. Most of the tasks we run have a > > heap of > > >>> 1GB. For a reduce fetching >200k map outputs, that's a reasonable, > even > > >>> stingy amount of space. -C > > >>> > > >>> > > >>> On Mar 10, 2010, at 5:26 AM, Ted Yu wrote: > > >>> > > >>> I verified that size and maxSize are long. This means MR-1182 didn't > > >>>> resolve > > >>>> Andy's issue. > > >>>> > > >>>> According to Andy: > > >>>> At the beginning of the job there are 209,754 pending map tasks and > 32 > > >>>> pending reduce tasks > > >>>> > > >>>> My guess is that GC wasn't reclaiming memory fast enough, leading to > > OOME > > >>>> because of large number of in-memory shuffle candidates. > > >>>> > > >>>> My suggestion for Andy would be to: > > >>>> 1. add -*verbose*:*gc as JVM parameter > > >>>> 2. modify reserve() slightly to calculate the maximum outstanding > > >>>> numPendingRequests and print the maximum. > > >>>> > > >>>> Based on the output from above two items, we can discuss solution. > > >>>> My intuition is to place upperbound on numPendingRequests beyond > which > > >>>> canFitInMemory() returns false. > > >>>> * > > >>>> My two cents. > > >>>> > > >>>> On Tue, Mar 9, 2010 at 11:51 PM, Christopher Douglas > > >>>> <chri...@yahoo-inc.com>wrote: > > >>>> > > >>>> That section of code is unmodified in MR-1182. See the patches/svn > > log. > > >>>>> -C > > >>>>> > > >>>>> Sent from my iPhone > > >>>>> > > >>>>> > > >>>>> On Mar 9, 2010, at 7:44 PM, "Ted Yu" <yuzhih...@gmail.com> wrote: > > >>>>> > > >>>>> I just downloaded hadoop-0.20.2 tar ball from cloudera mirror. > > >>>>> > > >>>>>> This is what I see in ReduceTask (line 999): > > >>>>>> public synchronized boolean reserve(int requestedSize, > InputStream > > >>>>>> in) > > >>>>>> > > >>>>>> throws InterruptedException { > > >>>>>> // Wait till the request can be fulfilled... > > >>>>>> while ((size + requestedSize) > maxSize) { > > >>>>>> > > >>>>>> I don't see the fix from MR-1182. > > >>>>>> > > >>>>>> That's why I suggested to Andy that he manually apply MR-1182. > > >>>>>> > > >>>>>> Cheers > > >>>>>> > > >>>>>> On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins < > > >>>>>> andy.saut...@returnpath.net > > >>>>>> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>> > > >>>>>> > > >>>>>> Thanks Christopher. > > >>>>>>> > > >>>>>>> The heap size for reduce tasks is configured to be 640M ( > > >>>>>>> mapred.child.java.opts set to -Xmx640m ). > > >>>>>>> > > >>>>>>> Andy > > >>>>>>> > > >>>>>>> -----Original Message----- > > >>>>>>> From: Christopher Douglas [mailto:chri...@yahoo-inc.com] > > >>>>>>> Sent: Tuesday, March 09, 2010 5:19 PM > > >>>>>>> To: common-user@hadoop.apache.org > > >>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError > > >>>>>>> > > >>>>>>> No, MR-1182 is included in 0.20.2 > > >>>>>>> > > >>>>>>> What heap size have you set for your reduce tasks? -C > > >>>>>>> > > >>>>>>> Sent from my iPhone > > >>>>>>> > > >>>>>>> On Mar 9, 2010, at 2:34 PM, "Ted Yu" <yuzhih...@gmail.com> > wrote: > > >>>>>>> > > >>>>>>> Andy: > > >>>>>>> > > >>>>>>>> You need to manually apply the patch. > > >>>>>>>> > > >>>>>>>> Cheers > > >>>>>>>> > > >>>>>>>> On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins < > > >>>>>>>> > > >>>>>>>> andy.saut...@returnpath.net > > >>>>>>> > > >>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> Thanks Ted. My understanding is that MAPREDUCE-1182 is > included > > >>>>>>>>> in the > > >>>>>>>>> 0.20.2 release. We upgraded our cluster to 0.20.2 this weekend > > and > > >>>>>>>>> re-ran > > >>>>>>>>> the same job scenarios. Running with > > mapred.reduce.parallel.copies > > >>>>>>>>> set to 1 > > >>>>>>>>> and continue to have the same Java heap space error. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> -----Original Message----- > > >>>>>>>>> From: Ted Yu [mailto:yuzhih...@gmail.com] > > >>>>>>>>> Sent: Tuesday, March 09, 2010 12:56 PM > > >>>>>>>>> To: common-user@hadoop.apache.org > > >>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError > > >>>>>>>>> > > >>>>>>>>> This issue has been resolved in > > >>>>>>>>> http://issues.apache.org/jira/browse/MAPREDUCE-1182 > > >>>>>>>>> > > >>>>>>>>> Please apply the patch > > >>>>>>>>> M1182-1v20.patch< > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>> > > > http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch > > >>>>>>> > > >>>>>>> > > >>>>>>>> > > >>>>>>>>>> On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins < > > >>>>>>>>> > > >>>>>>>>> andy.saut...@returnpath.net > > >>>>>>>> > > >>>>>>> > > >>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Thanks Ted. Very helpful. You are correct that I > misunderstood > > >>>>>>>>>> the > > >>>>>>>>>> > > >>>>>>>>>> code > > >>>>>>>>> > > >>>>>>>>> at ReduceTask.java:1535. I missed the fact that it's in a > > >>>>>>>>>> IOException > > >>>>>>>>>> > > >>>>>>>>>> catch > > >>>>>>>>> > > >>>>>>>>> block. My mistake. That's what I get for being in a rush. > > >>>>>>>>>> > > >>>>>>>>>> For what it's worth I did re-run the job with > > >>>>>>>>>> mapred.reduce.parallel.copies set with values from 5 all the > way > > >>>>>>>>>> down to > > >>>>>>>>>> > > >>>>>>>>>> 1. > > >>>>>>>>> > > >>>>>>>>> All failed with the same error: > > >>>>>>>>>> > > >>>>>>>>>> Error: java.lang.OutOfMemoryError: Java heap space > > >>>>>>>>>> at > > >>>>>>>>>> > > >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > > >>>>>>>>>> > > >>>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) > > >>>>>>>>> > > >>>>>>>>> at > > >>>>>>>>>> > > >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > > >>>>>>>>>> > > >>>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408) > > >>>>>>>>> > > >>>>>>>>> at > > >>>>>>>>>> > > >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > > >>>>>>>>>> > > >>>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261) > > >>>>>>>>> > > >>>>>>>>> at > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run > > >>>>>>>>>> > > >>>>>>>>> (ReduceTask.java:1195) > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> So from that it does seem like something else might be going > on, > > >>>>>>>>>> yes? > > >>>>>>>>>> > > >>>>>>>>>> I > > >>>>>>>>> > > >>>>>>>>> need to do some more research. > > >>>>>>>>>> > > >>>>>>>>>> I appreciate your insights. > > >>>>>>>>>> > > >>>>>>>>>> Andy > > >>>>>>>>>> > > >>>>>>>>>> -----Original Message----- > > >>>>>>>>>> From: Ted Yu [mailto:yuzhih...@gmail.com] > > >>>>>>>>>> Sent: Sunday, March 07, 2010 3:38 PM > > >>>>>>>>>> To: common-user@hadoop.apache.org > > >>>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError > > >>>>>>>>>> > > >>>>>>>>>> My observation is based on this call chain: > > >>>>>>>>>> MapOutputCopier.run() calling copyOutput() calling > > getMapOutput() > > >>>>>>>>>> calling > > >>>>>>>>>> ramManager.canFitInMemory(decompressedLength) > > >>>>>>>>>> > > >>>>>>>>>> Basically ramManager.canFitInMemory() makes decision without > > >>>>>>>>>> considering > > >>>>>>>>>> the > > >>>>>>>>>> number of MapOutputCopiers that are running. Thus 1.25 * 0.7 > of > > >>>>>>>>>> total > > >>>>>>>>>> > > >>>>>>>>>> heap > > >>>>>>>>> > > >>>>>>>>> may be used in shuffling if default parameters were used. > > >>>>>>>>>> Of course, you should check the value for > > >>>>>>>>>> mapred.reduce.parallel.copies > > >>>>>>>>>> > > >>>>>>>>>> to > > >>>>>>>>> > > >>>>>>>>> see if it is 5. If it is 4 or lower, my reasoning wouldn't > > apply. > > >>>>>>>>>> > > >>>>>>>>>> About ramManager.unreserve() call, ReduceTask.java from hadoop > > >>>>>>>>>> 0.20.2 > > >>>>>>>>>> > > >>>>>>>>>> only > > >>>>>>>>> > > >>>>>>>>> has 2731 lines. So I have to guess the location of the code > > >>>>>>>>>> snippet you > > >>>>>>>>>> provided. > > >>>>>>>>>> I found this around line 1535: > > >>>>>>>>>> } catch (IOException ioe) { > > >>>>>>>>>> LOG.info("Failed to shuffle from " + > > >>>>>>>>>> mapOutputLoc.getTaskAttemptId(), > > >>>>>>>>>> ioe); > > >>>>>>>>>> > > >>>>>>>>>> // Inform the ram-manager > > >>>>>>>>>> ramManager.closeInMemoryFile(mapOutputLength); > > >>>>>>>>>> ramManager.unreserve(mapOutputLength); > > >>>>>>>>>> > > >>>>>>>>>> // Discard the map-output > > >>>>>>>>>> try { > > >>>>>>>>>> mapOutput.discard(); > > >>>>>>>>>> } catch (IOException ignored) { > > >>>>>>>>>> LOG.info("Failed to discard map-output from " + > > >>>>>>>>>> mapOutputLoc.getTaskAttemptId(), ignored); > > >>>>>>>>>> } > > >>>>>>>>>> Please confirm the line number. > > >>>>>>>>>> > > >>>>>>>>>> If we're looking at the same code, I am afraid I don't see how > > we > > >>>>>>>>>> can > > >>>>>>>>>> improve it. First, I assume IOException shouldn't happen that > > >>>>>>>>>> often. > > >>>>>>>>>> Second, > > >>>>>>>>>> mapOutput.discard() just sets: > > >>>>>>>>>> data = null; > > >>>>>>>>>> for in memory case. Even if we call mapOutput.discard() before > > >>>>>>>>>> ramManager.unreserve(), we don't know when GC would kick in > and > > >>>>>>>>>> make more > > >>>>>>>>>> memory available. > > >>>>>>>>>> Of course, given the large number of map outputs in your > system, > > it > > >>>>>>>>>> > > >>>>>>>>>> became > > >>>>>>>>> > > >>>>>>>>> more likely that the root cause from my reasoning made OOME > > happen > > >>>>>>>>>> > > >>>>>>>>>> sooner. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>> Thanks > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins < > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> andy.saut...@returnpath.net > > >>>>>>>>> > > >>>>>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> Ted, > > >>>>>>>>>>> > > >>>>>>>>>>> I'm trying to follow the logic in your mail and I'm not sure > > I'm > > >>>>>>>>>>> following. If you would mind helping me understand I would > > >>>>>>>>>>> appreciate > > >>>>>>>>>>> > > >>>>>>>>>>> it. > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>> Looking at the code maxSingleShuffleLimit is only used in > > >>>>>>>>>>> determining > > >>>>>>>>>>> > > >>>>>>>>>>> if > > >>>>>>>>>> > > >>>>>>>>>> the copy _can_ fit into memory: > > >>>>>>>>>>> > > >>>>>>>>>>> boolean canFitInMemory(long requestedSize) { > > >>>>>>>>>>> return (requestedSize < Integer.MAX_VALUE && > > >>>>>>>>>>> requestedSize < maxSingleShuffleLimit); > > >>>>>>>>>>> } > > >>>>>>>>>>> > > >>>>>>>>>>> It also looks like the RamManager.reserve should wait until > > >>>>>>>>>>> memory > > >>>>>>>>>>> > > >>>>>>>>>>> is > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> available so it should hit a memory limit for that reason. > > >>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> What does seem a little strange to me is the following ( > > >>>>>>>>>>> > > >>>>>>>>>>> ReduceTask.java > > >>>>>>>>>> > > >>>>>>>>>> starting at 2730 ): > > >>>>>>>>>>> > > >>>>>>>>>>> // Inform the ram-manager > > >>>>>>>>>>> ramManager.closeInMemoryFile(mapOutputLength); > > >>>>>>>>>>> ramManager.unreserve(mapOutputLength); > > >>>>>>>>>>> > > >>>>>>>>>>> // Discard the map-output > > >>>>>>>>>>> try { > > >>>>>>>>>>> mapOutput.discard(); > > >>>>>>>>>>> } catch (IOException ignored) { > > >>>>>>>>>>> LOG.info("Failed to discard map-output from " + > > >>>>>>>>>>> mapOutputLoc.getTaskAttemptId(), ignored); > > >>>>>>>>>>> } > > >>>>>>>>>>> mapOutput = null; > > >>>>>>>>>>> > > >>>>>>>>>>> So to me that looks like the ramManager unreserves the memory > > >>>>>>>>>>> before > > >>>>>>>>>>> > > >>>>>>>>>>> the > > >>>>>>>>>> > > >>>>>>>>>> mapOutput is discarded. Shouldn't the mapOutput be discarded > > >>>>>>>>>>> _before_ > > >>>>>>>>>>> > > >>>>>>>>>>> the > > >>>>>>>>>> > > >>>>>>>>>> ramManager unreserves the memory? If the memory is > unreserved > > >>>>>>>>>>> before > > >>>>>>>>>>> > > >>>>>>>>>>> the > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> actual underlying data references are removed then it seems > like > > >>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> another > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> thread can try to allocate memory ( ReduceTask.java:2730 ) > > before > > >>>>>>>>>> > > >>>>>>>>>>> the > > >>>>>>>>>>> previous memory is disposed ( mapOutput.discard() ). > > >>>>>>>>>>> > > >>>>>>>>>>> Not sure that makes sense. One thing to note is that the > > >>>>>>>>>>> particular > > >>>>>>>>>>> > > >>>>>>>>>>> job > > >>>>>>>>>> > > >>>>>>>>>> that is failing does have a good number ( 200k+ ) of map > > >>>>>>>>>>> outputs. The > > >>>>>>>>>>> > > >>>>>>>>>>> large > > >>>>>>>>>> > > >>>>>>>>>> number of small map outputs may be why we are triggering a > > >>>>>>>>>>> problem. > > >>>>>>>>>>> > > >>>>>>>>>>> Thanks again for your thoughts. > > >>>>>>>>>>> > > >>>>>>>>>>> Andy > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> -----Original Message----- > > >>>>>>>>>>> From: Jacob R Rideout [mailto:apa...@jacobrideout.net] > > >>>>>>>>>>> Sent: Sunday, March 07, 2010 1:21 PM > > >>>>>>>>>>> To: common-user@hadoop.apache.org > > >>>>>>>>>>> Cc: Andy Sautins; Ted Yu > > >>>>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError > > >>>>>>>>>>> > > >>>>>>>>>>> Ted, > > >>>>>>>>>>> > > >>>>>>>>>>> Thank you. I filled MAPREDUCE-1571 to cover this issue. I > might > > >>>>>>>>>>> have > > >>>>>>>>>>> some time to write a patch later this week. > > >>>>>>>>>>> > > >>>>>>>>>>> Jacob Rideout > > >>>>>>>>>>> > > >>>>>>>>>>> On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu <yuzhih...@gmail.com > > > > >>>>>>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> I think there is mismatch (in ReduceTask.java) between: > > >>>>>>>>>>>> this.numCopiers = > > conf.getInt("mapred.reduce.parallel.copies", > > >>>>>>>>>>>> > > >>>>>>>>>>>> 5); > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> and: > > >>>>>>>>>> > > >>>>>>>>>>> maxSingleShuffleLimit = (long)(maxSize * > > >>>>>>>>>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION); > > >>>>>>>>>>>> where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f > > >>>>>>>>>>>> > > >>>>>>>>>>>> because > > >>>>>>>>>>>> copiers = new ArrayList<MapOutputCopier>(numCopiers); > > >>>>>>>>>>>> so the total memory allocated for in-mem shuffle is 1.25 * > > >>>>>>>>>>>> maxSize > > >>>>>>>>>>>> > > >>>>>>>>>>>> A JIRA should be filed to correlate the constant 5 above and > > >>>>>>>>>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Cheers > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout < > > >>>>>>>>>>>> > > >>>>>>>>>>>> apa...@jacobrideout.net > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> Hi all, > > >>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> We are seeing the following error in our reducers of a > > >>>>>>>>>>>>> particular > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> job: > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>> Error: java.lang.OutOfMemoryError: Java heap space > > >>>>>>>>>>>>> at > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > > >>>>>>>>>> > > >>>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) > > >>>>>>>>> > > >>>>>>>>> at > > >>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > > >>>>>>>>>> > > >>>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408) > > >>>>>>>>> > > >>>>>>>>> at > > >>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > > >>>>>>>>>> > > >>>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261) > > >>>>>>>>> > > >>>>>>>>> at > > >>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>> > > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run > > >>>>>>>>>> > > >>>>>>>>> (ReduceTask.java:1195) > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>>>> After enough reducers fail the entire job fails. This error > > >>>>>>>>>>>>> occurs > > >>>>>>>>>>>>> regardless of whether mapred.compress.map.output is true. > We > > >>>>>>>>>>>>> were > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> able > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>> to avoid the issue by reducing > > >>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>>> mapred.job.shuffle.input.buffer.percent > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>> to 20%. Shouldn't the framework via > > >>>>>>>>>> > > >>>>>>>>>>> ShuffleRamManager.canFitInMemory > > >>>>>>>>>>>>> and.ShuffleRamManager.reserve correctly detect the the > memory > > >>>>>>>>>>>>> available for allocation? I would think that with poor > > >>>>>>>>>>>>> configuration > > >>>>>>>>>>>>> settings (and default settings in particular) the job may > not > > >>>>>>>>>>>>> be as > > >>>>>>>>>>>>> efficient, but wouldn't die. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Here is some more context in the logs, I have attached the > > full > > >>>>>>>>>>>>> reducer log here: http://gist.github.com/323746 > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> 2010-03-06 07:54:49,621 INFO > > >>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask: > > >>>>>>>>>>>>> Shuffling 4191933 bytes (435311 raw bytes) into RAM from > > >>>>>>>>>>>>> attempt_201003060739_0002_m_000061_0 > > >>>>>>>>>>>>> 2010-03-06 07:54:50,222 INFO > > >>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Task > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>> attempt_201003060739_0002_r_000000_0: Failed fetch #1 from > > >>>>>>>>>> > > >>>>>>>>>>> attempt_201003060739_0002_m_000202_0 > > >>>>>>>>>>>>> 2010-03-06 07:54:50,223 WARN > > >>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask: > > >>>>>>>>>>>>> attempt_201003060739_0002_r_000000_0 adding host > > >>>>>>>>>>>>> hd37.dfs.returnpath.net to penalty box, next contact in 4 > > >>>>>>>>>>>>> seconds > > >>>>>>>>>>>>> 2010-03-06 07:54:50,223 INFO > > >>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask: > > >>>>>>>>>>>>> attempt_201003060739_0002_r_000000_0: Got 1 map-outputs > from > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> previous > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>> failures > > >>>>>>>>>> > > >>>>>>>>>>> 2010-03-06 07:54:50,223 FATAL > > >>>>>>>>>>>>> org.apache.hadoop.mapred.TaskRunner: > > >>>>>>>>>>>>> attempt_201003060739_0002_r_000000_0 : Map output copy > > failure : > > >>>>>>>>>>>>> java.lang.OutOfMemoryError: Java heap space > > >>>>>>>>>>>>> at > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > > >>>>>>>>>> > > >>>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) > > >>>>>>>>> > > >>>>>>>>> at > > >>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > > >>>>>>>>>> > > >>>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408) > > >>>>>>>>> > > >>>>>>>>> at > > >>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > > >>>>>>>>>> > > >>>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261) > > >>>>>>>>> > > >>>>>>>>> at > > >>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>> > > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run > > >>>>>>>>>> > > >>>>>>>>> (ReduceTask.java:1195) > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>>>> We tried this both in 0.20.1 and 0.20.2. We had hoped > > >>>>>>>>>>>>> MAPREDUCE-1182 > > >>>>>>>>>>>>> would address the issue in 0.20.2, but it did not. Does > > anyone > > >>>>>>>>>>>>> have > > >>>>>>>>>>>>> any comments or suggestions? Is this a bug I should file a > > JIRA > > >>>>>>>>>>>>> for? > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Jacob Rideout > > >>>>>>>>>>>>> Return Path > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>> > > >>> > > >> > > > > > >