I just downloaded hadoop-0.20.2 tar ball from cloudera mirror. This is what I see in ReduceTask (line 999): public synchronized boolean reserve(int requestedSize, InputStream in)
throws InterruptedException { // Wait till the request can be fulfilled... while ((size + requestedSize) > maxSize) { I don't see the fix from MR-1182. That's why I suggested to Andy that he manually apply MR-1182. Cheers On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins <andy.saut...@returnpath.net>wrote: > > Thanks Christopher. > > The heap size for reduce tasks is configured to be 640M ( > mapred.child.java.opts set to -Xmx640m ). > > Andy > > -----Original Message----- > From: Christopher Douglas [mailto:chri...@yahoo-inc.com] > Sent: Tuesday, March 09, 2010 5:19 PM > To: common-user@hadoop.apache.org > Subject: Re: Shuffle In Memory OutOfMemoryError > > No, MR-1182 is included in 0.20.2 > > What heap size have you set for your reduce tasks? -C > > Sent from my iPhone > > On Mar 9, 2010, at 2:34 PM, "Ted Yu" <yuzhih...@gmail.com> wrote: > > > Andy: > > You need to manually apply the patch. > > > > Cheers > > > > On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins < > andy.saut...@returnpath.net > > >wrote: > > > >> > >> Thanks Ted. My understanding is that MAPREDUCE-1182 is included > >> in the > >> 0.20.2 release. We upgraded our cluster to 0.20.2 this weekend and > >> re-ran > >> the same job scenarios. Running with mapred.reduce.parallel.copies > >> set to 1 > >> and continue to have the same Java heap space error. > >> > >> > >> > >> -----Original Message----- > >> From: Ted Yu [mailto:yuzhih...@gmail.com] > >> Sent: Tuesday, March 09, 2010 12:56 PM > >> To: common-user@hadoop.apache.org > >> Subject: Re: Shuffle In Memory OutOfMemoryError > >> > >> This issue has been resolved in > >> http://issues.apache.org/jira/browse/MAPREDUCE-1182 > >> > >> Please apply the patch > >> M1182-1v20.patch< > >> > http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch > >> > > >> > >> On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins < > andy.saut...@returnpath.net > >>> wrote: > >> > >>> > >>> Thanks Ted. Very helpful. You are correct that I misunderstood the > >> code > >>> at ReduceTask.java:1535. I missed the fact that it's in a > >>> IOException > >> catch > >>> block. My mistake. That's what I get for being in a rush. > >>> > >>> For what it's worth I did re-run the job with > >>> mapred.reduce.parallel.copies set with values from 5 all the way > >>> down to > >> 1. > >>> All failed with the same error: > >>> > >>> Error: java.lang.OutOfMemoryError: Java heap space > >>> at > >>> > >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > >> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) > >>> at > >>> > >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > >> $MapOutputCopier.getMapOutput(ReduceTask.java:1408) > >>> at > >>> > >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > >> $MapOutputCopier.copyOutput(ReduceTask.java:1261) > >>> at > >>> > >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run > >> (ReduceTask.java:1195) > >>> > >>> > >>> So from that it does seem like something else might be going on, > >>> yes? > >> I > >>> need to do some more research. > >>> > >>> I appreciate your insights. > >>> > >>> Andy > >>> > >>> -----Original Message----- > >>> From: Ted Yu [mailto:yuzhih...@gmail.com] > >>> Sent: Sunday, March 07, 2010 3:38 PM > >>> To: common-user@hadoop.apache.org > >>> Subject: Re: Shuffle In Memory OutOfMemoryError > >>> > >>> My observation is based on this call chain: > >>> MapOutputCopier.run() calling copyOutput() calling getMapOutput() > >>> calling > >>> ramManager.canFitInMemory(decompressedLength) > >>> > >>> Basically ramManager.canFitInMemory() makes decision without > >>> considering > >>> the > >>> number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of > >>> total > >> heap > >>> may be used in shuffling if default parameters were used. > >>> Of course, you should check the value for > >>> mapred.reduce.parallel.copies > >> to > >>> see if it is 5. If it is 4 or lower, my reasoning wouldn't apply. > >>> > >>> About ramManager.unreserve() call, ReduceTask.java from hadoop > >>> 0.20.2 > >> only > >>> has 2731 lines. So I have to guess the location of the code > >>> snippet you > >>> provided. > >>> I found this around line 1535: > >>> } catch (IOException ioe) { > >>> LOG.info("Failed to shuffle from " + > >>> mapOutputLoc.getTaskAttemptId(), > >>> ioe); > >>> > >>> // Inform the ram-manager > >>> ramManager.closeInMemoryFile(mapOutputLength); > >>> ramManager.unreserve(mapOutputLength); > >>> > >>> // Discard the map-output > >>> try { > >>> mapOutput.discard(); > >>> } catch (IOException ignored) { > >>> LOG.info("Failed to discard map-output from " + > >>> mapOutputLoc.getTaskAttemptId(), ignored); > >>> } > >>> Please confirm the line number. > >>> > >>> If we're looking at the same code, I am afraid I don't see how we > >>> can > >>> improve it. First, I assume IOException shouldn't happen that often. > >>> Second, > >>> mapOutput.discard() just sets: > >>> data = null; > >>> for in memory case. Even if we call mapOutput.discard() before > >>> ramManager.unreserve(), we don't know when GC would kick in and > >>> make more > >>> memory available. > >>> Of course, given the large number of map outputs in your system, it > >> became > >>> more likely that the root cause from my reasoning made OOME happen > >> sooner. > >>> > >>> Thanks > >>> > >>>> > >>> On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins < > >> andy.saut...@returnpath.net > >>>> wrote: > >>> > >>>> > >>>> Ted, > >>>> > >>>> I'm trying to follow the logic in your mail and I'm not sure I'm > >>>> following. If you would mind helping me understand I would > >>>> appreciate > >>> it. > >>>> > >>>> Looking at the code maxSingleShuffleLimit is only used in > >>>> determining > >>> if > >>>> the copy _can_ fit into memory: > >>>> > >>>> boolean canFitInMemory(long requestedSize) { > >>>> return (requestedSize < Integer.MAX_VALUE && > >>>> requestedSize < maxSingleShuffleLimit); > >>>> } > >>>> > >>>> It also looks like the RamManager.reserve should wait until > >>>> memory > >> is > >>>> available so it should hit a memory limit for that reason. > >>>> > >>>> What does seem a little strange to me is the following ( > >>> ReduceTask.java > >>>> starting at 2730 ): > >>>> > >>>> // Inform the ram-manager > >>>> ramManager.closeInMemoryFile(mapOutputLength); > >>>> ramManager.unreserve(mapOutputLength); > >>>> > >>>> // Discard the map-output > >>>> try { > >>>> mapOutput.discard(); > >>>> } catch (IOException ignored) { > >>>> LOG.info("Failed to discard map-output from " + > >>>> mapOutputLoc.getTaskAttemptId(), ignored); > >>>> } > >>>> mapOutput = null; > >>>> > >>>> So to me that looks like the ramManager unreserves the memory > >>>> before > >>> the > >>>> mapOutput is discarded. Shouldn't the mapOutput be discarded > >>>> _before_ > >>> the > >>>> ramManager unreserves the memory? If the memory is unreserved > >>>> before > >> the > >>>> actual underlying data references are removed then it seems like > >> another > >>>> thread can try to allocate memory ( ReduceTask.java:2730 ) before > >>>> the > >>>> previous memory is disposed ( mapOutput.discard() ). > >>>> > >>>> Not sure that makes sense. One thing to note is that the > >>>> particular > >>> job > >>>> that is failing does have a good number ( 200k+ ) of map > >>>> outputs. The > >>> large > >>>> number of small map outputs may be why we are triggering a problem. > >>>> > >>>> Thanks again for your thoughts. > >>>> > >>>> Andy > >>>> > >>>> > >>>> -----Original Message----- > >>>> From: Jacob R Rideout [mailto:apa...@jacobrideout.net] > >>>> Sent: Sunday, March 07, 2010 1:21 PM > >>>> To: common-user@hadoop.apache.org > >>>> Cc: Andy Sautins; Ted Yu > >>>> Subject: Re: Shuffle In Memory OutOfMemoryError > >>>> > >>>> Ted, > >>>> > >>>> Thank you. I filled MAPREDUCE-1571 to cover this issue. I might > >>>> have > >>>> some time to write a patch later this week. > >>>> > >>>> Jacob Rideout > >>>> > >>>> On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu <yuzhih...@gmail.com> > >>>> wrote: > >>>>> I think there is mismatch (in ReduceTask.java) between: > >>>>> this.numCopiers = conf.getInt("mapred.reduce.parallel.copies", > >> 5); > >>>>> and: > >>>>> maxSingleShuffleLimit = (long)(maxSize * > >>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION); > >>>>> where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f > >>>>> > >>>>> because > >>>>> copiers = new ArrayList<MapOutputCopier>(numCopiers); > >>>>> so the total memory allocated for in-mem shuffle is 1.25 * maxSize > >>>>> > >>>>> A JIRA should be filed to correlate the constant 5 above and > >>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION. > >>>>> > >>>>> Cheers > >>>>> > >>>>> On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout < > >>> apa...@jacobrideout.net > >>>>> wrote: > >>>>> > >>>>>> Hi all, > >>>>>> > >>>>>> We are seeing the following error in our reducers of a particular > >> job: > >>>>>> > >>>>>> Error: java.lang.OutOfMemoryError: Java heap space > >>>>>> at > >>>>>> > >>>> > >>> > >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > >> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) > >>>>>> at > >>>>>> > >>>> > >>> > >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > >> $MapOutputCopier.getMapOutput(ReduceTask.java:1408) > >>>>>> at > >>>>>> > >>>> > >>> > >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > >> $MapOutputCopier.copyOutput(ReduceTask.java:1261) > >>>>>> at > >>>>>> > >>>> > >>> > >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run > >> (ReduceTask.java:1195) > >>>>>> > >>>>>> > >>>>>> After enough reducers fail the entire job fails. This error > >>>>>> occurs > >>>>>> regardless of whether mapred.compress.map.output is true. We were > >> able > >>>>>> to avoid the issue by reducing > >> mapred.job.shuffle.input.buffer.percent > >>>>>> to 20%. Shouldn't the framework via > >>>>>> ShuffleRamManager.canFitInMemory > >>>>>> and.ShuffleRamManager.reserve correctly detect the the memory > >>>>>> available for allocation? I would think that with poor > >>>>>> configuration > >>>>>> settings (and default settings in particular) the job may not > >>>>>> be as > >>>>>> efficient, but wouldn't die. > >>>>>> > >>>>>> Here is some more context in the logs, I have attached the full > >>>>>> reducer log here: http://gist.github.com/323746 > >>>>>> > >>>>>> > >>>>>> 2010-03-06 07:54:49,621 INFO org.apache.hadoop.mapred.ReduceTask: > >>>>>> Shuffling 4191933 bytes (435311 raw bytes) into RAM from > >>>>>> attempt_201003060739_0002_m_000061_0 > >>>>>> 2010-03-06 07:54:50,222 INFO org.apache.hadoop.mapred.ReduceTask: > >> Task > >>>>>> attempt_201003060739_0002_r_000000_0: Failed fetch #1 from > >>>>>> attempt_201003060739_0002_m_000202_0 > >>>>>> 2010-03-06 07:54:50,223 WARN org.apache.hadoop.mapred.ReduceTask: > >>>>>> attempt_201003060739_0002_r_000000_0 adding host > >>>>>> hd37.dfs.returnpath.net to penalty box, next contact in 4 seconds > >>>>>> 2010-03-06 07:54:50,223 INFO org.apache.hadoop.mapred.ReduceTask: > >>>>>> attempt_201003060739_0002_r_000000_0: Got 1 map-outputs from > >> previous > >>>>>> failures > >>>>>> 2010-03-06 07:54:50,223 FATAL > >>>>>> org.apache.hadoop.mapred.TaskRunner: > >>>>>> attempt_201003060739_0002_r_000000_0 : Map output copy failure : > >>>>>> java.lang.OutOfMemoryError: Java heap space > >>>>>> at > >>>>>> > >>>> > >>> > >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > >> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) > >>>>>> at > >>>>>> > >>>> > >>> > >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > >> $MapOutputCopier.getMapOutput(ReduceTask.java:1408) > >>>>>> at > >>>>>> > >>>> > >>> > >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier > >> $MapOutputCopier.copyOutput(ReduceTask.java:1261) > >>>>>> at > >>>>>> > >>>> > >>> > >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run > >> (ReduceTask.java:1195) > >>>>>> > >>>>>> > >>>>>> We tried this both in 0.20.1 and 0.20.2. We had hoped > >>>>>> MAPREDUCE-1182 > >>>>>> would address the issue in 0.20.2, but it did not. Does anyone > >>>>>> have > >>>>>> any comments or suggestions? Is this a bug I should file a JIRA > >>>>>> for? > >>>>>> > >>>>>> Jacob Rideout > >>>>>> Return Path > >>>>>> > >>>>> > >>>> > >>> > >> >