In our case there were three things that helped:

   - reduce *mapred.reduce.parallel.copies *(and incur some execution time
   penalty on large clusters)
   - increase the # of reducers (and just have the larger # of output files)
   - decrease the amount of map output (may or may not be possible in your
   case)


Alex K


On Fri, May 7, 2010 at 8:25 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> You need to lower mapred.job.shuffle.input.buffer.percent to 20% or 25%.
> I didn't have time recently to find the root cause in 0.20.2
>
> I was told that shuffle has been rewritten in 0.21
> You may give it a try.
>
> On Fri, May 7, 2010 at 8:08 PM, Bo Shi <b...@visiblemeasures.com> wrote:
>
> > Hey Ted, any further insights on this?  We're encountering a similar
> > issue (on CD2).  I'll be applying MAPREDUCE-1182 to see if that
> > resolves our case but it sounds like that JIRA didn't completely
> > eliminate the problem for some folks.
> >
> > On Wed, Mar 10, 2010 at 11:54 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > > I pressed send key a bit early.
> > >
> > > I will have to dig a bit deeper.
> > > Hopefully someone can find reader.close() call after which I will look
> > for
> > > another possible root cause :-)
> > >
> > >
> > > On Wed, Mar 10, 2010 at 7:48 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >
> > >> Thanks to Andy for the log he provided.
> > >>
> > >> You can see from the log below that size increased steadily from
> > 341535057
> > >> to 408181692, approaching maxSize. Then OOME:
> > >>
> > >>
> > >> 2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask:
> > reserve:
> > >> pos=start requestedSize=3893000 size=341535057 numPendingRequests=0
> > >> maxSize=417601952
> > >> 2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask:
> > reserve:
> > >> pos=end requestedSize=3893000 size=345428057 numPendingRequests=0
> > >> maxSize=417601952
> > >> ...
> > >> 2010-03-10 18:38:35,950 INFO org.apache.hadoop.mapred.ReduceTask:
> > reserve:
> > >> pos=end requestedSize=635753 size=408181692 numPendingRequests=0
> > >> maxSize=417601952
> > >> 2010-03-10 18:38:36,603 INFO org.apache.hadoop.mapred.ReduceTask: Task
> > >> attempt_201003101826_0001_r_000004_0: Failed fetch #1 from
> > >> attempt_201003101826_0001_m_000875_0
> > >>
> > >> 2010-03-10 18:38:36,603 WARN org.apache.hadoop.mapred.ReduceTask:
> > >> attempt_201003101826_0001_r_000004_0 adding host
> > hd17.dfs.returnpath.netto penalty box, next contact in 4 seconds
> > >> 2010-03-10 18:38:36,604 INFO org.apache.hadoop.mapred.ReduceTask:
> > >> attempt_201003101826_0001_r_000004_0: Got 1 map-outputs from previous
> > >> failures
> > >> 2010-03-10 18:38:36,605 FATAL org.apache.hadoop.mapred.TaskRunner:
> > >> attempt_201003101826_0001_r_000004_0 : Map output copy failure :
> > >> java.lang.OutOfMemoryError: Java heap space
> > >>         at
> > >>
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1513)
> > >>         at
> > >>
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1413)
> > >>         at
> > >>
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1266)
> > >>         at
> > >>
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1200)
> > >>
> > >> Looking at the call to unreserve() in ReduceTask, two were for
> > IOException
> > >> and the other was for Sanity check (line 1557). Meaning they wouldn't
> be
> > >> called in normal execution path.
> > >>
> > >> I see one call in IFile.InMemoryReader close() method:
> > >>       // Inform the RamManager
> > >>       ramManager.unreserve(bufferSize);
> > >>
> > >> And InMemoryReader is used in createInMemorySegments():
> > >>           Reader<K, V> reader =
> > >>             new InMemoryReader<K, V>(ramManager, mo.mapAttemptId,
> > >>                                      mo.data, 0, mo.data.length);
> > >>
> > >> But I don't see reader.close() in ReduceTask file.
> > >>
> > >
> > >
> > >> On Wed, Mar 10, 2010 at 3:34 PM, Chris Douglas <chri...@yahoo-inc.com
> > >wrote:
> > >>
> > >>> I don't think this OOM is a framework bug per se, and given the
> > >>> rewrite/refactoring of the shuffle in MAPREDUCE-318 (in 0.21), tuning
> > the
> > >>> 0.20 shuffle semantics is likely not worthwhile (though data
> informing
> > >>> improvements to trunk would be excellent). Most likely (and
> > tautologically),
> > >>> ReduceTask simply requires more memory than is available and the job
> > failure
> > >>> can be avoided by either 0) increasing the heap size or 1) lowering
> > >>> mapred.shuffle.input.buffer.percent. Most of the tasks we run have a
> > heap of
> > >>> 1GB. For a reduce fetching >200k map outputs, that's a reasonable,
> even
> > >>> stingy amount of space. -C
> > >>>
> > >>>
> > >>> On Mar 10, 2010, at 5:26 AM, Ted Yu wrote:
> > >>>
> > >>>  I verified that size and maxSize are long. This means MR-1182 didn't
> > >>>> resolve
> > >>>> Andy's issue.
> > >>>>
> > >>>> According to Andy:
> > >>>> At the beginning of the job there are 209,754 pending map tasks and
> 32
> > >>>> pending reduce tasks
> > >>>>
> > >>>> My guess is that GC wasn't reclaiming memory fast enough, leading to
> > OOME
> > >>>> because of large number of in-memory shuffle candidates.
> > >>>>
> > >>>> My suggestion for Andy would be to:
> > >>>> 1. add -*verbose*:*gc as JVM parameter
> > >>>> 2. modify reserve() slightly to calculate the maximum outstanding
> > >>>> numPendingRequests and print the maximum.
> > >>>>
> > >>>> Based on the output from above two items, we can discuss solution.
> > >>>> My intuition is to place upperbound on numPendingRequests beyond
> which
> > >>>> canFitInMemory() returns false.
> > >>>> *
> > >>>> My two cents.
> > >>>>
> > >>>> On Tue, Mar 9, 2010 at 11:51 PM, Christopher Douglas
> > >>>> <chri...@yahoo-inc.com>wrote:
> > >>>>
> > >>>>  That section of code is unmodified in MR-1182. See the patches/svn
> > log.
> > >>>>> -C
> > >>>>>
> > >>>>> Sent from my iPhone
> > >>>>>
> > >>>>>
> > >>>>> On Mar 9, 2010, at 7:44 PM, "Ted Yu" <yuzhih...@gmail.com> wrote:
> > >>>>>
> > >>>>> I just downloaded hadoop-0.20.2 tar ball from cloudera mirror.
> > >>>>>
> > >>>>>> This is what I see in ReduceTask (line 999):
> > >>>>>>   public synchronized boolean reserve(int requestedSize,
> InputStream
> > >>>>>> in)
> > >>>>>>
> > >>>>>>   throws InterruptedException {
> > >>>>>>     // Wait till the request can be fulfilled...
> > >>>>>>     while ((size + requestedSize) > maxSize) {
> > >>>>>>
> > >>>>>> I don't see the fix from MR-1182.
> > >>>>>>
> > >>>>>> That's why I suggested to Andy that he manually apply MR-1182.
> > >>>>>>
> > >>>>>> Cheers
> > >>>>>>
> > >>>>>> On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins <
> > >>>>>> andy.saut...@returnpath.net
> > >>>>>>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>  Thanks Christopher.
> > >>>>>>>
> > >>>>>>> The heap size for reduce tasks is configured to be 640M (
> > >>>>>>> mapred.child.java.opts set to -Xmx640m ).
> > >>>>>>>
> > >>>>>>> Andy
> > >>>>>>>
> > >>>>>>> -----Original Message-----
> > >>>>>>> From: Christopher Douglas [mailto:chri...@yahoo-inc.com]
> > >>>>>>> Sent: Tuesday, March 09, 2010 5:19 PM
> > >>>>>>> To: common-user@hadoop.apache.org
> > >>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
> > >>>>>>>
> > >>>>>>> No, MR-1182 is included in 0.20.2
> > >>>>>>>
> > >>>>>>> What heap size have you set for your reduce tasks? -C
> > >>>>>>>
> > >>>>>>> Sent from my iPhone
> > >>>>>>>
> > >>>>>>> On Mar 9, 2010, at 2:34 PM, "Ted Yu" <yuzhih...@gmail.com>
> wrote:
> > >>>>>>>
> > >>>>>>> Andy:
> > >>>>>>>
> > >>>>>>>> You need to manually apply the patch.
> > >>>>>>>>
> > >>>>>>>> Cheers
> > >>>>>>>>
> > >>>>>>>> On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins <
> > >>>>>>>>
> > >>>>>>>>  andy.saut...@returnpath.net
> > >>>>>>>
> > >>>>>>>  wrote:
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>  Thanks Ted.  My understanding is that MAPREDUCE-1182 is
> included
> > >>>>>>>>> in the
> > >>>>>>>>> 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend
> > and
> > >>>>>>>>> re-ran
> > >>>>>>>>> the same job scenarios.  Running with
> > mapred.reduce.parallel.copies
> > >>>>>>>>> set to 1
> > >>>>>>>>> and continue to have the same Java heap space error.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> -----Original Message-----
> > >>>>>>>>> From: Ted Yu [mailto:yuzhih...@gmail.com]
> > >>>>>>>>> Sent: Tuesday, March 09, 2010 12:56 PM
> > >>>>>>>>> To: common-user@hadoop.apache.org
> > >>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
> > >>>>>>>>>
> > >>>>>>>>> This issue has been resolved in
> > >>>>>>>>> http://issues.apache.org/jira/browse/MAPREDUCE-1182
> > >>>>>>>>>
> > >>>>>>>>> Please apply the patch
> > >>>>>>>>> M1182-1v20.patch<
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> >
> http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>  On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins <
> > >>>>>>>>>
> > >>>>>>>>>  andy.saut...@returnpath.net
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>  wrote:
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>  Thanks Ted.  Very helpful.  You are correct that I
> misunderstood
> > >>>>>>>>>> the
> > >>>>>>>>>>
> > >>>>>>>>>>  code
> > >>>>>>>>>
> > >>>>>>>>>  at ReduceTask.java:1535.  I missed the fact that it's in a
> > >>>>>>>>>> IOException
> > >>>>>>>>>>
> > >>>>>>>>>>  catch
> > >>>>>>>>>
> > >>>>>>>>>  block.  My mistake.  That's what I get for being in a rush.
> > >>>>>>>>>>
> > >>>>>>>>>> For what it's worth I did re-run the job with
> > >>>>>>>>>> mapred.reduce.parallel.copies set with values from 5 all the
> way
> > >>>>>>>>>> down to
> > >>>>>>>>>>
> > >>>>>>>>>>  1.
> > >>>>>>>>>
> > >>>>>>>>>  All failed with the same error:
> > >>>>>>>>>>
> > >>>>>>>>>> Error: java.lang.OutOfMemoryError: Java heap space
> > >>>>>>>>>>   at
> > >>>>>>>>>>
> > >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
> > >>>>>>>>>>
> > >>>>>>>>> (ReduceTask.java:1195)
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> So from that it does seem like something else might be going
> on,
> > >>>>>>>>>> yes?
> > >>>>>>>>>>
> > >>>>>>>>>>  I
> > >>>>>>>>>
> > >>>>>>>>>  need to do some more research.
> > >>>>>>>>>>
> > >>>>>>>>>> I appreciate your insights.
> > >>>>>>>>>>
> > >>>>>>>>>> Andy
> > >>>>>>>>>>
> > >>>>>>>>>> -----Original Message-----
> > >>>>>>>>>> From: Ted Yu [mailto:yuzhih...@gmail.com]
> > >>>>>>>>>> Sent: Sunday, March 07, 2010 3:38 PM
> > >>>>>>>>>> To: common-user@hadoop.apache.org
> > >>>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
> > >>>>>>>>>>
> > >>>>>>>>>> My observation is based on this call chain:
> > >>>>>>>>>> MapOutputCopier.run() calling copyOutput() calling
> > getMapOutput()
> > >>>>>>>>>> calling
> > >>>>>>>>>> ramManager.canFitInMemory(decompressedLength)
> > >>>>>>>>>>
> > >>>>>>>>>> Basically ramManager.canFitInMemory() makes decision without
> > >>>>>>>>>> considering
> > >>>>>>>>>> the
> > >>>>>>>>>> number of MapOutputCopiers that are running. Thus 1.25 * 0.7
> of
> > >>>>>>>>>> total
> > >>>>>>>>>>
> > >>>>>>>>>>  heap
> > >>>>>>>>>
> > >>>>>>>>>  may be used in shuffling if default parameters were used.
> > >>>>>>>>>> Of course, you should check the value for
> > >>>>>>>>>> mapred.reduce.parallel.copies
> > >>>>>>>>>>
> > >>>>>>>>>>  to
> > >>>>>>>>>
> > >>>>>>>>>  see if it is 5. If it is 4 or lower, my reasoning wouldn't
> > apply.
> > >>>>>>>>>>
> > >>>>>>>>>> About ramManager.unreserve() call, ReduceTask.java from hadoop
> > >>>>>>>>>> 0.20.2
> > >>>>>>>>>>
> > >>>>>>>>>>  only
> > >>>>>>>>>
> > >>>>>>>>>  has 2731 lines. So I have to guess the location of the code
> > >>>>>>>>>> snippet you
> > >>>>>>>>>> provided.
> > >>>>>>>>>> I found this around line 1535:
> > >>>>>>>>>>   } catch (IOException ioe) {
> > >>>>>>>>>>     LOG.info("Failed to shuffle from " +
> > >>>>>>>>>> mapOutputLoc.getTaskAttemptId(),
> > >>>>>>>>>>              ioe);
> > >>>>>>>>>>
> > >>>>>>>>>>     // Inform the ram-manager
> > >>>>>>>>>>     ramManager.closeInMemoryFile(mapOutputLength);
> > >>>>>>>>>>     ramManager.unreserve(mapOutputLength);
> > >>>>>>>>>>
> > >>>>>>>>>>     // Discard the map-output
> > >>>>>>>>>>     try {
> > >>>>>>>>>>       mapOutput.discard();
> > >>>>>>>>>>     } catch (IOException ignored) {
> > >>>>>>>>>>       LOG.info("Failed to discard map-output from " +
> > >>>>>>>>>>                mapOutputLoc.getTaskAttemptId(), ignored);
> > >>>>>>>>>>     }
> > >>>>>>>>>> Please confirm the line number.
> > >>>>>>>>>>
> > >>>>>>>>>> If we're looking at the same code, I am afraid I don't see how
> > we
> > >>>>>>>>>> can
> > >>>>>>>>>> improve it. First, I assume IOException shouldn't happen that
> > >>>>>>>>>> often.
> > >>>>>>>>>> Second,
> > >>>>>>>>>> mapOutput.discard() just sets:
> > >>>>>>>>>>     data = null;
> > >>>>>>>>>> for in memory case. Even if we call mapOutput.discard() before
> > >>>>>>>>>> ramManager.unreserve(), we don't know when GC would kick in
> and
> > >>>>>>>>>> make more
> > >>>>>>>>>> memory available.
> > >>>>>>>>>> Of course, given the large number of map outputs in your
> system,
> > it
> > >>>>>>>>>>
> > >>>>>>>>>>  became
> > >>>>>>>>>
> > >>>>>>>>>  more likely that the root cause from my reasoning made OOME
> > happen
> > >>>>>>>>>>
> > >>>>>>>>>>  sooner.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>> Thanks
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>  On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins <
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>  andy.saut...@returnpath.net
> > >>>>>>>>>
> > >>>>>>>>>  wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>  Ted,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I'm trying to follow the logic in your mail and I'm not sure
> > I'm
> > >>>>>>>>>>> following.  If you would mind helping me understand I would
> > >>>>>>>>>>> appreciate
> > >>>>>>>>>>>
> > >>>>>>>>>>>  it.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> Looking at the code maxSingleShuffleLimit is only used in
> > >>>>>>>>>>> determining
> > >>>>>>>>>>>
> > >>>>>>>>>>>  if
> > >>>>>>>>>>
> > >>>>>>>>>>  the copy _can_ fit into memory:
> > >>>>>>>>>>>
> > >>>>>>>>>>> boolean canFitInMemory(long requestedSize) {
> > >>>>>>>>>>>   return (requestedSize < Integer.MAX_VALUE &&
> > >>>>>>>>>>>           requestedSize < maxSingleShuffleLimit);
> > >>>>>>>>>>>  }
> > >>>>>>>>>>>
> > >>>>>>>>>>> It also looks like the RamManager.reserve should wait until
> > >>>>>>>>>>> memory
> > >>>>>>>>>>>
> > >>>>>>>>>>>  is
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>  available so it should hit a memory limit for that reason.
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> What does seem a little strange to me is the following (
> > >>>>>>>>>>>
> > >>>>>>>>>>>  ReduceTask.java
> > >>>>>>>>>>
> > >>>>>>>>>>  starting at 2730 ):
> > >>>>>>>>>>>
> > >>>>>>>>>>>     // Inform the ram-manager
> > >>>>>>>>>>>     ramManager.closeInMemoryFile(mapOutputLength);
> > >>>>>>>>>>>     ramManager.unreserve(mapOutputLength);
> > >>>>>>>>>>>
> > >>>>>>>>>>>     // Discard the map-output
> > >>>>>>>>>>>     try {
> > >>>>>>>>>>>       mapOutput.discard();
> > >>>>>>>>>>>     } catch (IOException ignored) {
> > >>>>>>>>>>>       LOG.info("Failed to discard map-output from " +
> > >>>>>>>>>>>                mapOutputLoc.getTaskAttemptId(), ignored);
> > >>>>>>>>>>>     }
> > >>>>>>>>>>>     mapOutput = null;
> > >>>>>>>>>>>
> > >>>>>>>>>>> So to me that looks like the ramManager unreserves the memory
> > >>>>>>>>>>> before
> > >>>>>>>>>>>
> > >>>>>>>>>>>  the
> > >>>>>>>>>>
> > >>>>>>>>>>  mapOutput is discarded.  Shouldn't the mapOutput be discarded
> > >>>>>>>>>>> _before_
> > >>>>>>>>>>>
> > >>>>>>>>>>>  the
> > >>>>>>>>>>
> > >>>>>>>>>>  ramManager unreserves the memory?  If the memory is
> unreserved
> > >>>>>>>>>>> before
> > >>>>>>>>>>>
> > >>>>>>>>>>>  the
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>  actual underlying data references are removed then it seems
> like
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>  another
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>  thread can try to allocate memory ( ReduceTask.java:2730 )
> > before
> > >>>>>>>>>>
> > >>>>>>>>>>> the
> > >>>>>>>>>>> previous memory is disposed ( mapOutput.discard() ).
> > >>>>>>>>>>>
> > >>>>>>>>>>> Not sure that makes sense.  One thing to note is that the
> > >>>>>>>>>>> particular
> > >>>>>>>>>>>
> > >>>>>>>>>>>  job
> > >>>>>>>>>>
> > >>>>>>>>>>  that is failing does have a good number ( 200k+ ) of map
> > >>>>>>>>>>> outputs.  The
> > >>>>>>>>>>>
> > >>>>>>>>>>>  large
> > >>>>>>>>>>
> > >>>>>>>>>>  number of small map outputs may be why we are triggering a
> > >>>>>>>>>>> problem.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks again for your thoughts.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Andy
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> -----Original Message-----
> > >>>>>>>>>>> From: Jacob R Rideout [mailto:apa...@jacobrideout.net]
> > >>>>>>>>>>> Sent: Sunday, March 07, 2010 1:21 PM
> > >>>>>>>>>>> To: common-user@hadoop.apache.org
> > >>>>>>>>>>> Cc: Andy Sautins; Ted Yu
> > >>>>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
> > >>>>>>>>>>>
> > >>>>>>>>>>> Ted,
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thank you. I filled MAPREDUCE-1571 to cover this issue. I
> might
> > >>>>>>>>>>> have
> > >>>>>>>>>>> some time to write a patch later this week.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Jacob Rideout
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu <yuzhih...@gmail.com
> >
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>  I think there is mismatch (in ReduceTask.java) between:
> > >>>>>>>>>>>>  this.numCopiers =
> > conf.getInt("mapred.reduce.parallel.copies",
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>  5);
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>  and:
> > >>>>>>>>>>
> > >>>>>>>>>>>   maxSingleShuffleLimit = (long)(maxSize *
> > >>>>>>>>>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION);
> > >>>>>>>>>>>> where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> because
> > >>>>>>>>>>>>  copiers = new ArrayList<MapOutputCopier>(numCopiers);
> > >>>>>>>>>>>> so the total memory allocated for in-mem shuffle is 1.25 *
> > >>>>>>>>>>>> maxSize
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> A JIRA should be filed to correlate the constant 5 above and
> > >>>>>>>>>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout <
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>  apa...@jacobrideout.net
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>  wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Hi all,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> We are seeing the following error in our reducers of a
> > >>>>>>>>>>>>> particular
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>  job:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>  Error: java.lang.OutOfMemoryError: Java heap space
> > >>>>>>>>>>>>>   at
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>
> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
> > >>>>>>>>>>
> > >>>>>>>>> (ReduceTask.java:1195)
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>>>> After enough reducers fail the entire job fails. This error
> > >>>>>>>>>>>>> occurs
> > >>>>>>>>>>>>> regardless of whether mapred.compress.map.output is true.
> We
> > >>>>>>>>>>>>> were
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>  able
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>  to avoid the issue by reducing
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>  mapred.job.shuffle.input.buffer.percent
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>  to 20%. Shouldn't the framework via
> > >>>>>>>>>>
> > >>>>>>>>>>>  ShuffleRamManager.canFitInMemory
> > >>>>>>>>>>>>> and.ShuffleRamManager.reserve correctly detect the the
> memory
> > >>>>>>>>>>>>> available for allocation? I would think that with poor
> > >>>>>>>>>>>>> configuration
> > >>>>>>>>>>>>> settings (and default settings in particular) the job may
> not
> > >>>>>>>>>>>>> be as
> > >>>>>>>>>>>>> efficient, but wouldn't die.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Here is some more context in the logs, I have attached the
> > full
> > >>>>>>>>>>>>> reducer log here: http://gist.github.com/323746
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 2010-03-06 07:54:49,621 INFO
> > >>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
> > >>>>>>>>>>>>> Shuffling 4191933 bytes (435311 raw bytes) into RAM from
> > >>>>>>>>>>>>> attempt_201003060739_0002_m_000061_0
> > >>>>>>>>>>>>> 2010-03-06 07:54:50,222 INFO
> > >>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>  Task
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>  attempt_201003060739_0002_r_000000_0: Failed fetch #1 from
> > >>>>>>>>>>
> > >>>>>>>>>>>  attempt_201003060739_0002_m_000202_0
> > >>>>>>>>>>>>> 2010-03-06 07:54:50,223 WARN
> > >>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
> > >>>>>>>>>>>>> attempt_201003060739_0002_r_000000_0 adding host
> > >>>>>>>>>>>>> hd37.dfs.returnpath.net to penalty box, next contact in 4
> > >>>>>>>>>>>>> seconds
> > >>>>>>>>>>>>> 2010-03-06 07:54:50,223 INFO
> > >>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
> > >>>>>>>>>>>>> attempt_201003060739_0002_r_000000_0: Got 1 map-outputs
> from
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>  previous
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>  failures
> > >>>>>>>>>>
> > >>>>>>>>>>>  2010-03-06 07:54:50,223 FATAL
> > >>>>>>>>>>>>> org.apache.hadoop.mapred.TaskRunner:
> > >>>>>>>>>>>>> attempt_201003060739_0002_r_000000_0 : Map output copy
> > failure :
> > >>>>>>>>>>>>> java.lang.OutOfMemoryError: Java heap space
> > >>>>>>>>>>>>>   at
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>
> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
> > >>>>>>>>>>
> > >>>>>>>>> (ReduceTask.java:1195)
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>>>> We tried this both in 0.20.1 and 0.20.2. We had hoped
> > >>>>>>>>>>>>> MAPREDUCE-1182
> > >>>>>>>>>>>>> would address the issue in 0.20.2, but it did not. Does
> > anyone
> > >>>>>>>>>>>>> have
> > >>>>>>>>>>>>> any comments or suggestions? Is this a bug I should file a
> > JIRA
> > >>>>>>>>>>>>> for?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Jacob Rideout
> > >>>>>>>>>>>>> Return Path
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>
> > >>
> > >
> >
>

Reply via email to