Re: Shuffle In Memory OutOfMemoryError

2010-05-08 Thread Alex Kozlov
 parameter
   2. modify reserve() slightly to calculate the maximum outstanding
   numPendingRequests and print the maximum.
  
   Based on the output from above two items, we can discuss solution.
   My intuition is to place upperbound on numPendingRequests beyond
 which
   canFitInMemory() returns false.
   *
   My two cents.
  
   On Tue, Mar 9, 2010 at 11:51 PM, Christopher Douglas
   chri...@yahoo-inc.comwrote:
  
That section of code is unmodified in MR-1182. See the patches/svn
  log.
   -C
  
   Sent from my iPhone
  
  
   On Mar 9, 2010, at 7:44 PM, Ted Yu yuzhih...@gmail.com wrote:
  
   I just downloaded hadoop-0.20.2 tar ball from cloudera mirror.
  
   This is what I see in ReduceTask (line 999):
 public synchronized boolean reserve(int requestedSize,
 InputStream
   in)
  
 throws InterruptedException {
   // Wait till the request can be fulfilled...
   while ((size + requestedSize)  maxSize) {
  
   I don't see the fix from MR-1182.
  
   That's why I suggested to Andy that he manually apply MR-1182.
  
   Cheers
  
   On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins 
   andy.saut...@returnpath.net
  
   wrote:
  
  
  
Thanks Christopher.
  
   The heap size for reduce tasks is configured to be 640M (
   mapred.child.java.opts set to -Xmx640m ).
  
   Andy
  
   -Original Message-
   From: Christopher Douglas [mailto:chri...@yahoo-inc.com]
   Sent: Tuesday, March 09, 2010 5:19 PM
   To: common-user@hadoop.apache.org
   Subject: Re: Shuffle In Memory OutOfMemoryError
  
   No, MR-1182 is included in 0.20.2
  
   What heap size have you set for your reduce tasks? -C
  
   Sent from my iPhone
  
   On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com
 wrote:
  
   Andy:
  
   You need to manually apply the patch.
  
   Cheers
  
   On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins 
  
andy.saut...@returnpath.net
  
wrote:
  
  
  
  
Thanks Ted.  My understanding is that MAPREDUCE-1182 is
 included
   in the
   0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend
  and
   re-ran
   the same job scenarios.  Running with
  mapred.reduce.parallel.copies
   set to 1
   and continue to have the same Java heap space error.
  
  
  
   -Original Message-
   From: Ted Yu [mailto:yuzhih...@gmail.com]
   Sent: Tuesday, March 09, 2010 12:56 PM
   To: common-user@hadoop.apache.org
   Subject: Re: Shuffle In Memory OutOfMemoryError
  
   This issue has been resolved in
   http://issues.apache.org/jira/browse/MAPREDUCE-1182
  
   Please apply the patch
   M1182-1v20.patch
  
  
  
  
 
 http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch
  
  
  
On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins 
  
andy.saut...@returnpath.net
  
  
wrote:
  
  
  
  
Thanks Ted.  Very helpful.  You are correct that I
 misunderstood
   the
  
code
  
at ReduceTask.java:1535.  I missed the fact that it's in a
   IOException
  
catch
  
block.  My mistake.  That's what I get for being in a rush.
  
   For what it's worth I did re-run the job with
   mapred.reduce.parallel.copies set with values from 5 all the
 way
   down to
  
1.
  
All failed with the same error:
  
   Error: java.lang.OutOfMemoryError: Java heap space
 at
  
   org.apache.hadoop.mapred.ReduceTask$ReduceCopier
  
   $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
  
  at
  
   org.apache.hadoop.mapred.ReduceTask$ReduceCopier
  
   $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
  
  at
  
   org.apache.hadoop.mapred.ReduceTask$ReduceCopier
  
   $MapOutputCopier.copyOutput(ReduceTask.java:1261)
  
  at
  
  
  
  org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
  
   (ReduceTask.java:1195)
  
  
  
   So from that it does seem like something else might be going
 on,
   yes?
  
I
  
need to do some more research.
  
   I appreciate your insights.
  
   Andy
  
   -Original Message-
   From: Ted Yu [mailto:yuzhih...@gmail.com]
   Sent: Sunday, March 07, 2010 3:38 PM
   To: common-user@hadoop.apache.org
   Subject: Re: Shuffle In Memory OutOfMemoryError
  
   My observation is based on this call chain:
   MapOutputCopier.run() calling copyOutput() calling
  getMapOutput()
   calling
   ramManager.canFitInMemory(decompressedLength)
  
   Basically ramManager.canFitInMemory() makes decision without
   considering
   the
   number of MapOutputCopiers that are running. Thus 1.25 * 0.7
 of
   total
  
heap
  
may be used in shuffling if default parameters were used.
   Of course, you should check the value for
   mapred.reduce.parallel.copies
  
to
  
see if it is 5. If it is 4 or lower, my reasoning wouldn't
  apply.
  
   About ramManager.unreserve() call, ReduceTask.java from hadoop
   0.20.2
  
only
  
has 2731 lines. So I have to guess the location of the code
   snippet you
   provided.
   I found this around line 1535:
 } catch (IOException ioe

Re: Shuffle In Memory OutOfMemoryError

2010-05-07 Thread Bo Shi
.

 That's why I suggested to Andy that he manually apply MR-1182.

 Cheers

 On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins 
 andy.saut...@returnpath.net

 wrote:



  Thanks Christopher.

 The heap size for reduce tasks is configured to be 640M (
 mapred.child.java.opts set to -Xmx640m ).

 Andy

 -Original Message-
 From: Christopher Douglas [mailto:chri...@yahoo-inc.com]
 Sent: Tuesday, March 09, 2010 5:19 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Shuffle In Memory OutOfMemoryError

 No, MR-1182 is included in 0.20.2

 What heap size have you set for your reduce tasks? -C

 Sent from my iPhone

 On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote:

 Andy:

 You need to manually apply the patch.

 Cheers

 On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins 

  andy.saut...@returnpath.net

  wrote:




  Thanks Ted.  My understanding is that MAPREDUCE-1182 is included
 in the
 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and
 re-ran
 the same job scenarios.  Running with mapred.reduce.parallel.copies
 set to 1
 and continue to have the same Java heap space error.



 -Original Message-
 From: Ted Yu [mailto:yuzhih...@gmail.com]
 Sent: Tuesday, March 09, 2010 12:56 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Shuffle In Memory OutOfMemoryError

 This issue has been resolved in
 http://issues.apache.org/jira/browse/MAPREDUCE-1182

 Please apply the patch
 M1182-1v20.patch



 http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch



  On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins 

  andy.saut...@returnpath.net


  wrote:




  Thanks Ted.  Very helpful.  You are correct that I misunderstood
 the

  code

  at ReduceTask.java:1535.  I missed the fact that it's in a
 IOException

  catch

  block.  My mistake.  That's what I get for being in a rush.

 For what it's worth I did re-run the job with
 mapred.reduce.parallel.copies set with values from 5 all the way
 down to

  1.

  All failed with the same error:

 Error: java.lang.OutOfMemoryError: Java heap space
   at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier

 $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)

    at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier

 $MapOutputCopier.getMapOutput(ReduceTask.java:1408)

    at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier

 $MapOutputCopier.copyOutput(ReduceTask.java:1261)

    at


 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run

 (ReduceTask.java:1195)



 So from that it does seem like something else might be going on,
 yes?

  I

  need to do some more research.

 I appreciate your insights.

 Andy

 -Original Message-
 From: Ted Yu [mailto:yuzhih...@gmail.com]
 Sent: Sunday, March 07, 2010 3:38 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Shuffle In Memory OutOfMemoryError

 My observation is based on this call chain:
 MapOutputCopier.run() calling copyOutput() calling getMapOutput()
 calling
 ramManager.canFitInMemory(decompressedLength)

 Basically ramManager.canFitInMemory() makes decision without
 considering
 the
 number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of
 total

  heap

  may be used in shuffling if default parameters were used.
 Of course, you should check the value for
 mapred.reduce.parallel.copies

  to

  see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.

 About ramManager.unreserve() call, ReduceTask.java from hadoop
 0.20.2

  only

  has 2731 lines. So I have to guess the location of the code
 snippet you
 provided.
 I found this around line 1535:
   } catch (IOException ioe) {
     LOG.info(Failed to shuffle from  +
 mapOutputLoc.getTaskAttemptId(),
              ioe);

     // Inform the ram-manager
     ramManager.closeInMemoryFile(mapOutputLength);
     ramManager.unreserve(mapOutputLength);

     // Discard the map-output
     try {
       mapOutput.discard();
     } catch (IOException ignored) {
       LOG.info(Failed to discard map-output from  +
                mapOutputLoc.getTaskAttemptId(), ignored);
     }
 Please confirm the line number.

 If we're looking at the same code, I am afraid I don't see how we
 can
 improve it. First, I assume IOException shouldn't happen that
 often.
 Second,
 mapOutput.discard() just sets:
     data = null;
 for in memory case. Even if we call mapOutput.discard() before
 ramManager.unreserve(), we don't know when GC would kick in and
 make more
 memory available.
 Of course, given the large number of map outputs in your system, it

  became

  more likely that the root cause from my reasoning made OOME happen

  sooner.


 Thanks


  On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins 


  andy.saut...@returnpath.net

  wrote:




  Ted,

 I'm trying to follow the logic in your mail and I'm not sure I'm
 following.  If you would mind helping me understand I would
 appreciate

  it.


 Looking at the code maxSingleShuffleLimit is only used in
 determining

  if

  the copy _can_ fit into memory

Re: Shuffle In Memory OutOfMemoryError

2010-05-07 Thread Ted Yu
 
 
  On Mar 9, 2010, at 7:44 PM, Ted Yu yuzhih...@gmail.com wrote:
 
  I just downloaded hadoop-0.20.2 tar ball from cloudera mirror.
 
  This is what I see in ReduceTask (line 999):
public synchronized boolean reserve(int requestedSize, InputStream
  in)
 
throws InterruptedException {
  // Wait till the request can be fulfilled...
  while ((size + requestedSize)  maxSize) {
 
  I don't see the fix from MR-1182.
 
  That's why I suggested to Andy that he manually apply MR-1182.
 
  Cheers
 
  On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins 
  andy.saut...@returnpath.net
 
  wrote:
 
 
 
   Thanks Christopher.
 
  The heap size for reduce tasks is configured to be 640M (
  mapred.child.java.opts set to -Xmx640m ).
 
  Andy
 
  -Original Message-
  From: Christopher Douglas [mailto:chri...@yahoo-inc.com]
  Sent: Tuesday, March 09, 2010 5:19 PM
  To: common-user@hadoop.apache.org
  Subject: Re: Shuffle In Memory OutOfMemoryError
 
  No, MR-1182 is included in 0.20.2
 
  What heap size have you set for your reduce tasks? -C
 
  Sent from my iPhone
 
  On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote:
 
  Andy:
 
  You need to manually apply the patch.
 
  Cheers
 
  On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins 
 
   andy.saut...@returnpath.net
 
   wrote:
 
 
 
 
   Thanks Ted.  My understanding is that MAPREDUCE-1182 is included
  in the
  0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend
 and
  re-ran
  the same job scenarios.  Running with
 mapred.reduce.parallel.copies
  set to 1
  and continue to have the same Java heap space error.
 
 
 
  -Original Message-
  From: Ted Yu [mailto:yuzhih...@gmail.com]
  Sent: Tuesday, March 09, 2010 12:56 PM
  To: common-user@hadoop.apache.org
  Subject: Re: Shuffle In Memory OutOfMemoryError
 
  This issue has been resolved in
  http://issues.apache.org/jira/browse/MAPREDUCE-1182
 
  Please apply the patch
  M1182-1v20.patch
 
 
 
 
 http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch
 
 
 
   On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins 
 
   andy.saut...@returnpath.net
 
 
   wrote:
 
 
 
 
   Thanks Ted.  Very helpful.  You are correct that I misunderstood
  the
 
   code
 
   at ReduceTask.java:1535.  I missed the fact that it's in a
  IOException
 
   catch
 
   block.  My mistake.  That's what I get for being in a rush.
 
  For what it's worth I did re-run the job with
  mapred.reduce.parallel.copies set with values from 5 all the way
  down to
 
   1.
 
   All failed with the same error:
 
  Error: java.lang.OutOfMemoryError: Java heap space
at
 
  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
 
  $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
 
 at
 
  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
 
  $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
 
 at
 
  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
 
  $MapOutputCopier.copyOutput(ReduceTask.java:1261)
 
 at
 
 
 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
 
  (ReduceTask.java:1195)
 
 
 
  So from that it does seem like something else might be going on,
  yes?
 
   I
 
   need to do some more research.
 
  I appreciate your insights.
 
  Andy
 
  -Original Message-
  From: Ted Yu [mailto:yuzhih...@gmail.com]
  Sent: Sunday, March 07, 2010 3:38 PM
  To: common-user@hadoop.apache.org
  Subject: Re: Shuffle In Memory OutOfMemoryError
 
  My observation is based on this call chain:
  MapOutputCopier.run() calling copyOutput() calling
 getMapOutput()
  calling
  ramManager.canFitInMemory(decompressedLength)
 
  Basically ramManager.canFitInMemory() makes decision without
  considering
  the
  number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of
  total
 
   heap
 
   may be used in shuffling if default parameters were used.
  Of course, you should check the value for
  mapred.reduce.parallel.copies
 
   to
 
   see if it is 5. If it is 4 or lower, my reasoning wouldn't
 apply.
 
  About ramManager.unreserve() call, ReduceTask.java from hadoop
  0.20.2
 
   only
 
   has 2731 lines. So I have to guess the location of the code
  snippet you
  provided.
  I found this around line 1535:
} catch (IOException ioe) {
  LOG.info(Failed to shuffle from  +
  mapOutputLoc.getTaskAttemptId(),
   ioe);
 
  // Inform the ram-manager
  ramManager.closeInMemoryFile(mapOutputLength);
  ramManager.unreserve(mapOutputLength);
 
  // Discard the map-output
  try {
mapOutput.discard();
  } catch (IOException ignored) {
LOG.info(Failed to discard map-output from  +
 mapOutputLoc.getTaskAttemptId(), ignored);
  }
  Please confirm the line number.
 
  If we're looking at the same code, I am afraid I don't see how
 we
  can
  improve it. First, I assume IOException shouldn't happen that
  often.
  Second,
  mapOutput.discard() just sets:
  data = null;
  for in memory case. Even if we call

Re: Shuffle In Memory OutOfMemoryError

2010-03-10 Thread Ted Yu
I verified that size and maxSize are long. This means MR-1182 didn't resolve
Andy's issue.

According to Andy:
At the beginning of the job there are 209,754 pending map tasks and 32
pending reduce tasks

My guess is that GC wasn't reclaiming memory fast enough, leading to OOME
because of large number of in-memory shuffle candidates.

My suggestion for Andy would be to:
1. add -*verbose*:*gc as JVM parameter
2. modify reserve() slightly to calculate the maximum outstanding
numPendingRequests and print the maximum.

Based on the output from above two items, we can discuss solution.
My intuition is to place upperbound on numPendingRequests beyond which
canFitInMemory() returns false.
*
My two cents.

On Tue, Mar 9, 2010 at 11:51 PM, Christopher Douglas
chri...@yahoo-inc.comwrote:

 That section of code is unmodified in MR-1182. See the patches/svn log. -C

 Sent from my iPhone


 On Mar 9, 2010, at 7:44 PM, Ted Yu yuzhih...@gmail.com wrote:

  I just downloaded hadoop-0.20.2 tar ball from cloudera mirror.
 This is what I see in ReduceTask (line 999):
 public synchronized boolean reserve(int requestedSize, InputStream in)

 throws InterruptedException {
   // Wait till the request can be fulfilled...
   while ((size + requestedSize)  maxSize) {

 I don't see the fix from MR-1182.

 That's why I suggested to Andy that he manually apply MR-1182.

 Cheers

 On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins andy.saut...@returnpath.net
 wrote:


 Thanks Christopher.

 The heap size for reduce tasks is configured to be 640M (
 mapred.child.java.opts set to -Xmx640m ).

 Andy

 -Original Message-
 From: Christopher Douglas [mailto:chri...@yahoo-inc.com]
 Sent: Tuesday, March 09, 2010 5:19 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Shuffle In Memory OutOfMemoryError

 No, MR-1182 is included in 0.20.2

 What heap size have you set for your reduce tasks? -C

 Sent from my iPhone

 On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote:

  Andy:
 You need to manually apply the patch.

 Cheers

 On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins 

 andy.saut...@returnpath.net

 wrote:



 Thanks Ted.  My understanding is that MAPREDUCE-1182 is included
 in the
 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and
 re-ran
 the same job scenarios.  Running with mapred.reduce.parallel.copies
 set to 1
 and continue to have the same Java heap space error.



 -Original Message-
 From: Ted Yu [mailto:yuzhih...@gmail.com]
 Sent: Tuesday, March 09, 2010 12:56 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Shuffle In Memory OutOfMemoryError

 This issue has been resolved in
 http://issues.apache.org/jira/browse/MAPREDUCE-1182

 Please apply the patch
 M1182-1v20.patch


 http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch



 On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins 

 andy.saut...@returnpath.net

 wrote:



 Thanks Ted.  Very helpful.  You are correct that I misunderstood the

 code

 at ReduceTask.java:1535.  I missed the fact that it's in a
 IOException

 catch

 block.  My mistake.  That's what I get for being in a rush.

 For what it's worth I did re-run the job with
 mapred.reduce.parallel.copies set with values from 5 all the way
 down to

 1.

 All failed with the same error:

 Error: java.lang.OutOfMemoryError: Java heap space
 at

  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
 $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)

 at

  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
 $MapOutputCopier.getMapOutput(ReduceTask.java:1408)

 at

  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
 $MapOutputCopier.copyOutput(ReduceTask.java:1261)

 at

  org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
 (ReduceTask.java:1195)



 So from that it does seem like something else might be going on,
 yes?

 I

 need to do some more research.

 I appreciate your insights.

 Andy

 -Original Message-
 From: Ted Yu [mailto:yuzhih...@gmail.com]
 Sent: Sunday, March 07, 2010 3:38 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Shuffle In Memory OutOfMemoryError

 My observation is based on this call chain:
 MapOutputCopier.run() calling copyOutput() calling getMapOutput()
 calling
 ramManager.canFitInMemory(decompressedLength)

 Basically ramManager.canFitInMemory() makes decision without
 considering
 the
 number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of
 total

 heap

 may be used in shuffling if default parameters were used.
 Of course, you should check the value for
 mapred.reduce.parallel.copies

 to

 see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.

 About ramManager.unreserve() call, ReduceTask.java from hadoop
 0.20.2

 only

 has 2731 lines. So I have to guess the location of the code
 snippet you
 provided.
 I found this around line 1535:
 } catch (IOException ioe) {
   LOG.info(Failed to shuffle from  +
 mapOutputLoc.getTaskAttemptId

Re: Shuffle In Memory OutOfMemoryError

2010-03-10 Thread Chris Douglas
I don't think this OOM is a framework bug per se, and given the  
rewrite/refactoring of the shuffle in MAPREDUCE-318 (in 0.21), tuning  
the 0.20 shuffle semantics is likely not worthwhile (though data  
informing improvements to trunk would be excellent). Most likely (and  
tautologically), ReduceTask simply requires more memory than is  
available and the job failure can be avoided by either 0) increasing  
the heap size or 1) lowering mapred.shuffle.input.buffer.percent. Most  
of the tasks we run have a heap of 1GB. For a reduce fetching 200k  
map outputs, that's a reasonable, even stingy amount of space. -C


On Mar 10, 2010, at 5:26 AM, Ted Yu wrote:

I verified that size and maxSize are long. This means MR-1182 didn't  
resolve

Andy's issue.

According to Andy:
At the beginning of the job there are 209,754 pending map tasks and 32
pending reduce tasks

My guess is that GC wasn't reclaiming memory fast enough, leading to  
OOME

because of large number of in-memory shuffle candidates.

My suggestion for Andy would be to:
1. add -*verbose*:*gc as JVM parameter
2. modify reserve() slightly to calculate the maximum outstanding
numPendingRequests and print the maximum.

Based on the output from above two items, we can discuss solution.
My intuition is to place upperbound on numPendingRequests beyond which
canFitInMemory() returns false.
*
My two cents.

On Tue, Mar 9, 2010 at 11:51 PM, Christopher Douglas
chri...@yahoo-inc.comwrote:

That section of code is unmodified in MR-1182. See the patches/svn  
log. -C


Sent from my iPhone


On Mar 9, 2010, at 7:44 PM, Ted Yu yuzhih...@gmail.com wrote:

I just downloaded hadoop-0.20.2 tar ball from cloudera mirror.

This is what I see in ReduceTask (line 999):
   public synchronized boolean reserve(int requestedSize,  
InputStream in)


   throws InterruptedException {
 // Wait till the request can be fulfilled...
 while ((size + requestedSize)  maxSize) {

I don't see the fix from MR-1182.

That's why I suggested to Andy that he manually apply MR-1182.

Cheers

On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins andy.saut...@returnpath.net

wrote:




Thanks Christopher.

The heap size for reduce tasks is configured to be 640M (
mapred.child.java.opts set to -Xmx640m ).

Andy

-Original Message-
From: Christopher Douglas [mailto:chri...@yahoo-inc.com]
Sent: Tuesday, March 09, 2010 5:19 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

No, MR-1182 is included in 0.20.2

What heap size have you set for your reduce tasks? -C

Sent from my iPhone

On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote:

Andy:

You need to manually apply the patch.

Cheers

On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins 


andy.saut...@returnpath.net


wrote:






Thanks Ted.  My understanding is that MAPREDUCE-1182 is included
in the
0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend  
and

re-ran
the same job scenarios.  Running with  
mapred.reduce.parallel.copies

set to 1
and continue to have the same Java heap space error.



-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Tuesday, March 09, 2010 12:56 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

This issue has been resolved in
http://issues.apache.org/jira/browse/MAPREDUCE-1182

Please apply the patch
M1182-1v20.patch



http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch






On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins 


andy.saut...@returnpath.net



wrote:





Thanks Ted.  Very helpful.  You are correct that I  
misunderstood the



code


at ReduceTask.java:1535.  I missed the fact that it's in a
IOException


catch


block.  My mistake.  That's what I get for being in a rush.

For what it's worth I did re-run the job with
mapred.reduce.parallel.copies set with values from 5 all the way
down to


1.


All failed with the same error:

Error: java.lang.OutOfMemoryError: Java heap space
   at

org.apache.hadoop.mapred.ReduceTask$ReduceCopier

$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)


   at

org.apache.hadoop.mapred.ReduceTask$ReduceCopier

$MapOutputCopier.getMapOutput(ReduceTask.java:1408)


   at

org.apache.hadoop.mapred.ReduceTask$ReduceCopier

$MapOutputCopier.copyOutput(ReduceTask.java:1261)


   at

org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
$MapOutputCopier.run

(ReduceTask.java:1195)




So from that it does seem like something else might be going on,
yes?


I


need to do some more research.

I appreciate your insights.

Andy

-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Sunday, March 07, 2010 3:38 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

My observation is based on this call chain:
MapOutputCopier.run() calling copyOutput() calling  
getMapOutput()

calling
ramManager.canFitInMemory(decompressedLength)

Basically ramManager.canFitInMemory() makes decision without

Re: Shuffle In Memory OutOfMemoryError

2010-03-10 Thread Ted Yu
Thanks to Andy for the log he provided.

You can see from the log below that size increased steadily from 341535057
to 408181692, approaching maxSize. Then OOME:

2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask: reserve:
pos=start requestedSize=3893000 size=341535057 numPendingRequests=0
maxSize=417601952
2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask: reserve:
pos=end requestedSize=3893000 size=345428057 numPendingRequests=0
maxSize=417601952
...
2010-03-10 18:38:35,950 INFO org.apache.hadoop.mapred.ReduceTask: reserve:
pos=end requestedSize=635753 size=408181692 numPendingRequests=0
maxSize=417601952
2010-03-10 18:38:36,603 INFO org.apache.hadoop.mapred.ReduceTask: Task
attempt_201003101826_0001_r_04_0: Failed fetch #1 from
attempt_201003101826_0001_m_000875_0

2010-03-10 18:38:36,603 WARN org.apache.hadoop.mapred.ReduceTask:
attempt_201003101826_0001_r_04_0 adding host hd17.dfs.returnpath.net to
penalty box, next contact in 4 seconds
2010-03-10 18:38:36,604 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201003101826_0001_r_04_0: Got 1 map-outputs from previous
failures
2010-03-10 18:38:36,605 FATAL org.apache.hadoop.mapred.TaskRunner:
attempt_201003101826_0001_r_04_0 : Map output copy failure :
java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1513)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1413)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1266)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1200)

Looking at the call to unreserve() in ReduceTask, two were for IOException
and the other was for Sanity check (line 1557). Meaning they wouldn't be
called in normal execution path.

I see one call in IFile.InMemoryReader:
  // Inform the RamManager
  ramManager.unreserve(bufferSize);

And InMemoryReader is used in
  ReaderK, V reader =
new InMemoryReaderK, V(ramManager, mo.mapAttemptId,
 mo.data, 0, mo.data.length);


On Wed, Mar 10, 2010 at 3:34 PM, Chris Douglas chri...@yahoo-inc.comwrote:

 I don't think this OOM is a framework bug per se, and given the
 rewrite/refactoring of the shuffle in MAPREDUCE-318 (in 0.21), tuning the
 0.20 shuffle semantics is likely not worthwhile (though data informing
 improvements to trunk would be excellent). Most likely (and tautologically),
 ReduceTask simply requires more memory than is available and the job failure
 can be avoided by either 0) increasing the heap size or 1) lowering
 mapred.shuffle.input.buffer.percent. Most of the tasks we run have a heap of
 1GB. For a reduce fetching 200k map outputs, that's a reasonable, even
 stingy amount of space. -C


 On Mar 10, 2010, at 5:26 AM, Ted Yu wrote:

  I verified that size and maxSize are long. This means MR-1182 didn't
 resolve
 Andy's issue.

 According to Andy:
 At the beginning of the job there are 209,754 pending map tasks and 32
 pending reduce tasks

 My guess is that GC wasn't reclaiming memory fast enough, leading to OOME
 because of large number of in-memory shuffle candidates.

 My suggestion for Andy would be to:
 1. add -*verbose*:*gc as JVM parameter
 2. modify reserve() slightly to calculate the maximum outstanding
 numPendingRequests and print the maximum.

 Based on the output from above two items, we can discuss solution.
 My intuition is to place upperbound on numPendingRequests beyond which
 canFitInMemory() returns false.
 *
 My two cents.

 On Tue, Mar 9, 2010 at 11:51 PM, Christopher Douglas
 chri...@yahoo-inc.comwrote:

  That section of code is unmodified in MR-1182. See the patches/svn log.
 -C

 Sent from my iPhone


 On Mar 9, 2010, at 7:44 PM, Ted Yu yuzhih...@gmail.com wrote:

 I just downloaded hadoop-0.20.2 tar ball from cloudera mirror.

 This is what I see in ReduceTask (line 999):
   public synchronized boolean reserve(int requestedSize, InputStream in)

   throws InterruptedException {
 // Wait till the request can be fulfilled...
 while ((size + requestedSize)  maxSize) {

 I don't see the fix from MR-1182.

 That's why I suggested to Andy that he manually apply MR-1182.

 Cheers

 On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins 
 andy.saut...@returnpath.net

 wrote:



  Thanks Christopher.

 The heap size for reduce tasks is configured to be 640M (
 mapred.child.java.opts set to -Xmx640m ).

 Andy

 -Original Message-
 From: Christopher Douglas [mailto:chri...@yahoo-inc.com]
 Sent: Tuesday, March 09, 2010 5:19 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Shuffle In Memory OutOfMemoryError

 No, MR-1182 is included in 0.20.2

 What heap size have you set for your reduce tasks? -C

 Sent from my iPhone

 On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote:

 Andy

Re: Shuffle In Memory OutOfMemoryError

2010-03-10 Thread Ted Yu
 Message-
 From: Christopher Douglas [mailto:chri...@yahoo-inc.com]
 Sent: Tuesday, March 09, 2010 5:19 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Shuffle In Memory OutOfMemoryError

 No, MR-1182 is included in 0.20.2

 What heap size have you set for your reduce tasks? -C

 Sent from my iPhone

 On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote:

 Andy:

 You need to manually apply the patch.

 Cheers

 On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins 

  andy.saut...@returnpath.net

  wrote:




  Thanks Ted.  My understanding is that MAPREDUCE-1182 is included
 in the
 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and
 re-ran
 the same job scenarios.  Running with mapred.reduce.parallel.copies
 set to 1
 and continue to have the same Java heap space error.



 -Original Message-
 From: Ted Yu [mailto:yuzhih...@gmail.com]
 Sent: Tuesday, March 09, 2010 12:56 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Shuffle In Memory OutOfMemoryError

 This issue has been resolved in
 http://issues.apache.org/jira/browse/MAPREDUCE-1182

 Please apply the patch
 M1182-1v20.patch



 http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch



  On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins 

  andy.saut...@returnpath.net


  wrote:




  Thanks Ted.  Very helpful.  You are correct that I misunderstood
 the

  code

  at ReduceTask.java:1535.  I missed the fact that it's in a
 IOException

  catch

  block.  My mistake.  That's what I get for being in a rush.

 For what it's worth I did re-run the job with
 mapred.reduce.parallel.copies set with values from 5 all the way
 down to

  1.

  All failed with the same error:

 Error: java.lang.OutOfMemoryError: Java heap space
   at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier

 $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)

at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier

 $MapOutputCopier.getMapOutput(ReduceTask.java:1408)

at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier

 $MapOutputCopier.copyOutput(ReduceTask.java:1261)

at


 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run

 (ReduceTask.java:1195)



 So from that it does seem like something else might be going on,
 yes?

  I

  need to do some more research.

 I appreciate your insights.

 Andy

 -Original Message-
 From: Ted Yu [mailto:yuzhih...@gmail.com]
 Sent: Sunday, March 07, 2010 3:38 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Shuffle In Memory OutOfMemoryError

 My observation is based on this call chain:
 MapOutputCopier.run() calling copyOutput() calling getMapOutput()
 calling
 ramManager.canFitInMemory(decompressedLength)

 Basically ramManager.canFitInMemory() makes decision without
 considering
 the
 number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of
 total

  heap

  may be used in shuffling if default parameters were used.
 Of course, you should check the value for
 mapred.reduce.parallel.copies

  to

  see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.

 About ramManager.unreserve() call, ReduceTask.java from hadoop
 0.20.2

  only

  has 2731 lines. So I have to guess the location of the code
 snippet you
 provided.
 I found this around line 1535:
   } catch (IOException ioe) {
 LOG.info(Failed to shuffle from  +
 mapOutputLoc.getTaskAttemptId(),
  ioe);

 // Inform the ram-manager
 ramManager.closeInMemoryFile(mapOutputLength);
 ramManager.unreserve(mapOutputLength);

 // Discard the map-output
 try {
   mapOutput.discard();
 } catch (IOException ignored) {
   LOG.info(Failed to discard map-output from  +
mapOutputLoc.getTaskAttemptId(), ignored);
 }
 Please confirm the line number.

 If we're looking at the same code, I am afraid I don't see how we
 can
 improve it. First, I assume IOException shouldn't happen that
 often.
 Second,
 mapOutput.discard() just sets:
 data = null;
 for in memory case. Even if we call mapOutput.discard() before
 ramManager.unreserve(), we don't know when GC would kick in and
 make more
 memory available.
 Of course, given the large number of map outputs in your system, it

  became

  more likely that the root cause from my reasoning made OOME happen

  sooner.


 Thanks


  On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins 


  andy.saut...@returnpath.net

  wrote:




  Ted,

 I'm trying to follow the logic in your mail and I'm not sure I'm
 following.  If you would mind helping me understand I would
 appreciate

  it.


 Looking at the code maxSingleShuffleLimit is only used in
 determining

  if

  the copy _can_ fit into memory:

 boolean canFitInMemory(long requestedSize) {
   return (requestedSize  Integer.MAX_VALUE 
   requestedSize  maxSingleShuffleLimit);
  }

 It also looks like the RamManager.reserve should wait until
 memory

  is


  available so it should hit a memory limit for that reason.


 What does seem

Re: Shuffle In Memory OutOfMemoryError

2010-03-09 Thread baleksan
FrrkcyriivlkfkjlkcuhgoyjturopEihymbgfkieNjl$jk

Simon

Simon

Matilda

Boris

lltjhhwgh
Sent via BlackBerry from T-Mobile

-Original Message-
From: Ted Yu yuzhih...@gmail.com
Date: Tue, 9 Mar 2010 14:33:28 
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

Andy:
You need to manually apply the patch.

Cheers

On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins andy.saut...@returnpath.netwrote:


   Thanks Ted.  My understanding is that MAPREDUCE-1182 is included in the
 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and re-ran
 the same job scenarios.  Running with mapred.reduce.parallel.copies set to 1
 and continue to have the same Java heap space error.



 -Original Message-
 From: Ted Yu [mailto:yuzhih...@gmail.com]
 Sent: Tuesday, March 09, 2010 12:56 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Shuffle In Memory OutOfMemoryError

 This issue has been resolved in
 http://issues.apache.org/jira/browse/MAPREDUCE-1182

 Please apply the patch
 M1182-1v20.patch
 http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch

 On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins andy.saut...@returnpath.net
 wrote:

 
   Thanks Ted.  Very helpful.  You are correct that I misunderstood the
 code
  at ReduceTask.java:1535.  I missed the fact that it's in a IOException
 catch
  block.  My mistake.  That's what I get for being in a rush.
 
   For what it's worth I did re-run the job with
  mapred.reduce.parallel.copies set with values from 5 all the way down to
 1.
   All failed with the same error:
 
  Error: java.lang.OutOfMemoryError: Java heap space
 at
 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
 at
 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
 at
 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
 at
 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
 
 
So from that it does seem like something else might be going on, yes?
  I
  need to do some more research.
 
   I appreciate your insights.
 
   Andy
 
  -Original Message-
  From: Ted Yu [mailto:yuzhih...@gmail.com]
  Sent: Sunday, March 07, 2010 3:38 PM
  To: common-user@hadoop.apache.org
  Subject: Re: Shuffle In Memory OutOfMemoryError
 
  My observation is based on this call chain:
  MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling
  ramManager.canFitInMemory(decompressedLength)
 
  Basically ramManager.canFitInMemory() makes decision without considering
  the
  number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total
 heap
  may be used in shuffling if default parameters were used.
  Of course, you should check the value for mapred.reduce.parallel.copies
 to
  see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.
 
  About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2
 only
  has 2731 lines. So I have to guess the location of the code snippet you
  provided.
  I found this around line 1535:
 } catch (IOException ioe) {
   LOG.info(Failed to shuffle from  +
  mapOutputLoc.getTaskAttemptId(),
ioe);
 
   // Inform the ram-manager
   ramManager.closeInMemoryFile(mapOutputLength);
   ramManager.unreserve(mapOutputLength);
 
   // Discard the map-output
   try {
 mapOutput.discard();
   } catch (IOException ignored) {
 LOG.info(Failed to discard map-output from  +
  mapOutputLoc.getTaskAttemptId(), ignored);
   }
  Please confirm the line number.
 
  If we're looking at the same code, I am afraid I don't see how we can
  improve it. First, I assume IOException shouldn't happen that often.
  Second,
  mapOutput.discard() just sets:
   data = null;
  for in memory case. Even if we call mapOutput.discard() before
  ramManager.unreserve(), we don't know when GC would kick in and make more
  memory available.
  Of course, given the large number of map outputs in your system, it
 became
  more likely that the root cause from my reasoning made OOME happen
 sooner.
 
  Thanks
 
  
  On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins 
 andy.saut...@returnpath.net
  wrote:
 
  
 Ted,
  
 I'm trying to follow the logic in your mail and I'm not sure I'm
   following.  If you would mind helping me understand I would appreciate
  it.
  
 Looking at the code maxSingleShuffleLimit is only used in determining
  if
   the copy _can_ fit into memory:
  
   boolean canFitInMemory(long requestedSize) {
  return (requestedSize  Integer.MAX_VALUE 
  requestedSize  maxSingleShuffleLimit);
}
  
  It also looks like the RamManager.reserve should wait until memory
 is
   available so it should hit a memory

Re: Shuffle In Memory OutOfMemoryError

2010-03-09 Thread Christopher Douglas

No, MR-1182 is included in 0.20.2

What heap size have you set for your reduce tasks? -C

Sent from my iPhone

On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote:


Andy:
You need to manually apply the patch.

Cheers

On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins andy.saut...@returnpath.net 
wrote:




 Thanks Ted.  My understanding is that MAPREDUCE-1182 is included  
in the
0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and  
re-ran
the same job scenarios.  Running with mapred.reduce.parallel.copies  
set to 1

and continue to have the same Java heap space error.



-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Tuesday, March 09, 2010 12:56 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

This issue has been resolved in
http://issues.apache.org/jira/browse/MAPREDUCE-1182

Please apply the patch
M1182-1v20.patch
http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch 



On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins andy.saut...@returnpath.net

wrote:




Thanks Ted.  Very helpful.  You are correct that I misunderstood the

code
at ReduceTask.java:1535.  I missed the fact that it's in a  
IOException

catch

block.  My mistake.  That's what I get for being in a rush.

For what it's worth I did re-run the job with
mapred.reduce.parallel.copies set with values from 5 all the way  
down to

1.

All failed with the same error:

Error: java.lang.OutOfMemoryError: Java heap space
  at

org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)

  at

org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
$MapOutputCopier.getMapOutput(ReduceTask.java:1408)

  at

org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
$MapOutputCopier.copyOutput(ReduceTask.java:1261)

  at

org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run 
(ReduceTask.java:1195)



 So from that it does seem like something else might be going on,  
yes?

I

need to do some more research.

I appreciate your insights.

Andy

-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Sunday, March 07, 2010 3:38 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

My observation is based on this call chain:
MapOutputCopier.run() calling copyOutput() calling getMapOutput()  
calling

ramManager.canFitInMemory(decompressedLength)

Basically ramManager.canFitInMemory() makes decision without  
considering

the
number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of  
total

heap

may be used in shuffling if default parameters were used.
Of course, you should check the value for  
mapred.reduce.parallel.copies

to

see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.

About ramManager.unreserve() call, ReduceTask.java from hadoop  
0.20.2

only
has 2731 lines. So I have to guess the location of the code  
snippet you

provided.
I found this around line 1535:
  } catch (IOException ioe) {
LOG.info(Failed to shuffle from  +
mapOutputLoc.getTaskAttemptId(),
 ioe);

// Inform the ram-manager
ramManager.closeInMemoryFile(mapOutputLength);
ramManager.unreserve(mapOutputLength);

// Discard the map-output
try {
  mapOutput.discard();
} catch (IOException ignored) {
  LOG.info(Failed to discard map-output from  +
   mapOutputLoc.getTaskAttemptId(), ignored);
}
Please confirm the line number.

If we're looking at the same code, I am afraid I don't see how we  
can

improve it. First, I assume IOException shouldn't happen that often.
Second,
mapOutput.discard() just sets:
data = null;
for in memory case. Even if we call mapOutput.discard() before
ramManager.unreserve(), we don't know when GC would kick in and  
make more

memory available.
Of course, given the large number of map outputs in your system, it

became

more likely that the root cause from my reasoning made OOME happen

sooner.


Thanks




On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins 

andy.saut...@returnpath.net

wrote:




 Ted,

 I'm trying to follow the logic in your mail and I'm not sure I'm
following.  If you would mind helping me understand I would  
appreciate

it.


 Looking at the code maxSingleShuffleLimit is only used in  
determining

if

the copy _can_ fit into memory:

   boolean canFitInMemory(long requestedSize) {
  return (requestedSize  Integer.MAX_VALUE 
  requestedSize  maxSingleShuffleLimit);
}

  It also looks like the RamManager.reserve should wait until  
memory

is

available so it should hit a memory limit for that reason.

  What does seem a little strange to me is the following (

ReduceTask.java

starting at 2730 ):

// Inform the ram-manager
ramManager.closeInMemoryFile(mapOutputLength);
ramManager.unreserve(mapOutputLength);

// Discard the map

RE: Shuffle In Memory OutOfMemoryError

2010-03-09 Thread Andy Sautins

  Thanks Christopher.  

  The heap size for reduce tasks is configured to be 640M ( 
mapred.child.java.opts set to -Xmx640m ).

  Andy

-Original Message-
From: Christopher Douglas [mailto:chri...@yahoo-inc.com] 
Sent: Tuesday, March 09, 2010 5:19 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

No, MR-1182 is included in 0.20.2

What heap size have you set for your reduce tasks? -C

Sent from my iPhone

On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote:

 Andy:
 You need to manually apply the patch.

 Cheers

 On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins andy.saut...@returnpath.net 
 wrote:


  Thanks Ted.  My understanding is that MAPREDUCE-1182 is included  
 in the
 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and  
 re-ran
 the same job scenarios.  Running with mapred.reduce.parallel.copies  
 set to 1
 and continue to have the same Java heap space error.



 -Original Message-
 From: Ted Yu [mailto:yuzhih...@gmail.com]
 Sent: Tuesday, March 09, 2010 12:56 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Shuffle In Memory OutOfMemoryError

 This issue has been resolved in
 http://issues.apache.org/jira/browse/MAPREDUCE-1182

 Please apply the patch
 M1182-1v20.patch
 http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch 
 

 On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins andy.saut...@returnpath.net
 wrote:


 Thanks Ted.  Very helpful.  You are correct that I misunderstood the
 code
 at ReduceTask.java:1535.  I missed the fact that it's in a  
 IOException
 catch
 block.  My mistake.  That's what I get for being in a rush.

 For what it's worth I did re-run the job with
 mapred.reduce.parallel.copies set with values from 5 all the way  
 down to
 1.
 All failed with the same error:

 Error: java.lang.OutOfMemoryError: Java heap space
   at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
 $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
   at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
 $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
   at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
 $MapOutputCopier.copyOutput(ReduceTask.java:1261)
   at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run 
 (ReduceTask.java:1195)


  So from that it does seem like something else might be going on,  
 yes?
 I
 need to do some more research.

 I appreciate your insights.

 Andy

 -Original Message-
 From: Ted Yu [mailto:yuzhih...@gmail.com]
 Sent: Sunday, March 07, 2010 3:38 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Shuffle In Memory OutOfMemoryError

 My observation is based on this call chain:
 MapOutputCopier.run() calling copyOutput() calling getMapOutput()  
 calling
 ramManager.canFitInMemory(decompressedLength)

 Basically ramManager.canFitInMemory() makes decision without  
 considering
 the
 number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of  
 total
 heap
 may be used in shuffling if default parameters were used.
 Of course, you should check the value for  
 mapred.reduce.parallel.copies
 to
 see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.

 About ramManager.unreserve() call, ReduceTask.java from hadoop  
 0.20.2
 only
 has 2731 lines. So I have to guess the location of the code  
 snippet you
 provided.
 I found this around line 1535:
   } catch (IOException ioe) {
 LOG.info(Failed to shuffle from  +
 mapOutputLoc.getTaskAttemptId(),
  ioe);

 // Inform the ram-manager
 ramManager.closeInMemoryFile(mapOutputLength);
 ramManager.unreserve(mapOutputLength);

 // Discard the map-output
 try {
   mapOutput.discard();
 } catch (IOException ignored) {
   LOG.info(Failed to discard map-output from  +
mapOutputLoc.getTaskAttemptId(), ignored);
 }
 Please confirm the line number.

 If we're looking at the same code, I am afraid I don't see how we  
 can
 improve it. First, I assume IOException shouldn't happen that often.
 Second,
 mapOutput.discard() just sets:
 data = null;
 for in memory case. Even if we call mapOutput.discard() before
 ramManager.unreserve(), we don't know when GC would kick in and  
 make more
 memory available.
 Of course, given the large number of map outputs in your system, it
 became
 more likely that the root cause from my reasoning made OOME happen
 sooner.

 Thanks


 On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins 
 andy.saut...@returnpath.net
 wrote:


  Ted,

  I'm trying to follow the logic in your mail and I'm not sure I'm
 following.  If you would mind helping me understand I would  
 appreciate
 it.

  Looking at the code maxSingleShuffleLimit is only used in  
 determining
 if
 the copy _can_ fit into memory:

boolean canFitInMemory(long requestedSize) {
   return (requestedSize  Integer.MAX_VALUE 
   requestedSize

Re: Shuffle In Memory OutOfMemoryError

2010-03-09 Thread Ted Yu
I just downloaded hadoop-0.20.2 tar ball from cloudera mirror.
This is what I see in ReduceTask (line 999):
  public synchronized boolean reserve(int requestedSize, InputStream in)

  throws InterruptedException {
// Wait till the request can be fulfilled...
while ((size + requestedSize)  maxSize) {

I don't see the fix from MR-1182.

That's why I suggested to Andy that he manually apply MR-1182.

Cheers

On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins andy.saut...@returnpath.netwrote:


  Thanks Christopher.

  The heap size for reduce tasks is configured to be 640M (
 mapred.child.java.opts set to -Xmx640m ).

  Andy

 -Original Message-
 From: Christopher Douglas [mailto:chri...@yahoo-inc.com]
 Sent: Tuesday, March 09, 2010 5:19 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Shuffle In Memory OutOfMemoryError

 No, MR-1182 is included in 0.20.2

 What heap size have you set for your reduce tasks? -C

 Sent from my iPhone

 On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote:

  Andy:
  You need to manually apply the patch.
 
  Cheers
 
  On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins 
 andy.saut...@returnpath.net
  wrote:
 
 
   Thanks Ted.  My understanding is that MAPREDUCE-1182 is included
  in the
  0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and
  re-ran
  the same job scenarios.  Running with mapred.reduce.parallel.copies
  set to 1
  and continue to have the same Java heap space error.
 
 
 
  -Original Message-
  From: Ted Yu [mailto:yuzhih...@gmail.com]
  Sent: Tuesday, March 09, 2010 12:56 PM
  To: common-user@hadoop.apache.org
  Subject: Re: Shuffle In Memory OutOfMemoryError
 
  This issue has been resolved in
  http://issues.apache.org/jira/browse/MAPREDUCE-1182
 
  Please apply the patch
  M1182-1v20.patch
 
 http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch
  
 
  On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins 
 andy.saut...@returnpath.net
  wrote:
 
 
  Thanks Ted.  Very helpful.  You are correct that I misunderstood the
  code
  at ReduceTask.java:1535.  I missed the fact that it's in a
  IOException
  catch
  block.  My mistake.  That's what I get for being in a rush.
 
  For what it's worth I did re-run the job with
  mapred.reduce.parallel.copies set with values from 5 all the way
  down to
  1.
  All failed with the same error:
 
  Error: java.lang.OutOfMemoryError: Java heap space
at
 
  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
  $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
at
 
  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
  $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
at
 
  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
  $MapOutputCopier.copyOutput(ReduceTask.java:1261)
at
 
  org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
  (ReduceTask.java:1195)
 
 
   So from that it does seem like something else might be going on,
  yes?
  I
  need to do some more research.
 
  I appreciate your insights.
 
  Andy
 
  -Original Message-
  From: Ted Yu [mailto:yuzhih...@gmail.com]
  Sent: Sunday, March 07, 2010 3:38 PM
  To: common-user@hadoop.apache.org
  Subject: Re: Shuffle In Memory OutOfMemoryError
 
  My observation is based on this call chain:
  MapOutputCopier.run() calling copyOutput() calling getMapOutput()
  calling
  ramManager.canFitInMemory(decompressedLength)
 
  Basically ramManager.canFitInMemory() makes decision without
  considering
  the
  number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of
  total
  heap
  may be used in shuffling if default parameters were used.
  Of course, you should check the value for
  mapred.reduce.parallel.copies
  to
  see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.
 
  About ramManager.unreserve() call, ReduceTask.java from hadoop
  0.20.2
  only
  has 2731 lines. So I have to guess the location of the code
  snippet you
  provided.
  I found this around line 1535:
} catch (IOException ioe) {
  LOG.info(Failed to shuffle from  +
  mapOutputLoc.getTaskAttemptId(),
   ioe);
 
  // Inform the ram-manager
  ramManager.closeInMemoryFile(mapOutputLength);
  ramManager.unreserve(mapOutputLength);
 
  // Discard the map-output
  try {
mapOutput.discard();
  } catch (IOException ignored) {
LOG.info(Failed to discard map-output from  +
 mapOutputLoc.getTaskAttemptId(), ignored);
  }
  Please confirm the line number.
 
  If we're looking at the same code, I am afraid I don't see how we
  can
  improve it. First, I assume IOException shouldn't happen that often.
  Second,
  mapOutput.discard() just sets:
  data = null;
  for in memory case. Even if we call mapOutput.discard() before
  ramManager.unreserve(), we don't know when GC would kick in and
  make more
  memory available.
  Of course, given the large

Re: Shuffle In Memory OutOfMemoryError

2010-03-09 Thread Christopher Douglas
That section of code is unmodified in MR-1182. See the patches/svn  
log. -C


Sent from my iPhone

On Mar 9, 2010, at 7:44 PM, Ted Yu yuzhih...@gmail.com wrote:


I just downloaded hadoop-0.20.2 tar ball from cloudera mirror.
This is what I see in ReduceTask (line 999):
 public synchronized boolean reserve(int requestedSize,  
InputStream in)


 throws InterruptedException {
   // Wait till the request can be fulfilled...
   while ((size + requestedSize)  maxSize) {

I don't see the fix from MR-1182.

That's why I suggested to Andy that he manually apply MR-1182.

Cheers

On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins andy.saut...@returnpath.net 
wrote:




Thanks Christopher.

The heap size for reduce tasks is configured to be 640M (
mapred.child.java.opts set to -Xmx640m ).

Andy

-Original Message-
From: Christopher Douglas [mailto:chri...@yahoo-inc.com]
Sent: Tuesday, March 09, 2010 5:19 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

No, MR-1182 is included in 0.20.2

What heap size have you set for your reduce tasks? -C

Sent from my iPhone

On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote:


Andy:
You need to manually apply the patch.

Cheers

On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins 

andy.saut...@returnpath.net

wrote:




Thanks Ted.  My understanding is that MAPREDUCE-1182 is included
in the
0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and
re-ran
the same job scenarios.  Running with mapred.reduce.parallel.copies
set to 1
and continue to have the same Java heap space error.



-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Tuesday, March 09, 2010 12:56 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

This issue has been resolved in
http://issues.apache.org/jira/browse/MAPREDUCE-1182

Please apply the patch
M1182-1v20.patch


http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch




On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins 

andy.saut...@returnpath.net

wrote:




Thanks Ted.  Very helpful.  You are correct that I misunderstood  
the

code

at ReduceTask.java:1535.  I missed the fact that it's in a
IOException

catch

block.  My mistake.  That's what I get for being in a rush.

For what it's worth I did re-run the job with
mapred.reduce.parallel.copies set with values from 5 all the way
down to

1.

All failed with the same error:

Error: java.lang.OutOfMemoryError: Java heap space
 at


org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)

 at


org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$MapOutputCopier.getMapOutput(ReduceTask.java:1408)

 at


org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$MapOutputCopier.copyOutput(ReduceTask.java:1261)

 at

org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
$MapOutputCopier.run

(ReduceTask.java:1195)



So from that it does seem like something else might be going on,
yes?

I

need to do some more research.

I appreciate your insights.

Andy

-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Sunday, March 07, 2010 3:38 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

My observation is based on this call chain:
MapOutputCopier.run() calling copyOutput() calling getMapOutput()
calling
ramManager.canFitInMemory(decompressedLength)

Basically ramManager.canFitInMemory() makes decision without
considering
the
number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of
total

heap

may be used in shuffling if default parameters were used.
Of course, you should check the value for
mapred.reduce.parallel.copies

to

see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.

About ramManager.unreserve() call, ReduceTask.java from hadoop
0.20.2

only

has 2731 lines. So I have to guess the location of the code
snippet you
provided.
I found this around line 1535:
 } catch (IOException ioe) {
   LOG.info(Failed to shuffle from  +
mapOutputLoc.getTaskAttemptId(),
ioe);

   // Inform the ram-manager
   ramManager.closeInMemoryFile(mapOutputLength);
   ramManager.unreserve(mapOutputLength);

   // Discard the map-output
   try {
 mapOutput.discard();
   } catch (IOException ignored) {
 LOG.info(Failed to discard map-output from  +
  mapOutputLoc.getTaskAttemptId(), ignored);
   }
Please confirm the line number.

If we're looking at the same code, I am afraid I don't see how we
can
improve it. First, I assume IOException shouldn't happen that  
often.

Second,
mapOutput.discard() just sets:
   data = null;
for in memory case. Even if we call mapOutput.discard() before
ramManager.unreserve(), we don't know when GC would kick in and
make more
memory available.
Of course, given the large number of map outputs in your system,  
it

became

more likely

Re: Shuffle In Memory OutOfMemoryError

2010-03-07 Thread Jacob R Rideout
Ted,

Thank you. I filled MAPREDUCE-1571 to cover this issue. I might have
some time to write a patch later this week.

Jacob Rideout

On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu yuzhih...@gmail.com wrote:
 I think there is mismatch (in ReduceTask.java) between:
      this.numCopiers = conf.getInt(mapred.reduce.parallel.copies, 5);
 and:
        maxSingleShuffleLimit = (long)(maxSize *
 MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION);
 where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f

 because
      copiers = new ArrayListMapOutputCopier(numCopiers);
 so the total memory allocated for in-mem shuffle is 1.25 * maxSize

 A JIRA should be filed to correlate the constant 5 above and
 MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION.

 Cheers

 On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout 
 apa...@jacobrideout.netwrote:

 Hi all,

 We are seeing the following error in our reducers of a particular job:

 Error: java.lang.OutOfMemoryError: Java heap space
        at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
        at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
        at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
        at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)


 After enough reducers fail the entire job fails. This error occurs
 regardless of whether mapred.compress.map.output is true. We were able
 to avoid the issue by reducing mapred.job.shuffle.input.buffer.percent
 to 20%. Shouldn't the framework via ShuffleRamManager.canFitInMemory
 and.ShuffleRamManager.reserve correctly detect the the memory
 available for allocation? I would think that with poor configuration
 settings (and default settings in particular) the job may not be as
 efficient, but wouldn't die.

 Here is some more context in the logs, I have attached the full
 reducer log here: http://gist.github.com/323746


 2010-03-06 07:54:49,621 INFO org.apache.hadoop.mapred.ReduceTask:
 Shuffling 4191933 bytes (435311 raw bytes) into RAM from
 attempt_201003060739_0002_m_61_0
 2010-03-06 07:54:50,222 INFO org.apache.hadoop.mapred.ReduceTask: Task
 attempt_201003060739_0002_r_00_0: Failed fetch #1 from
 attempt_201003060739_0002_m_000202_0
 2010-03-06 07:54:50,223 WARN org.apache.hadoop.mapred.ReduceTask:
 attempt_201003060739_0002_r_00_0 adding host
 hd37.dfs.returnpath.net to penalty box, next contact in 4 seconds
 2010-03-06 07:54:50,223 INFO org.apache.hadoop.mapred.ReduceTask:
 attempt_201003060739_0002_r_00_0: Got 1 map-outputs from previous
 failures
 2010-03-06 07:54:50,223 FATAL org.apache.hadoop.mapred.TaskRunner:
 attempt_201003060739_0002_r_00_0 : Map output copy failure :
 java.lang.OutOfMemoryError: Java heap space
        at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
        at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
        at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
        at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)


 We tried this both in 0.20.1 and 0.20.2. We had hoped MAPREDUCE-1182
 would address the issue in 0.20.2, but it did not. Does anyone have
 any comments or suggestions? Is this a bug I should file a JIRA for?

 Jacob Rideout
 Return Path




RE: Shuffle In Memory OutOfMemoryError

2010-03-07 Thread Andy Sautins

   Ted,

   I'm trying to follow the logic in your mail and I'm not sure I'm following.  
If you would mind helping me understand I would appreciate it.  

   Looking at the code maxSingleShuffleLimit is only used in determining if the 
copy _can_ fit into memory:

 boolean canFitInMemory(long requestedSize) {
return (requestedSize  Integer.MAX_VALUE 
requestedSize  maxSingleShuffleLimit);
  }

It also looks like the RamManager.reserve should wait until memory is 
available so it should hit a memory limit for that reason. 

What does seem a little strange to me is the following ( ReduceTask.java 
starting at 2730 ):

  // Inform the ram-manager
  ramManager.closeInMemoryFile(mapOutputLength);
  ramManager.unreserve(mapOutputLength);

  // Discard the map-output
  try {
mapOutput.discard();
  } catch (IOException ignored) {
LOG.info(Failed to discard map-output from  +
 mapOutputLoc.getTaskAttemptId(), ignored);
  }
  mapOutput = null;

   So to me that looks like the ramManager unreserves the memory before the 
mapOutput is discarded.  Shouldn't the mapOutput be discarded _before_ the 
ramManager unreserves the memory?  If the memory is unreserved before the 
actual underlying data references are removed then it seems like another thread 
can try to allocate memory ( ReduceTask.java:2730 ) before the previous memory 
is disposed ( mapOutput.discard() ).  

   Not sure that makes sense.  One thing to note is that the particular job 
that is failing does have a good number ( 200k+ ) of map outputs.  The large 
number of small map outputs may be why we are triggering a problem.

   Thanks again for your thoughts.

   Andy


-Original Message-
From: Jacob R Rideout [mailto:apa...@jacobrideout.net] 
Sent: Sunday, March 07, 2010 1:21 PM
To: common-user@hadoop.apache.org
Cc: Andy Sautins; Ted Yu
Subject: Re: Shuffle In Memory OutOfMemoryError

Ted,

Thank you. I filled MAPREDUCE-1571 to cover this issue. I might have
some time to write a patch later this week.

Jacob Rideout

On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu yuzhih...@gmail.com wrote:
 I think there is mismatch (in ReduceTask.java) between:
      this.numCopiers = conf.getInt(mapred.reduce.parallel.copies, 5);
 and:
        maxSingleShuffleLimit = (long)(maxSize *
 MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION);
 where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f

 because
      copiers = new ArrayListMapOutputCopier(numCopiers);
 so the total memory allocated for in-mem shuffle is 1.25 * maxSize

 A JIRA should be filed to correlate the constant 5 above and
 MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION.

 Cheers

 On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout 
 apa...@jacobrideout.netwrote:

 Hi all,

 We are seeing the following error in our reducers of a particular job:

 Error: java.lang.OutOfMemoryError: Java heap space
        at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
        at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
        at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
        at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)


 After enough reducers fail the entire job fails. This error occurs
 regardless of whether mapred.compress.map.output is true. We were able
 to avoid the issue by reducing mapred.job.shuffle.input.buffer.percent
 to 20%. Shouldn't the framework via ShuffleRamManager.canFitInMemory
 and.ShuffleRamManager.reserve correctly detect the the memory
 available for allocation? I would think that with poor configuration
 settings (and default settings in particular) the job may not be as
 efficient, but wouldn't die.

 Here is some more context in the logs, I have attached the full
 reducer log here: http://gist.github.com/323746


 2010-03-06 07:54:49,621 INFO org.apache.hadoop.mapred.ReduceTask:
 Shuffling 4191933 bytes (435311 raw bytes) into RAM from
 attempt_201003060739_0002_m_61_0
 2010-03-06 07:54:50,222 INFO org.apache.hadoop.mapred.ReduceTask: Task
 attempt_201003060739_0002_r_00_0: Failed fetch #1 from
 attempt_201003060739_0002_m_000202_0
 2010-03-06 07:54:50,223 WARN org.apache.hadoop.mapred.ReduceTask:
 attempt_201003060739_0002_r_00_0 adding host
 hd37.dfs.returnpath.net to penalty box, next contact in 4 seconds
 2010-03-06 07:54:50,223 INFO org.apache.hadoop.mapred.ReduceTask:
 attempt_201003060739_0002_r_00_0: Got 1 map-outputs from previous
 failures
 2010-03-06 07:54:50,223 FATAL org.apache.hadoop.mapred.TaskRunner:
 attempt_201003060739_0002_r_00_0 : Map output copy failure :
 java.lang.OutOfMemoryError: Java heap space
        at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508

Re: Shuffle In Memory OutOfMemoryError

2010-03-07 Thread Ted Yu
My observation is based on this call chain:
MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling
ramManager.canFitInMemory(decompressedLength)

Basically ramManager.canFitInMemory() makes decision without considering the
number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap
may be used in shuffling if default parameters were used.
Of course, you should check the value for mapred.reduce.parallel.copies to
see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.

About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only
has 2731 lines. So I have to guess the location of the code snippet you
provided.
I found this around line 1535:
} catch (IOException ioe) {
  LOG.info(Failed to shuffle from  +
mapOutputLoc.getTaskAttemptId(),
   ioe);

  // Inform the ram-manager
  ramManager.closeInMemoryFile(mapOutputLength);
  ramManager.unreserve(mapOutputLength);

  // Discard the map-output
  try {
mapOutput.discard();
  } catch (IOException ignored) {
LOG.info(Failed to discard map-output from  +
 mapOutputLoc.getTaskAttemptId(), ignored);
  }
Please confirm the line number.

If we're looking at the same code, I am afraid I don't see how we can
improve it. First, I assume IOException shouldn't happen that often. Second,
mapOutput.discard() just sets:
  data = null;
for in memory case. Even if we call mapOutput.discard() before
ramManager.unreserve(), we don't know when GC would kick in and make more
memory available.
Of course, given the large number of map outputs in your system, it became
more likely that the root cause from my reasoning made OOME happen sooner.

Thanks


On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins andy.saut...@returnpath.netwrote:


   Ted,

   I'm trying to follow the logic in your mail and I'm not sure I'm
 following.  If you would mind helping me understand I would appreciate it.

   Looking at the code maxSingleShuffleLimit is only used in determining if
 the copy _can_ fit into memory:

 boolean canFitInMemory(long requestedSize) {
return (requestedSize  Integer.MAX_VALUE 
requestedSize  maxSingleShuffleLimit);
  }

It also looks like the RamManager.reserve should wait until memory is
 available so it should hit a memory limit for that reason.

What does seem a little strange to me is the following ( ReduceTask.java
 starting at 2730 ):

  // Inform the ram-manager
  ramManager.closeInMemoryFile(mapOutputLength);
  ramManager.unreserve(mapOutputLength);

  // Discard the map-output
  try {
mapOutput.discard();
  } catch (IOException ignored) {
LOG.info(Failed to discard map-output from  +
 mapOutputLoc.getTaskAttemptId(), ignored);
  }
  mapOutput = null;

   So to me that looks like the ramManager unreserves the memory before the
 mapOutput is discarded.  Shouldn't the mapOutput be discarded _before_ the
 ramManager unreserves the memory?  If the memory is unreserved before the
 actual underlying data references are removed then it seems like another
 thread can try to allocate memory ( ReduceTask.java:2730 ) before the
 previous memory is disposed ( mapOutput.discard() ).

   Not sure that makes sense.  One thing to note is that the particular job
 that is failing does have a good number ( 200k+ ) of map outputs.  The large
 number of small map outputs may be why we are triggering a problem.

   Thanks again for your thoughts.

   Andy


 -Original Message-
 From: Jacob R Rideout [mailto:apa...@jacobrideout.net]
 Sent: Sunday, March 07, 2010 1:21 PM
 To: common-user@hadoop.apache.org
 Cc: Andy Sautins; Ted Yu
 Subject: Re: Shuffle In Memory OutOfMemoryError

 Ted,

 Thank you. I filled MAPREDUCE-1571 to cover this issue. I might have
 some time to write a patch later this week.

 Jacob Rideout

 On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu yuzhih...@gmail.com wrote:
  I think there is mismatch (in ReduceTask.java) between:
   this.numCopiers = conf.getInt(mapred.reduce.parallel.copies, 5);
  and:
 maxSingleShuffleLimit = (long)(maxSize *
  MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION);
  where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f
 
  because
   copiers = new ArrayListMapOutputCopier(numCopiers);
  so the total memory allocated for in-mem shuffle is 1.25 * maxSize
 
  A JIRA should be filed to correlate the constant 5 above and
  MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION.
 
  Cheers
 
  On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout apa...@jacobrideout.net
 wrote:
 
  Hi all,
 
  We are seeing the following error in our reducers of a particular job:
 
  Error: java.lang.OutOfMemoryError: Java heap space
 at
 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508

RE: Shuffle In Memory OutOfMemoryError

2010-03-07 Thread Andy Sautins

  Thanks Ted.  Very helpful.  You are correct that I misunderstood the code at 
ReduceTask.java:1535.  I missed the fact that it's in a IOException catch 
block.  My mistake.  That's what I get for being in a rush.

  For what it's worth I did re-run the job with mapred.reduce.parallel.copies 
set with values from 5 all the way down to 1.  All failed with the same error:

Error: java.lang.OutOfMemoryError: Java heap space
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)

  
  So from that it does seem like something else might be going on, yes?  I need 
to do some more research.  

  I appreciate your insights.

  Andy   

-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com] 
Sent: Sunday, March 07, 2010 3:38 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

My observation is based on this call chain:
MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling
ramManager.canFitInMemory(decompressedLength)

Basically ramManager.canFitInMemory() makes decision without considering the
number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap
may be used in shuffling if default parameters were used.
Of course, you should check the value for mapred.reduce.parallel.copies to
see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.

About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only
has 2731 lines. So I have to guess the location of the code snippet you
provided.
I found this around line 1535:
} catch (IOException ioe) {
  LOG.info(Failed to shuffle from  +
mapOutputLoc.getTaskAttemptId(),
   ioe);

  // Inform the ram-manager
  ramManager.closeInMemoryFile(mapOutputLength);
  ramManager.unreserve(mapOutputLength);

  // Discard the map-output
  try {
mapOutput.discard();
  } catch (IOException ignored) {
LOG.info(Failed to discard map-output from  +
 mapOutputLoc.getTaskAttemptId(), ignored);
  }
Please confirm the line number.

If we're looking at the same code, I am afraid I don't see how we can
improve it. First, I assume IOException shouldn't happen that often. Second,
mapOutput.discard() just sets:
  data = null;
for in memory case. Even if we call mapOutput.discard() before
ramManager.unreserve(), we don't know when GC would kick in and make more
memory available.
Of course, given the large number of map outputs in your system, it became
more likely that the root cause from my reasoning made OOME happen sooner.

Thanks


On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins andy.saut...@returnpath.netwrote:


   Ted,

   I'm trying to follow the logic in your mail and I'm not sure I'm
 following.  If you would mind helping me understand I would appreciate it.

   Looking at the code maxSingleShuffleLimit is only used in determining if
 the copy _can_ fit into memory:

 boolean canFitInMemory(long requestedSize) {
return (requestedSize  Integer.MAX_VALUE 
requestedSize  maxSingleShuffleLimit);
  }

It also looks like the RamManager.reserve should wait until memory is
 available so it should hit a memory limit for that reason.

What does seem a little strange to me is the following ( ReduceTask.java
 starting at 2730 ):

  // Inform the ram-manager
  ramManager.closeInMemoryFile(mapOutputLength);
  ramManager.unreserve(mapOutputLength);

  // Discard the map-output
  try {
mapOutput.discard();
  } catch (IOException ignored) {
LOG.info(Failed to discard map-output from  +
 mapOutputLoc.getTaskAttemptId(), ignored);
  }
  mapOutput = null;

   So to me that looks like the ramManager unreserves the memory before the
 mapOutput is discarded.  Shouldn't the mapOutput be discarded _before_ the
 ramManager unreserves the memory?  If the memory is unreserved before the
 actual underlying data references are removed then it seems like another
 thread can try to allocate memory ( ReduceTask.java:2730 ) before the
 previous memory is disposed ( mapOutput.discard() ).

   Not sure that makes sense.  One thing to note is that the particular job
 that is failing does have a good number ( 200k+ ) of map outputs.  The large
 number of small map outputs may be why we are triggering a problem.

   Thanks again for your thoughts.

   Andy


 -Original Message-
 From: Jacob R Rideout [mailto:apa...@jacobrideout.net]
 Sent: Sunday, March 07, 2010

Re: Shuffle In Memory OutOfMemoryError

2010-03-07 Thread Ted Yu
Lowering mapred.job.shuffle.input.buffer.percent would be the option to
choose.

Maybe GC wasn't releasing memory fast enough for in memory shuffling.

On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins andy.saut...@returnpath.netwrote:


  Thanks Ted.  Very helpful.  You are correct that I misunderstood the code
 at ReduceTask.java:1535.  I missed the fact that it's in a IOException catch
 block.  My mistake.  That's what I get for being in a rush.

  For what it's worth I did re-run the job with
 mapred.reduce.parallel.copies set with values from 5 all the way down to 1.
  All failed with the same error:

 Error: java.lang.OutOfMemoryError: Java heap space
at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)


   So from that it does seem like something else might be going on, yes?  I
 need to do some more research.

  I appreciate your insights.

  Andy

 -Original Message-
 From: Ted Yu [mailto:yuzhih...@gmail.com]
 Sent: Sunday, March 07, 2010 3:38 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Shuffle In Memory OutOfMemoryError

 My observation is based on this call chain:
 MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling
 ramManager.canFitInMemory(decompressedLength)

 Basically ramManager.canFitInMemory() makes decision without considering
 the
 number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap
 may be used in shuffling if default parameters were used.
 Of course, you should check the value for mapred.reduce.parallel.copies to
 see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.

 About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only
 has 2731 lines. So I have to guess the location of the code snippet you
 provided.
 I found this around line 1535:
} catch (IOException ioe) {
  LOG.info(Failed to shuffle from  +
 mapOutputLoc.getTaskAttemptId(),
   ioe);

  // Inform the ram-manager
  ramManager.closeInMemoryFile(mapOutputLength);
  ramManager.unreserve(mapOutputLength);

  // Discard the map-output
  try {
mapOutput.discard();
  } catch (IOException ignored) {
LOG.info(Failed to discard map-output from  +
 mapOutputLoc.getTaskAttemptId(), ignored);
  }
 Please confirm the line number.

 If we're looking at the same code, I am afraid I don't see how we can
 improve it. First, I assume IOException shouldn't happen that often.
 Second,
 mapOutput.discard() just sets:
  data = null;
 for in memory case. Even if we call mapOutput.discard() before
 ramManager.unreserve(), we don't know when GC would kick in and make more
 memory available.
 Of course, given the large number of map outputs in your system, it became
 more likely that the root cause from my reasoning made OOME happen sooner.

 Thanks

 
 On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins andy.saut...@returnpath.net
 wrote:

 
Ted,
 
I'm trying to follow the logic in your mail and I'm not sure I'm
  following.  If you would mind helping me understand I would appreciate
 it.
 
Looking at the code maxSingleShuffleLimit is only used in determining
 if
  the copy _can_ fit into memory:
 
  boolean canFitInMemory(long requestedSize) {
 return (requestedSize  Integer.MAX_VALUE 
 requestedSize  maxSingleShuffleLimit);
   }
 
 It also looks like the RamManager.reserve should wait until memory is
  available so it should hit a memory limit for that reason.
 
 What does seem a little strange to me is the following (
 ReduceTask.java
  starting at 2730 ):
 
   // Inform the ram-manager
   ramManager.closeInMemoryFile(mapOutputLength);
   ramManager.unreserve(mapOutputLength);
 
   // Discard the map-output
   try {
 mapOutput.discard();
   } catch (IOException ignored) {
 LOG.info(Failed to discard map-output from  +
  mapOutputLoc.getTaskAttemptId(), ignored);
   }
   mapOutput = null;
 
So to me that looks like the ramManager unreserves the memory before
 the
  mapOutput is discarded.  Shouldn't the mapOutput be discarded _before_
 the
  ramManager unreserves the memory?  If the memory is unreserved before the
  actual underlying data references are removed then it seems like another
  thread can try to allocate memory ( ReduceTask.java:2730 ) before the
  previous memory is disposed ( mapOutput.discard() ).
 
Not sure that makes sense.  One thing to note is that the particular
 job

Shuffle In Memory OutOfMemoryError

2010-03-06 Thread Jacob R Rideout
Hi all,

We are seeing the following error in our reducers of a particular job:

Error: java.lang.OutOfMemoryError: Java heap space
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)


After enough reducers fail the entire job fails. This error occurs
regardless of whether mapred.compress.map.output is true. We were able
to avoid the issue by reducing mapred.job.shuffle.input.buffer.percent
to 20%. Shouldn't the framework via ShuffleRamManager.canFitInMemory
and.ShuffleRamManager.reserve correctly detect the the memory
available for allocation? I would think that with poor configuration
settings (and default settings in particular) the job may not be as
efficient, but wouldn't die.

Here is some more context in the logs, I have attached the full
reducer log here: http://gist.github.com/323746


2010-03-06 07:54:49,621 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 4191933 bytes (435311 raw bytes) into RAM from
attempt_201003060739_0002_m_61_0
2010-03-06 07:54:50,222 INFO org.apache.hadoop.mapred.ReduceTask: Task
attempt_201003060739_0002_r_00_0: Failed fetch #1 from
attempt_201003060739_0002_m_000202_0
2010-03-06 07:54:50,223 WARN org.apache.hadoop.mapred.ReduceTask:
attempt_201003060739_0002_r_00_0 adding host
hd37.dfs.returnpath.net to penalty box, next contact in 4 seconds
2010-03-06 07:54:50,223 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201003060739_0002_r_00_0: Got 1 map-outputs from previous
failures
2010-03-06 07:54:50,223 FATAL org.apache.hadoop.mapred.TaskRunner:
attempt_201003060739_0002_r_00_0 : Map output copy failure :
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)


We tried this both in 0.20.1 and 0.20.2. We had hoped MAPREDUCE-1182
would address the issue in 0.20.2, but it did not. Does anyone have
any comments or suggestions? Is this a bug I should file a JIRA for?

Jacob Rideout
Return Path


Re: Shuffle In Memory OutOfMemoryError

2010-03-06 Thread Ted Yu
I think there is mismatch (in ReduceTask.java) between:
  this.numCopiers = conf.getInt(mapred.reduce.parallel.copies, 5);
and:
maxSingleShuffleLimit = (long)(maxSize *
MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION);
where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f

because
  copiers = new ArrayListMapOutputCopier(numCopiers);
so the total memory allocated for in-mem shuffle is 1.25 * maxSize

A JIRA should be filed to correlate the constant 5 above and
MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION.

Cheers

On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout apa...@jacobrideout.netwrote:

 Hi all,

 We are seeing the following error in our reducers of a particular job:

 Error: java.lang.OutOfMemoryError: Java heap space
at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)


 After enough reducers fail the entire job fails. This error occurs
 regardless of whether mapred.compress.map.output is true. We were able
 to avoid the issue by reducing mapred.job.shuffle.input.buffer.percent
 to 20%. Shouldn't the framework via ShuffleRamManager.canFitInMemory
 and.ShuffleRamManager.reserve correctly detect the the memory
 available for allocation? I would think that with poor configuration
 settings (and default settings in particular) the job may not be as
 efficient, but wouldn't die.

 Here is some more context in the logs, I have attached the full
 reducer log here: http://gist.github.com/323746


 2010-03-06 07:54:49,621 INFO org.apache.hadoop.mapred.ReduceTask:
 Shuffling 4191933 bytes (435311 raw bytes) into RAM from
 attempt_201003060739_0002_m_61_0
 2010-03-06 07:54:50,222 INFO org.apache.hadoop.mapred.ReduceTask: Task
 attempt_201003060739_0002_r_00_0: Failed fetch #1 from
 attempt_201003060739_0002_m_000202_0
 2010-03-06 07:54:50,223 WARN org.apache.hadoop.mapred.ReduceTask:
 attempt_201003060739_0002_r_00_0 adding host
 hd37.dfs.returnpath.net to penalty box, next contact in 4 seconds
 2010-03-06 07:54:50,223 INFO org.apache.hadoop.mapred.ReduceTask:
 attempt_201003060739_0002_r_00_0: Got 1 map-outputs from previous
 failures
 2010-03-06 07:54:50,223 FATAL org.apache.hadoop.mapred.TaskRunner:
 attempt_201003060739_0002_r_00_0 : Map output copy failure :
 java.lang.OutOfMemoryError: Java heap space
at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)


 We tried this both in 0.20.1 and 0.20.2. We had hoped MAPREDUCE-1182
 would address the issue in 0.20.2, but it did not. Does anyone have
 any comments or suggestions? Is this a bug I should file a JIRA for?

 Jacob Rideout
 Return Path