Re: Shuffle In Memory OutOfMemoryError
parameter 2. modify reserve() slightly to calculate the maximum outstanding numPendingRequests and print the maximum. Based on the output from above two items, we can discuss solution. My intuition is to place upperbound on numPendingRequests beyond which canFitInMemory() returns false. * My two cents. On Tue, Mar 9, 2010 at 11:51 PM, Christopher Douglas chri...@yahoo-inc.comwrote: That section of code is unmodified in MR-1182. See the patches/svn log. -C Sent from my iPhone On Mar 9, 2010, at 7:44 PM, Ted Yu yuzhih...@gmail.com wrote: I just downloaded hadoop-0.20.2 tar ball from cloudera mirror. This is what I see in ReduceTask (line 999): public synchronized boolean reserve(int requestedSize, InputStream in) throws InterruptedException { // Wait till the request can be fulfilled... while ((size + requestedSize) maxSize) { I don't see the fix from MR-1182. That's why I suggested to Andy that he manually apply MR-1182. Cheers On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Christopher. The heap size for reduce tasks is configured to be 640M ( mapred.child.java.opts set to -Xmx640m ). Andy -Original Message- From: Christopher Douglas [mailto:chri...@yahoo-inc.com] Sent: Tuesday, March 09, 2010 5:19 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError No, MR-1182 is included in 0.20.2 What heap size have you set for your reduce tasks? -C Sent from my iPhone On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote: Andy: You need to manually apply the patch. Cheers On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. My understanding is that MAPREDUCE-1182 is included in the 0.20.2 release. We upgraded our cluster to 0.20.2 this weekend and re-ran the same job scenarios. Running with mapred.reduce.parallel.copies set to 1 and continue to have the same Java heap space error. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, March 09, 2010 12:56 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError This issue has been resolved in http://issues.apache.org/jira/browse/MAPREDUCE-1182 Please apply the patch M1182-1v20.patch http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. Very helpful. You are correct that I misunderstood the code at ReduceTask.java:1535. I missed the fact that it's in a IOException catch block. My mistake. That's what I get for being in a rush. For what it's worth I did re-run the job with mapred.reduce.parallel.copies set with values from 5 all the way down to 1. All failed with the same error: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run (ReduceTask.java:1195) So from that it does seem like something else might be going on, yes? I need to do some more research. I appreciate your insights. Andy -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, March 07, 2010 3:38 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError My observation is based on this call chain: MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling ramManager.canFitInMemory(decompressedLength) Basically ramManager.canFitInMemory() makes decision without considering the number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap may be used in shuffling if default parameters were used. Of course, you should check the value for mapred.reduce.parallel.copies to see if it is 5. If it is 4 or lower, my reasoning wouldn't apply. About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only has 2731 lines. So I have to guess the location of the code snippet you provided. I found this around line 1535: } catch (IOException ioe
Re: Shuffle In Memory OutOfMemoryError
. That's why I suggested to Andy that he manually apply MR-1182. Cheers On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Christopher. The heap size for reduce tasks is configured to be 640M ( mapred.child.java.opts set to -Xmx640m ). Andy -Original Message- From: Christopher Douglas [mailto:chri...@yahoo-inc.com] Sent: Tuesday, March 09, 2010 5:19 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError No, MR-1182 is included in 0.20.2 What heap size have you set for your reduce tasks? -C Sent from my iPhone On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote: Andy: You need to manually apply the patch. Cheers On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. My understanding is that MAPREDUCE-1182 is included in the 0.20.2 release. We upgraded our cluster to 0.20.2 this weekend and re-ran the same job scenarios. Running with mapred.reduce.parallel.copies set to 1 and continue to have the same Java heap space error. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, March 09, 2010 12:56 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError This issue has been resolved in http://issues.apache.org/jira/browse/MAPREDUCE-1182 Please apply the patch M1182-1v20.patch http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. Very helpful. You are correct that I misunderstood the code at ReduceTask.java:1535. I missed the fact that it's in a IOException catch block. My mistake. That's what I get for being in a rush. For what it's worth I did re-run the job with mapred.reduce.parallel.copies set with values from 5 all the way down to 1. All failed with the same error: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run (ReduceTask.java:1195) So from that it does seem like something else might be going on, yes? I need to do some more research. I appreciate your insights. Andy -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, March 07, 2010 3:38 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError My observation is based on this call chain: MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling ramManager.canFitInMemory(decompressedLength) Basically ramManager.canFitInMemory() makes decision without considering the number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap may be used in shuffling if default parameters were used. Of course, you should check the value for mapred.reduce.parallel.copies to see if it is 5. If it is 4 or lower, my reasoning wouldn't apply. About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only has 2731 lines. So I have to guess the location of the code snippet you provided. I found this around line 1535: } catch (IOException ioe) { LOG.info(Failed to shuffle from + mapOutputLoc.getTaskAttemptId(), ioe); // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map-output try { mapOutput.discard(); } catch (IOException ignored) { LOG.info(Failed to discard map-output from + mapOutputLoc.getTaskAttemptId(), ignored); } Please confirm the line number. If we're looking at the same code, I am afraid I don't see how we can improve it. First, I assume IOException shouldn't happen that often. Second, mapOutput.discard() just sets: data = null; for in memory case. Even if we call mapOutput.discard() before ramManager.unreserve(), we don't know when GC would kick in and make more memory available. Of course, given the large number of map outputs in your system, it became more likely that the root cause from my reasoning made OOME happen sooner. Thanks On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins andy.saut...@returnpath.net wrote: Ted, I'm trying to follow the logic in your mail and I'm not sure I'm following. If you would mind helping me understand I would appreciate it. Looking at the code maxSingleShuffleLimit is only used in determining if the copy _can_ fit into memory
Re: Shuffle In Memory OutOfMemoryError
On Mar 9, 2010, at 7:44 PM, Ted Yu yuzhih...@gmail.com wrote: I just downloaded hadoop-0.20.2 tar ball from cloudera mirror. This is what I see in ReduceTask (line 999): public synchronized boolean reserve(int requestedSize, InputStream in) throws InterruptedException { // Wait till the request can be fulfilled... while ((size + requestedSize) maxSize) { I don't see the fix from MR-1182. That's why I suggested to Andy that he manually apply MR-1182. Cheers On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Christopher. The heap size for reduce tasks is configured to be 640M ( mapred.child.java.opts set to -Xmx640m ). Andy -Original Message- From: Christopher Douglas [mailto:chri...@yahoo-inc.com] Sent: Tuesday, March 09, 2010 5:19 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError No, MR-1182 is included in 0.20.2 What heap size have you set for your reduce tasks? -C Sent from my iPhone On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote: Andy: You need to manually apply the patch. Cheers On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. My understanding is that MAPREDUCE-1182 is included in the 0.20.2 release. We upgraded our cluster to 0.20.2 this weekend and re-ran the same job scenarios. Running with mapred.reduce.parallel.copies set to 1 and continue to have the same Java heap space error. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, March 09, 2010 12:56 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError This issue has been resolved in http://issues.apache.org/jira/browse/MAPREDUCE-1182 Please apply the patch M1182-1v20.patch http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. Very helpful. You are correct that I misunderstood the code at ReduceTask.java:1535. I missed the fact that it's in a IOException catch block. My mistake. That's what I get for being in a rush. For what it's worth I did re-run the job with mapred.reduce.parallel.copies set with values from 5 all the way down to 1. All failed with the same error: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run (ReduceTask.java:1195) So from that it does seem like something else might be going on, yes? I need to do some more research. I appreciate your insights. Andy -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, March 07, 2010 3:38 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError My observation is based on this call chain: MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling ramManager.canFitInMemory(decompressedLength) Basically ramManager.canFitInMemory() makes decision without considering the number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap may be used in shuffling if default parameters were used. Of course, you should check the value for mapred.reduce.parallel.copies to see if it is 5. If it is 4 or lower, my reasoning wouldn't apply. About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only has 2731 lines. So I have to guess the location of the code snippet you provided. I found this around line 1535: } catch (IOException ioe) { LOG.info(Failed to shuffle from + mapOutputLoc.getTaskAttemptId(), ioe); // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map-output try { mapOutput.discard(); } catch (IOException ignored) { LOG.info(Failed to discard map-output from + mapOutputLoc.getTaskAttemptId(), ignored); } Please confirm the line number. If we're looking at the same code, I am afraid I don't see how we can improve it. First, I assume IOException shouldn't happen that often. Second, mapOutput.discard() just sets: data = null; for in memory case. Even if we call
Re: Shuffle In Memory OutOfMemoryError
I verified that size and maxSize are long. This means MR-1182 didn't resolve Andy's issue. According to Andy: At the beginning of the job there are 209,754 pending map tasks and 32 pending reduce tasks My guess is that GC wasn't reclaiming memory fast enough, leading to OOME because of large number of in-memory shuffle candidates. My suggestion for Andy would be to: 1. add -*verbose*:*gc as JVM parameter 2. modify reserve() slightly to calculate the maximum outstanding numPendingRequests and print the maximum. Based on the output from above two items, we can discuss solution. My intuition is to place upperbound on numPendingRequests beyond which canFitInMemory() returns false. * My two cents. On Tue, Mar 9, 2010 at 11:51 PM, Christopher Douglas chri...@yahoo-inc.comwrote: That section of code is unmodified in MR-1182. See the patches/svn log. -C Sent from my iPhone On Mar 9, 2010, at 7:44 PM, Ted Yu yuzhih...@gmail.com wrote: I just downloaded hadoop-0.20.2 tar ball from cloudera mirror. This is what I see in ReduceTask (line 999): public synchronized boolean reserve(int requestedSize, InputStream in) throws InterruptedException { // Wait till the request can be fulfilled... while ((size + requestedSize) maxSize) { I don't see the fix from MR-1182. That's why I suggested to Andy that he manually apply MR-1182. Cheers On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Christopher. The heap size for reduce tasks is configured to be 640M ( mapred.child.java.opts set to -Xmx640m ). Andy -Original Message- From: Christopher Douglas [mailto:chri...@yahoo-inc.com] Sent: Tuesday, March 09, 2010 5:19 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError No, MR-1182 is included in 0.20.2 What heap size have you set for your reduce tasks? -C Sent from my iPhone On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote: Andy: You need to manually apply the patch. Cheers On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. My understanding is that MAPREDUCE-1182 is included in the 0.20.2 release. We upgraded our cluster to 0.20.2 this weekend and re-ran the same job scenarios. Running with mapred.reduce.parallel.copies set to 1 and continue to have the same Java heap space error. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, March 09, 2010 12:56 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError This issue has been resolved in http://issues.apache.org/jira/browse/MAPREDUCE-1182 Please apply the patch M1182-1v20.patch http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. Very helpful. You are correct that I misunderstood the code at ReduceTask.java:1535. I missed the fact that it's in a IOException catch block. My mistake. That's what I get for being in a rush. For what it's worth I did re-run the job with mapred.reduce.parallel.copies set with values from 5 all the way down to 1. All failed with the same error: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run (ReduceTask.java:1195) So from that it does seem like something else might be going on, yes? I need to do some more research. I appreciate your insights. Andy -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, March 07, 2010 3:38 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError My observation is based on this call chain: MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling ramManager.canFitInMemory(decompressedLength) Basically ramManager.canFitInMemory() makes decision without considering the number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap may be used in shuffling if default parameters were used. Of course, you should check the value for mapred.reduce.parallel.copies to see if it is 5. If it is 4 or lower, my reasoning wouldn't apply. About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only has 2731 lines. So I have to guess the location of the code snippet you provided. I found this around line 1535: } catch (IOException ioe) { LOG.info(Failed to shuffle from + mapOutputLoc.getTaskAttemptId
Re: Shuffle In Memory OutOfMemoryError
I don't think this OOM is a framework bug per se, and given the rewrite/refactoring of the shuffle in MAPREDUCE-318 (in 0.21), tuning the 0.20 shuffle semantics is likely not worthwhile (though data informing improvements to trunk would be excellent). Most likely (and tautologically), ReduceTask simply requires more memory than is available and the job failure can be avoided by either 0) increasing the heap size or 1) lowering mapred.shuffle.input.buffer.percent. Most of the tasks we run have a heap of 1GB. For a reduce fetching 200k map outputs, that's a reasonable, even stingy amount of space. -C On Mar 10, 2010, at 5:26 AM, Ted Yu wrote: I verified that size and maxSize are long. This means MR-1182 didn't resolve Andy's issue. According to Andy: At the beginning of the job there are 209,754 pending map tasks and 32 pending reduce tasks My guess is that GC wasn't reclaiming memory fast enough, leading to OOME because of large number of in-memory shuffle candidates. My suggestion for Andy would be to: 1. add -*verbose*:*gc as JVM parameter 2. modify reserve() slightly to calculate the maximum outstanding numPendingRequests and print the maximum. Based on the output from above two items, we can discuss solution. My intuition is to place upperbound on numPendingRequests beyond which canFitInMemory() returns false. * My two cents. On Tue, Mar 9, 2010 at 11:51 PM, Christopher Douglas chri...@yahoo-inc.comwrote: That section of code is unmodified in MR-1182. See the patches/svn log. -C Sent from my iPhone On Mar 9, 2010, at 7:44 PM, Ted Yu yuzhih...@gmail.com wrote: I just downloaded hadoop-0.20.2 tar ball from cloudera mirror. This is what I see in ReduceTask (line 999): public synchronized boolean reserve(int requestedSize, InputStream in) throws InterruptedException { // Wait till the request can be fulfilled... while ((size + requestedSize) maxSize) { I don't see the fix from MR-1182. That's why I suggested to Andy that he manually apply MR-1182. Cheers On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Christopher. The heap size for reduce tasks is configured to be 640M ( mapred.child.java.opts set to -Xmx640m ). Andy -Original Message- From: Christopher Douglas [mailto:chri...@yahoo-inc.com] Sent: Tuesday, March 09, 2010 5:19 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError No, MR-1182 is included in 0.20.2 What heap size have you set for your reduce tasks? -C Sent from my iPhone On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote: Andy: You need to manually apply the patch. Cheers On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. My understanding is that MAPREDUCE-1182 is included in the 0.20.2 release. We upgraded our cluster to 0.20.2 this weekend and re-ran the same job scenarios. Running with mapred.reduce.parallel.copies set to 1 and continue to have the same Java heap space error. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, March 09, 2010 12:56 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError This issue has been resolved in http://issues.apache.org/jira/browse/MAPREDUCE-1182 Please apply the patch M1182-1v20.patch http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. Very helpful. You are correct that I misunderstood the code at ReduceTask.java:1535. I missed the fact that it's in a IOException catch block. My mistake. That's what I get for being in a rush. For what it's worth I did re-run the job with mapred.reduce.parallel.copies set with values from 5 all the way down to 1. All failed with the same error: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.run (ReduceTask.java:1195) So from that it does seem like something else might be going on, yes? I need to do some more research. I appreciate your insights. Andy -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, March 07, 2010 3:38 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError My observation is based on this call chain: MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling ramManager.canFitInMemory(decompressedLength) Basically ramManager.canFitInMemory() makes decision without
Re: Shuffle In Memory OutOfMemoryError
Thanks to Andy for the log he provided. You can see from the log below that size increased steadily from 341535057 to 408181692, approaching maxSize. Then OOME: 2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask: reserve: pos=start requestedSize=3893000 size=341535057 numPendingRequests=0 maxSize=417601952 2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask: reserve: pos=end requestedSize=3893000 size=345428057 numPendingRequests=0 maxSize=417601952 ... 2010-03-10 18:38:35,950 INFO org.apache.hadoop.mapred.ReduceTask: reserve: pos=end requestedSize=635753 size=408181692 numPendingRequests=0 maxSize=417601952 2010-03-10 18:38:36,603 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_201003101826_0001_r_04_0: Failed fetch #1 from attempt_201003101826_0001_m_000875_0 2010-03-10 18:38:36,603 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201003101826_0001_r_04_0 adding host hd17.dfs.returnpath.net to penalty box, next contact in 4 seconds 2010-03-10 18:38:36,604 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201003101826_0001_r_04_0: Got 1 map-outputs from previous failures 2010-03-10 18:38:36,605 FATAL org.apache.hadoop.mapred.TaskRunner: attempt_201003101826_0001_r_04_0 : Map output copy failure : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1513) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1413) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1266) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1200) Looking at the call to unreserve() in ReduceTask, two were for IOException and the other was for Sanity check (line 1557). Meaning they wouldn't be called in normal execution path. I see one call in IFile.InMemoryReader: // Inform the RamManager ramManager.unreserve(bufferSize); And InMemoryReader is used in ReaderK, V reader = new InMemoryReaderK, V(ramManager, mo.mapAttemptId, mo.data, 0, mo.data.length); On Wed, Mar 10, 2010 at 3:34 PM, Chris Douglas chri...@yahoo-inc.comwrote: I don't think this OOM is a framework bug per se, and given the rewrite/refactoring of the shuffle in MAPREDUCE-318 (in 0.21), tuning the 0.20 shuffle semantics is likely not worthwhile (though data informing improvements to trunk would be excellent). Most likely (and tautologically), ReduceTask simply requires more memory than is available and the job failure can be avoided by either 0) increasing the heap size or 1) lowering mapred.shuffle.input.buffer.percent. Most of the tasks we run have a heap of 1GB. For a reduce fetching 200k map outputs, that's a reasonable, even stingy amount of space. -C On Mar 10, 2010, at 5:26 AM, Ted Yu wrote: I verified that size and maxSize are long. This means MR-1182 didn't resolve Andy's issue. According to Andy: At the beginning of the job there are 209,754 pending map tasks and 32 pending reduce tasks My guess is that GC wasn't reclaiming memory fast enough, leading to OOME because of large number of in-memory shuffle candidates. My suggestion for Andy would be to: 1. add -*verbose*:*gc as JVM parameter 2. modify reserve() slightly to calculate the maximum outstanding numPendingRequests and print the maximum. Based on the output from above two items, we can discuss solution. My intuition is to place upperbound on numPendingRequests beyond which canFitInMemory() returns false. * My two cents. On Tue, Mar 9, 2010 at 11:51 PM, Christopher Douglas chri...@yahoo-inc.comwrote: That section of code is unmodified in MR-1182. See the patches/svn log. -C Sent from my iPhone On Mar 9, 2010, at 7:44 PM, Ted Yu yuzhih...@gmail.com wrote: I just downloaded hadoop-0.20.2 tar ball from cloudera mirror. This is what I see in ReduceTask (line 999): public synchronized boolean reserve(int requestedSize, InputStream in) throws InterruptedException { // Wait till the request can be fulfilled... while ((size + requestedSize) maxSize) { I don't see the fix from MR-1182. That's why I suggested to Andy that he manually apply MR-1182. Cheers On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Christopher. The heap size for reduce tasks is configured to be 640M ( mapred.child.java.opts set to -Xmx640m ). Andy -Original Message- From: Christopher Douglas [mailto:chri...@yahoo-inc.com] Sent: Tuesday, March 09, 2010 5:19 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError No, MR-1182 is included in 0.20.2 What heap size have you set for your reduce tasks? -C Sent from my iPhone On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote: Andy
Re: Shuffle In Memory OutOfMemoryError
Message- From: Christopher Douglas [mailto:chri...@yahoo-inc.com] Sent: Tuesday, March 09, 2010 5:19 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError No, MR-1182 is included in 0.20.2 What heap size have you set for your reduce tasks? -C Sent from my iPhone On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote: Andy: You need to manually apply the patch. Cheers On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. My understanding is that MAPREDUCE-1182 is included in the 0.20.2 release. We upgraded our cluster to 0.20.2 this weekend and re-ran the same job scenarios. Running with mapred.reduce.parallel.copies set to 1 and continue to have the same Java heap space error. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, March 09, 2010 12:56 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError This issue has been resolved in http://issues.apache.org/jira/browse/MAPREDUCE-1182 Please apply the patch M1182-1v20.patch http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. Very helpful. You are correct that I misunderstood the code at ReduceTask.java:1535. I missed the fact that it's in a IOException catch block. My mistake. That's what I get for being in a rush. For what it's worth I did re-run the job with mapred.reduce.parallel.copies set with values from 5 all the way down to 1. All failed with the same error: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run (ReduceTask.java:1195) So from that it does seem like something else might be going on, yes? I need to do some more research. I appreciate your insights. Andy -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, March 07, 2010 3:38 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError My observation is based on this call chain: MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling ramManager.canFitInMemory(decompressedLength) Basically ramManager.canFitInMemory() makes decision without considering the number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap may be used in shuffling if default parameters were used. Of course, you should check the value for mapred.reduce.parallel.copies to see if it is 5. If it is 4 or lower, my reasoning wouldn't apply. About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only has 2731 lines. So I have to guess the location of the code snippet you provided. I found this around line 1535: } catch (IOException ioe) { LOG.info(Failed to shuffle from + mapOutputLoc.getTaskAttemptId(), ioe); // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map-output try { mapOutput.discard(); } catch (IOException ignored) { LOG.info(Failed to discard map-output from + mapOutputLoc.getTaskAttemptId(), ignored); } Please confirm the line number. If we're looking at the same code, I am afraid I don't see how we can improve it. First, I assume IOException shouldn't happen that often. Second, mapOutput.discard() just sets: data = null; for in memory case. Even if we call mapOutput.discard() before ramManager.unreserve(), we don't know when GC would kick in and make more memory available. Of course, given the large number of map outputs in your system, it became more likely that the root cause from my reasoning made OOME happen sooner. Thanks On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins andy.saut...@returnpath.net wrote: Ted, I'm trying to follow the logic in your mail and I'm not sure I'm following. If you would mind helping me understand I would appreciate it. Looking at the code maxSingleShuffleLimit is only used in determining if the copy _can_ fit into memory: boolean canFitInMemory(long requestedSize) { return (requestedSize Integer.MAX_VALUE requestedSize maxSingleShuffleLimit); } It also looks like the RamManager.reserve should wait until memory is available so it should hit a memory limit for that reason. What does seem
Re: Shuffle In Memory OutOfMemoryError
FrrkcyriivlkfkjlkcuhgoyjturopEihymbgfkieNjl$jk Simon Simon Matilda Boris lltjhhwgh Sent via BlackBerry from T-Mobile -Original Message- From: Ted Yu yuzhih...@gmail.com Date: Tue, 9 Mar 2010 14:33:28 To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError Andy: You need to manually apply the patch. Cheers On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins andy.saut...@returnpath.netwrote: Thanks Ted. My understanding is that MAPREDUCE-1182 is included in the 0.20.2 release. We upgraded our cluster to 0.20.2 this weekend and re-ran the same job scenarios. Running with mapred.reduce.parallel.copies set to 1 and continue to have the same Java heap space error. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, March 09, 2010 12:56 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError This issue has been resolved in http://issues.apache.org/jira/browse/MAPREDUCE-1182 Please apply the patch M1182-1v20.patch http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. Very helpful. You are correct that I misunderstood the code at ReduceTask.java:1535. I missed the fact that it's in a IOException catch block. My mistake. That's what I get for being in a rush. For what it's worth I did re-run the job with mapred.reduce.parallel.copies set with values from 5 all the way down to 1. All failed with the same error: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) So from that it does seem like something else might be going on, yes? I need to do some more research. I appreciate your insights. Andy -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, March 07, 2010 3:38 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError My observation is based on this call chain: MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling ramManager.canFitInMemory(decompressedLength) Basically ramManager.canFitInMemory() makes decision without considering the number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap may be used in shuffling if default parameters were used. Of course, you should check the value for mapred.reduce.parallel.copies to see if it is 5. If it is 4 or lower, my reasoning wouldn't apply. About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only has 2731 lines. So I have to guess the location of the code snippet you provided. I found this around line 1535: } catch (IOException ioe) { LOG.info(Failed to shuffle from + mapOutputLoc.getTaskAttemptId(), ioe); // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map-output try { mapOutput.discard(); } catch (IOException ignored) { LOG.info(Failed to discard map-output from + mapOutputLoc.getTaskAttemptId(), ignored); } Please confirm the line number. If we're looking at the same code, I am afraid I don't see how we can improve it. First, I assume IOException shouldn't happen that often. Second, mapOutput.discard() just sets: data = null; for in memory case. Even if we call mapOutput.discard() before ramManager.unreserve(), we don't know when GC would kick in and make more memory available. Of course, given the large number of map outputs in your system, it became more likely that the root cause from my reasoning made OOME happen sooner. Thanks On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins andy.saut...@returnpath.net wrote: Ted, I'm trying to follow the logic in your mail and I'm not sure I'm following. If you would mind helping me understand I would appreciate it. Looking at the code maxSingleShuffleLimit is only used in determining if the copy _can_ fit into memory: boolean canFitInMemory(long requestedSize) { return (requestedSize Integer.MAX_VALUE requestedSize maxSingleShuffleLimit); } It also looks like the RamManager.reserve should wait until memory is available so it should hit a memory
Re: Shuffle In Memory OutOfMemoryError
No, MR-1182 is included in 0.20.2 What heap size have you set for your reduce tasks? -C Sent from my iPhone On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote: Andy: You need to manually apply the patch. Cheers On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. My understanding is that MAPREDUCE-1182 is included in the 0.20.2 release. We upgraded our cluster to 0.20.2 this weekend and re-ran the same job scenarios. Running with mapred.reduce.parallel.copies set to 1 and continue to have the same Java heap space error. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, March 09, 2010 12:56 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError This issue has been resolved in http://issues.apache.org/jira/browse/MAPREDUCE-1182 Please apply the patch M1182-1v20.patch http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. Very helpful. You are correct that I misunderstood the code at ReduceTask.java:1535. I missed the fact that it's in a IOException catch block. My mistake. That's what I get for being in a rush. For what it's worth I did re-run the job with mapred.reduce.parallel.copies set with values from 5 all the way down to 1. All failed with the same error: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run (ReduceTask.java:1195) So from that it does seem like something else might be going on, yes? I need to do some more research. I appreciate your insights. Andy -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, March 07, 2010 3:38 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError My observation is based on this call chain: MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling ramManager.canFitInMemory(decompressedLength) Basically ramManager.canFitInMemory() makes decision without considering the number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap may be used in shuffling if default parameters were used. Of course, you should check the value for mapred.reduce.parallel.copies to see if it is 5. If it is 4 or lower, my reasoning wouldn't apply. About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only has 2731 lines. So I have to guess the location of the code snippet you provided. I found this around line 1535: } catch (IOException ioe) { LOG.info(Failed to shuffle from + mapOutputLoc.getTaskAttemptId(), ioe); // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map-output try { mapOutput.discard(); } catch (IOException ignored) { LOG.info(Failed to discard map-output from + mapOutputLoc.getTaskAttemptId(), ignored); } Please confirm the line number. If we're looking at the same code, I am afraid I don't see how we can improve it. First, I assume IOException shouldn't happen that often. Second, mapOutput.discard() just sets: data = null; for in memory case. Even if we call mapOutput.discard() before ramManager.unreserve(), we don't know when GC would kick in and make more memory available. Of course, given the large number of map outputs in your system, it became more likely that the root cause from my reasoning made OOME happen sooner. Thanks On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins andy.saut...@returnpath.net wrote: Ted, I'm trying to follow the logic in your mail and I'm not sure I'm following. If you would mind helping me understand I would appreciate it. Looking at the code maxSingleShuffleLimit is only used in determining if the copy _can_ fit into memory: boolean canFitInMemory(long requestedSize) { return (requestedSize Integer.MAX_VALUE requestedSize maxSingleShuffleLimit); } It also looks like the RamManager.reserve should wait until memory is available so it should hit a memory limit for that reason. What does seem a little strange to me is the following ( ReduceTask.java starting at 2730 ): // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map
RE: Shuffle In Memory OutOfMemoryError
Thanks Christopher. The heap size for reduce tasks is configured to be 640M ( mapred.child.java.opts set to -Xmx640m ). Andy -Original Message- From: Christopher Douglas [mailto:chri...@yahoo-inc.com] Sent: Tuesday, March 09, 2010 5:19 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError No, MR-1182 is included in 0.20.2 What heap size have you set for your reduce tasks? -C Sent from my iPhone On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote: Andy: You need to manually apply the patch. Cheers On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. My understanding is that MAPREDUCE-1182 is included in the 0.20.2 release. We upgraded our cluster to 0.20.2 this weekend and re-ran the same job scenarios. Running with mapred.reduce.parallel.copies set to 1 and continue to have the same Java heap space error. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, March 09, 2010 12:56 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError This issue has been resolved in http://issues.apache.org/jira/browse/MAPREDUCE-1182 Please apply the patch M1182-1v20.patch http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. Very helpful. You are correct that I misunderstood the code at ReduceTask.java:1535. I missed the fact that it's in a IOException catch block. My mistake. That's what I get for being in a rush. For what it's worth I did re-run the job with mapred.reduce.parallel.copies set with values from 5 all the way down to 1. All failed with the same error: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run (ReduceTask.java:1195) So from that it does seem like something else might be going on, yes? I need to do some more research. I appreciate your insights. Andy -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, March 07, 2010 3:38 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError My observation is based on this call chain: MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling ramManager.canFitInMemory(decompressedLength) Basically ramManager.canFitInMemory() makes decision without considering the number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap may be used in shuffling if default parameters were used. Of course, you should check the value for mapred.reduce.parallel.copies to see if it is 5. If it is 4 or lower, my reasoning wouldn't apply. About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only has 2731 lines. So I have to guess the location of the code snippet you provided. I found this around line 1535: } catch (IOException ioe) { LOG.info(Failed to shuffle from + mapOutputLoc.getTaskAttemptId(), ioe); // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map-output try { mapOutput.discard(); } catch (IOException ignored) { LOG.info(Failed to discard map-output from + mapOutputLoc.getTaskAttemptId(), ignored); } Please confirm the line number. If we're looking at the same code, I am afraid I don't see how we can improve it. First, I assume IOException shouldn't happen that often. Second, mapOutput.discard() just sets: data = null; for in memory case. Even if we call mapOutput.discard() before ramManager.unreserve(), we don't know when GC would kick in and make more memory available. Of course, given the large number of map outputs in your system, it became more likely that the root cause from my reasoning made OOME happen sooner. Thanks On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins andy.saut...@returnpath.net wrote: Ted, I'm trying to follow the logic in your mail and I'm not sure I'm following. If you would mind helping me understand I would appreciate it. Looking at the code maxSingleShuffleLimit is only used in determining if the copy _can_ fit into memory: boolean canFitInMemory(long requestedSize) { return (requestedSize Integer.MAX_VALUE requestedSize
Re: Shuffle In Memory OutOfMemoryError
I just downloaded hadoop-0.20.2 tar ball from cloudera mirror. This is what I see in ReduceTask (line 999): public synchronized boolean reserve(int requestedSize, InputStream in) throws InterruptedException { // Wait till the request can be fulfilled... while ((size + requestedSize) maxSize) { I don't see the fix from MR-1182. That's why I suggested to Andy that he manually apply MR-1182. Cheers On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins andy.saut...@returnpath.netwrote: Thanks Christopher. The heap size for reduce tasks is configured to be 640M ( mapred.child.java.opts set to -Xmx640m ). Andy -Original Message- From: Christopher Douglas [mailto:chri...@yahoo-inc.com] Sent: Tuesday, March 09, 2010 5:19 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError No, MR-1182 is included in 0.20.2 What heap size have you set for your reduce tasks? -C Sent from my iPhone On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote: Andy: You need to manually apply the patch. Cheers On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. My understanding is that MAPREDUCE-1182 is included in the 0.20.2 release. We upgraded our cluster to 0.20.2 this weekend and re-ran the same job scenarios. Running with mapred.reduce.parallel.copies set to 1 and continue to have the same Java heap space error. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, March 09, 2010 12:56 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError This issue has been resolved in http://issues.apache.org/jira/browse/MAPREDUCE-1182 Please apply the patch M1182-1v20.patch http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. Very helpful. You are correct that I misunderstood the code at ReduceTask.java:1535. I missed the fact that it's in a IOException catch block. My mistake. That's what I get for being in a rush. For what it's worth I did re-run the job with mapred.reduce.parallel.copies set with values from 5 all the way down to 1. All failed with the same error: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run (ReduceTask.java:1195) So from that it does seem like something else might be going on, yes? I need to do some more research. I appreciate your insights. Andy -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, March 07, 2010 3:38 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError My observation is based on this call chain: MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling ramManager.canFitInMemory(decompressedLength) Basically ramManager.canFitInMemory() makes decision without considering the number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap may be used in shuffling if default parameters were used. Of course, you should check the value for mapred.reduce.parallel.copies to see if it is 5. If it is 4 or lower, my reasoning wouldn't apply. About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only has 2731 lines. So I have to guess the location of the code snippet you provided. I found this around line 1535: } catch (IOException ioe) { LOG.info(Failed to shuffle from + mapOutputLoc.getTaskAttemptId(), ioe); // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map-output try { mapOutput.discard(); } catch (IOException ignored) { LOG.info(Failed to discard map-output from + mapOutputLoc.getTaskAttemptId(), ignored); } Please confirm the line number. If we're looking at the same code, I am afraid I don't see how we can improve it. First, I assume IOException shouldn't happen that often. Second, mapOutput.discard() just sets: data = null; for in memory case. Even if we call mapOutput.discard() before ramManager.unreserve(), we don't know when GC would kick in and make more memory available. Of course, given the large
Re: Shuffle In Memory OutOfMemoryError
That section of code is unmodified in MR-1182. See the patches/svn log. -C Sent from my iPhone On Mar 9, 2010, at 7:44 PM, Ted Yu yuzhih...@gmail.com wrote: I just downloaded hadoop-0.20.2 tar ball from cloudera mirror. This is what I see in ReduceTask (line 999): public synchronized boolean reserve(int requestedSize, InputStream in) throws InterruptedException { // Wait till the request can be fulfilled... while ((size + requestedSize) maxSize) { I don't see the fix from MR-1182. That's why I suggested to Andy that he manually apply MR-1182. Cheers On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Christopher. The heap size for reduce tasks is configured to be 640M ( mapred.child.java.opts set to -Xmx640m ). Andy -Original Message- From: Christopher Douglas [mailto:chri...@yahoo-inc.com] Sent: Tuesday, March 09, 2010 5:19 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError No, MR-1182 is included in 0.20.2 What heap size have you set for your reduce tasks? -C Sent from my iPhone On Mar 9, 2010, at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote: Andy: You need to manually apply the patch. Cheers On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. My understanding is that MAPREDUCE-1182 is included in the 0.20.2 release. We upgraded our cluster to 0.20.2 this weekend and re-ran the same job scenarios. Running with mapred.reduce.parallel.copies set to 1 and continue to have the same Java heap space error. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, March 09, 2010 12:56 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError This issue has been resolved in http://issues.apache.org/jira/browse/MAPREDUCE-1182 Please apply the patch M1182-1v20.patch http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins andy.saut...@returnpath.net wrote: Thanks Ted. Very helpful. You are correct that I misunderstood the code at ReduceTask.java:1535. I missed the fact that it's in a IOException catch block. My mistake. That's what I get for being in a rush. For what it's worth I did re-run the job with mapred.reduce.parallel.copies set with values from 5 all the way down to 1. All failed with the same error: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.run (ReduceTask.java:1195) So from that it does seem like something else might be going on, yes? I need to do some more research. I appreciate your insights. Andy -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, March 07, 2010 3:38 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError My observation is based on this call chain: MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling ramManager.canFitInMemory(decompressedLength) Basically ramManager.canFitInMemory() makes decision without considering the number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap may be used in shuffling if default parameters were used. Of course, you should check the value for mapred.reduce.parallel.copies to see if it is 5. If it is 4 or lower, my reasoning wouldn't apply. About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only has 2731 lines. So I have to guess the location of the code snippet you provided. I found this around line 1535: } catch (IOException ioe) { LOG.info(Failed to shuffle from + mapOutputLoc.getTaskAttemptId(), ioe); // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map-output try { mapOutput.discard(); } catch (IOException ignored) { LOG.info(Failed to discard map-output from + mapOutputLoc.getTaskAttemptId(), ignored); } Please confirm the line number. If we're looking at the same code, I am afraid I don't see how we can improve it. First, I assume IOException shouldn't happen that often. Second, mapOutput.discard() just sets: data = null; for in memory case. Even if we call mapOutput.discard() before ramManager.unreserve(), we don't know when GC would kick in and make more memory available. Of course, given the large number of map outputs in your system, it became more likely
Re: Shuffle In Memory OutOfMemoryError
Ted, Thank you. I filled MAPREDUCE-1571 to cover this issue. I might have some time to write a patch later this week. Jacob Rideout On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu yuzhih...@gmail.com wrote: I think there is mismatch (in ReduceTask.java) between: this.numCopiers = conf.getInt(mapred.reduce.parallel.copies, 5); and: maxSingleShuffleLimit = (long)(maxSize * MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION); where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f because copiers = new ArrayListMapOutputCopier(numCopiers); so the total memory allocated for in-mem shuffle is 1.25 * maxSize A JIRA should be filed to correlate the constant 5 above and MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION. Cheers On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout apa...@jacobrideout.netwrote: Hi all, We are seeing the following error in our reducers of a particular job: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) After enough reducers fail the entire job fails. This error occurs regardless of whether mapred.compress.map.output is true. We were able to avoid the issue by reducing mapred.job.shuffle.input.buffer.percent to 20%. Shouldn't the framework via ShuffleRamManager.canFitInMemory and.ShuffleRamManager.reserve correctly detect the the memory available for allocation? I would think that with poor configuration settings (and default settings in particular) the job may not be as efficient, but wouldn't die. Here is some more context in the logs, I have attached the full reducer log here: http://gist.github.com/323746 2010-03-06 07:54:49,621 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 4191933 bytes (435311 raw bytes) into RAM from attempt_201003060739_0002_m_61_0 2010-03-06 07:54:50,222 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_201003060739_0002_r_00_0: Failed fetch #1 from attempt_201003060739_0002_m_000202_0 2010-03-06 07:54:50,223 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201003060739_0002_r_00_0 adding host hd37.dfs.returnpath.net to penalty box, next contact in 4 seconds 2010-03-06 07:54:50,223 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201003060739_0002_r_00_0: Got 1 map-outputs from previous failures 2010-03-06 07:54:50,223 FATAL org.apache.hadoop.mapred.TaskRunner: attempt_201003060739_0002_r_00_0 : Map output copy failure : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) We tried this both in 0.20.1 and 0.20.2. We had hoped MAPREDUCE-1182 would address the issue in 0.20.2, but it did not. Does anyone have any comments or suggestions? Is this a bug I should file a JIRA for? Jacob Rideout Return Path
RE: Shuffle In Memory OutOfMemoryError
Ted, I'm trying to follow the logic in your mail and I'm not sure I'm following. If you would mind helping me understand I would appreciate it. Looking at the code maxSingleShuffleLimit is only used in determining if the copy _can_ fit into memory: boolean canFitInMemory(long requestedSize) { return (requestedSize Integer.MAX_VALUE requestedSize maxSingleShuffleLimit); } It also looks like the RamManager.reserve should wait until memory is available so it should hit a memory limit for that reason. What does seem a little strange to me is the following ( ReduceTask.java starting at 2730 ): // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map-output try { mapOutput.discard(); } catch (IOException ignored) { LOG.info(Failed to discard map-output from + mapOutputLoc.getTaskAttemptId(), ignored); } mapOutput = null; So to me that looks like the ramManager unreserves the memory before the mapOutput is discarded. Shouldn't the mapOutput be discarded _before_ the ramManager unreserves the memory? If the memory is unreserved before the actual underlying data references are removed then it seems like another thread can try to allocate memory ( ReduceTask.java:2730 ) before the previous memory is disposed ( mapOutput.discard() ). Not sure that makes sense. One thing to note is that the particular job that is failing does have a good number ( 200k+ ) of map outputs. The large number of small map outputs may be why we are triggering a problem. Thanks again for your thoughts. Andy -Original Message- From: Jacob R Rideout [mailto:apa...@jacobrideout.net] Sent: Sunday, March 07, 2010 1:21 PM To: common-user@hadoop.apache.org Cc: Andy Sautins; Ted Yu Subject: Re: Shuffle In Memory OutOfMemoryError Ted, Thank you. I filled MAPREDUCE-1571 to cover this issue. I might have some time to write a patch later this week. Jacob Rideout On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu yuzhih...@gmail.com wrote: I think there is mismatch (in ReduceTask.java) between: this.numCopiers = conf.getInt(mapred.reduce.parallel.copies, 5); and: maxSingleShuffleLimit = (long)(maxSize * MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION); where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f because copiers = new ArrayListMapOutputCopier(numCopiers); so the total memory allocated for in-mem shuffle is 1.25 * maxSize A JIRA should be filed to correlate the constant 5 above and MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION. Cheers On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout apa...@jacobrideout.netwrote: Hi all, We are seeing the following error in our reducers of a particular job: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) After enough reducers fail the entire job fails. This error occurs regardless of whether mapred.compress.map.output is true. We were able to avoid the issue by reducing mapred.job.shuffle.input.buffer.percent to 20%. Shouldn't the framework via ShuffleRamManager.canFitInMemory and.ShuffleRamManager.reserve correctly detect the the memory available for allocation? I would think that with poor configuration settings (and default settings in particular) the job may not be as efficient, but wouldn't die. Here is some more context in the logs, I have attached the full reducer log here: http://gist.github.com/323746 2010-03-06 07:54:49,621 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 4191933 bytes (435311 raw bytes) into RAM from attempt_201003060739_0002_m_61_0 2010-03-06 07:54:50,222 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_201003060739_0002_r_00_0: Failed fetch #1 from attempt_201003060739_0002_m_000202_0 2010-03-06 07:54:50,223 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201003060739_0002_r_00_0 adding host hd37.dfs.returnpath.net to penalty box, next contact in 4 seconds 2010-03-06 07:54:50,223 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201003060739_0002_r_00_0: Got 1 map-outputs from previous failures 2010-03-06 07:54:50,223 FATAL org.apache.hadoop.mapred.TaskRunner: attempt_201003060739_0002_r_00_0 : Map output copy failure : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508
Re: Shuffle In Memory OutOfMemoryError
My observation is based on this call chain: MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling ramManager.canFitInMemory(decompressedLength) Basically ramManager.canFitInMemory() makes decision without considering the number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap may be used in shuffling if default parameters were used. Of course, you should check the value for mapred.reduce.parallel.copies to see if it is 5. If it is 4 or lower, my reasoning wouldn't apply. About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only has 2731 lines. So I have to guess the location of the code snippet you provided. I found this around line 1535: } catch (IOException ioe) { LOG.info(Failed to shuffle from + mapOutputLoc.getTaskAttemptId(), ioe); // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map-output try { mapOutput.discard(); } catch (IOException ignored) { LOG.info(Failed to discard map-output from + mapOutputLoc.getTaskAttemptId(), ignored); } Please confirm the line number. If we're looking at the same code, I am afraid I don't see how we can improve it. First, I assume IOException shouldn't happen that often. Second, mapOutput.discard() just sets: data = null; for in memory case. Even if we call mapOutput.discard() before ramManager.unreserve(), we don't know when GC would kick in and make more memory available. Of course, given the large number of map outputs in your system, it became more likely that the root cause from my reasoning made OOME happen sooner. Thanks On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins andy.saut...@returnpath.netwrote: Ted, I'm trying to follow the logic in your mail and I'm not sure I'm following. If you would mind helping me understand I would appreciate it. Looking at the code maxSingleShuffleLimit is only used in determining if the copy _can_ fit into memory: boolean canFitInMemory(long requestedSize) { return (requestedSize Integer.MAX_VALUE requestedSize maxSingleShuffleLimit); } It also looks like the RamManager.reserve should wait until memory is available so it should hit a memory limit for that reason. What does seem a little strange to me is the following ( ReduceTask.java starting at 2730 ): // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map-output try { mapOutput.discard(); } catch (IOException ignored) { LOG.info(Failed to discard map-output from + mapOutputLoc.getTaskAttemptId(), ignored); } mapOutput = null; So to me that looks like the ramManager unreserves the memory before the mapOutput is discarded. Shouldn't the mapOutput be discarded _before_ the ramManager unreserves the memory? If the memory is unreserved before the actual underlying data references are removed then it seems like another thread can try to allocate memory ( ReduceTask.java:2730 ) before the previous memory is disposed ( mapOutput.discard() ). Not sure that makes sense. One thing to note is that the particular job that is failing does have a good number ( 200k+ ) of map outputs. The large number of small map outputs may be why we are triggering a problem. Thanks again for your thoughts. Andy -Original Message- From: Jacob R Rideout [mailto:apa...@jacobrideout.net] Sent: Sunday, March 07, 2010 1:21 PM To: common-user@hadoop.apache.org Cc: Andy Sautins; Ted Yu Subject: Re: Shuffle In Memory OutOfMemoryError Ted, Thank you. I filled MAPREDUCE-1571 to cover this issue. I might have some time to write a patch later this week. Jacob Rideout On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu yuzhih...@gmail.com wrote: I think there is mismatch (in ReduceTask.java) between: this.numCopiers = conf.getInt(mapred.reduce.parallel.copies, 5); and: maxSingleShuffleLimit = (long)(maxSize * MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION); where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f because copiers = new ArrayListMapOutputCopier(numCopiers); so the total memory allocated for in-mem shuffle is 1.25 * maxSize A JIRA should be filed to correlate the constant 5 above and MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION. Cheers On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout apa...@jacobrideout.net wrote: Hi all, We are seeing the following error in our reducers of a particular job: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508
RE: Shuffle In Memory OutOfMemoryError
Thanks Ted. Very helpful. You are correct that I misunderstood the code at ReduceTask.java:1535. I missed the fact that it's in a IOException catch block. My mistake. That's what I get for being in a rush. For what it's worth I did re-run the job with mapred.reduce.parallel.copies set with values from 5 all the way down to 1. All failed with the same error: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) So from that it does seem like something else might be going on, yes? I need to do some more research. I appreciate your insights. Andy -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, March 07, 2010 3:38 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError My observation is based on this call chain: MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling ramManager.canFitInMemory(decompressedLength) Basically ramManager.canFitInMemory() makes decision without considering the number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap may be used in shuffling if default parameters were used. Of course, you should check the value for mapred.reduce.parallel.copies to see if it is 5. If it is 4 or lower, my reasoning wouldn't apply. About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only has 2731 lines. So I have to guess the location of the code snippet you provided. I found this around line 1535: } catch (IOException ioe) { LOG.info(Failed to shuffle from + mapOutputLoc.getTaskAttemptId(), ioe); // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map-output try { mapOutput.discard(); } catch (IOException ignored) { LOG.info(Failed to discard map-output from + mapOutputLoc.getTaskAttemptId(), ignored); } Please confirm the line number. If we're looking at the same code, I am afraid I don't see how we can improve it. First, I assume IOException shouldn't happen that often. Second, mapOutput.discard() just sets: data = null; for in memory case. Even if we call mapOutput.discard() before ramManager.unreserve(), we don't know when GC would kick in and make more memory available. Of course, given the large number of map outputs in your system, it became more likely that the root cause from my reasoning made OOME happen sooner. Thanks On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins andy.saut...@returnpath.netwrote: Ted, I'm trying to follow the logic in your mail and I'm not sure I'm following. If you would mind helping me understand I would appreciate it. Looking at the code maxSingleShuffleLimit is only used in determining if the copy _can_ fit into memory: boolean canFitInMemory(long requestedSize) { return (requestedSize Integer.MAX_VALUE requestedSize maxSingleShuffleLimit); } It also looks like the RamManager.reserve should wait until memory is available so it should hit a memory limit for that reason. What does seem a little strange to me is the following ( ReduceTask.java starting at 2730 ): // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map-output try { mapOutput.discard(); } catch (IOException ignored) { LOG.info(Failed to discard map-output from + mapOutputLoc.getTaskAttemptId(), ignored); } mapOutput = null; So to me that looks like the ramManager unreserves the memory before the mapOutput is discarded. Shouldn't the mapOutput be discarded _before_ the ramManager unreserves the memory? If the memory is unreserved before the actual underlying data references are removed then it seems like another thread can try to allocate memory ( ReduceTask.java:2730 ) before the previous memory is disposed ( mapOutput.discard() ). Not sure that makes sense. One thing to note is that the particular job that is failing does have a good number ( 200k+ ) of map outputs. The large number of small map outputs may be why we are triggering a problem. Thanks again for your thoughts. Andy -Original Message- From: Jacob R Rideout [mailto:apa...@jacobrideout.net] Sent: Sunday, March 07, 2010
Re: Shuffle In Memory OutOfMemoryError
Lowering mapred.job.shuffle.input.buffer.percent would be the option to choose. Maybe GC wasn't releasing memory fast enough for in memory shuffling. On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins andy.saut...@returnpath.netwrote: Thanks Ted. Very helpful. You are correct that I misunderstood the code at ReduceTask.java:1535. I missed the fact that it's in a IOException catch block. My mistake. That's what I get for being in a rush. For what it's worth I did re-run the job with mapred.reduce.parallel.copies set with values from 5 all the way down to 1. All failed with the same error: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) So from that it does seem like something else might be going on, yes? I need to do some more research. I appreciate your insights. Andy -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, March 07, 2010 3:38 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError My observation is based on this call chain: MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling ramManager.canFitInMemory(decompressedLength) Basically ramManager.canFitInMemory() makes decision without considering the number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap may be used in shuffling if default parameters were used. Of course, you should check the value for mapred.reduce.parallel.copies to see if it is 5. If it is 4 or lower, my reasoning wouldn't apply. About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only has 2731 lines. So I have to guess the location of the code snippet you provided. I found this around line 1535: } catch (IOException ioe) { LOG.info(Failed to shuffle from + mapOutputLoc.getTaskAttemptId(), ioe); // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map-output try { mapOutput.discard(); } catch (IOException ignored) { LOG.info(Failed to discard map-output from + mapOutputLoc.getTaskAttemptId(), ignored); } Please confirm the line number. If we're looking at the same code, I am afraid I don't see how we can improve it. First, I assume IOException shouldn't happen that often. Second, mapOutput.discard() just sets: data = null; for in memory case. Even if we call mapOutput.discard() before ramManager.unreserve(), we don't know when GC would kick in and make more memory available. Of course, given the large number of map outputs in your system, it became more likely that the root cause from my reasoning made OOME happen sooner. Thanks On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins andy.saut...@returnpath.net wrote: Ted, I'm trying to follow the logic in your mail and I'm not sure I'm following. If you would mind helping me understand I would appreciate it. Looking at the code maxSingleShuffleLimit is only used in determining if the copy _can_ fit into memory: boolean canFitInMemory(long requestedSize) { return (requestedSize Integer.MAX_VALUE requestedSize maxSingleShuffleLimit); } It also looks like the RamManager.reserve should wait until memory is available so it should hit a memory limit for that reason. What does seem a little strange to me is the following ( ReduceTask.java starting at 2730 ): // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map-output try { mapOutput.discard(); } catch (IOException ignored) { LOG.info(Failed to discard map-output from + mapOutputLoc.getTaskAttemptId(), ignored); } mapOutput = null; So to me that looks like the ramManager unreserves the memory before the mapOutput is discarded. Shouldn't the mapOutput be discarded _before_ the ramManager unreserves the memory? If the memory is unreserved before the actual underlying data references are removed then it seems like another thread can try to allocate memory ( ReduceTask.java:2730 ) before the previous memory is disposed ( mapOutput.discard() ). Not sure that makes sense. One thing to note is that the particular job
Shuffle In Memory OutOfMemoryError
Hi all, We are seeing the following error in our reducers of a particular job: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) After enough reducers fail the entire job fails. This error occurs regardless of whether mapred.compress.map.output is true. We were able to avoid the issue by reducing mapred.job.shuffle.input.buffer.percent to 20%. Shouldn't the framework via ShuffleRamManager.canFitInMemory and.ShuffleRamManager.reserve correctly detect the the memory available for allocation? I would think that with poor configuration settings (and default settings in particular) the job may not be as efficient, but wouldn't die. Here is some more context in the logs, I have attached the full reducer log here: http://gist.github.com/323746 2010-03-06 07:54:49,621 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 4191933 bytes (435311 raw bytes) into RAM from attempt_201003060739_0002_m_61_0 2010-03-06 07:54:50,222 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_201003060739_0002_r_00_0: Failed fetch #1 from attempt_201003060739_0002_m_000202_0 2010-03-06 07:54:50,223 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201003060739_0002_r_00_0 adding host hd37.dfs.returnpath.net to penalty box, next contact in 4 seconds 2010-03-06 07:54:50,223 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201003060739_0002_r_00_0: Got 1 map-outputs from previous failures 2010-03-06 07:54:50,223 FATAL org.apache.hadoop.mapred.TaskRunner: attempt_201003060739_0002_r_00_0 : Map output copy failure : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) We tried this both in 0.20.1 and 0.20.2. We had hoped MAPREDUCE-1182 would address the issue in 0.20.2, but it did not. Does anyone have any comments or suggestions? Is this a bug I should file a JIRA for? Jacob Rideout Return Path
Re: Shuffle In Memory OutOfMemoryError
I think there is mismatch (in ReduceTask.java) between: this.numCopiers = conf.getInt(mapred.reduce.parallel.copies, 5); and: maxSingleShuffleLimit = (long)(maxSize * MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION); where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f because copiers = new ArrayListMapOutputCopier(numCopiers); so the total memory allocated for in-mem shuffle is 1.25 * maxSize A JIRA should be filed to correlate the constant 5 above and MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION. Cheers On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout apa...@jacobrideout.netwrote: Hi all, We are seeing the following error in our reducers of a particular job: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) After enough reducers fail the entire job fails. This error occurs regardless of whether mapred.compress.map.output is true. We were able to avoid the issue by reducing mapred.job.shuffle.input.buffer.percent to 20%. Shouldn't the framework via ShuffleRamManager.canFitInMemory and.ShuffleRamManager.reserve correctly detect the the memory available for allocation? I would think that with poor configuration settings (and default settings in particular) the job may not be as efficient, but wouldn't die. Here is some more context in the logs, I have attached the full reducer log here: http://gist.github.com/323746 2010-03-06 07:54:49,621 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 4191933 bytes (435311 raw bytes) into RAM from attempt_201003060739_0002_m_61_0 2010-03-06 07:54:50,222 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_201003060739_0002_r_00_0: Failed fetch #1 from attempt_201003060739_0002_m_000202_0 2010-03-06 07:54:50,223 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201003060739_0002_r_00_0 adding host hd37.dfs.returnpath.net to penalty box, next contact in 4 seconds 2010-03-06 07:54:50,223 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201003060739_0002_r_00_0: Got 1 map-outputs from previous failures 2010-03-06 07:54:50,223 FATAL org.apache.hadoop.mapred.TaskRunner: attempt_201003060739_0002_r_00_0 : Map output copy failure : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) We tried this both in 0.20.1 and 0.20.2. We had hoped MAPREDUCE-1182 would address the issue in 0.20.2, but it did not. Does anyone have any comments or suggestions? Is this a bug I should file a JIRA for? Jacob Rideout Return Path