Ted, I'm trying to follow the logic in your mail and I'm not sure I'm following. If you would mind helping me understand I would appreciate it.
Looking at the code maxSingleShuffleLimit is only used in determining if the copy _can_ fit into memory: boolean canFitInMemory(long requestedSize) { return (requestedSize < Integer.MAX_VALUE && requestedSize < maxSingleShuffleLimit); } It also looks like the RamManager.reserve should wait until memory is available so it should hit a memory limit for that reason. What does seem a little strange to me is the following ( ReduceTask.java starting at 2730 ): // Inform the ram-manager ramManager.closeInMemoryFile(mapOutputLength); ramManager.unreserve(mapOutputLength); // Discard the map-output try { mapOutput.discard(); } catch (IOException ignored) { LOG.info("Failed to discard map-output from " + mapOutputLoc.getTaskAttemptId(), ignored); } mapOutput = null; So to me that looks like the ramManager unreserves the memory before the mapOutput is discarded. Shouldn't the mapOutput be discarded _before_ the ramManager unreserves the memory? If the memory is unreserved before the actual underlying data references are removed then it seems like another thread can try to allocate memory ( ReduceTask.java:2730 ) before the previous memory is disposed ( mapOutput.discard() ). Not sure that makes sense. One thing to note is that the particular job that is failing does have a good number ( 200k+ ) of map outputs. The large number of small map outputs may be why we are triggering a problem. Thanks again for your thoughts. Andy -----Original Message----- From: Jacob R Rideout [mailto:apa...@jacobrideout.net] Sent: Sunday, March 07, 2010 1:21 PM To: common-user@hadoop.apache.org Cc: Andy Sautins; Ted Yu Subject: Re: Shuffle In Memory OutOfMemoryError Ted, Thank you. I filled MAPREDUCE-1571 to cover this issue. I might have some time to write a patch later this week. Jacob Rideout On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu <yuzhih...@gmail.com> wrote: > I think there is mismatch (in ReduceTask.java) between: > this.numCopiers = conf.getInt("mapred.reduce.parallel.copies", 5); > and: > maxSingleShuffleLimit = (long)(maxSize * > MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION); > where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f > > because > copiers = new ArrayList<MapOutputCopier>(numCopiers); > so the total memory allocated for in-mem shuffle is 1.25 * maxSize > > A JIRA should be filed to correlate the constant 5 above and > MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION. > > Cheers > > On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout > <apa...@jacobrideout.net>wrote: > >> Hi all, >> >> We are seeing the following error in our reducers of a particular job: >> >> Error: java.lang.OutOfMemoryError: Java heap space >> at >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) >> at >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) >> at >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) >> at >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) >> >> >> After enough reducers fail the entire job fails. This error occurs >> regardless of whether mapred.compress.map.output is true. We were able >> to avoid the issue by reducing mapred.job.shuffle.input.buffer.percent >> to 20%. Shouldn't the framework via ShuffleRamManager.canFitInMemory >> and.ShuffleRamManager.reserve correctly detect the the memory >> available for allocation? I would think that with poor configuration >> settings (and default settings in particular) the job may not be as >> efficient, but wouldn't die. >> >> Here is some more context in the logs, I have attached the full >> reducer log here: http://gist.github.com/323746 >> >> >> 2010-03-06 07:54:49,621 INFO org.apache.hadoop.mapred.ReduceTask: >> Shuffling 4191933 bytes (435311 raw bytes) into RAM from >> attempt_201003060739_0002_m_000061_0 >> 2010-03-06 07:54:50,222 INFO org.apache.hadoop.mapred.ReduceTask: Task >> attempt_201003060739_0002_r_000000_0: Failed fetch #1 from >> attempt_201003060739_0002_m_000202_0 >> 2010-03-06 07:54:50,223 WARN org.apache.hadoop.mapred.ReduceTask: >> attempt_201003060739_0002_r_000000_0 adding host >> hd37.dfs.returnpath.net to penalty box, next contact in 4 seconds >> 2010-03-06 07:54:50,223 INFO org.apache.hadoop.mapred.ReduceTask: >> attempt_201003060739_0002_r_000000_0: Got 1 map-outputs from previous >> failures >> 2010-03-06 07:54:50,223 FATAL org.apache.hadoop.mapred.TaskRunner: >> attempt_201003060739_0002_r_000000_0 : Map output copy failure : >> java.lang.OutOfMemoryError: Java heap space >> at >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) >> at >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) >> at >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) >> at >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) >> >> >> We tried this both in 0.20.1 and 0.20.2. We had hoped MAPREDUCE-1182 >> would address the issue in 0.20.2, but it did not. Does anyone have >> any comments or suggestions? Is this a bug I should file a JIRA for? >> >> Jacob Rideout >> Return Path >> >