[ https://issues.apache.org/jira/browse/MAPREDUCE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465215#comment-13465215 ]
Sandy Ryza commented on MAPREDUCE-4655: --------------------------------------- Looked at a heap dump, and it appears that the problem was caused by Avro holding on to a reference after it was done with it. Filed AVRO-1175. > MergeManager.reserve can OutOfMemoryError if more than 10% of max memory is > used on non-MapOutputs > -------------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-4655 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4655 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.0.1-alpha > Reporter: Sandy Ryza > > The MergeManager does a memory check, using a limit that defaults to 90% of > Runtime.getRuntime().maxMemory(). Allocations that would bring the total > memory allocated by the MergeManager over this limit are asked to wait until > memory frees up. Disk is used for single allocations that would be over 25% > of the memory limit. > If some other part of the reducer were to be using more than 10% of the > memory. the current check wouldn't stop an OutOfMemoryError. > Before creating an in-memory MapOutput, a check can be done using > Runtime.getRuntime().freeMemory(), waiting until memory is freed up if it > fails. > 12/08/17 10:36:29 INFO mapreduce.Job: Task Id : > attempt_1342723342632_0010_r_000005_0, Status : FAILED > Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#6 > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:123) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147) > Caused by: java.lang.OutOfMemoryError: Java heap space > at > org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:58) > > at > org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:45) > > at > org.apache.hadoop.mapreduce.task.reduce.MapOutput.<init>(MapOutput.java:97) > at > org.apache.hadoop.mapreduce.task.reduce.MergeManager.unconditionalReserve(MergeManager.java:286) > > at > org.apache.hadoop.mapreduce.task.reduce.MergeManager.reserve(MergeManager.java:276) > > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:327) > > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:273) > > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:153) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira