[ https://issues.apache.org/jira/browse/MAPREDUCE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040517#comment-16040517 ]
Adam Sotona commented on MAPREDUCE-6447: ---------------------------------------- With relation to this issue I have manually performed a kind of configuration "bracketing", where I manually altered the job configuration of a still running application before another attempt. To do it manually is a bit complicated as you have to modify the job configuration in HDFS stagging folder of the application as well as on each data node in the local appcache/<app>/filecache. However what if there would be a mechanism to prepare such alternate configuration(s), which will be used for second and following attempts after a task fail? For example in job configuration I would set: mapreduce.reduce.shuffle.input.buffer.percent=0.70 mapreduce.reduce.shuffle.memory.limit.percent=0.4 mapreduce.reduce.shuffle.parallelcopies=30 attempt-2.mapreduce.reduce.shuffle.input.buffer.percent=0.70 attempt-2.mapreduce.reduce.shuffle.memory.limit.percent=0.2 attempt-2.mapreduce.reduce.shuffle.parallelcopies=4 attempt-3.mapreduce.reduce.shuffle.input.buffer.percent=0.20 attempt-3.mapreduce.reduce.shuffle.memory.limit.percent=0.2 attempt-3.mapreduce.reduce.shuffle.parallelcopies=2 Such mechanism should not be hard to implement and it would allow to have alternate fail-over configurations prepared in advance instead of long-term repeated bug hunting. Thanks > reduce shuffle throws "java.lang.OutOfMemoryError: Java heap space" > ------------------------------------------------------------------- > > Key: MAPREDUCE-6447 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6447 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.7.1 > Reporter: shuzhangyao > Assignee: shuzhangyao > Priority: Minor > > 2015-08-11 14:03:54,550 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#10 > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Caused by: java.lang.OutOfMemoryError: Java heap space > at > org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56) > at > org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46) > at > org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63) > at > org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:303) > at > org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:293) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:511) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org