[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040517#comment-16040517
 ] 

Adam Sotona commented on MAPREDUCE-6447:
----------------------------------------

With relation to this issue I have manually performed a kind of configuration 
"bracketing", where I manually altered the job configuration of a still running 
application before another attempt.
To do it manually is a bit complicated as you have to modify the job 
configuration in HDFS stagging folder of the application as well as on each 
data node in the local appcache/<app>/filecache.

However what if there would be a mechanism to prepare such alternate 
configuration(s), which will be used for second and following attempts after a 
task fail?

For example in job configuration I would set:
mapreduce.reduce.shuffle.input.buffer.percent=0.70
mapreduce.reduce.shuffle.memory.limit.percent=0.4
mapreduce.reduce.shuffle.parallelcopies=30
attempt-2.mapreduce.reduce.shuffle.input.buffer.percent=0.70
attempt-2.mapreduce.reduce.shuffle.memory.limit.percent=0.2
attempt-2.mapreduce.reduce.shuffle.parallelcopies=4
attempt-3.mapreduce.reduce.shuffle.input.buffer.percent=0.20
attempt-3.mapreduce.reduce.shuffle.memory.limit.percent=0.2
attempt-3.mapreduce.reduce.shuffle.parallelcopies=2

Such mechanism should not be hard to implement and it would allow to have 
alternate fail-over configurations prepared in advance instead of long-term 
repeated bug hunting.

Thanks

> reduce shuffle throws "java.lang.OutOfMemoryError: Java heap space"
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6447
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6447
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.7.1
>            Reporter: shuzhangyao
>            Assignee: shuzhangyao
>            Priority: Minor
>
> 2015-08-11 14:03:54,550 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#10
>       at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.OutOfMemoryError: Java heap space
>       at 
> org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
>       at 
> org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:303)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:293)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:511)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329)
>       at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to