[ 
https://issues.apache.org/jira/browse/MAPREDUCE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748989#action_12748989
 ] 

Steve Loughran commented on MAPREDUCE-935:
------------------------------------------

looking closer the client backoff thread is not related to the problem, but the 
penalty box is possibly causing needless suffering. Though I suspect that local 
task tracker has stopped listening for RPC calls, which is the root cause of 
trouble. 

> There's little to be gained by putting a host into the penaltybox at reduce 
> time, if its the only host you have
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-935
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-935
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Steve Loughran
>
> Exponential backoff may be good for dealing with troublesome hosts, but not 
> if you only have one host in the entire system. From the log of 
> {{TestNodeRefresh}}, which for some reason is blocking in the reduce phase, I 
> can see it doesn't take much for the backoff to kick in so rapidly that the 
> reducer is waiting for longer than the test
> {code}
> 2009-08-28 21:39:16,788 WARN  mapred.ReduceTask 
> (ReduceTask.java:fetchOutputs(2192)) - 
> attempt_20090828213826033_0001_r_000000_0 adding host localhost to penalty 
> box, next contact in 150 seconds
> {code}
> The result of this backoff process is that the reduce process ends up 
> appearing to hang, getting killed from above. 
> Note that this isn't the root cause of the problem, but it certainly 
> amplifies things. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to