[jira] Commented: (HADOOP-2220) Reduce tasks fail too easily because of repeated fetch failures

Srikanth Kakani (JIRA) Wed, 21 Nov 2007 13:25:03 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544601
 ]


Srikanth Kakani commented on HADOOP-2220:
-----------------------------------------

Map side problem: HADOOP-2247, the same formula mentioned there should work in 
this case as well. 


> Reduce tasks fail too easily because of repeated fetch failures
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2220
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2220
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Christian Kunz
>
> Currently reduce tasks with more than MAX_FAILED_UNIQUE_FETCHES (= 5 
> hard-coded) failures to fetch output from different mappers will fail (I 
> believe, introduced in HADOOP-1158)
> This gives us some problems with longer running jobs with a large number of 
> mappers in multiple waves:
> Otherwise problem-less reduce tasks fail because of too many fetch failures 
> due to resource contention, and new reduce tasks have to fetch all data from 
> the already successfully executed mappers, introducing a lot of additional IO 
> overhead. Also, the job will fail when the same reducer exhausts the maximum 
> number of attempts.
> The limit should be a function of number of mappers and/or waves of mappers, 
> and should be more conservative (e.g. no need to let them fail when there are 
> enough slots to start speculatively executed reducers and speculative 
> execution is enabled). Also, we might consider to not count such a restart 
> against the number of attempts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2220) Reduce tasks fail too easily because of repeated fetch failures

Reply via email to