[ 
https://issues.apache.org/jira/browse/HADOOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy resolved HADOOP-2220.
-----------------------------------

    Resolution: Fixed

Fixed as a part of HADOOP-2247.

> Reduce tasks fail too easily because of repeated fetch failures
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2220
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2220
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Christian Kunz
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.15.2
>
>
> Currently reduce tasks with more than MAX_FAILED_UNIQUE_FETCHES (= 5 
> hard-coded) failures to fetch output from different mappers will fail (I 
> believe, introduced in HADOOP-1158)
> This gives us some problems with longer running jobs with a large number of 
> mappers in multiple waves:
> Otherwise problem-less reduce tasks fail because of too many fetch failures 
> due to resource contention, and new reduce tasks have to fetch all data from 
> the already successfully executed mappers, introducing a lot of additional IO 
> overhead. Also, the job will fail when the same reducer exhausts the maximum 
> number of attempts.
> The limit should be a function of number of mappers and/or waves of mappers, 
> and should be more conservative (e.g. no need to let them fail when there are 
> enough slots to start speculatively executed reducers and speculative 
> execution is enabled). Also, we might consider to not count such a restart 
> against the number of attempts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to