[ 
https://issues.apache.org/jira/browse/HADOOP-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501791
 ] 

Devaraj Das commented on HADOOP-1158:
-------------------------------------

Given the fact that in general losing reduces is detrimental, I'd propose a 
minor variant to the logic behind killing reduces. The reduce should kill 
itself when it fails to fetch the map output from even the new location, i.e., 
the unique 5 faulty fetches should have at least 1 retrial (i.e., we don't kill 
a reduce too early).

Also, does it make sense to have the logic behind killing/reexecuting reduces 
in the JobTracker. Two reasons:
1) since the JobTracker knows very well how many times a reduce complained, 
and, for which maps it complained, etc., 
2) consistent behavior - jobtracker handles the reexecution of maps and it 
might handle the reexecution of reduces as well.

> JobTracker should collect statistics of failed map output fetches, and take 
> decisions to reexecute map tasks and/or restart the (possibly faulty) Jetty 
> server on the TaskTracker
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1158
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1158
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Devaraj Das
>            Assignee: Arun C Murthy
>
> The JobTracker should keep a track (with feedback from Reducers) of how many 
> times a fetch for a particular map output failed. If this exceeds a certain 
> threshold, then that map should be declared as lost, and should be reexecuted 
> elsewhere. Based on the number of such complaints from Reducers, the 
> JobTracker can blacklist the TaskTracker. This will make the framework 
> reliable - it will take care of (faulty) TaskTrackers that sometimes always 
> fail to serve up map outputs (for which exceptions are not properly 
> raised/handled, for e.g., if the exception/problem happens in the Jetty 
> server).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to