[jira] [Resolved] (MAPREDUCE-562) A single slow (but not dead) map TaskTracker impedes MapReduce progress

Allen Wittenauer (JIRA) Tue, 22 Jul 2014 13:53:28 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Allen Wittenauer resolved MAPREDUCE-562.
----------------------------------------

    Resolution: Incomplete

This is still an interesting issue, but at this point, I feel the need to close 
this one.  The big reason being that this problem needs to be generalized for 
YARN and made much less MR specific.


> A single slow (but not dead) map TaskTracker impedes MapReduce progress
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-562
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-562
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Aaron Kimball
>
> We see cases where there may be a large number of mapper nodes running many 
> tasks (e.g., a thousand). The reducers will pull 980 of the map task 
> intermediate files down, but will be unable to retrieve the final 
> intermediate shards from the last node. The TaskTracker on that node returns 
> data to reducers either slowly or not at all, but its heartbeat messages make 
> it back to the JobTracker -- so the JobTracker doesn't mark the tasks as 
> failed. Manually stopping the offending TaskTracker works to migrate the 
> tasks to other nodes, where the shuffling process finishes very quickly. Left 
> on its own, it can take hours to unjam itself otherwise.
> We need a mechanism for reducers to provide feedback to the JobTracker that 
> one of the mapper nodes should be regarded as lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (MAPREDUCE-562) A single slow (but not dead) map TaskTracker impedes MapReduce progress

Reply via email to