[ https://issues.apache.org/jira/browse/MAPREDUCE-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer resolved MAPREDUCE-562. ---------------------------------------- Resolution: Incomplete This is still an interesting issue, but at this point, I feel the need to close this one. The big reason being that this problem needs to be generalized for YARN and made much less MR specific. > A single slow (but not dead) map TaskTracker impedes MapReduce progress > ----------------------------------------------------------------------- > > Key: MAPREDUCE-562 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-562 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Aaron Kimball > > We see cases where there may be a large number of mapper nodes running many > tasks (e.g., a thousand). The reducers will pull 980 of the map task > intermediate files down, but will be unable to retrieve the final > intermediate shards from the last node. The TaskTracker on that node returns > data to reducers either slowly or not at all, but its heartbeat messages make > it back to the JobTracker -- so the JobTracker doesn't mark the tasks as > failed. Manually stopping the offending TaskTracker works to migrate the > tasks to other nodes, where the shuffling process finishes very quickly. Left > on its own, it can take hours to unjam itself otherwise. > We need a mechanism for reducers to provide feedback to the JobTracker that > one of the mapper nodes should be regarded as lost. -- This message was sent by Atlassian JIRA (v6.2#6252)