[
https://issues.apache.org/jira/browse/HADOOP-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Devaraj Das updated HADOOP-1183:
--------------------------------
Attachment: 1183.patch
Retrials of map output fetches might overwrite the new events got from the JT
for the same maps. Lets assume that a tasktracker is lost while we are in the
process of fetching map outputs from it. There is a timing issue between when a
mapoutput fetch completes with a failure, and when a new event for the same map
task is obtained. If the latter is got before the former, and if the fetch
corresponding to the new event is not scheduled before the former, then it will
lead to loss of this new event (overwritten with the retrial for the old failed
fetch).
The attached patch should handle this issue - here the FAILED events are
explicitly handled. Please review it (while i am testing it on a big cluster).
> MapTask completion not recorded properly at the Reducer's end
> -------------------------------------------------------------
>
> Key: HADOOP-1183
> URL: https://issues.apache.org/jira/browse/HADOOP-1183
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Reporter: Devaraj Das
> Assigned To: Devaraj Das
> Priority: Critical
> Attachments: 1183.patch
>
>
> A couple of reducers were continuously trying to fetch map outputs from a
> lost tasktracker. Although the tasks running on that lost TT successfully
> reexecuted elsewhere, the Reducers' tasktrackers didn't correctly note those
> events.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.