[
https://issues.apache.org/jira/browse/HADOOP-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arun C Murthy updated HADOOP-1324:
----------------------------------
Attachment: HADOOP-1324_20070507_1.patch
Simple fix:
On receipt of FSError from the child-vm, the task-tracker now just kills the
task instead of re-initing itself - the idea is that with sufficient no. of
task-failures on the same tracker it will get black-listed for the job, no new
tasks will get assigned to it and things should swim along...
> FSError encountered by one running task should not be fatal to other tasks on
> that node
> ---------------------------------------------------------------------------------------
>
> Key: HADOOP-1324
> URL: https://issues.apache.org/jira/browse/HADOOP-1324
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Reporter: Devaraj Das
> Assigned To: Arun C Murthy
> Attachments: HADOOP-1324_20070507_1.patch
>
>
> Currently, if one task encounters a FSError, it reports that to the
> TaskTracker and the TaskTracker reinitializes itself and effectively loses
> state of all the other running tasks too. This can probably be improved
> especially after the fix for HADOOP-1252. The TaskTracker should probably
> avoid reinitializing itself and instead get blacklisted for that job. Other
> tasks should be allowed to continue as long as they can (complete
> successfully, or, fail either due to disk problems or otherwise).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.