[ 
https://issues.apache.org/jira/browse/HADOOP-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HADOOP-1324:
----------------------------------

    Attachment: HADOOP-1324_20070507_1.patch

Simple fix:
On receipt of FSError from the child-vm, the task-tracker now just kills the 
task instead of re-initing itself - the idea is that with sufficient no. of 
task-failures on the same tracker it will get black-listed for the job, no new 
tasks will get assigned to it and things should swim along...

> FSError encountered by one running task should not be fatal to other tasks on 
> that node
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1324
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1324
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>         Assigned To: Arun C Murthy
>         Attachments: HADOOP-1324_20070507_1.patch
>
>
> Currently, if one task encounters a FSError, it reports that to the 
> TaskTracker and the TaskTracker reinitializes itself and effectively loses 
> state of all the other running tasks too. This can probably be improved 
> especially after the fix for HADOOP-1252. The TaskTracker should probably 
> avoid reinitializing itself and instead get blacklisted for that job. Other 
> tasks should be allowed to continue as long as they can (complete 
> successfully, or, fail either due to disk problems or otherwise).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to