[jira] Commented: (HADOOP-400) the job tracker re-runs failed tasks on the same node

Dick King (JIRA) Fri, 28 Jul 2006 14:16:41 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-400?page=comments#action_12424170 ] 
            
Dick King commented on HADOOP-400:
----------------------------------


I do see one possible hole.  If a machine loses its TaskTracker, it gets a new 
one.  Can we arrange for the new TaskTracker to inherit the task failures from 
its predecessor?  That would be a bit hard ... but for this to work at all the 
tasks have to know what TaskTrackers they've flunked on.  All the TaskTracker 
has to know is who its predecessors are to refuse tasks that have flunked on 
its TaskTracker site [usually, on its machine].

-dk


> the job tracker re-runs failed tasks on the same node
> -----------------------------------------------------
>
>                 Key: HADOOP-400
>                 URL: http://issues.apache.org/jira/browse/HADOOP-400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.4.0
>            Reporter: Owen O'Malley
>         Assigned To: Owen O'Malley
>
> The job tracker tries not to run tasks that have previously failed on a node 
> on that node again, but it doesn't strictly prevent it.
> I propose to change the rule so that when pollForNewTask is called by a 
> TaskTracker, the JobTracker will only assign it a task that has failed on 
> that TaskTracker, if and only if it has already failed on the entire cluster. 
> Thus, for "normal" clusters with more than 4 TaskTrackers, you will be 
> guaranteed that it will run on 4 different TaskTrackers. For small clusters, 
> it will run on every TaskTracker in the cluster at least once.
> Does that sound reasonable to everyone?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-400) the job tracker re-runs failed tasks on the same node

Reply via email to