[jira] Updated: (HADOOP-400) the job tracker re-runs failed tasks on the same node

Owen O'Malley (JIRA) Thu, 03 Aug 2006 12:12:50 -0700

     [ http://issues.apache.org/jira/browse/HADOOP-400?page=all ]


Owen O'Malley updated HADOOP-400:
---------------------------------

           Status: Patch Available  (was: Open)
    Fix Version/s: 0.6.0
       Attachment: task-schedule.patch

This patch does:
  1. It limits each TaskTracker to running
        min(tasksPerTracker, ceil(tasksLeftToRun/numTaskTrackers))
      this will prevent the problem that we saw where the last 2 reduces 
scheduled were put on the same node rather than different empty ones
  2. It refactors obtainNewMapTask and obtainNewReduceTask to call a common 
utility function. It also replaces the two parallel loops with one.
  3. Only allowed tasks that have failed on this task tracker to run if we have 
exhausted the cluster.

> the job tracker re-runs failed tasks on the same node
> -----------------------------------------------------
>
>                 Key: HADOOP-400
>                 URL: http://issues.apache.org/jira/browse/HADOOP-400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.4.0
>            Reporter: Owen O'Malley
>         Assigned To: Owen O'Malley
>             Fix For: 0.6.0
>
>         Attachments: task-schedule.patch
>
>
> The job tracker tries not to run tasks that have previously failed on a node 
> on that node again, but it doesn't strictly prevent it.
> I propose to change the rule so that when pollForNewTask is called by a 
> TaskTracker, the JobTracker will only assign it a task that has failed on 
> that TaskTracker, if and only if it has already failed on the entire cluster. 
> Thus, for "normal" clusters with more than 4 TaskTrackers, you will be 
> guaranteed that it will run on 4 different TaskTrackers. For small clusters, 
> it will run on every TaskTracker in the cluster at least once.
> Does that sound reasonable to everyone?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-400) the job tracker re-runs failed tasks on the same node

Reply via email to