[ http://issues.apache.org/jira/browse/HADOOP-142?page=all ]
     
Doug Cutting resolved HADOOP-142:
---------------------------------

    Resolution: Fixed

I just committed this.  Thanks, Owen!

> failed tasks should be rescheduled on different hosts after other jobs
> ----------------------------------------------------------------------
>
>          Key: HADOOP-142
>          URL: http://issues.apache.org/jira/browse/HADOOP-142
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.2
>  Attachments: no-repeat-failures.patch
>
> Currently when tasks fail, they are usually rerun immediately on the same 
> host. This causes problems in a couple of ways. 
>   1.The task is more likely to fail on the same host. 
>   2.If there is cleanup code (such as clearing pendingCreates) it does not 
> always run immediately, leading to cascading failures.
> For a first pass, I propose that when a task fails, we start the scan for new 
> tasks to launch at the following task of the same type (within that job). So 
> if maps[99] fails, when we are looking to assign new map tasks from this job, 
> we scan like maps[100]...maps[N], maps[0]..,maps[99].
> A more involved change would avoid running tasks on nodes where it has failed 
> before. This is a little tricky, because you don't want to prevent 
> re-excution of tasks on 1 node clusters and the job tracker needs to schedule 
> one task tracker at a time.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to