[ 
https://issues.apache.org/jira/browse/HADOOP-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HADOOP-4305:
--------------------------------------------

    Attachment: patch-4305-1.txt

Here is a patch with proposed fix.
The patch does the following:
*  Adds the blacklisted trackers of the job to the potentially faulty list, in 
JobTracker.finalizeJob()
*  The tracker is moved to blacklisted trackers (across jobs) from potentially 
faulty list iff
   ** #blacklists  exceed mapred.max.tracker.blacklists (default value is 4),
   **  #blacklists is 50% above the average #blacklists, over the active and 
potentially faulty trackers
   **  50% the cluster is not blacklisted yet
* Restarting the tracker makes it an active tracker
* After a day, the tarcker is given a chance again to run tasks
* Adds #blacklisted_trackers to ClusterStatus
* Updates web UI to show the blacklisted trackers.


> repeatedly blacklisted tasktrackers should get declared dead
> ------------------------------------------------------------
>
>                 Key: HADOOP-4305
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4305
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Christian Kunz
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.20.0
>
>         Attachments: patch-4305-1.txt
>
>
> When running a batch of jobs it often happens that the same tasktrackers are 
> blacklisted again and again. This can slow job execution considerably, in 
> particular, when tasks fail because of timeout.
> It would make sense to no longer assign any tasks to such tasktrackers and to 
> declare them dead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to