[
https://issues.apache.org/jira/browse/HADOOP-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amareshwari Sriramadasu updated HADOOP-4305:
--------------------------------------------
Attachment: patch-4305-1.txt
Here is a patch with proposed fix.
The patch does the following:
* Adds the blacklisted trackers of the job to the potentially faulty list, in
JobTracker.finalizeJob()
* The tracker is moved to blacklisted trackers (across jobs) from potentially
faulty list iff
** #blacklists exceed mapred.max.tracker.blacklists (default value is 4),
** #blacklists is 50% above the average #blacklists, over the active and
potentially faulty trackers
** 50% the cluster is not blacklisted yet
* Restarting the tracker makes it an active tracker
* After a day, the tarcker is given a chance again to run tasks
* Adds #blacklisted_trackers to ClusterStatus
* Updates web UI to show the blacklisted trackers.
> repeatedly blacklisted tasktrackers should get declared dead
> ------------------------------------------------------------
>
> Key: HADOOP-4305
> URL: https://issues.apache.org/jira/browse/HADOOP-4305
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Christian Kunz
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.20.0
>
> Attachments: patch-4305-1.txt
>
>
> When running a batch of jobs it often happens that the same tasktrackers are
> blacklisted again and again. This can slow job execution considerably, in
> particular, when tasks fail because of timeout.
> It would make sense to no longer assign any tasks to such tasktrackers and to
> declare them dead.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.