[
https://issues.apache.org/jira/browse/HADOOP-5643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701438#action_12701438
]
Amar Kamat commented on HADOOP-5643:
------------------------------------
I think calling this as blacklisting will lead to more confusion. As Owen
suggested we can call it as *decommissioning/recommissioning* of trackers which
would essentially mean that irrespective of what state the tracker is, the
jobtracker is asked to decommission(rerun+ignore)/recommission(add back) it. So
the command would be
_bin/hadoop jobtracker -decommission tracker1,tracker2...._ and _bin/hadoop
jobtracker -recommission tracker1,tracker2...._.
All the running tasks (also completed maps) that were launched on that machine
will be killed and rerun. We can reuse the lost-tracker code for doing this.
Maybe a thread should be started on demand (similar to cleanup queue thread)
for a decommissioning request. Also these tracker will be added to the ignore
list (i.e issue a 'shutdown' upon contact). So a decommission request is
equivalent to lost-tracker + add-to-ignore-list.
Upon a recommission, the trackers will be removed from the ignore list. This
can be done inline.
>From the webui, a simple checkbox against all the trackers can be provided and
>an action named 'Decommission' can be provided (similar to actions for jobs on
>jobtracker.jsp). On the trackers page, we can provide another section for
>decommissioned trackers and there we can provide a checkbox for
>recommissioning it.
Note :
1) Acls check should be done before decommissioning and recommissioning.
2) This info needs to be persisted. Upon every decommission/recommission,
persist this info to system.dir/jobtracker.info
3) Upon restart, the ignore list will also be recovered and loaded (i.e invoke
jobtracker.decommission(recovered-list) from recovery-manager)
4) These new apis can be added to the TaskTrackerManager interface as there
really are tasktracker level actions.
----
Thoughts?
> Ability to blacklist tasktracker
> --------------------------------
>
> Key: HADOOP-5643
> URL: https://issues.apache.org/jira/browse/HADOOP-5643
> Project: Hadoop Core
> Issue Type: New Feature
> Affects Versions: 0.20.0
> Reporter: Rajiv Chittajallu
> Assignee: Amar Kamat
>
> Its not always possible to shutdown the tasktracker to stop scheduling tasks
> on the node. (eg you can't login to the node but the TT is up).
> This can be via
> * mapred.exclude and should be refreshed with out restarting the tasktracker
> * hadoop job -fail-tracker <tracker id>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.