[
https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473544
]
dhruba borthakur commented on HADOOP-442:
-----------------------------------------
Regarding comment 5 above, it actually might make sense to have a separate
thread to check whether a decommission is completed or not. It can run on its
own schedule. The ReplicationMonitor thread periodically works every 3 seconds
and this periodicity is "too" frequent to be checking decommissioned nodes.
> slaves file should include an 'exclude' section, to prevent "bad" datanodes
> and tasktrackers from disrupting a cluster
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-442
> URL: https://issues.apache.org/jira/browse/HADOOP-442
> Project: Hadoop
> Issue Type: Bug
> Components: conf
> Reporter: Yoram Arnon
> Assigned To: Wendy Chien
> Attachments: hadoop-442-10.patch, hadoop-442-8.patch
>
>
> I recently had a few nodes go bad, such that they were inaccessible to ssh,
> but were still running their java processes.
> tasks that executed on them were failing, causing jobs to fail.
> I couldn't stop the java processes, because of the ssh issue, so I was
> helpless until I could actually power down these nodes.
> restarting the cluster doesn't help, even when removing the bad nodes from
> the slaves file - they just reconnect and are accepted.
> while we plan to avoid tasks from launching on the same nodes over and over,
> what I'd like is to be able to prevent rogue processes from connecting to the
> masters.
> Ideally, the slaves file will contain an 'exclude' section, which will list
> nodes that shouldn't be accessed, and should be ignored if they try to
> connect. That would also help in configuring the slaves file for a large
> cluster - I'd list the full range of machines in the cluster, then list the
> ones that are down in the 'exclude' section
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.