[jira] Commented: (HDFS-1547) Improve decommission mechanism

Suresh Srinivas (JIRA) Wed, 05 Jan 2011 23:12:12 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978182#action_12978182
 ]


Suresh Srinivas commented on HDFS-1547:
---------------------------------------

> Does the dfsadmin -refreshNodes command upload the excludes and includes and 
> the decom file (all three?) into the namenode's in memory state?
It is good to persist this at namenode. That way post namenode restart, the 
datanodes that were intended to be out of service will not come back into 
service.

> when a node is being-decommissioned state, why do you propose that it reduces 
> the frequency of block reports and heartbeats? Is this really needed?...
Sanjay's comment addresses this.

> I really like the fact that you are proposing the decommissioned nodes are 
> not auto shutdown. 
This was my original proposal. After thinking a bit, I see following issues:
* Not shutting down datanodes changes the intent of HADOOP-442; shutting down 
datanode ensures problematic datanodes cannot be used any more.
* Currently the shutdown ensures datanodes are not used by the namenode. I am 
concerned not shutting down datanode could result in namenode using using 
decommissioned nodes in unintended ways.
* My concern earlier was, there is no way to figure out if datanode is dead 
because decommission is complete  or for other reasons. However namenode has 
the state that datanode is decommissioned. We could improve current dead node 
list to show two lists, decommissioned and dead list in namenode WebUI.
* The storage free capacity from the decommissioned datanodes should not be 
counted towards available storage capacity of the cluster. Only used capacity 
should count towards cluster used capacity.

Current behavior is:
# A currently registered datanode is decommissioned and then disallowed to 
communicate with NN.
# A datanode that had registered previously with NN (after NN is restarted) and 
currently not registered is decommissioned, if it registers with NN.
# A datanode that had not registered (after NN restart) is disallowed from 
registering and is never decommissioned.

By changing the behavior (3), most of what I had proposed for decom file can be 
achieved. This also avoids two config files for exclude and decom with very 
little and subtle sematic difference.


> Improve decommission mechanism
> ------------------------------
>
>                 Key: HDFS-1547
>                 URL: https://issues.apache.org/jira/browse/HDFS-1547
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>             Fix For: 0.23.0
>
>
> Current decommission mechanism driven using exclude file has several issues. 
> This bug proposes some changes in the mechanism for better manageability. See 
> the proposal in the next comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1547) Improve decommission mechanism

Reply via email to