[ 
https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973394#action_12973394
 ] 

Suresh Srinivas commented on HDFS-1547:
---------------------------------------

h3. Background
# To decommission datanodes, the datanodes are added to exclude file on 
namenode, followed by "refereshNodes" command.
#* NN starts decommissioning registered datanodes.
#* If a datanode is not registered, then NN excludes it from registering.
#* Removing a datanode from exclude file stops decommissioning.
# Decommissioning is complete when all the replicas of the datanode are 
replicated to other nodes.
# After a node is decommissioned, it can no longer talk to NN. This results in 
datanode shutdown.

h3. Problems
# exclude file has overloaded semantics - it is used to exclued a datanode from 
registering and  also for decommission. This is confusing and results in the 
following issues:
#* If a datanode is not registered, after adding it to exclude list, it is not 
allowed to register and hence not decommissioned.
#* After adding a datanode to exclude list, restarting NN results in decom not 
completing as the datanodes are excluded from registration.
#* If the datanode has the only set of replicas for a block in the above two 
scenarios, disallowed registration causes missing blocks and corrupt files.
# When decom is done, the datanode shuts down. There is no way to figure out if 
a datanode died before decom is complete or died because decom is complete.

h3. Proposed changes
h4. New decom file
# A new file will be used for listing datanodes to be decommissioned. Adding 
datanode to this  starts decommissioning. Removing datanode from this file 
stops decommissioning.
# Namenode allows registration from nodes in decom list. Continues 
decommissioning if not complete.
# Decommissioned datanodes are not automatically shutdown. Theyhave to be 
shutdown by the administrator.


h4. Datanode will have following new states:
# inservice - datanode is registered with the namenode and is providing storage 
service.
# decommmision-in-progress: decommissioning has started at the datanode. 
Namenode sends this state to the datanodes. Datanodes reduce frequency of 
heartbeats to Max(heartbeat time, datanode expiry time/2)?
# decommissioned - decommissioning is complete. Namenode sends this state to 
the datanode. In this state datanode no longer sends block reports to the 
namenode and also reduces the frequency of heartbeats.
# disconnected - datanode is not able to communicate with the namenode.

h3. I need feedback on a choice we need to make:
# For backward compatibility continue to support using exclude file for 
decommissioning, with the current semantics. This is in addition to new way of 
decommissioning a node by adding it to a separate decom file. When a node is in 
both the files, decom file takes precedence.
# Deprecate support for decom using exclude file. This will not be backward 
compatible.



> Improve decommission mechanism
> ------------------------------
>
>                 Key: HDFS-1547
>                 URL: https://issues.apache.org/jira/browse/HDFS-1547
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>             Fix For: 0.23.0
>
>
> Current decommission mechanism driven using exclude file has several issues. 
> This bug proposes some changes in the mechanism for better manageability. See 
> the proposal in the next comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to