[ https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973394#action_12973394 ]
Suresh Srinivas commented on HDFS-1547: --------------------------------------- h3. Background # To decommission datanodes, the datanodes are added to exclude file on namenode, followed by "refereshNodes" command. #* NN starts decommissioning registered datanodes. #* If a datanode is not registered, then NN excludes it from registering. #* Removing a datanode from exclude file stops decommissioning. # Decommissioning is complete when all the replicas of the datanode are replicated to other nodes. # After a node is decommissioned, it can no longer talk to NN. This results in datanode shutdown. h3. Problems # exclude file has overloaded semantics - it is used to exclued a datanode from registering and also for decommission. This is confusing and results in the following issues: #* If a datanode is not registered, after adding it to exclude list, it is not allowed to register and hence not decommissioned. #* After adding a datanode to exclude list, restarting NN results in decom not completing as the datanodes are excluded from registration. #* If the datanode has the only set of replicas for a block in the above two scenarios, disallowed registration causes missing blocks and corrupt files. # When decom is done, the datanode shuts down. There is no way to figure out if a datanode died before decom is complete or died because decom is complete. h3. Proposed changes h4. New decom file # A new file will be used for listing datanodes to be decommissioned. Adding datanode to this starts decommissioning. Removing datanode from this file stops decommissioning. # Namenode allows registration from nodes in decom list. Continues decommissioning if not complete. # Decommissioned datanodes are not automatically shutdown. Theyhave to be shutdown by the administrator. h4. Datanode will have following new states: # inservice - datanode is registered with the namenode and is providing storage service. # decommmision-in-progress: decommissioning has started at the datanode. Namenode sends this state to the datanodes. Datanodes reduce frequency of heartbeats to Max(heartbeat time, datanode expiry time/2)? # decommissioned - decommissioning is complete. Namenode sends this state to the datanode. In this state datanode no longer sends block reports to the namenode and also reduces the frequency of heartbeats. # disconnected - datanode is not able to communicate with the namenode. h3. I need feedback on a choice we need to make: # For backward compatibility continue to support using exclude file for decommissioning, with the current semantics. This is in addition to new way of decommissioning a node by adding it to a separate decom file. When a node is in both the files, decom file takes precedence. # Deprecate support for decom using exclude file. This will not be backward compatible. > Improve decommission mechanism > ------------------------------ > > Key: HDFS-1547 > URL: https://issues.apache.org/jira/browse/HDFS-1547 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Affects Versions: 0.23.0 > Reporter: Suresh Srinivas > Assignee: Suresh Srinivas > Fix For: 0.23.0 > > > Current decommission mechanism driven using exclude file has several issues. > This bug proposes some changes in the mechanism for better manageability. See > the proposal in the next comment for more details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.