Decommissioning nodes not persisted between NameNode restarts
-------------------------------------------------------------

                 Key: HDFS-1271
                 URL: https://issues.apache.org/jira/browse/HDFS-1271
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: name-node
            Reporter: Travis Crawford


Datanodes in the process of being decomissioned should still be decomissioning 
after namenode restarts. Currently they are marked as dead after a restart.


Details:

Nodes can be safely removed from a cluster by marking them as decomissioned and 
waiting for their data to be replicated elsewhere. This is accomplished by 
adding a node to the filed referenced by dfs.hosts.excluded, then refreshing 
nodes.

Decomissioning means block reports from the decomissioned datanode are no 
longer accepted by the namenode, meaning for decomissioning to occur the NN 
must have an existing block report. That is, a datanode can transition from: 
live --> decomissioning --> dead. Nodes can NOT transition from: dead --> 
decomissioning --> dead.

Operationally this is problematic because intervention is required should the 
NN restart while nodes are decomissioning, meaning in-house administration 
tools must be more complex, or more likely admins have to babysit the 
decomissioning process.

Someone more familiar with the code might have a better idea, but perhaps the 
first block report for dfs.hosts.excluded hosts should be accepted?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to