Decommissioning nodes not persisted between NameNode restarts -------------------------------------------------------------
Key: HDFS-1271 URL: https://issues.apache.org/jira/browse/HDFS-1271 Project: Hadoop HDFS Issue Type: Bug Components: name-node Reporter: Travis Crawford Datanodes in the process of being decomissioned should still be decomissioning after namenode restarts. Currently they are marked as dead after a restart. Details: Nodes can be safely removed from a cluster by marking them as decomissioned and waiting for their data to be replicated elsewhere. This is accomplished by adding a node to the filed referenced by dfs.hosts.excluded, then refreshing nodes. Decomissioning means block reports from the decomissioned datanode are no longer accepted by the namenode, meaning for decomissioning to occur the NN must have an existing block report. That is, a datanode can transition from: live --> decomissioning --> dead. Nodes can NOT transition from: dead --> decomissioning --> dead. Operationally this is problematic because intervention is required should the NN restart while nodes are decomissioning, meaning in-house administration tools must be more complex, or more likely admins have to babysit the decomissioning process. Someone more familiar with the code might have a better idea, but perhaps the first block report for dfs.hosts.excluded hosts should be accepted? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.