[ https://issues.apache.org/jira/browse/HDFS-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518357#comment-14518357 ]
Ming Ma commented on HDFS-8056: ------------------------------- [~andrew.wang] and others, appreciate any input you might have. > Decommissioned dead nodes should continue to be counted as dead after NN > restart > -------------------------------------------------------------------------------- > > Key: HDFS-8056 > URL: https://issues.apache.org/jira/browse/HDFS-8056 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Ming Ma > Assignee: Ming Ma > Attachments: HDFS-8056-2.patch, HDFS-8056.patch > > > We had some offline discussion with [~andrew.wang] and [~cmccabe] about this. > Bring this up for more input and get the patch in place. > Dead nodes are tracked by {{DatanodeManager}}'s {{datanodeMap}}. However, > after NN restarts, those nodes that were dead before NN restart won't be in > {{datanodeMap}}. {{DatanodeManager}}'s {{getDatanodeListForReport}} will add > those dead nodes, but not if they are in the exclude file. > {noformat} > if (listDeadNodes) { > for (InetSocketAddress addr : includedNodes) { > if (foundNodes.matchedBy(addr) || excludedNodes.match(addr)) { > continue; > } > // The remaining nodes are ones that are referenced by the hosts > // files but that we do not know about, ie that we have never > // head from. Eg. an entry that is no longer part of the cluster > // or a bogus entry was given in the hosts files > // > // If the host file entry specified the xferPort, we use that. > // Otherwise, we guess that it is the default xfer port. > // We can't ask the DataNode what it had configured, because it's > // dead. > DatanodeDescriptor dn = new DatanodeDescriptor(new DatanodeID(addr > .getAddress().getHostAddress(), addr.getHostName(), "", > addr.getPort() == 0 ? defaultXferPort : addr.getPort(), > defaultInfoPort, defaultInfoSecurePort, defaultIpcPort)); > setDatanodeDead(dn); > nodes.add(dn); > } > } > {noformat} > The issue here is the decommissioned dead node JMX will be different after NN > restart. It might be better to make it consistent across NN restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)