[ https://issues.apache.org/jira/browse/HDFS-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470677#comment-13470677 ]
Jason Lowe commented on HDFS-3224: ---------------------------------- This bug seems benign but is causing issues with ops monitoring scripts because it allows a node to be reported as simultaneously live and dead by the NN web UI and JMX. Here's one scenario: * Node is registered and appears as a live node * Node fails badly, starts showing up as a dead node * Node is re-imaged by ops as a fresh node * Node rejoins the cluster, and now the same host is reported as both live and dead Since re-imaging the node causes it to get a new storage ID, the failure to recognized it by name means the NN thinks it's a totally different node and therefore the node is placed in the datanode map twice for the two storage IDs. In this case I think we should be calling getDatanodeByName (i.e.: where we include the port). This would help us properly distinguish datanodes that are using ephemeral ports (e.g.: miniclusters). > Bug in check for DN re-registration with different storage ID > ------------------------------------------------------------- > > Key: HDFS-3224 > URL: https://issues.apache.org/jira/browse/HDFS-3224 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Eli Collins > Priority: Minor > > DatanodeManager#registerDatanode checks the host to node map using an IP:port > key, however the map is keyed on IP, so this check will always fail. It's > performing the check to determine if a DN with the same IP and storage ID has > already registered, and if so to remove this DN from the map and indicate > that eg it's no longer hosting these blocks. This bug has been here forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira