[ https://issues.apache.org/jira/browse/HDFS-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978371#comment-13978371 ]
Rushabh S Shah commented on HDFS-5773: -------------------------------------- I wrote a test case. Steps to reproduce test case: 1. Create a MiniDFSCluster with 1 namenode and 3 datanode 2. Make the heartbeat interval (DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, 7) too high. 3. Make the heartbeat recheck interval (DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY) low. 4. Open a file. 5. Sleep for an appropriate amount of time such that the namenode declares the node dead since the datanode didn't heartbeated within the heartbeat recheck interval and datanode sent the block report. 6. This will generate an IOException with the following stack trace java.io.IOException: Got blockReceivedDeleted message from unregistered or dead node at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blockReceivedAndDeleted(BlockManager.java:2238) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:825) 7. But when the datanode heartbeated to namenode (after the heartbeat interval), the namenode re registered the data node and added it to the topology and the namenode recovered from the Exception. So according to my test case, the namenode recovered as it should. I was not able to reproduce the error that was mentioned in this jira So closing the jira and feel free to reopen if it happened again. > NN may reject formerly dead DNs > ------------------------------- > > Key: HDFS-5773 > URL: https://issues.apache.org/jira/browse/HDFS-5773 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.0.0-alpha, 3.0.0, 0.23.10 > Reporter: Daryn Sharp > Assignee: Rushabh S Shah > Priority: Critical > > If the heartbeat monitor declares a node dead, it may never allow a DN to > rejoin. The NN will generate messages like "Got blockReceivedDeleted message > from unregistered or dead node". > There appears to be a bug where the the isAlive flag is not set to true when > a formerly known DN attempts to rejoin. -- This message was sent by Atlassian JIRA (v6.2#6252)