[ https://issues.apache.org/jira/browse/HDFS-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726485#comment-14726485 ]
Yi Liu commented on HDFS-8995: ------------------------------ +1, thanks Kihwal. Will commit it shortly. > Flaw in registration bookeeping can make DN die on reconnect > ------------------------------------------------------------ > > Key: HDFS-8995 > URL: https://issues.apache.org/jira/browse/HDFS-8995 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Priority: Critical > Attachments: HDFS-8995.patch > > > Normally data nodes re-register with the namenode when it was unreachable for > more than the heartbeat expiration and becomes reachable again. Datanodes > keep retrying the last rpc call such as incremental block report and > heartbeat and when it finally gets through the namenode tells it to > re-register. > We have observed that some of datanodes stay dead in such scenarios. Further > investigation has revealed that those were told to shutdown by the namenode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)