[ https://issues.apache.org/jira/browse/HDFS-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726562#comment-14726562 ]
Hudson commented on HDFS-8995: ------------------------------ FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #339 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/339/]) HDFS-8995. Flaw in registration bookeeping can make DN die on reconnect. (Kihwal Lee via yliu) (yliu: rev 5652131d2ea68c408dd3cd8bee31723642a8cdde) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Flaw in registration bookeeping can make DN die on reconnect > ------------------------------------------------------------ > > Key: HDFS-8995 > URL: https://issues.apache.org/jira/browse/HDFS-8995 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Priority: Critical > Fix For: 2.7.2 > > Attachments: HDFS-8995.patch > > > Normally data nodes re-register with the namenode when it was unreachable for > more than the heartbeat expiration and becomes reachable again. Datanodes > keep retrying the last rpc call such as incremental block report and > heartbeat and when it finally gets through the namenode tells it to > re-register. > We have observed that some of datanodes stay dead in such scenarios. Further > investigation has revealed that those were told to shutdown by the namenode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)