[jira] [Commented] (HDFS-8995) Flaw in registration bookeeping can make DN die on reconnect

Sangjin Lee (JIRA) Fri, 20 Nov 2015 15:55:20 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15019107#comment-15019107
 ]


Sangjin Lee commented on HDFS-8995:
-----------------------------------

Does this issue exist in 2.6.x? Should this be backported to branch-2.6?

> Flaw in registration bookeeping can make DN die on reconnect
> ------------------------------------------------------------
>
>                 Key: HDFS-8995
>                 URL: https://issues.apache.org/jira/browse/HDFS-8995
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>             Fix For: 2.7.2
>
>         Attachments: HDFS-8995.patch
>
>
> Normally data nodes re-register with the namenode when it was unreachable for 
> more than the heartbeat expiration and becomes reachable again. Datanodes 
> keep retrying the last rpc call such as incremental block report and 
> heartbeat and when it finally gets through the namenode tells it to 
> re-register.
> We have observed that some of datanodes stay dead in such scenarios. Further 
> investigation has revealed that those were told to shutdown by the namenode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8995) Flaw in registration bookeeping can make DN die on reconnect

Reply via email to