[ https://issues.apache.org/jira/browse/HDFS-11830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014841#comment-16014841 ]
Weiwei Yang commented on HDFS-11830: ------------------------------------ Hello [~msingh] Thank you for helping to review. I have addressed most of your comments in v2 patch except one bq. We should also raise an exception if the endpoint is in any other state apart from HEARTBEAT. We cannot raise an exception here because in test mode, if we set a short heartbeat interval, 1s for example. Datanode might not be able to fully transit to {{REGISTER}} state and it receives another response from SCM with {{reregisterCommand}} command. I think just ignore changing the state in this case should be fine. What do you think? Thank you. > Ozone: Datanode needs to re-register to SCM if SCM is restarted > --------------------------------------------------------------- > > Key: HDFS-11830 > URL: https://issues.apache.org/jira/browse/HDFS-11830 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone > Reporter: Weiwei Yang > Assignee: Weiwei Yang > Priority: Critical > Attachments: HDFS-11830-HDFS-7240.001.patch, > HDFS-11830-HDFS-7240.002.patch > > > Problem description: > # Start NN, DN, SCM > # Restart SCM and will see following warnings in SCM log > 17/05/02 00:47:08 WARN node.SCMNodeManager: SCM receive heartbeat from > unregistered datanode > Datanode could not re-establish communication with SCM afterwards. Propose to > fix this by adding a new command in HB handling telling datanode to > re-register with SCM. Datanode once received this command transits to > REGISTER state again to proceed. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org