[ https://issues.apache.org/jira/browse/HDFS-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jing Zhao updated HDFS-8127: ---------------------------- Attachment: HDFS-8127.000.patch Upload a patch to fix the issue. Instead of adding -upgrade option, the patch lets the SBN directly learn if ANN is in upgrade state through a RPC call. Then if the ANN is in upgrade state, the SBN tries to save its original state into the previous directory. If its original state is corrupted and cannot be recovered, we prompt the user to format the SBN first: since we still use bootstrapstandby for HA rollback, it should be ok to have an old state generated by the new software. > NameNode Failover during HA upgrade can cause DataNode to finalize upgrade > -------------------------------------------------------------------------- > > Key: HDFS-8127 > URL: https://issues.apache.org/jira/browse/HDFS-8127 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha > Affects Versions: 2.4.0 > Reporter: Jing Zhao > Assignee: Jing Zhao > Priority: Blocker > Attachments: HDFS-8127.000.patch > > > Currently for HA upgrade (enabled by HDFS-5138), we use {{-bootstrapStandby}} > to initialize the standby NameNode. The standby NameNode does not have the > {{previous}} directory thus it does not know that the cluster is in the > upgrade state. If NN failover happens, as response of block reports, the new > ANN will tell DNs to finalize the upgrade thus make it impossible to rollback > again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)