[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13766268#comment-13766268 ]
Konstantin Shvachko edited comment on HDFS-5138 at 9/13/13 6:12 AM: -------------------------------------------------------------------- Hey, guys. Indeed, the scope of this jira should probably be #1. Not to diminish in any way the importance of rolling upgrades. The NN upgrade happens in loadNamesystem() before RPCServer is started, so SBN wont even see this. Then DNs are asked to upgrade before they are allowed to register. That is, Active NN is in SafeMode and there is nothing for SBN to worry about yet as the journal is not changing. With NFS-mounted shared storage the upgrade should be pretty straightforward. We should modify the code to allow it, and then lots of testing of course. For QJM I am not sure. Would it be easier to let SBN checkpoint from the upgraded NN and start reading the journal from that image. With -rollback SBN should probably do the same thing. <Edited: I meant rollback rather than as it was finalize> was (Author: shv): Hey, guys. Indeed, the scope of this jira should probably be #1. Not to diminish in any way the importance of rolling upgrades. The NN upgrade happens in loadNamesystem() before RPCServer is started, so SBN wont even see this. Then DNs are asked to upgrade before they are allowed to register. That is, Active NN is in SafeMode and there is nothing for SBN to worry about yet as the journal is not changing. With NFS-mounted shared storage the upgrade should be pretty straightforward. We should modify the code to allow it, and then lots of testing of course. For QJM I am not sure. Would it be easier to let SBN checkpoint from the upgraded NN and start reading the journal from that image. With -finalize SBN should probably do the same thing. > Support HDFS upgrade in HA > -------------------------- > > Key: HDFS-5138 > URL: https://issues.apache.org/jira/browse/HDFS-5138 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.1.1-beta > Reporter: Kihwal Lee > Priority: Blocker > > With HA enabled, NN wo't start with "-upgrade". Since there has been a layout > version change between 2.0.x and 2.1.x, starting NN in upgrade mode was > necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way > to get around this was to disable HA and upgrade. > The NN and the cluster cannot be flipped back to HA until the upgrade is > finalized. If HA is disabled only on NN for layout upgrade and HA is turned > back on without involving DNs, things will work, but finaliizeUpgrade won't > work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade > snapshots won't get removed. > We will need a different ways of doing layout upgrade and upgrade snapshot. > I am marking this as a 2.1.1-beta blocker based on feedback from others. If > there is a reasonable workaround that does not increase maintenance window > greatly, we can lower its priority from blocker to critical. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira