[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868402#comment-13868402 ]
Todd Lipcon commented on HDFS-5138: ----------------------------------- + // This is expected to happen for a stanby NN. Typo (standby) + // Either they all return the same thing or this call fails, so we can + // just return the first result. Would be good to assert that - eg in case one of the JNs crashed in the middle of a previously attempted upgrade sequence. - * @param useLock true - enables locking on the storage directory and false - * disables locking + * @param isShared whether or not this dir is shared between two NNs. true + * enables locking on the storage directory, false disables locking I think this doc is now wrong because you inverted the sense of these booleans - we _don't_ lock the shared dir. + public synchronized void doFinalizeOfSharedLog() throws IOException { + public synchronized boolean canRollBackSharedLog(Storage prevStorage, Style nit: extra space in the above two methods + if (!sd.isShared()) { + // This will be done on transition to active. Worth a LOG.info or even warn here Currently it seems like whichever SBN starts up first has to be the one who does the transition to active. Maybe a follow-up JIRA could be to relax that constraint? Seems like it should be fine for either one of the NNs to actually do the upgrade - the lock file is just to make sure they agree on the target ctime. + dfsadmin -finalizeUpgrade'>>> command while the NNs are running and one of them + is active. The active NN at the time this happens will perform the upgrade of + the shared log, and both of the NNs will finalize the upgrade in their local I think here you mean the "finalization of the shared log" > Support HDFS upgrade in HA > -------------------------- > > Key: HDFS-5138 > URL: https://issues.apache.org/jira/browse/HDFS-5138 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.1.1-beta > Reporter: Kihwal Lee > Assignee: Aaron T. Myers > Priority: Blocker > Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch > > > With HA enabled, NN wo't start with "-upgrade". Since there has been a layout > version change between 2.0.x and 2.1.x, starting NN in upgrade mode was > necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way > to get around this was to disable HA and upgrade. > The NN and the cluster cannot be flipped back to HA until the upgrade is > finalized. If HA is disabled only on NN for layout upgrade and HA is turned > back on without involving DNs, things will work, but finaliizeUpgrade won't > work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade > snapshots won't get removed. > We will need a different ways of doing layout upgrade and upgrade snapshot. > I am marking this as a 2.1.1-beta blocker based on feedback from others. If > there is a reasonable workaround that does not increase maintenance window > greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)