[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aaron T. Myers updated HDFS-5138: --------------------------------- Attachment: HDFS-5138.patch {quote} + // This is expected to happen for a stanby NN. Typo (standby) {quote} Thanks, fixed. {quote} + // Either they all return the same thing or this call fails, so we can + // just return the first result. Would be good to assert that - eg in case one of the JNs crashed in the middle of a previously attempted upgrade sequence. {quote} Sure, done. {quote} * @param useLock true - enables locking on the storage directory and false * disables locking + * @param isShared whether or not this dir is shared between two NNs. true + * enables locking on the storage directory, false disables locking I think this doc is now wrong because you inverted the sense of these booleans - we don't lock the shared dir. {quote} Good catch. Fixed. {quote} + public synchronized void doFinalizeOfSharedLog() throws IOException { + public synchronized boolean canRollBackSharedLog(Storage prevStorage, Style nit: extra space in the above two methods {quote} Fixed. {quote} + if (!sd.isShared()) { + // This will be done on transition to active. Worth a LOG.info or even warn here {quote} Added the following: {code} LOG.info("Not doing recovery on " + sd + " now. Will be done on " + "transition to active."); {code} bq. Currently it seems like whichever SBN starts up first has to be the one who does the transition to active. Maybe a follow-up JIRA could be to relax that constraint? Seems like it should be fine for either one of the NNs to actually do the upgrade - the lock file is just to make sure they agree on the target ctime. Agree this seems like a good idea, but agree it can reasonably be done in a follow-up JIRA. If you agree, I'll file it when we commit this one. {quote} + dfsadmin -finalizeUpgrade'>>> command while the NNs are running and one of them + is active. The active NN at the time this happens will perform the upgrade of + the shared log, and both of the NNs will finalize the upgrade in their local I think here you mean the "finalization of the shared log" {quote} Sure did. Fixed. > Support HDFS upgrade in HA > -------------------------- > > Key: HDFS-5138 > URL: https://issues.apache.org/jira/browse/HDFS-5138 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.1.1-beta > Reporter: Kihwal Lee > Assignee: Aaron T. Myers > Priority: Blocker > Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, > HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch > > > With HA enabled, NN wo't start with "-upgrade". Since there has been a layout > version change between 2.0.x and 2.1.x, starting NN in upgrade mode was > necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way > to get around this was to disable HA and upgrade. > The NN and the cluster cannot be flipped back to HA until the upgrade is > finalized. If HA is disabled only on NN for layout upgrade and HA is turned > back on without involving DNs, things will work, but finaliizeUpgrade won't > work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade > snapshots won't get removed. > We will need a different ways of doing layout upgrade and upgrade snapshot. > I am marking this as a 2.1.1-beta blocker based on feedback from others. If > there is a reasonable workaround that does not increase maintenance window > greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)