[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-5138:
---------------------------------

    Attachment: HDFS-5138.patch

{quote}
+ // This is expected to happen for a stanby NN.
Typo (standby)
{quote}

Thanks, fixed.

{quote}
+ // Either they all return the same thing or this call fails, so we can
+ // just return the first result.
Would be good to assert that - eg in case one of the JNs crashed in the middle 
of a previously attempted upgrade sequence.
{quote}

Sure, done.

{quote}
* @param useLock true - enables locking on the storage directory and false
* disables locking
+ * @param isShared whether or not this dir is shared between two NNs. true
+ * enables locking on the storage directory, false disables locking
I think this doc is now wrong because you inverted the sense of these booleans 
- we don't lock the shared dir.
{quote}

Good catch. Fixed.

{quote}
+ public synchronized void doFinalizeOfSharedLog() throws IOException {
+ public synchronized boolean canRollBackSharedLog(Storage prevStorage,
Style nit: extra space in the above two methods
{quote}

Fixed.

{quote}
+ if (!sd.isShared()) {
+ // This will be done on transition to active.
Worth a LOG.info or even warn here
{quote}

Added the following:

{code}
LOG.info("Not doing recovery on " + sd + " now. Will be done on "
                + "transition to active.");
{code}

bq. Currently it seems like whichever SBN starts up first has to be the one who 
does the transition to active. Maybe a follow-up JIRA could be to relax that 
constraint? Seems like it should be fine for either one of the NNs to actually 
do the upgrade - the lock file is just to make sure they agree on the target 
ctime.

Agree this seems like a good idea, but agree it can reasonably be done in a 
follow-up JIRA. If you agree, I'll file it when we commit this one.

{quote}
+ dfsadmin -finalizeUpgrade'>>> command while the NNs are running and one of 
them
+ is active. The active NN at the time this happens will perform the upgrade of
+ the shared log, and both of the NNs will finalize the upgrade in their local
I think here you mean the "finalization of the shared log"
{quote}

Sure did. Fixed.

> Support HDFS upgrade in HA
> --------------------------
>
>                 Key: HDFS-5138
>                 URL: https://issues.apache.org/jira/browse/HDFS-5138
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.1.1-beta
>            Reporter: Kihwal Lee
>            Assignee: Aaron T. Myers
>            Priority: Blocker
>         Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to