[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868402#comment-13868402
 ] 

Todd Lipcon commented on HDFS-5138:
-----------------------------------

+          // This is expected to happen for a stanby NN.

Typo (standby)

+      // Either they all return the same thing or this call fails, so we can
+      // just return the first result.

Would be good to assert that - eg in case one of the JNs crashed in the middle 
of a previously attempted upgrade sequence.

-     * @param useLock true - enables locking on the storage directory and false
-     *          disables locking
+     * @param isShared whether or not this dir is shared between two NNs. true
+     *          enables locking on the storage directory, false disables 
locking

I think this doc is now wrong because you inverted the sense of these booleans 
- we _don't_ lock the shared dir.

+  public synchronized  void doFinalizeOfSharedLog() throws IOException {
+  public synchronized  boolean canRollBackSharedLog(Storage prevStorage,
Style nit: extra space in the above two methods

+          if (!sd.isShared()) {
+            // This will be done on transition to active.
Worth a LOG.info or even warn here

Currently it seems like whichever SBN starts up first has to be the one who 
does the transition to active. Maybe a follow-up JIRA could be to relax that 
constraint? Seems like it should be fine for either one of the NNs to actually 
do the upgrade - the lock file is just to make sure they agree on the target 
ctime.

+  dfsadmin -finalizeUpgrade'>>> command while the NNs are running and one of 
them
+  is active. The active NN at the time this happens will perform the upgrade of
+  the shared log, and both of the NNs will finalize the upgrade in their local

I think here you mean the "finalization of the shared log"


> Support HDFS upgrade in HA
> --------------------------
>
>                 Key: HDFS-5138
>                 URL: https://issues.apache.org/jira/browse/HDFS-5138
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.1.1-beta
>            Reporter: Kihwal Lee
>            Assignee: Aaron T. Myers
>            Priority: Blocker
>         Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to