[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882096#comment-13882096
 ] 

Suresh Srinivas commented on HDFS-5138:
---------------------------------------

@Todd, I have had some conversation about this [~atm] related to this jira. I 
had brought up one issue about potentially losing editlogs on JournalNode. I 
thought that would be addressed before this jira can be committed. I have been 
very busy and have not been able to provide all my comments. Reviewing this 
patch has been quite tricky. Here are my almost complete review comments. While 
some of the issues are minor nits, I do not think this patch and the 
documentation is ready.

I am adding information about the design, the way I understand it. Let me know 
if I got it wrong.
*Upgrade preparation:*
# New bits are installed on the cluster nodes.
# The cluster is brought down.

*Upgrade:* For HA setup, choose one of the namenodes to initiate upgrade on and 
start it with -upgrade flag.
# NN performs preupgrade for all non shared storage directories by moving 
current to previous.tmp and creating new current.
#* Failure here is fine. NN start up fails. Next attempt at upgrade the storage 
directories are recovered.
# NN performs preupgrade of shared edits (NFS/JournalNodes) over RPC. 
JournalNodes current moved to previous.tmp and new current is created.
#* If one of the JN preupgrade fails and upgrade is reattempted, editlog 
directory could be lost on the JN. Restarting the JN does not fix the issue.
# NN performs upgrade of non shared edits by writing new CTIME to current and 
moving previous.tmp to previous.
#* If one of the JN preupgrade fails and upgrade is reattempted, editlog 
directory could be lost on the JN. Restarting the JN does not fix the issue.
# NN performs upgrade of shared edits (NFS/JournalNodes) over RPC. JournalNodes 
current has new CTIM and previous.tmp is moved to previous.
# We need to document that all the JournalNodes must be up. If a JN is 
irrecoverably lost, configuration must be changed to exclude the JN.

*Rollback:* NN is started with rollback flag
# For all the non shared directories, the NN checks for canRollBack, 
essentially ensures that previous directory with the right layout version 
exists.
# For all the shared directories, the NN checks for canRollBack, essentially 
ensures that previous directory with the right layout version exists.
# NN performs rollback for shared directories (moving previous to current)
#* If rollback of one of the JN fails, then directories are in inconsistent 
state. I think any attempt at retrying rollback will fail and will require 
manually moving files around. I do not think restarting JN fixes this.
# We need to document that all the JournalNodes must be up. If a JN is 
irrecoverably lost, configuration must be changed to exclude the JN.

*Finalize:* DFSAdmin command is run to finalize the upgrade.
# Active NN performs finalizing of editlog. If JN's fail to finalize, active NN 
fails to finalize. However it is possible that standby finalizes, leaving the 
cluster in an inconsistent state.
# We need to document that all the JournalNodes must be up. If a JN is 
irrecoverably lost, configuration must be changed to exclude the JN.

Comments on the code in the patch (this is almost complete):
Comments:
# Minor nit: there are some white space changes
# assertAllResultsEqual - for loop can just start with i = 1? Also if the 
collection objects is of size zero or one, the method can return early. Is 
there a need to do object.toArray() for these early checks? With that, perhaps 
the findbugs exclude may not be necessary.
# Unit test can be added for methods isAtLeastOneActive, 
getRpcAddressesForNameserviceId and getProxiesForAllNameNodesInNameservice (I 
am okay if this is done in a separate jira)
# Finalizing upgrade is quite tricky. Consider the following scenarios:
#* One NN is active and the other is standby - works fine
#* One NN is active and the other is down or all NNs - finalize command throws 
exception and the user will not know if it has succeeded or failed and what to 
do next
#* No active NN - throws an exception cannot finalize with no active
#* BlockPoolSliceStorage.java change seems unnecessary
# Why is {{throw new AssertionError("Unreachable code.");}} in 
QuorumJournalManager.java methods?
# FSImage#doRollBack() - when canRollBack is false after checking if non-share 
directories can rollback, an exception must be immediately thrown, instead of 
checking shared editlog. Also printing Log.info when storages can be rolled 
back will help in debugging.
# FSEditlog#canRollBackSharedLog should accept StorageInfo instead of Storage
# QuorumJournalManager#canRollBack and getJournalCTime can throw AssertionError 
(from DFSUtil.assertAllResultsEqual()). Is that the right exception to expose 
or IOException?
# Namenode startup throws AssertionError with -rollback option. I think we 
should throw IOException, which is how all the other failures are indicated.


> Support HDFS upgrade in HA
> --------------------------
>
>                 Key: HDFS-5138
>                 URL: https://issues.apache.org/jira/browse/HDFS-5138
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.1.1-beta
>            Reporter: Kihwal Lee
>            Assignee: Aaron T. Myers
>            Priority: Blocker
>         Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> hdfs-5138-branch-2.txt
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to