[ 
https://issues.apache.org/jira/browse/HDFS-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-11209:
------------------------------
    Attachment: HDFS-11209.04.patch

Delta from v03: removing the unit test change that can't repro the original 
rolling upgrade issue. 

The repro is a bit tricky with MiniDFSCluter as we need to run old version of 
NN with ""hdfs dfsadmin -rollingUpgrade prepare"  to create a fsiamge with the 
old layoutversion. Then do the upgrade and run the primary namenode(new 
software layout version) with "-rollingUpgrade started" option and secondary 
namenode (new software layout version) as normal. 

The software layout version is determined by static method from LayoutVersion 
class which is not supported with mockito. It is possible to do that with 
powermock + mockito. Decide to add unit test in a separate ticket. I've 
manually tested upgrade from Hadoop 2.6 -> Hadoop 2.7.1 in a non-HA setup with 
layout version changing 60->63 and verified that the SNN can checkpoint with an 
unfinalized primary NN rollingupgrade.

> SNN can't checkpoint when rolling upgrade is not finalized
> ----------------------------------------------------------
>
>                 Key: HDFS-11209
>                 URL: https://issues.apache.org/jira/browse/HDFS-11209
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: rolling upgrades
>    Affects Versions: 2.8.0, 3.0.0-alpha1
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>            Priority: Critical
>         Attachments: HDFS-11209.00.patch, HDFS-11209.01.patch, 
> HDFS-11209.02.patch, HDFS-11209.03.patch, HDFS-11209.04.patch
>
>
> Similar problem has been fixed with HDFS-7185. Recent change in HDFS-8432 
> brings this back. 
> With HDFS-8432, the primary NN will not update the VERSION file to the new 
> version after running with "rollingUpgrade" option until upgrade is 
> finalized. This is to support more downgrade use cases.
> However, the checkpoint on the SNN is incorrectly updating the VERSION file 
> when the rollingUpgrade is not finalized yet on the primary NN. As a result, 
> the SNN checkpoint successfully but fail to push it to the primary NN because 
> its version is higher than the primary NN as shown below.
> {code}
> 2016-12-02 05:25:31,918 ERROR namenode.SecondaryNameNode 
> (SecondaryNameNode.java:doWork(399)) - Exception in doCheckpoint
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
>  Image uploading failed, status: 403, url: 
> http://NN:50070/imagetransfer?txid=345404754&imageFile=IMAGE&File-Le..., 
> message: This namenode has storage info -60:221856466:1444080250181:clusterX 
> but the secondary expected -63:221856466:1444080250181:clusterX
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to