[ 
https://issues.apache.org/jira/browse/HDFS-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17178:
------------------------------
    Affects Version/s: 3.4.0

> BootstrapStandby needs to handle RollingUpgrade 
> ------------------------------------------------
>
>                 Key: HDFS-17178
>                 URL: https://issues.apache.org/jira/browse/HDFS-17178
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 3.4.0
>            Reporter: Danny Becker
>            Assignee: Danny Becker
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 3.4.0
>
>
> During rollingUpgrade, bootstrapStandby will fail with an exception due to 
> different NameNodeLayoutVersions. We can ignore this safely during 
> RollingUpgrade because different NameNodeLayoutVersions are expected.
>  * NameNodes will not be able to recover with BootstrapStandby if they go 
> through destructive repair before the rollingUpgrade has been finalized.
> Error during BootstrapStandby before change:
> {code:java}
> =====================================================
> About to bootstrap Standby ID nn2 from:
>            Nameservice ID: MTPrime-MWHE01-0
>         Other Namenode ID: nn1
>   Other NN's HTTP address: https://MWHEEEAP002D9A2:81
>   Other NN's IPC  address: MWHEEEAP002D9A2.ap.gbl/10.59.208.18:8020
>              Namespace ID: 895912530
>             Block pool ID: BP-1556042256-10.99.154.61-1663325602669
>                Cluster ID: MWHE01
>            Layout version: -64
>        isUpgradeFinalized: true
> =====================================================
> 2023-08-28T19:35:06,940 ERROR [main] namenode.NameNode: Failed to start 
> namenode.
> java.io.IOException: java.lang.RuntimeException: 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException:
>  Image transfer servlet at 
> https://MWHEEEAP002D9A2:81/imagetransfer?getimage=1&txid=25683470&storageInfo=-64:895912530:1663325602669:MWHE01&bootstrapstandby=true
>  failed with status code 403
> Response message:
> This namenode has storage info -63:895912530:1663325602669:MWHE01 but the 
> secondary expected -64:895912530:1663325602669:MWHE01
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.run(BootstrapStandby.java:583)
>  ~[hadoop-hdfs-2.9.2-MT-SNAPSHOT.jar:?]
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1717)
>  ~[hadoop-hdfs-2.9.2-MT-SNAPSHOT.jar:?]
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1819) 
> [hadoop-hdfs-2.9.2-MT-SNAPSHOT.jar:?]
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException:
>  Image transfer servlet at https://MWHEEEAP002D9A2:81{code}
> This is caused because the namespaceInfo sent from the proxy node does not 
> include the effective layout version, which causes BootstrapStandby to send a 
> request with a storageinfo param using the service layout version. This 
> causes the proxy node to refuse the request, because it compares the 
> storageinfo param against its storage info, which uses the effective layout 
> version, not the service layout version. 
> To fix this we can modify the proxy.versionRequest() call stack to set the 
> layout version using the effective layout version on the proxy node. We can 
> then add logic to BootstrapStandby to properly handle the case where the 
> proxy node is in rolling upgrade.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to