Danny Becker created HDFS-17178:
-----------------------------------
Summary: Bootstrap Standby needs to handle RollingUpgrade
Key: HDFS-17178
URL: https://issues.apache.org/jira/browse/HDFS-17178
Project: Hadoop HDFS
Issue Type: Improvement
Components: namenode
Reporter: Danny Becker
Assignee: Danny Becker
Fix For: 3.3.4
During rollingUpgrade, bootstrapStandby will fail with an exception due to
different NameNodeLayoutVersions. We can ignore this safely during
RollingUpgrade because different NameNodeLayoutVersions are expected.
* NameNodes will not be able to recover with BootstrapStandby if they go
through destructive repair before the rollingUpgrade has been finalized.
Error during BootstrapStandby before change:
{code:java}
=====================================================
About to bootstrap Standby ID nn2 from:
Nameservice ID: MTPrime-MWHE01-0
Other Namenode ID: nn1
Other NN's HTTP address: https://MWHEEEAP002D9A2:81
Other NN's IPC address: MWHEEEAP002D9A2.ap.gbl/10.59.208.18:8020
Namespace ID: 895912530
Block pool ID: BP-1556042256-10.99.154.61-1663325602669
Cluster ID: MWHE01
Layout version: -64
isUpgradeFinalized: true
=====================================================
2023-08-28T19:35:06,940 ERROR [main] namenode.NameNode: Failed to start
namenode.
java.io.IOException: java.lang.RuntimeException:
org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException:
Image transfer servlet at
https://MWHEEEAP002D9A2:81/imagetransfer?getimage=1&txid=25683470&storageInfo=-64:895912530:1663325602669:MWHE01&bootstrapstandby=true
failed with status code 403
Response message:
This namenode has storage info -63:895912530:1663325602669:MWHE01 but the
secondary expected -64:895912530:1663325602669:MWHE01
at
org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.run(BootstrapStandby.java:583)
~[hadoop-hdfs-2.9.2-MT-SNAPSHOT.jar:?]
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1717)
~[hadoop-hdfs-2.9.2-MT-SNAPSHOT.jar:?]
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1819)
[hadoop-hdfs-2.9.2-MT-SNAPSHOT.jar:?]
Caused by: java.lang.RuntimeException:
org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException:
Image transfer servlet at https://MWHEEEAP002D9A2:81{code}
This is caused because the namespaceInfo sent from the proxy node does not
include the effective layout version, which causes BootstrapStandby to send a
request with a storageinfo param using the service layout version. This causes
the proxy node to refuse the request, because it compares the storageinfo param
against its storage info, which uses the effective layout version, not the
service layout version.
To fix this we can modify the proxy.versionRequest() call stack to set the
layout version using the effective layout version on the proxy node. We can
then add logic to BootstrapStandby to properly handle the case where the proxy
node is in rolling upgrade.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]