[ https://issues.apache.org/jira/browse/HDFS-17356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811207#comment-17811207 ]
xiaojunxiang commented on HDFS-17356: ------------------------------------- [~hiwangzhihui][~hexiaoqiao] Thaks for your tip, but after my test, when the router client nameservice exists in dfs.nameservices, even if is configured with dfs.federation.router.monitor.localnamenode.enable=fasle, The impact on namenode persists, causing NameNode startup failure. !screenshot-1.png! According to the code, It work in Router Server side, but our appeal is that HDFSClient conflicts with NameNode (and DataNode server) sides !screenshot-2.png! > RBF: Add Configuration dfs.federation.router.ns.name Optimization > ----------------------------------------------------------------- > > Key: HDFS-17356 > URL: https://issues.apache.org/jira/browse/HDFS-17356 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfs, rbf > Reporter: wangzhihui > Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > When enabling RBF federation in HDFS, when the HDFS server and RBFClient > share the same configuration and the HDFS server (NameNode、ZKFC) and > RBFClient are on the same node, the following exception occurs, causing > NameNode to fail to start; The reason is that the NS of the Router service > has been added to the dfs.nameservices list. When NameNode starts, it obtains > the NS that the current node belongs to. However, it is found that there are > multiple NS that cannot be recognized and cannot pass the verification of > existing logic, ultimately resulting in NameNode startup failure. Currently, > we can only solve this problem by isolating the hdfs-site.xml of RouterClient > and NameNode. However, grouping configuration is not conducive to our unified > management of cluster configuration. Therefore, we propose a new solution to > solve this problem better. > {code:java} > // code placeholder > 2023-10-30 15:53:24,613 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: > registered UNIX signal handlers for [TERM, HUP, INT] > 2023-10-30 15:53:24,672 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: > createNameNode [] > 2023-10-30 15:53:24,760 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: > Loaded properties from hadoop-metrics2.properties > 2023-10-30 15:53:24,842 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot > period at 10 second(s). > 2023-10-30 15:53:24,842 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > started > 2023-10-30 15:53:24,868 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. > org.apache.hadoop.HadoopIllegalArgumentException: Configuration has multiple > addresses that match local node's address. Please configure the system with > dfs.nameservice.id and dfs.ha.namenode.id > at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1257) > at org.apache.hadoop.hdfs.DFSUtil.getNameServiceId(DFSUtil.java:1158) > at > org.apache.hadoop.hdfs.DFSUtil.getNamenodeNameServiceId(DFSUtil.java:1113) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.getNameServiceId(NameNode.java:1822) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:1005) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:995) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1769) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1834) > 2023-10-30 15:53:24,870 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1: org.apache.hadoop.HadoopIllegalArgumentException: Configuration has > multiple addresses that match local node's address. Please configure the > system with dfs.nameservice.id and dfs.ha.name > node.id > 2023-10-30 15:53:24,874 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: > SHUTDOWN_MSG: {code} > > hdfs-site.xml > {code:java} > // code placeholder > <property> > <name>dfs.nameservices</name> > <value>mycluster1,mycluster2,ns-fed</value> > </property><property> > <name>dfs.ha.namenodes.ns-fed</name> > <value>r1</value> > </property> > <property> > <name>dfs.namenode.rpc-address.ns-fed.r1</name> > <value>node1.com:8888</value> > </property> > <property> > <name>dfs.ha.namenodes.mycluster1</name> > <value>nn1,nn2</value> > </property> > <property> > <name>dfs.namenode.http-address.mycluster1.nn1</name> > <value>node1.com:50070</value> > </property> > <property> > <name>dfs.namenode.http-address.mycluster1.nn2</name> > <value>node2.com:50070</value> > </property><property> > <name>dfs.ha.namenodes.mycluster2</name> > <value>nn1,nn2</value> > </property> > <property> > <name>dfs.namenode.http-address.mycluster2.nn1</name> > <value>node3.com:50070</value> > </property> > <property> > <name>dfs.namenode.http-address.mycluster2.nn2</name> > <value>node4.com:50070</value> > </property><property> > <name>dfs.client.failover.proxy.provider.ns-fed</name> > > <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> > </property> > <property> > <name>dfs.client.failover.random.order</name> > <value>true</value> > </property> {code} > > Solution > Add dfs.federation.router.ns.name configuration in hdfs-site.xml to mark the > Router NS name. and filter out Router NS during NameNode or ZKFC startup to > avoid this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org