[ https://issues.apache.org/jira/browse/HDFS-17356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811892#comment-17811892 ]
xiaojunxiang edited comment on HDFS-17356 at 1/29/24 2:08 PM: -------------------------------------------------------------- [~tasanuma], [~hiwangzhihui] [~hexiaoqiao] thanks yours advise, I have found three solutions to this problem, and now I will summarize them. 1. Option 1: Configure dfs.nameservice.id=<current ns> or dfs.ha.namenode.id=<current nn> - advantage: No new development is needed, RouterServer and NameNode and HDFSClient can be deployed on the same node - disadvantage: Different nodes need to have different dfs.nameservice.id configurations 2. Option 2: Develop new configuration "dfs.federation.router.ns.name" as suggested by jira. - advantage: RouterServer and NameNode and HDFSClient can be deployed on the same node,different nodes can use the same configurations - disadvantage: Need new development 3. Option 3: Constrained deployment pattern,the Router and NameNode are need deployed on different nodes - advantage: No new development is needed, different nodes can use the same configurations - disadvantage: The Router and NameNode are need deployed on different nodes In summary, if you have more than 5 hosts in your cluster, and hate that different node cannot use the same configurations, then the Options 3 will be the simplest and best solution !screenshot-5.png! was (Author: JIRAUSER300087): [~tasanuma], [~hiwangzhihui] [~hexiaoqiao] thanks yours advise, I have found three solutions to this problem, and now I will summarize them. 1. Option 1: Configure dfs.nameservice.id=<current ns> or dfs.ha.namenode.id=<current nn> - advantage: No new development is needed, RouterServer and NameNode and HDFSClient can be deployed on the same node - disadvantage: Different nodes need to have different dfs.nameservice.id configurations 2. Option 2: Develop new configuration "dfs.federation.router.ns.name" as suggested by jira. - advantage: RouterServer and NameNode and HDFSClient can be deployed on the same node,different nodes can use the same configurations - disadvantage: Need new development 3. Option 3: Constrained deployment pattern,the Router and NameNode are need deployed on different nodes - advantage: No new development is needed, different nodes can use the same configurations - disadvantage: The Router and NameNode are need deployed on different nodes In summary, if you have more than 5 hosts in your cluster, and hate that different node cannot use the same configurations, Options 3 is the simplest solution !screenshot-5.png! > RBF: Add Configuration dfs.federation.router.ns.name Optimization > ----------------------------------------------------------------- > > Key: HDFS-17356 > URL: https://issues.apache.org/jira/browse/HDFS-17356 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfs, rbf > Reporter: wangzhihui > Priority: Minor > Attachments: image-2024-01-29-18-04-55-391.png, screenshot-1.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png, screenshot-5.png > > > When enabling RBF federation in HDFS, when the HDFS server and RBFClient > share the same configuration and the HDFS server (NameNode、ZKFC) and > RBFClient are on the same node, the following exception occurs, causing > NameNode to fail to start; The reason is that the NS of the Router service > has been added to the dfs.nameservices list. When NameNode starts, it obtains > the NS that the current node belongs to. However, it is found that there are > multiple NS that cannot be recognized and cannot pass the verification of > existing logic, ultimately resulting in NameNode startup failure. Currently, > we can only solve this problem by isolating the hdfs-site.xml of RouterClient > and NameNode. However, grouping configuration is not conducive to our unified > management of cluster configuration. Therefore, we propose a new solution to > solve this problem better. > {code:java} > // code placeholder > 2023-10-30 15:53:24,613 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: > registered UNIX signal handlers for [TERM, HUP, INT] > 2023-10-30 15:53:24,672 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: > createNameNode [] > 2023-10-30 15:53:24,760 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: > Loaded properties from hadoop-metrics2.properties > 2023-10-30 15:53:24,842 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot > period at 10 second(s). > 2023-10-30 15:53:24,842 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > started > 2023-10-30 15:53:24,868 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. > org.apache.hadoop.HadoopIllegalArgumentException: Configuration has multiple > addresses that match local node's address. Please configure the system with > dfs.nameservice.id and dfs.ha.namenode.id > at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1257) > at org.apache.hadoop.hdfs.DFSUtil.getNameServiceId(DFSUtil.java:1158) > at > org.apache.hadoop.hdfs.DFSUtil.getNamenodeNameServiceId(DFSUtil.java:1113) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.getNameServiceId(NameNode.java:1822) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:1005) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:995) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1769) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1834) > 2023-10-30 15:53:24,870 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1: org.apache.hadoop.HadoopIllegalArgumentException: Configuration has > multiple addresses that match local node's address. Please configure the > system with dfs.nameservice.id and dfs.ha.name > node.id > 2023-10-30 15:53:24,874 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: > SHUTDOWN_MSG: {code} > > hdfs-site.xml > {code:java} > // code placeholder > <property> > <name>dfs.nameservices</name> > <value>mycluster1,mycluster2,ns-fed</value> > </property><property> > <name>dfs.ha.namenodes.ns-fed</name> > <value>r1</value> > </property> > <property> > <name>dfs.namenode.rpc-address.ns-fed.r1</name> > <value>node1.com:8888</value> > </property> > <property> > <name>dfs.ha.namenodes.mycluster1</name> > <value>nn1,nn2</value> > </property> > <property> > <name>dfs.namenode.http-address.mycluster1.nn1</name> > <value>node1.com:50070</value> > </property> > <property> > <name>dfs.namenode.http-address.mycluster1.nn2</name> > <value>node2.com:50070</value> > </property><property> > <name>dfs.ha.namenodes.mycluster2</name> > <value>nn1,nn2</value> > </property> > <property> > <name>dfs.namenode.http-address.mycluster2.nn1</name> > <value>node3.com:50070</value> > </property> > <property> > <name>dfs.namenode.http-address.mycluster2.nn2</name> > <value>node4.com:50070</value> > </property><property> > <name>dfs.client.failover.proxy.provider.ns-fed</name> > > <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> > </property> > <property> > <name>dfs.client.failover.random.order</name> > <value>true</value> > </property> {code} > > Solution > Add dfs.federation.router.ns.name configuration in hdfs-site.xml to mark the > Router NS name. and filter out Router NS during NameNode or ZKFC startup to > avoid this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org