[ 
https://issues.apache.org/jira/browse/HDFS-17356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811892#comment-17811892
 ] 

xiaojunxiang edited comment on HDFS-17356 at 1/29/24 2:08 PM:
--------------------------------------------------------------

[~tasanuma], [~hiwangzhihui] [~hexiaoqiao] thanks yours advise, I have found 
three solutions to this problem, and now I will summarize them.

1. Option 1:  Configure dfs.nameservice.id=<current ns> or 
dfs.ha.namenode.id=<current nn>
   - advantage: No new development is needed, RouterServer and NameNode and 
HDFSClient can be deployed on the same node
   - disadvantage: Different nodes need to have different dfs.nameservice.id 
configurations

2. Option 2: Develop new configuration  "dfs.federation.router.ns.name" as 
suggested by jira.
   - advantage: RouterServer and NameNode and HDFSClient can be deployed on the 
same node,different nodes can use the same configurations
   - disadvantage:  Need new development 

3. Option 3: Constrained deployment pattern,the Router and NameNode are need 
deployed on different nodes
   - advantage: No new development is needed, different nodes can use the same 
configurations
   - disadvantage:  The Router and NameNode are need deployed on different nodes

In summary, if you have more than 5 hosts in your cluster, and hate that 
different node cannot use the same configurations, then the Options 3 will be 
the simplest and best solution
 !screenshot-5.png! 


was (Author: JIRAUSER300087):
[~tasanuma], [~hiwangzhihui] [~hexiaoqiao] thanks yours advise, I have found 
three solutions to this problem, and now I will summarize them.

1. Option 1:  Configure dfs.nameservice.id=<current ns> or 
dfs.ha.namenode.id=<current nn>
   - advantage: No new development is needed, RouterServer and NameNode and 
HDFSClient can be deployed on the same node
   - disadvantage: Different nodes need to have different dfs.nameservice.id 
configurations

2. Option 2: Develop new configuration  "dfs.federation.router.ns.name" as 
suggested by jira.
   - advantage: RouterServer and NameNode and HDFSClient can be deployed on the 
same node,different nodes can use the same configurations
   - disadvantage:  Need new development 

3. Option 3: Constrained deployment pattern,the Router and NameNode are need 
deployed on different nodes
   - advantage: No new development is needed, different nodes can use the same 
configurations
   - disadvantage:  The Router and NameNode are need deployed on different nodes

In summary, if you have more than 5 hosts in your cluster, and hate that 
different node cannot use the same configurations, Options 3 is the simplest 
solution
 !screenshot-5.png! 

> RBF: Add Configuration dfs.federation.router.ns.name Optimization
> -----------------------------------------------------------------
>
>                 Key: HDFS-17356
>                 URL: https://issues.apache.org/jira/browse/HDFS-17356
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: dfs, rbf
>            Reporter: wangzhihui
>            Priority: Minor
>         Attachments: image-2024-01-29-18-04-55-391.png, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png, screenshot-4.png, screenshot-5.png
>
>
>     When enabling RBF federation in HDFS, when the HDFS server and RBFClient 
> share the same configuration and the HDFS server (NameNode、ZKFC) and 
> RBFClient are on the same node, the following exception occurs, causing 
> NameNode to fail to start; The reason is that the NS of the Router service 
> has been added to the dfs.nameservices list. When NameNode starts, it obtains 
> the NS that the current node belongs to. However, it is found that there are 
> multiple NS that cannot be recognized and cannot pass the verification of 
> existing logic, ultimately resulting in NameNode startup failure. Currently, 
> we can only solve this problem by isolating the hdfs-site.xml of RouterClient 
> and NameNode. However, grouping configuration is not conducive to our unified 
> management of cluster configuration. Therefore, we propose a new solution to 
> solve this problem better.
> {code:java}
> // code placeholder
> 2023-10-30 15:53:24,613 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> registered UNIX signal handlers for [TERM, HUP, INT]
> 2023-10-30 15:53:24,672 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> createNameNode []
> 2023-10-30 15:53:24,760 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: 
> Loaded properties from hadoop-metrics2.properties
> 2023-10-30 15:53:24,842 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 2023-10-30 15:53:24,842 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system 
> started
> 2023-10-30 15:53:24,868 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
> org.apache.hadoop.HadoopIllegalArgumentException: Configuration has multiple 
> addresses that match local node's address. Please configure the system with 
> dfs.nameservice.id and dfs.ha.namenode.id
>         at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1257)
>         at org.apache.hadoop.hdfs.DFSUtil.getNameServiceId(DFSUtil.java:1158)
>         at 
> org.apache.hadoop.hdfs.DFSUtil.getNamenodeNameServiceId(DFSUtil.java:1113)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.getNameServiceId(NameNode.java:1822)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:1005)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:995)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1769)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1834)
> 2023-10-30 15:53:24,870 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: org.apache.hadoop.HadoopIllegalArgumentException: Configuration has 
> multiple addresses that match local node's address. Please configure the 
> system with dfs.nameservice.id and dfs.ha.name
> node.id
> 2023-10-30 15:53:24,874 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> SHUTDOWN_MSG: {code}
>  
> hdfs-site.xml
> {code:java}
> // code placeholder
> <property>
>   <name>dfs.nameservices</name>
>   <value>mycluster1,mycluster2,ns-fed</value>
> </property><property>
>   <name>dfs.ha.namenodes.ns-fed</name>
>   <value>r1</value>
> </property>
> <property>
>   <name>dfs.namenode.rpc-address.ns-fed.r1</name>
>   <value>node1.com:8888</value>
> </property>
> <property>
>   <name>dfs.ha.namenodes.mycluster1</name>
>   <value>nn1,nn2</value>
> </property>
> <property>
>   <name>dfs.namenode.http-address.mycluster1.nn1</name>
>   <value>node1.com:50070</value>
> </property>
> <property>
>   <name>dfs.namenode.http-address.mycluster1.nn2</name>
>   <value>node2.com:50070</value>
> </property><property>
>   <name>dfs.ha.namenodes.mycluster2</name>
>   <value>nn1,nn2</value>
> </property>
> <property>
>   <name>dfs.namenode.http-address.mycluster2.nn1</name>
>   <value>node3.com:50070</value>
> </property>
> <property>
>   <name>dfs.namenode.http-address.mycluster2.nn2</name>
>   <value>node4.com:50070</value>
> </property><property>
>   <name>dfs.client.failover.proxy.provider.ns-fed</name>
>   
> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> </property>
> <property>
>   <name>dfs.client.failover.random.order</name>
>   <value>true</value>
> </property> {code}
>  
> Solution
> Add dfs.federation.router.ns.name configuration in hdfs-site.xml to mark the 
> Router NS name. and filter out Router NS during NameNode or ZKFC startup to 
> avoid this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to