[ https://issues.apache.org/jira/browse/HDFS-17602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhongkun Wu updated HDFS-17602: ------------------------------- Description: We have seventy thousand nodes in our production hadoop cluster, we use rbf to do namespace federation. when uses space order in rbf, and writes to the mount point, it fails to write and generates an empty file. ---- We dug into the code and found the root cause: When create a file in rbf, after create rpc is invoked,the addblock rpc is invoked many times till the write is done. The rbf space resolver would choose an irrelevant namespace, and the client will write data to the wrong location ---- These are the code fragments: !image-2024-08-12-10-08-58-271.png! In the MultipleDestinationMountTableResolver.java we invoke orderedResolver.getFirstNamespace(path, mountTableResult); It will then invoke this function in RouterResolver.java !image-2024-08-12-10-12-48-428.png! and now we are in chooseFirstNamespace function in AvailableSpaceResolver.java !image-2024-08-12-10-14-20-580.png! The path parameter is the destination where we want to create a file the loc parameter is the mount point we set this function will choose the most available namespace in all the namespace we have in StateStore, which is not the same as the mount point we set for our destination. As a result we will get a namespace irrelevant to the namespaces we set for the destination path we get the namespace we don't set with our destination path, So the it will choose the first namespace it sees and it's not really the most available namespace among the namespaces we set for our destination was: We have seventy thousand nodes in our production hadoop cluster, we use rbf to do namespace federation. when uses space order in rbf, and writes to the mount point, it fails to write and generates an empty file. ---- We dug into the code and found the root cause: When create a file in rbf, after create rpc is invoked,the addblock rpc is invoked many times till the write is done. The rbf space resolver would choose an irrelevant namespace, and the client will write data to the wrong location ---- These are the code fragments: !image-2024-08-12-10-08-58-271.png! In the MultipleDestinationMountTableResolver.java we invoke orderedResolver.getFirstNamespace(path, mountTableResult); It will then invoke this function in RouterResolver.java !image-2024-08-12-10-12-48-428.png! and now we are in chooseFirstNamespace function in AvailableSpaceResolver.java !image-2024-08-12-10-14-20-580.png! The path parameter is the destination where we want to create a file the loc parameter is the mount point we set this function will choose the most available namespace in all the namespace we have in StateStore, which is not the same as the mount point we set for our destination. As a result we will get a namespace irrelevant to the namespaces we set for the destination path !image-2024-08-12-10-25-42-863.png! in the log above: we get the namespace we don't set with our destination path, So the it will choose the first namespace it sees and it's not really the most available namespace among the namespaces we set for our destination > RBF: Fix mount point with SPACE order can not find the available namespace. > --------------------------------------------------------------------------- > > Key: HDFS-17602 > URL: https://issues.apache.org/jira/browse/HDFS-17602 > Project: Hadoop HDFS > Issue Type: Bug > Components: router > Reporter: Zhongkun Wu > Assignee: Zhongkun Wu > Priority: Critical > Labels: pull-request-available > Fix For: 3.5.0 > > Attachments: image-2024-08-12-10-08-54-031.png, > image-2024-08-12-10-08-58-271.png, image-2024-08-12-10-12-48-428.png, > image-2024-08-12-10-14-20-580.png, image-2024-08-12-10-25-26-003.png, > image-2024-08-12-10-25-42-863.png > > > We have seventy thousand nodes in our production hadoop cluster, we use rbf > to do namespace federation. when uses space order in rbf, and writes to the > mount point, it fails to write and generates an empty file. > ---- > We dug into the code and found the root cause: When create a file in rbf, > after create rpc is invoked,the addblock rpc is invoked many times till the > write is done. The rbf space resolver would choose an irrelevant namespace, > and the client will write data to the wrong location > ---- > These are the code fragments: > > !image-2024-08-12-10-08-58-271.png! > In the > MultipleDestinationMountTableResolver.java we invoke > orderedResolver.getFirstNamespace(path, mountTableResult); > It will then invoke this function in RouterResolver.java > !image-2024-08-12-10-12-48-428.png! > and now we are in > chooseFirstNamespace function in AvailableSpaceResolver.java > !image-2024-08-12-10-14-20-580.png! > > The path parameter is the destination where we want to create a file > the loc parameter is the mount point we set > > this function will choose the most available namespace in all the namespace > we have in StateStore, which is not the same as the mount point we set for > our destination. > > As a result we will get a namespace irrelevant to the namespaces we set for > the destination path > > > we get the namespace we don't set with our destination path, So the it will > choose the first namespace it sees and it's not really the most available > namespace among the namespaces we set for our destination > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org