[ 
https://issues.apache.org/jira/browse/HDFS-17602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongkun Wu updated HDFS-17602:
-------------------------------
    Description: 
We have seventy thousand nodes in our production hadoop cluster, we use rbf to 
do namespace federation. when uses space order in rbf, and writes to the mount 
point, it fails to write and generates an empty file.  
----
We dug into the code and found the root cause: When create a file in rbf, after 
create rpc is invoked,the addblock rpc is invoked many times till the write is 
done. The rbf space resolver would choose an irrelevant namespace, and the 
client will write data to the wrong location
----
These are the code fragments:

 

!image-2024-08-12-10-08-58-271.png!

In the 
MultipleDestinationMountTableResolver.java we invoke 
orderedResolver.getFirstNamespace(path, mountTableResult);
It will then invoke this function in RouterResolver.java
!image-2024-08-12-10-12-48-428.png!
and now we are in 
chooseFirstNamespace function in AvailableSpaceResolver.java 
!image-2024-08-12-10-14-20-580.png!
 
The path parameter is the destination where we want to create a file
the loc parameter is the mount point we set
 
this function will choose the most available namespace in all the namespace we 
have in StateStore, which is not the same as the mount point we set for our 
destination.
 
As a result we will get a namespace  irrelevant to the namespaces we set for 
the destination path

 

 

we get the namespace we don't set with our destination path, So the it will 
choose the first namespace it sees and it's not really the most available 
namespace among the namespaces we set for our destination

 

 
 
 

  was:
We have seventy thousand nodes in our production hadoop cluster, we use rbf to 
do namespace federation. when uses space order in rbf, and writes to the mount 
point, it fails to write and generates an empty file.  
----
We dug into the code and found the root cause: When create a file in rbf, after 
create rpc is invoked,the addblock rpc is invoked many times till the write is 
done. The rbf space resolver would choose an irrelevant namespace, and the 
client will write data to the wrong location
----
These are the code fragments:

 

!image-2024-08-12-10-08-58-271.png!

In the 
MultipleDestinationMountTableResolver.java we invoke 
orderedResolver.getFirstNamespace(path, mountTableResult);
It will then invoke this function in RouterResolver.java
!image-2024-08-12-10-12-48-428.png!
and now we are in 
chooseFirstNamespace function in AvailableSpaceResolver.java 
!image-2024-08-12-10-14-20-580.png!
 
The path parameter is the destination where we want to create a file
the loc parameter is the mount point we set
 
this function will choose the most available namespace in all the namespace we 
have in StateStore, which is not the same as the mount point we set for our 
destination.
 
As a result we will get a namespace  irrelevant to the namespaces we set for 
the destination path

 

!image-2024-08-12-10-25-42-863.png!

in the log above:

we get the namespace we don't set with our destination path, So the it will 
choose the first namespace it sees and it's not really the most available 
namespace among the namespaces we set for our destination

 

 
 
 


> RBF: Fix mount point with SPACE order can not find the available namespace.
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-17602
>                 URL: https://issues.apache.org/jira/browse/HDFS-17602
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: router
>            Reporter: Zhongkun Wu
>            Assignee: Zhongkun Wu
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 3.5.0
>
>         Attachments: image-2024-08-12-10-08-54-031.png, 
> image-2024-08-12-10-08-58-271.png, image-2024-08-12-10-12-48-428.png, 
> image-2024-08-12-10-14-20-580.png, image-2024-08-12-10-25-26-003.png, 
> image-2024-08-12-10-25-42-863.png
>
>
> We have seventy thousand nodes in our production hadoop cluster, we use rbf 
> to do namespace federation. when uses space order in rbf, and writes to the 
> mount point, it fails to write and generates an empty file.  
> ----
> We dug into the code and found the root cause: When create a file in rbf, 
> after create rpc is invoked,the addblock rpc is invoked many times till the 
> write is done. The rbf space resolver would choose an irrelevant namespace, 
> and the client will write data to the wrong location
> ----
> These are the code fragments:
>  
> !image-2024-08-12-10-08-58-271.png!
> In the 
> MultipleDestinationMountTableResolver.java we invoke 
> orderedResolver.getFirstNamespace(path, mountTableResult);
> It will then invoke this function in RouterResolver.java
> !image-2024-08-12-10-12-48-428.png!
> and now we are in 
> chooseFirstNamespace function in AvailableSpaceResolver.java 
> !image-2024-08-12-10-14-20-580.png!
>  
> The path parameter is the destination where we want to create a file
> the loc parameter is the mount point we set
>  
> this function will choose the most available namespace in all the namespace 
> we have in StateStore, which is not the same as the mount point we set for 
> our destination.
>  
> As a result we will get a namespace  irrelevant to the namespaces we set for 
> the destination path
>  
>  
> we get the namespace we don't set with our destination path, So the it will 
> choose the first namespace it sees and it's not really the most available 
> namespace among the namespaces we set for our destination
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to