[
https://issues.apache.org/jira/browse/HDFS-17602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhongkun Wu updated HDFS-17602:
-------------------------------
Description:
We have seventy thousand nodes in our production hadoop cluster, we use rbf to
do namespace federation. when uses space order in rbf, and writes to the mount
point, it fails to write and generates an empty file.
----
We dug into the code and found the root cause: When create a file in rbf, after
create rpc is invoked,the addblock rpc is invoked many times till the write is
done. The rbf space resolver would choose an irrelevant namespace, and the
client will write data to the wrong location
----
These are the code fragments:
!image-2024-08-12-10-08-58-271.png!
In the
MultipleDestinationMountTableResolver.java we invoke
orderedResolver.getFirstNamespace(path, mountTableResult);
It will then invoke this function in RouterResolver.java
!image-2024-08-12-10-12-48-428.png!
and now we are in
chooseFirstNamespace function in AvailableSpaceResolver.java
!image-2024-08-12-10-14-20-580.png!
The path parameter is the destination where we want to create a file
the loc parameter is the mount point we set
this function will choose the most available namespace in all the namespace we
have in StateStore, which is not the same as the mount point we set for our
destination.
As a result we will get a namespace irrelevant to the namespaces we set for
the destination path
we get the namespace we don't set with our destination path, So the it will
choose the first namespace it sees and it's not really the most available
namespace among the namespaces we set for our destination
was:
We have seventy thousand nodes in our production hadoop cluster, we use rbf to
do namespace federation. when uses space order in rbf, and writes to the mount
point, it fails to write and generates an empty file.
----
We dug into the code and found the root cause: When create a file in rbf, after
create rpc is invoked,the addblock rpc is invoked many times till the write is
done. The rbf space resolver would choose an irrelevant namespace, and the
client will write data to the wrong location
----
These are the code fragments:
!image-2024-08-12-10-08-58-271.png!
In the
MultipleDestinationMountTableResolver.java we invoke
orderedResolver.getFirstNamespace(path, mountTableResult);
It will then invoke this function in RouterResolver.java
!image-2024-08-12-10-12-48-428.png!
and now we are in
chooseFirstNamespace function in AvailableSpaceResolver.java
!image-2024-08-12-10-14-20-580.png!
The path parameter is the destination where we want to create a file
the loc parameter is the mount point we set
this function will choose the most available namespace in all the namespace we
have in StateStore, which is not the same as the mount point we set for our
destination.
As a result we will get a namespace irrelevant to the namespaces we set for
the destination path
!image-2024-08-12-10-25-42-863.png!
in the log above:
we get the namespace we don't set with our destination path, So the it will
choose the first namespace it sees and it's not really the most available
namespace among the namespaces we set for our destination
> RBF: Fix mount point with SPACE order can not find the available namespace.
> ---------------------------------------------------------------------------
>
> Key: HDFS-17602
> URL: https://issues.apache.org/jira/browse/HDFS-17602
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: router
> Reporter: Zhongkun Wu
> Assignee: Zhongkun Wu
> Priority: Critical
> Labels: pull-request-available
> Fix For: 3.5.0
>
> Attachments: image-2024-08-12-10-08-54-031.png,
> image-2024-08-12-10-08-58-271.png, image-2024-08-12-10-12-48-428.png,
> image-2024-08-12-10-14-20-580.png, image-2024-08-12-10-25-26-003.png,
> image-2024-08-12-10-25-42-863.png
>
>
> We have seventy thousand nodes in our production hadoop cluster, we use rbf
> to do namespace federation. when uses space order in rbf, and writes to the
> mount point, it fails to write and generates an empty file.
> ----
> We dug into the code and found the root cause: When create a file in rbf,
> after create rpc is invoked,the addblock rpc is invoked many times till the
> write is done. The rbf space resolver would choose an irrelevant namespace,
> and the client will write data to the wrong location
> ----
> These are the code fragments:
>
> !image-2024-08-12-10-08-58-271.png!
> In the
> MultipleDestinationMountTableResolver.java we invoke
> orderedResolver.getFirstNamespace(path, mountTableResult);
> It will then invoke this function in RouterResolver.java
> !image-2024-08-12-10-12-48-428.png!
> and now we are in
> chooseFirstNamespace function in AvailableSpaceResolver.java
> !image-2024-08-12-10-14-20-580.png!
>
> The path parameter is the destination where we want to create a file
> the loc parameter is the mount point we set
>
> this function will choose the most available namespace in all the namespace
> we have in StateStore, which is not the same as the mount point we set for
> our destination.
>
> As a result we will get a namespace irrelevant to the namespaces we set for
> the destination path
>
>
> we get the namespace we don't set with our destination path, So the it will
> choose the first namespace it sees and it's not really the most available
> namespace among the namespaces we set for our destination
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]