[ 
https://issues.apache.org/jira/browse/HDFS-16487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498531#comment-17498531
 ] 

Fengnan Li commented on HDFS-16487:
-----------------------------------

Thanks for the discussion [~ayushtkn] [~elgoiri] !

One big part of HDFS-13506 is to set the right owner/group permission of the 
physical HDFS dir/file. These values need to be passed to router and doAs the 
user to create the right permission. 

Internally we haven't turned on HDFS-15554 since some services are creating 
mounts before creating the dirs.

However even with both patches there is one contradiction here: HDFS 
paths/files are generally created by clients and Router mounts are created by 
RouterAdmin. If we bundle them together we are either making Router know 
clients behavior at precisely each dir level and with all the right information 
to create the path (permission and even ACL), or grant clients RouterAdmin 
access (which is how one of our internal services is doing) and this is less 
ideal as well.

The context of the whole of this rethinking is:

We are backing up all data sets in secondary datacenter from primary 
datacenter. There are about ~10k Hive tables as a big part of it. These tables 
come with various owner/group. One service is constantly running jobs to copy 
data. Per table, partitions are at different HDFS clusters and Router mounts 
specify the location. Initially we only created the partition mounts, like:

table/2018 -> HDFS A

table/2022 -> HDFS B

When the copy service starts, it lists the dirs for one table and Router 
returns all of these mounts. Client think there is already 2018 partition and 
it starts to create 2018/01 then failed on NoSuchFileException. From the 
client's perspective, listing returns the wrong results.

> RBF: getListing uses raw mount table points
> -------------------------------------------
>
>                 Key: HDFS-16487
>                 URL: https://issues.apache.org/jira/browse/HDFS-16487
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Fengnan Li
>            Assignee: Aihua Xu
>            Priority: Major
>
> In getListing, the result is a union of subclusters results and mount points. 
> However these two are of different concepts and the latter one is something 
> Router internal. It is very possible that the actual path doesn't exist in 
> the dest HDFS yet. 
> Can we choose a different strategy that check each children mount point and 
> confirm there is the HDFS path in the dest cluster? If so, we can add it; 
> otherwise we should skip this mount because it confuses clients. (Clients 
> could directly create a subdir under a dangling mount point)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to