[ 
https://issues.apache.org/jira/browse/HADOOP-15891?focusedWorklogId=478247&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-478247
 ]

ASF GitHub Bot logged work on HADOOP-15891:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Sep/20 01:49
            Start Date: 03/Sep/20 01:49
    Worklog Time Spent: 10m 
      Work Description: JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-686183399


   Thanks @umamaheswararao 
   
   
   
   @umamaheswararao  Thanks for the comments. Please see the reply inline.
   > Hi @JohnZZGithub, I got few other points to discuss.
   > 
   > 1. We have exposed getMountPoints API. It seems we can't return any mount 
points from REGEX based because you would not know until you got src paths to 
resolve and find real target fs. What should we do for this API?
   
   It's a great question. I guess most caller of getMountPoints wants to 
traverse all the file systems to do some operation. E.g. setVerifyChecksum(). 
We didn't see issues on our internal Yarn + HDFS and Yarn + GCS clusters. The 
usage pattern includes but not limited to MR, Spark, Presto, Vertica loading 
and etc. But it's possible that some users might rely on these APIs. I could 
see two options forward:
   1. Returning a MountPint with special FileSystem for Regex Mount points. We 
could cache the initialized fileSystem under the regex mountpoint and perform 
the operation. For filesystems that might appear in the future, we could cache 
the past calls from callers and try to apply it or just not support it. 
   2. We could indicate that we don't support such APIs for regex mount points.
   And to extend the topic a little bit, this kind of ViewFileSystem API (API 
which tries to visit all file systems) caused several problems for us.  E.g. 
setVerifyChecksum() initialized a file system for a mount point users didn't 
want to use it all. And the initialization of the file system will fail as it 
requires credentials during initialization. Users don't have it as it never 
means to visit the mount point. We developed a LazyChRootedFileSystem on top of 
every target system (not public) to do lazy initialization for path-based APIs. 
But it's hard to tackle APIs without path passed in. So to summarize, we see 
cases users want to avoid these non-path based API to trigger actions on every 
child file system. In the meantime, some users(though rare in our scenarios) 
might want to use these APIs applied to all children's filesystems. I feel it's 
hard to satisfy both needs.
   
   > 2. Other API is getDelegationTokenIssuers. Applications like YARN uses 
this API to get all child fs delegation tokens. This also will not work for 
REGEX based mount points.
    We did see an issue with addDelegationTokens in the secure Hadoop cluster. 
But the problem we met is not all normal mountpoints are secure. So the API 
caused a problem when it tries to initialize all children's file systems. We 
took a workaround by making it path-based. As for getDelegationTokens, I guess 
the problem is similar. We didn't see issues because it's not used. Could we 
make it path based too?  Or we could take the approach stated in problem one.
   
   > 3. Other question is how this child filesystem objects gets closed. There 
was an issue with [ViewFileSystem#close | 
https://issues.apache.org/jira/browse/HADOOP-15565 ]. I would like to know how 
that get addressed in this case as don't keep anything in InnerCache.
    Could we make the inner cache a thread-safe structure and track all the 
opened file systems under regex mount points? 
   
   These are really great points, thanks a lot.
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 478247)
    Time Spent: 3h  (was: 2h 50m)

> Provide Regex Based Mount Point In Inode Tree
> ---------------------------------------------
>
>                 Key: HADOOP-15891
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15891
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: viewfs
>            Reporter: zhenzhao wang
>            Assignee: zhenzhao wang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HADOOP-15891.015.patch, HDFS-13948.001.patch, 
> HDFS-13948.002.patch, HDFS-13948.003.patch, HDFS-13948.004.patch, 
> HDFS-13948.005.patch, HDFS-13948.006.patch, HDFS-13948.007.patch, 
> HDFS-13948.008.patch, HDFS-13948.009.patch, HDFS-13948.011.patch, 
> HDFS-13948.012.patch, HDFS-13948.013.patch, HDFS-13948.014.patch, HDFS-13948_ 
> Regex Link Type In Mont Table-V0.pdf, HDFS-13948_ Regex Link Type In Mount 
> Table-v1.pdf
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> This jira is created to support regex based mount point in Inode Tree. We 
> noticed that mount point only support fixed target path. However, we might 
> have user cases when target needs to refer some fields from source. e.g. We 
> might want a mapping of /cluster1/user1 => /cluster1-dc1/user-nn-user1, we 
> want to refer `cluster` and `user` field in source to construct target. It's 
> impossible to archive this with current link type. Though we could set 
> one-to-one mapping, the mount table would become bloated if we have thousands 
> of users. Besides, a regex mapping would empower us more flexibility. So we 
> are going to build a regex based mount point which target could refer groups 
> from src regex mapping. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to