JohnZZGithub edited a comment on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-686183399


   Thanks @umamaheswararao 
   
   
   
   @umamaheswararao  Thanks for the comments. Please see the reply inline.
   > Hi @JohnZZGithub, I got few other points to discuss.
   > 
   > 1. We have exposed getMountPoints API. It seems we can't return any mount 
points from REGEX based because you would not know until you got src paths to 
resolve and find real target fs. What should we do for this API?
   
   It's a great question. I guess most caller of getMountPoints wants to 
traverse all the file systems to do some operation. E.g. setVerifyChecksum(). 
We didn't see issues on our internal Yarn + HDFS and Yarn + GCS clusters. The 
usage pattern includes but not limited to MR, Spark, Presto, Vertica loading 
and etc. But it's possible that some users might rely on these APIs. I could 
see two options forward:
   1. Returning a MountPint with special FileSystem for Regex Mount points. We 
could cache the initialized fileSystem under the regex mountpoint and perform 
the operation. For filesystems that might appear in the future, we could cache 
the past calls from callers and try to apply it or just not support it. 
   2. We could indicate that we don't support such APIs for regex mount points.
   And to extend the topic a little bit, this kind of ViewFileSystem API (API 
which tries to visit all file systems) caused several problems for us.  E.g. 
setVerifyChecksum() initialized a file system for a mount point users didn't 
want to use it all. And the initialization of the file system will fail as it 
requires credentials during initialization. Users don't have it as it never 
means to visit the mount point. We developed a LazyChRootedFileSystem on top of 
every target system (not public) to do lazy initialization for path-based APIs. 
But it's hard to tackle APIs without path passed in. So to summarize, we see 
cases users want to avoid these non-path based API to trigger actions on every 
child file system. In the meantime, some users(though rare in our scenarios) 
might want to use these APIs applied to all children's filesystems. I feel it's 
hard to satisfy both needs.
   
   > 2. Other API is getDelegationTokenIssuers. Applications like YARN uses 
this API to get all child fs delegation tokens. This also will not work for 
REGEX based mount points.
    
   We did see an issue with addDelegationTokens in the secure Hadoop cluster. 
But the problem we met is not all normal mountpoints are secure. So the API 
caused a problem when it tries to initialize all children's file systems. We 
took a workaround by making it path-based. As for getDelegationTokens, I guess 
the problem is similar. We didn't see issues because it's not used. Could we 
make it path based too?  Or we could take the approach stated in problem one.
   
   > 3. Other question is how this child filesystem objects gets closed. There 
was an issue with [ViewFileSystem#close | 
https://issues.apache.org/jira/browse/HADOOP-15565 ]. I would like to know how 
that get addressed in this case as don't keep anything in InnerCache.
   
    Could we make the inner cache a thread-safe structure and track all the 
opened file systems under regex mount points? 
   
   These are really great points, thanks a lot.
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to