[ 
https://issues.apache.org/jira/browse/HADOOP-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919497#comment-16919497
 ] 

Steve Loughran commented on HADOOP-16540:
-----------------------------------------

* its (user, prefix, auth) not just prefix and auth, bear that in mind
* given your example use case of S3, I'd like to know a lot more about what you 
are considering here and why

S3A FS instances are fairly expensive: thread and http pools, dynamo DB pools, 
AWS transfer managers...you don't want to have >1 per bucket if you can avoid 
it. It may be better to support some tuning within the store, as HADOOP-16396 
did for s3guard authoritative mode.

That leaves different user credentials as the main justification, or similar 
things like encryption keys to use on different paths. True? Or maybe seek 
policies?

If so, it'll be fun trying to work out how to deal with operations which span 
paths.

All work has to be against hadoop trunk; you'll also need to make sure that it 
works with delegation tokens for job submit, including S3A DTs. That is non 
trivial as it is another place which uses (token identifier + FS URI) as the 
map. Only one DT per bucket is going to be collected or provided regardless of 
how many are in the cache. So please, get familiar with that code before 
starting to do things with fairly major implications.

> Pluggable Filesystem Caching Support in FileSystem Class
> --------------------------------------------------------
>
>                 Key: HADOOP-16540
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16540
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 3.3.0
>            Reporter: Arun Ravi M V
>            Priority: Major
>
> Provide an option to use Custom Cache Class in FileSystem Class. Currently, 
> the caching is enabled by default and uses the URI schema and authority value 
> to determine whether to create a new FS instance for the given URI or to 
> fetch an already existing one from the cache.
> In case of AWS S3 FS Impl, for an S3 path, the authority name will be bucket 
> name, ie Filesystem object will be cached at the bucket level, but providing 
> a custom caching logic can empower the user to cache it at some prefix level 
> and provide more flexibility. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to