[jira] [Commented] (HADOOP-16540) Pluggable Filesystem Caching Support in FileSystem Class

2019-09-03 Thread Arun Ravi M V (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921436#comment-16921436
 ] 

Arun Ravi M V commented on HADOOP-16540:


{quote}We'll also need some credential factory API to take some (operation, 
UGI, source, dest) params and return the creds for that operation. Unless you 
really want to give the clients full credentials, you will be needing some 
credential factory service over RPC there.{quote}
Yes, I agree with you on this. Operation Context-based credential factory API 
is what I should be looking for. I will close this ticket as WONTFIX 

> Pluggable Filesystem Caching Support in FileSystem Class
> 
>
> Key: HADOOP-16540
> URL: https://issues.apache.org/jira/browse/HADOOP-16540
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 3.3.0
>Reporter: Arun Ravi M V
>Priority: Major
>
> Provide an option to use Custom Cache Class in FileSystem Class. Currently, 
> the caching is enabled by default and uses the URI schema and authority value 
> to determine whether to create a new FS instance for the given URI or to 
> fetch an already existing one from the cache.
> In case of AWS S3 FS Impl, for an S3 path, the authority name will be bucket 
> name, ie Filesystem object will be cached at the bucket level, but providing 
> a custom caching logic can empower the user to cache it at some prefix level 
> and provide more flexibility. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16540) Pluggable Filesystem Caching Support in FileSystem Class

2019-09-03 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921385#comment-16921385
 ] 

Steve Loughran commented on HADOOP-16540:
-

I think you are in trouble here -and that pluggable FS caching is not the 
solution. There are some big assumptions in application code that an FS 
instance can be used across any path in the FS, and that permissions checks are 
done in the server. Once you have an instance you can pass in any path (or as 
HADOOP-16482 implies, any S3 URI).

HADOOP-16445 is working on separate signers for S3 and DDB/STS; Sidd is looking 
about being more adaptive here. And in the proposal linked off HADOOP-16456 
I've discussed having a per request context which would go end-to-end across an 
operation, so you could create a signer/set of creds per request (more 
specifically, you'd need to cache them as 
org.apache.hadoop.fs.s3a.auth.delegation.ILoadTestSessionCredentials shows the 
limits there). and those credentials would go round with the read/write/rename, 
etc. We'll also need some credential factory API to take some (operation, UGI, 
source, dest) params and return the creds for that operation. Unless you really 
want to give the clients full credentials, you will be needing some credential 
factory service over RPC there.

I think you should get involved with those bits of work, so you can make sure 
it helps meet your needs.

I think we should close this JIRA as a WONTFIX; it doesn't do what you need.

> Pluggable Filesystem Caching Support in FileSystem Class
> 
>
> Key: HADOOP-16540
> URL: https://issues.apache.org/jira/browse/HADOOP-16540
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 3.3.0
>Reporter: Arun Ravi M V
>Priority: Major
>
> Provide an option to use Custom Cache Class in FileSystem Class. Currently, 
> the caching is enabled by default and uses the URI schema and authority value 
> to determine whether to create a new FS instance for the given URI or to 
> fetch an already existing one from the cache.
> In case of AWS S3 FS Impl, for an S3 path, the authority name will be bucket 
> name, ie Filesystem object will be cached at the bucket level, but providing 
> a custom caching logic can empower the user to cache it at some prefix level 
> and provide more flexibility. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16540) Pluggable Filesystem Caching Support in FileSystem Class

2019-08-30 Thread Arun Ravi M V (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919525#comment-16919525
 ] 

Arun Ravi M V commented on HADOOP-16540:


Yes, you are right, the main reason for this ticket is user credentials. I have 
a situation where a large number of datasets (a few thousand) is located at a 
single s3 bucket. I am trying to introduce Role-based access control here, AWS 
policies have size limitations and cannot be used as the only solution. In this 
case, I would like to define caching per dataset (bucket + root s3 prefix) 
instead of doing it at the s3 bucket level.

> Pluggable Filesystem Caching Support in FileSystem Class
> 
>
> Key: HADOOP-16540
> URL: https://issues.apache.org/jira/browse/HADOOP-16540
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 3.3.0
>Reporter: Arun Ravi M V
>Priority: Major
>
> Provide an option to use Custom Cache Class in FileSystem Class. Currently, 
> the caching is enabled by default and uses the URI schema and authority value 
> to determine whether to create a new FS instance for the given URI or to 
> fetch an already existing one from the cache.
> In case of AWS S3 FS Impl, for an S3 path, the authority name will be bucket 
> name, ie Filesystem object will be cached at the bucket level, but providing 
> a custom caching logic can empower the user to cache it at some prefix level 
> and provide more flexibility. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16540) Pluggable Filesystem Caching Support in FileSystem Class

2019-08-30 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919497#comment-16919497
 ] 

Steve Loughran commented on HADOOP-16540:
-

* its (user, prefix, auth) not just prefix and auth, bear that in mind
* given your example use case of S3, I'd like to know a lot more about what you 
are considering here and why

S3A FS instances are fairly expensive: thread and http pools, dynamo DB pools, 
AWS transfer managers...you don't want to have >1 per bucket if you can avoid 
it. It may be better to support some tuning within the store, as HADOOP-16396 
did for s3guard authoritative mode.

That leaves different user credentials as the main justification, or similar 
things like encryption keys to use on different paths. True? Or maybe seek 
policies?

If so, it'll be fun trying to work out how to deal with operations which span 
paths.

All work has to be against hadoop trunk; you'll also need to make sure that it 
works with delegation tokens for job submit, including S3A DTs. That is non 
trivial as it is another place which uses (token identifier + FS URI) as the 
map. Only one DT per bucket is going to be collected or provided regardless of 
how many are in the cache. So please, get familiar with that code before 
starting to do things with fairly major implications.

> Pluggable Filesystem Caching Support in FileSystem Class
> 
>
> Key: HADOOP-16540
> URL: https://issues.apache.org/jira/browse/HADOOP-16540
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 3.3.0
>Reporter: Arun Ravi M V
>Priority: Major
>
> Provide an option to use Custom Cache Class in FileSystem Class. Currently, 
> the caching is enabled by default and uses the URI schema and authority value 
> to determine whether to create a new FS instance for the given URI or to 
> fetch an already existing one from the cache.
> In case of AWS S3 FS Impl, for an S3 path, the authority name will be bucket 
> name, ie Filesystem object will be cached at the bucket level, but providing 
> a custom caching logic can empower the user to cache it at some prefix level 
> and provide more flexibility. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org