[jira] [Commented] (HADOOP-16540) Pluggable Filesystem Caching Support in FileSystem Class
[ https://issues.apache.org/jira/browse/HADOOP-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921436#comment-16921436 ] Arun Ravi M V commented on HADOOP-16540: {quote}We'll also need some credential factory API to take some (operation, UGI, source, dest) params and return the creds for that operation. Unless you really want to give the clients full credentials, you will be needing some credential factory service over RPC there.{quote} Yes, I agree with you on this. Operation Context-based credential factory API is what I should be looking for. I will close this ticket as WONTFIX > Pluggable Filesystem Caching Support in FileSystem Class > > > Key: HADOOP-16540 > URL: https://issues.apache.org/jira/browse/HADOOP-16540 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 3.3.0 >Reporter: Arun Ravi M V >Priority: Major > > Provide an option to use Custom Cache Class in FileSystem Class. Currently, > the caching is enabled by default and uses the URI schema and authority value > to determine whether to create a new FS instance for the given URI or to > fetch an already existing one from the cache. > In case of AWS S3 FS Impl, for an S3 path, the authority name will be bucket > name, ie Filesystem object will be cached at the bucket level, but providing > a custom caching logic can empower the user to cache it at some prefix level > and provide more flexibility. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16540) Pluggable Filesystem Caching Support in FileSystem Class
[ https://issues.apache.org/jira/browse/HADOOP-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921385#comment-16921385 ] Steve Loughran commented on HADOOP-16540: - I think you are in trouble here -and that pluggable FS caching is not the solution. There are some big assumptions in application code that an FS instance can be used across any path in the FS, and that permissions checks are done in the server. Once you have an instance you can pass in any path (or as HADOOP-16482 implies, any S3 URI). HADOOP-16445 is working on separate signers for S3 and DDB/STS; Sidd is looking about being more adaptive here. And in the proposal linked off HADOOP-16456 I've discussed having a per request context which would go end-to-end across an operation, so you could create a signer/set of creds per request (more specifically, you'd need to cache them as org.apache.hadoop.fs.s3a.auth.delegation.ILoadTestSessionCredentials shows the limits there). and those credentials would go round with the read/write/rename, etc. We'll also need some credential factory API to take some (operation, UGI, source, dest) params and return the creds for that operation. Unless you really want to give the clients full credentials, you will be needing some credential factory service over RPC there. I think you should get involved with those bits of work, so you can make sure it helps meet your needs. I think we should close this JIRA as a WONTFIX; it doesn't do what you need. > Pluggable Filesystem Caching Support in FileSystem Class > > > Key: HADOOP-16540 > URL: https://issues.apache.org/jira/browse/HADOOP-16540 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 3.3.0 >Reporter: Arun Ravi M V >Priority: Major > > Provide an option to use Custom Cache Class in FileSystem Class. Currently, > the caching is enabled by default and uses the URI schema and authority value > to determine whether to create a new FS instance for the given URI or to > fetch an already existing one from the cache. > In case of AWS S3 FS Impl, for an S3 path, the authority name will be bucket > name, ie Filesystem object will be cached at the bucket level, but providing > a custom caching logic can empower the user to cache it at some prefix level > and provide more flexibility. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16540) Pluggable Filesystem Caching Support in FileSystem Class
[ https://issues.apache.org/jira/browse/HADOOP-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919525#comment-16919525 ] Arun Ravi M V commented on HADOOP-16540: Yes, you are right, the main reason for this ticket is user credentials. I have a situation where a large number of datasets (a few thousand) is located at a single s3 bucket. I am trying to introduce Role-based access control here, AWS policies have size limitations and cannot be used as the only solution. In this case, I would like to define caching per dataset (bucket + root s3 prefix) instead of doing it at the s3 bucket level. > Pluggable Filesystem Caching Support in FileSystem Class > > > Key: HADOOP-16540 > URL: https://issues.apache.org/jira/browse/HADOOP-16540 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 3.3.0 >Reporter: Arun Ravi M V >Priority: Major > > Provide an option to use Custom Cache Class in FileSystem Class. Currently, > the caching is enabled by default and uses the URI schema and authority value > to determine whether to create a new FS instance for the given URI or to > fetch an already existing one from the cache. > In case of AWS S3 FS Impl, for an S3 path, the authority name will be bucket > name, ie Filesystem object will be cached at the bucket level, but providing > a custom caching logic can empower the user to cache it at some prefix level > and provide more flexibility. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16540) Pluggable Filesystem Caching Support in FileSystem Class
[ https://issues.apache.org/jira/browse/HADOOP-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919497#comment-16919497 ] Steve Loughran commented on HADOOP-16540: - * its (user, prefix, auth) not just prefix and auth, bear that in mind * given your example use case of S3, I'd like to know a lot more about what you are considering here and why S3A FS instances are fairly expensive: thread and http pools, dynamo DB pools, AWS transfer managers...you don't want to have >1 per bucket if you can avoid it. It may be better to support some tuning within the store, as HADOOP-16396 did for s3guard authoritative mode. That leaves different user credentials as the main justification, or similar things like encryption keys to use on different paths. True? Or maybe seek policies? If so, it'll be fun trying to work out how to deal with operations which span paths. All work has to be against hadoop trunk; you'll also need to make sure that it works with delegation tokens for job submit, including S3A DTs. That is non trivial as it is another place which uses (token identifier + FS URI) as the map. Only one DT per bucket is going to be collected or provided regardless of how many are in the cache. So please, get familiar with that code before starting to do things with fairly major implications. > Pluggable Filesystem Caching Support in FileSystem Class > > > Key: HADOOP-16540 > URL: https://issues.apache.org/jira/browse/HADOOP-16540 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 3.3.0 >Reporter: Arun Ravi M V >Priority: Major > > Provide an option to use Custom Cache Class in FileSystem Class. Currently, > the caching is enabled by default and uses the URI schema and authority value > to determine whether to create a new FS instance for the given URI or to > fetch an already existing one from the cache. > In case of AWS S3 FS Impl, for an S3 path, the authority name will be bucket > name, ie Filesystem object will be cached at the bucket level, but providing > a custom caching logic can empower the user to cache it at some prefix level > and provide more flexibility. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org