[ https://issues.apache.org/jira/browse/HADOOP-17198?focusedWorklogId=640670&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640670 ]
ASF GitHub Bot logged work on HADOOP-17198: ------------------------------------------- Author: ASF GitHub Bot Created on: 23/Aug/21 12:00 Start Date: 23/Aug/21 12:00 Worklog Time Spent: 10m Work Description: steveloughran commented on a change in pull request #3260: URL: https://github.com/apache/hadoop/pull/3260#discussion_r693902136 ########## File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md ########## @@ -1576,6 +1576,49 @@ Why explicitly declare a bucket bound to the central endpoint? It ensures that if the default endpoint is changed to a new region, data store in US-east is still reachable. +## <a name="accesspoints"></a>Configuring S3 AccessPoints usage with S3A +S3a now supports [S3 Access Point](https://aws.amazon.com/s3/features/access-points/) usage which +improves VPC integration with S3 and simplifies your data's permission model because different +policies can be applied now on the Access Point level. For more information about why to use and +how to create them make sure to read the official documentation. + +Accessing data through an access point, is done by using its ARN, as opposed to just the bucket name. +You can set the Access Point ARN property using the following per bucket configuration property: +```xml +<property> + <name>fs.s3a.sample-bucket.accesspoint.arn</name> + <value> {ACCESSPOINT_ARN_HERE} </value> + <description>Configure S3a traffic to use this AccessPoint</description> +</property> +``` + +Be mindful that this configures all access to the `sample-bucket` bucket for S3A, and in turn S3, +to go through the new Access Point ARN. So, for example `s3a://sample-bucket/key` will now use your +configured ARN when getting data from S3 instead of your bucket. + +You can also use an Access Point name as a path URI such as `s3a://finance-team-access/key`, by +configuring the `.accesspoint.arn` property as a per-bucket override: +```xml +<property> + <name>fs.s3a.finance-team-access.accesspoint.arn</name> + <value> {ACCESSPOINT_ARN_HERE} </value> + <description>Configure S3a traffic to use this AccessPoint</description> +</property> +``` + +Before using Access Points make sure you're not impacted by the following: +- `ListObjectsV1` is not supported, arguably you shouldn't use it if you can; +- The endpoint for S3 requests will automatically change from `s3.amazonaws.com` to use +`s3-accesspoint.REGION.amazonaws.{com | com.cn}` depending on the Access Point ARN. This **only** +happens if the `fs.s3a.endpoint` property isn't set. The endpoint property overwrites any changes, +this is intentional so FIPS or DualStack endpoints can be set. While considering endpoints, +if you have any custom signers that use the host endpoint property make sure to update them if +needed; +- Access Point names don't have to be globally unique, in the same way that bucket names have to. +This means you may end up in a situation where you have 2 Access Points with the same name. If you Review comment: nit use "two" ########## File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md ########## @@ -1576,6 +1576,49 @@ Why explicitly declare a bucket bound to the central endpoint? It ensures that if the default endpoint is changed to a new region, data store in US-east is still reachable. +## <a name="accesspoints"></a>Configuring S3 AccessPoints usage with S3A +S3a now supports [S3 Access Point](https://aws.amazon.com/s3/features/access-points/) usage which +improves VPC integration with S3 and simplifies your data's permission model because different +policies can be applied now on the Access Point level. For more information about why to use and +how to create them make sure to read the official documentation. + +Accessing data through an access point, is done by using its ARN, as opposed to just the bucket name. +You can set the Access Point ARN property using the following per bucket configuration property: +```xml +<property> + <name>fs.s3a.sample-bucket.accesspoint.arn</name> + <value> {ACCESSPOINT_ARN_HERE} </value> + <description>Configure S3a traffic to use this AccessPoint</description> +</property> +``` + +Be mindful that this configures all access to the `sample-bucket` bucket for S3A, and in turn S3, +to go through the new Access Point ARN. So, for example `s3a://sample-bucket/key` will now use your +configured ARN when getting data from S3 instead of your bucket. + +You can also use an Access Point name as a path URI such as `s3a://finance-team-access/key`, by +configuring the `.accesspoint.arn` property as a per-bucket override: +```xml +<property> + <name>fs.s3a.finance-team-access.accesspoint.arn</name> + <value> {ACCESSPOINT_ARN_HERE} </value> + <description>Configure S3a traffic to use this AccessPoint</description> +</property> +``` + +Before using Access Points make sure you're not impacted by the following: +- `ListObjectsV1` is not supported, arguably you shouldn't use it if you can; Review comment: cut the "arguably" as it will only puzzle the reader. Best to say "this is deprecated on AWS S3 for performance reasons" ########## File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md ########## @@ -1576,6 +1576,49 @@ Why explicitly declare a bucket bound to the central endpoint? It ensures that if the default endpoint is changed to a new region, data store in US-east is still reachable. +## <a name="accesspoints"></a>Configuring S3 AccessPoints usage with S3A +S3a now supports [S3 Access Point](https://aws.amazon.com/s3/features/access-points/) usage which +improves VPC integration with S3 and simplifies your data's permission model because different +policies can be applied now on the Access Point level. For more information about why to use and +how to create them make sure to read the official documentation. + +Accessing data through an access point, is done by using its ARN, as opposed to just the bucket name. +You can set the Access Point ARN property using the following per bucket configuration property: +```xml +<property> + <name>fs.s3a.sample-bucket.accesspoint.arn</name> + <value> {ACCESSPOINT_ARN_HERE} </value> + <description>Configure S3a traffic to use this AccessPoint</description> +</property> +``` + +Be mindful that this configures all access to the `sample-bucket` bucket for S3A, and in turn S3, Review comment: now we only support per-bucket config, this text is duplicate/confusing...there's no way someone could set the global binding, so only need to cover per-bucket ########## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java ########## @@ -2570,6 +2614,11 @@ protected S3ListResult continueListObjects(S3ListRequest request, OBJECT_CONTINUE_LIST_REQUEST, () -> { if (useListV1) { + if (accessPoint != null) { + // AccessPoints are not compatible with V1List + throw new InvalidRequestException("ListV1 is not supported by AccessPoints"); Review comment: Actually, I think we could just fail and let whoever is editing the settings deal with it. v1 is not the default, and the only place we recommend it is for 3rd party implementations. If someone changes the list option, things fail. but propose: including the config option in the text, e.g. "v1 list API configured in" + LIST_VERSION + " is not supported by access points" ########## File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java ########## @@ -257,17 +257,19 @@ private static void skipIfS3GuardAndS3CSEEnabled(Configuration conf) { } /** - * Either skip if PathIOE occurred due to S3CSE and S3Guard - * incompatibility or throw the PathIOE. + * Either skip if PathIOE occurred due to exception which contains a message which signals Review comment: nit: "Either" is no longer needed ########## File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AEncryptionSSEKMSUserDefinedKey.java ########## @@ -39,12 +39,14 @@ protected Configuration createConfiguration() { // get the KMS key for this test. Configuration c = new Configuration(); String kmsKey = c.get(SERVER_SIDE_ENCRYPTION_KEY); - if (StringUtils.isBlank(kmsKey) || !c.get(SERVER_SIDE_ENCRYPTION_ALGORITHM) - .equals(S3AEncryptionMethods.CSE_KMS.name())) { - skip(SERVER_SIDE_ENCRYPTION_KEY + " is not set for " + - SSE_KMS.getMethod() + " or CSE-KMS algorithm is used instead of " - + "SSE-KMS"); + + skipIfKmsKeyIdIsNotSet(c); + // FS is not available at this point so checking CSE like this Review comment: can just call `skipIfCSEIsEnabled` ########## File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AClientSideEncryptionKms.java ########## @@ -56,6 +57,11 @@ protected Configuration createConfiguration() { protected void maybeSkipTest() { skipIfEncryptionTestsDisabled(getConfiguration()); skipIfKmsKeyIdIsNotSet(getConfiguration()); + // Skip if CSE is not configured as an algorithm + String encryption = getConfiguration().get(Constants.SERVER_SIDE_ENCRYPTION_ALGORITHM, ""); + if (!encryption.equals(S3AEncryptionMethods.CSE_KMS.getMethod())) { + skip("CSE encryption has been set"); Review comment: error text is wrong ########## File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java ########## @@ -257,17 +257,19 @@ private static void skipIfS3GuardAndS3CSEEnabled(Configuration conf) { } /** - * Either skip if PathIOE occurred due to S3CSE and S3Guard - * incompatibility or throw the PathIOE. + * Either skip if PathIOE occurred due to exception which contains a message which signals + * an incompatibility or throw the PathIOE. * * @param ioe PathIOE being parsed. - * @throws PathIOException Throws PathIOE if it doesn't relate to S3CSE - * and S3Guard incompatibility. + * @param messages messages found in the PathIOE that trigger a test to skip + * @throws PathIOException Throws PathIOE if it doesn't relate to any message in {@code messages}. */ - public static void maybeSkipIfS3GuardAndS3CSEIOE(PathIOException ioe) + public static void maybeSkipIfIOEContainsMessage(PathIOException ioe, String ...messages) Review comment: nit, remove the `maybe` as the `if` indicates it happens sometimes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 640670) Time Spent: 7.5h (was: 7h 20m) > Support S3 Access Points > ------------------------ > > Key: HADOOP-17198 > URL: https://issues.apache.org/jira/browse/HADOOP-17198 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.3.0 > Reporter: Steve Loughran > Assignee: Bogdan Stolojan > Priority: Major > Labels: pull-request-available > Time Spent: 7.5h > Remaining Estimate: 0h > > Improve VPC integration by supporting access points for buckets > https://docs.aws.amazon.com/AmazonS3/latest/dev/access-points.html > Not sure how to do this *at all*; -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org