[jira] [Work logged] (HADOOP-17198) Support S3 Access Points

ASF GitHub Bot (Jira) Mon, 23 Aug 2021 05:01:06 -0700


     [ 
https://issues.apache.org/jira/browse/HADOOP-17198?focusedWorklogId=640670&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640670
 ]


ASF GitHub Bot logged work on HADOOP-17198:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 23/Aug/21 12:00
            Start Date: 23/Aug/21 12:00
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on a change in pull request 
#3260:
URL: https://github.com/apache/hadoop/pull/3260#discussion_r693902136



##########
File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
##########
@@ -1576,6 +1576,49 @@ Why explicitly declare a bucket bound to the central 
endpoint? It ensures
 that if the default endpoint is changed to a new region, data store in
 US-east is still reachable.
 
+## <a name="accesspoints"></a>Configuring S3 AccessPoints usage with S3A
+S3a now supports [S3 Access 
Point](https://aws.amazon.com/s3/features/access-points/) usage which
+improves VPC integration with S3 and simplifies your data's permission model 
because different
+policies can be applied now on the Access Point level. For more information 
about why to use and
+how to create them make sure to read the official documentation.
+
+Accessing data through an access point, is done by using its ARN, as opposed 
to just the bucket name.
+You can set the Access Point ARN property using the following per bucket 
configuration property:
+```xml
+<property>
+    <name>fs.s3a.sample-bucket.accesspoint.arn</name>
+    <value> {ACCESSPOINT_ARN_HERE} </value>
+    <description>Configure S3a traffic to use this AccessPoint</description>
+</property>
+```
+
+Be mindful that this configures all access to the `sample-bucket` bucket for 
S3A, and in turn S3,
+to go through the new Access Point ARN. So, for example 
`s3a://sample-bucket/key` will now use your
+configured ARN when getting data from S3 instead of your bucket.
+
+You can also use an Access Point name as a path URI such as 
`s3a://finance-team-access/key`, by
+configuring the `.accesspoint.arn` property as a per-bucket override:
+```xml
+<property>
+    <name>fs.s3a.finance-team-access.accesspoint.arn</name>
+    <value> {ACCESSPOINT_ARN_HERE} </value>
+    <description>Configure S3a traffic to use this AccessPoint</description>
+</property>
+```
+
+Before using Access Points make sure you're not impacted by the following:
+- `ListObjectsV1` is not supported, arguably you shouldn't use it if you can;
+- The endpoint for S3 requests will automatically change from 
`s3.amazonaws.com` to use
+`s3-accesspoint.REGION.amazonaws.{com | com.cn}` depending on the Access Point 
ARN. This **only**
+happens if the `fs.s3a.endpoint` property isn't set. The endpoint property 
overwrites any changes,
+this is intentional so FIPS or DualStack endpoints can be set. While 
considering endpoints,
+if you have any custom signers that use the host endpoint property make sure 
to update them if
+needed;
+- Access Point names don't have to be globally unique, in the same way that 
bucket names have to.
+This means you may end up in a situation where you have 2 Access Points with 
the same name. If you

Review comment:
       nit use "two"

##########
File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
##########
@@ -1576,6 +1576,49 @@ Why explicitly declare a bucket bound to the central 
endpoint? It ensures
 that if the default endpoint is changed to a new region, data store in
 US-east is still reachable.
 
+## <a name="accesspoints"></a>Configuring S3 AccessPoints usage with S3A
+S3a now supports [S3 Access 
Point](https://aws.amazon.com/s3/features/access-points/) usage which
+improves VPC integration with S3 and simplifies your data's permission model 
because different
+policies can be applied now on the Access Point level. For more information 
about why to use and
+how to create them make sure to read the official documentation.
+
+Accessing data through an access point, is done by using its ARN, as opposed 
to just the bucket name.
+You can set the Access Point ARN property using the following per bucket 
configuration property:
+```xml
+<property>
+    <name>fs.s3a.sample-bucket.accesspoint.arn</name>
+    <value> {ACCESSPOINT_ARN_HERE} </value>
+    <description>Configure S3a traffic to use this AccessPoint</description>
+</property>
+```
+
+Be mindful that this configures all access to the `sample-bucket` bucket for 
S3A, and in turn S3,
+to go through the new Access Point ARN. So, for example 
`s3a://sample-bucket/key` will now use your
+configured ARN when getting data from S3 instead of your bucket.
+
+You can also use an Access Point name as a path URI such as 
`s3a://finance-team-access/key`, by
+configuring the `.accesspoint.arn` property as a per-bucket override:
+```xml
+<property>
+    <name>fs.s3a.finance-team-access.accesspoint.arn</name>
+    <value> {ACCESSPOINT_ARN_HERE} </value>
+    <description>Configure S3a traffic to use this AccessPoint</description>
+</property>
+```
+
+Before using Access Points make sure you're not impacted by the following:
+- `ListObjectsV1` is not supported, arguably you shouldn't use it if you can;

Review comment:
       cut the "arguably" as it will only puzzle the reader. Best to say "this 
is deprecated on AWS S3 for performance reasons"

##########
File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
##########
@@ -1576,6 +1576,49 @@ Why explicitly declare a bucket bound to the central 
endpoint? It ensures
 that if the default endpoint is changed to a new region, data store in
 US-east is still reachable.
 
+## <a name="accesspoints"></a>Configuring S3 AccessPoints usage with S3A
+S3a now supports [S3 Access 
Point](https://aws.amazon.com/s3/features/access-points/) usage which
+improves VPC integration with S3 and simplifies your data's permission model 
because different
+policies can be applied now on the Access Point level. For more information 
about why to use and
+how to create them make sure to read the official documentation.
+
+Accessing data through an access point, is done by using its ARN, as opposed 
to just the bucket name.
+You can set the Access Point ARN property using the following per bucket 
configuration property:
+```xml
+<property>
+    <name>fs.s3a.sample-bucket.accesspoint.arn</name>
+    <value> {ACCESSPOINT_ARN_HERE} </value>
+    <description>Configure S3a traffic to use this AccessPoint</description>
+</property>
+```
+
+Be mindful that this configures all access to the `sample-bucket` bucket for 
S3A, and in turn S3,

Review comment:
       now we only support per-bucket config, this text is 
duplicate/confusing...there's no way someone could set the global binding, so 
only need to cover per-bucket

##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
##########
@@ -2570,6 +2614,11 @@ protected S3ListResult continueListObjects(S3ListRequest 
request,
               OBJECT_CONTINUE_LIST_REQUEST,
               () -> {
                 if (useListV1) {
+                  if (accessPoint != null) {
+                    // AccessPoints are not compatible with V1List
+                    throw new InvalidRequestException("ListV1 is not supported 
by AccessPoints");

Review comment:
       Actually, I think we could just fail and let whoever is editing the 
settings deal with it. v1 is not the default, and the only place we recommend 
it is for 3rd party implementations. If someone changes the list option, things 
fail.
   
   but propose: including the config option in the text, e.g.
   "v1 list API configured in" + LIST_VERSION + " is not supported by access 
points"

##########
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java
##########
@@ -257,17 +257,19 @@ private static void 
skipIfS3GuardAndS3CSEEnabled(Configuration conf) {
   }
 
   /**
-   * Either skip if PathIOE occurred due to S3CSE and S3Guard
-   * incompatibility or throw the PathIOE.
+   * Either skip if PathIOE occurred due to exception which contains a message 
which signals

Review comment:
       nit: "Either" is no longer needed

##########
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AEncryptionSSEKMSUserDefinedKey.java
##########
@@ -39,12 +39,14 @@ protected Configuration createConfiguration() {
     // get the KMS key for this test.
     Configuration c = new Configuration();
     String kmsKey = c.get(SERVER_SIDE_ENCRYPTION_KEY);
-    if (StringUtils.isBlank(kmsKey) || !c.get(SERVER_SIDE_ENCRYPTION_ALGORITHM)
-        .equals(S3AEncryptionMethods.CSE_KMS.name())) {
-      skip(SERVER_SIDE_ENCRYPTION_KEY + " is not set for " +
-          SSE_KMS.getMethod() + " or CSE-KMS algorithm is used instead of "
-          + "SSE-KMS");
+
+    skipIfKmsKeyIdIsNotSet(c);
+    // FS is not available at this point so checking CSE like this

Review comment:
       can just call `skipIfCSEIsEnabled`

##########
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AClientSideEncryptionKms.java
##########
@@ -56,6 +57,11 @@ protected Configuration createConfiguration() {
   protected void maybeSkipTest() {
     skipIfEncryptionTestsDisabled(getConfiguration());
     skipIfKmsKeyIdIsNotSet(getConfiguration());
+    // Skip if CSE is not configured as an algorithm
+    String encryption = 
getConfiguration().get(Constants.SERVER_SIDE_ENCRYPTION_ALGORITHM, "");
+    if (!encryption.equals(S3AEncryptionMethods.CSE_KMS.getMethod())) {
+      skip("CSE encryption has been set");

Review comment:
       error text is wrong

##########
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java
##########
@@ -257,17 +257,19 @@ private static void 
skipIfS3GuardAndS3CSEEnabled(Configuration conf) {
   }
 
   /**
-   * Either skip if PathIOE occurred due to S3CSE and S3Guard
-   * incompatibility or throw the PathIOE.
+   * Either skip if PathIOE occurred due to exception which contains a message 
which signals
+   * an incompatibility or throw the PathIOE.
    *
    * @param ioe PathIOE being parsed.
-   * @throws PathIOException Throws PathIOE if it doesn't relate to S3CSE
-   *                         and S3Guard incompatibility.
+   * @param messages messages found in the PathIOE that trigger a test to skip
+   * @throws PathIOException Throws PathIOE if it doesn't relate to any 
message in {@code messages}.
    */
-  public static void maybeSkipIfS3GuardAndS3CSEIOE(PathIOException ioe)
+  public static void maybeSkipIfIOEContainsMessage(PathIOException ioe, String 
...messages)

Review comment:
       nit, remove the `maybe` as the `if` indicates it happens sometimes




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 640670)
    Time Spent: 7.5h  (was: 7h 20m)

> Support S3 Access Points
> ------------------------
>
>                 Key: HADOOP-17198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17198
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.3.0
>            Reporter: Steve Loughran
>            Assignee: Bogdan Stolojan
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Improve VPC integration by supporting access points for buckets
> https://docs.aws.amazon.com/AmazonS3/latest/dev/access-points.html
> Not sure how to do this *at all*; 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Work logged] (HADOOP-17198) Support S3 Access Points

Reply via email to