[GitHub] [hadoop] mukund-thakur commented on a change in pull request #3260: HADOOP-17198 Support S3 AccessPoint

GitBox Thu, 05 Aug 2021 07:08:53 -0700


mukund-thakur commented on a change in pull request #3260:
URL: https://github.com/apache/hadoop/pull/3260#discussion_r683387906




##########
File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
##########
@@ -1576,6 +1576,81 @@ Why explicitly declare a bucket bound to the central 
endpoint? It ensures
 that if the default endpoint is changed to a new region, data store in
 US-east is still reachable.
 
+## <a name="accesspoints"></a>Configuring S3 AccessPoints usage with S3a

Review comment:
       nit: using S3A.

##########
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/performance/AbstractS3ACostTest.java
##########
@@ -125,6 +126,13 @@ public AbstractS3ACostTest(
   public Configuration createConfiguration() {
     Configuration conf = super.createConfiguration();
     String bucketName = getTestBucketName(conf);
+    // If AccessPoint ARN is set guarded tests are skipped
+    String apArn = conf.get(ACCESS_POINT_ARN, "");
+    if (isGuarded() && !apArn.isEmpty()) {
+      LOG.warn("Skipping test since AccessPoint ARN is set and is incompatible 
with S3Guard.");
+      Assume.assumeFalse(true);

Review comment:
       nit: maybe put the warning in assumeFalse only.

##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
##########
@@ -400,6 +410,14 @@ public void initialize(URI name, Configuration 
originalConf)
       LOG.debug("Initializing S3AFileSystem for {}", bucket);
       // clone the configuration into one with propagated bucket options
       Configuration conf = propagateBucketOptions(originalConf, bucket);
+
+      String apArn = conf.getTrimmed(ACCESS_POINT_ARN, "");

Review comment:
       Yes use ACCESS_POINT_ARN_DEFAULT as suggested by steve above as well. 

##########
File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
##########
@@ -1576,6 +1576,81 @@ Why explicitly declare a bucket bound to the central 
endpoint? It ensures
 that if the default endpoint is changed to a new region, data store in
 US-east is still reachable.
 
+## <a name="accesspoints"></a>Configuring S3 AccessPoints usage with S3a
+S3a now supports [S3 Access 
Point](https://aws.amazon.com/s3/features/access-points/) usage which
+improves VPC integration with S3 and simplifies your data's permission model 
because different
+policies can be applied now on the Access Point level. For more information 
about why to use them
+make sure to read the official documentation.
+
+Accessing data through an access point, is done by using its ARN, as opposed 
to just the bucket name.
+You can set the Access Point ARN property using the following configuration 
property:
+```xml
+<property>
+    <name>fs.s3a.accesspoint.arn</name>
+    <value> {ACCESSPOINT_ARN_HERE} </value>
+    <description>Configure S3a traffic to use this AccessPoint</description>
+</property>
+```
+
+Be mindful that this configures **all access** to S3a, and in turn S3, to go 
through that ARN.
+So for example `s3a://yourbucket/key` will now use your configured ARN when 
getting data from S3
+instead of your bucket. The flip side to this is that if you're working with 
multiple buckets
+`s3a://yourbucket` and `s3a://yourotherbucket` both of their requests will go 
through the same
+Access Point ARN. To configure different Access Point ARNs, per bucket 
overrides can be used with
+access point names instead of bucket names as such:
+
+- Let's assume you have an existing workflow with the following paths 
`s3a://data-bucket`,
+`s3a://output-bucket` and you want to work with a new Access Point called 
`finance-accesspoint`. All
+you would then need to add is the following per bucket configuration change:
+```xml
+<property>
+    <name>fs.s3a.bucket.finance-accesspoint.accesspoint.arn</name>
+    <value> arn:aws:s3:eu-west-1:123456789101:accesspoint/finance-accesspoint 
</value>
+</property>
+```
+
+While keeping the global `accesspoint.arn` property set to empty `" "` which 
is the default.

Review comment:
       I think we discussed it to be "" above so the docs should be consistent 
as well. 

##########
File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
##########
@@ -1576,6 +1576,81 @@ Why explicitly declare a bucket bound to the central 
endpoint? It ensures
 that if the default endpoint is changed to a new region, data store in
 US-east is still reachable.
 
+## <a name="accesspoints"></a>Configuring S3 AccessPoints usage with S3a
+S3a now supports [S3 Access 
Point](https://aws.amazon.com/s3/features/access-points/) usage which
+improves VPC integration with S3 and simplifies your data's permission model 
because different
+policies can be applied now on the Access Point level. For more information 
about why to use them
+make sure to read the official documentation.
+
+Accessing data through an access point, is done by using its ARN, as opposed 
to just the bucket name.
+You can set the Access Point ARN property using the following configuration 
property:
+```xml
+<property>
+    <name>fs.s3a.accesspoint.arn</name>
+    <value> {ACCESSPOINT_ARN_HERE} </value>
+    <description>Configure S3a traffic to use this AccessPoint</description>
+</property>
+```
+
+Be mindful that this configures **all access** to S3a, and in turn S3, to go 
through that ARN.
+So for example `s3a://yourbucket/key` will now use your configured ARN when 
getting data from S3
+instead of your bucket. The flip side to this is that if you're working with 
multiple buckets
+`s3a://yourbucket` and `s3a://yourotherbucket` both of their requests will go 
through the same
+Access Point ARN. To configure different Access Point ARNs, per bucket 
overrides can be used with
+access point names instead of bucket names as such:
+
+- Let's assume you have an existing workflow with the following paths 
`s3a://data-bucket`,
+`s3a://output-bucket` and you want to work with a new Access Point called 
`finance-accesspoint`. All
+you would then need to add is the following per bucket configuration change:
+```xml
+<property>
+    <name>fs.s3a.bucket.finance-accesspoint.accesspoint.arn</name>

Review comment:
       So using this we can configure access to one bucket via ARN and other 
buckets directly?

##########
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AClientSideEncryptionKms.java
##########
@@ -56,6 +57,7 @@ protected Configuration createConfiguration() {
   protected void maybeSkipTest() {
     skipIfEncryptionTestsDisabled(getConfiguration());
     skipIfKmsKeyIdIsNotSet(getConfiguration());
+    skipIfCSEIsNotEnabled(getConfiguration());

Review comment:
       Should this be a part of this patch?

##########
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AConfiguration.java
##########
@@ -361,6 +362,14 @@ public void 
shouldBeAbleToSwitchOnS3PathStyleAccessViaConfigProperty()
       // isn't in the same region as the s3 client default. See
       // http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html
       assertEquals(HttpStatus.SC_MOVED_PERMANENTLY, e.getStatusCode());
+    } catch (final IllegalArgumentException e) {
+      // Path style addressing does not work with AP ARNs
+      if (conf.get(ACCESS_POINT_ARN, "").isEmpty()) {
+        LOG.error("Caught unexpected exception: ", e);
+        throw e;
+      }
+

Review comment:
       nit: move to a string constant. "ARN of type accesspoint cannot be 
passed as a bucket"

##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/ArnResource.java
##########
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+import javax.annotation.Nonnull;
+
+import com.amazonaws.arn.Arn;
+import com.amazonaws.regions.RegionUtils;
+
+/**
+ * Represents an Arn Resource, this can be an accesspoint or bucket.
+ */
+public class ArnResource {
+
+  /**
+   * Resource name.
+   */
+  private final String name;
+
+  /**
+   * Resource owner account id.
+   */
+  private final String ownerAccountId;
+
+  /**
+   * Resource region.
+   */
+  private final String region;
+
+  /**
+   * Full Arn for the resource.
+   */
+  private final String fullArn;
+
+  /**
+   * Partition for the resource. Allowed partitions: aws, aws-cn, aws-us-gov
+   */
+  private final String partition;
+
+  /**
+   * Because of the different ways an endpoint can be constructed depending on 
partition we're
+   * relying on the AWS SDK to produce the endpoint. In this case we need a 
region key of the form
+   * {@code String.format("accesspoint-%s", awsRegion)}
+   */
+  private final String accessPointRegionKey;
+
+  private ArnResource(String name, String owner, String region, String 
partition, String fullArn) {
+    this.name = name;
+    this.ownerAccountId = owner;
+    this.region = region;
+    this.partition = partition;
+    this.fullArn = fullArn;
+    this.accessPointRegionKey = String.format("accesspoint-%s", region);
+  }
+
+  /**
+   * Resource name.
+   * @return resource name.
+   */
+  public String getName() {
+    return name;
+  }
+
+  /**
+   * Return owner's account id.
+   */
+  public String getOwnerAccountId() {
+    return ownerAccountId;
+  }
+
+  /**
+   * Resource region.
+   * @return resource region.
+   */
+  public String getRegion() {
+    return region;
+  }
+
+  /**
+   * Full arn for resource.
+   * @return arn for resource.
+   */
+  public String getFullArn() {
+    return fullArn;
+  }
+
+  /**
+   * Formatted endpoint for the resource.
+   * @return resource endpoint.
+   */
+  public String getEndpoint() {
+    return RegionUtils.getRegion(accessPointRegionKey)
+        .getServiceEndpoint("s3");
+  }
+
+  /**
+   * Parses the passed `arn` string into a full ArnResource
+   * @param arn - string representing an Arn resource
+   * @throws IllegalArgumentException - if the Arn is malformed or any of the 
region, accountId and
+   * resource name properties are empty
+   * @return new ArnResource instance
+   */
+  @Nonnull
+  public static ArnResource accessPointFromArn(String arn) throws 
IllegalArgumentException {
+    Arn parsed = Arn.fromString(arn);
+
+    if (parsed.getRegion().isEmpty() || parsed.getAccountId().isEmpty() ||
+        parsed.getResourceAsString().isEmpty()) {
+      throw new IllegalArgumentException(arn);

Review comment:
       nit: A bit of detailed error messaging would be nice.

##########
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java
##########
@@ -1507,4 +1507,13 @@ public static void skipIfKmsKeyIdIsNotSet(Configuration 
configuration) {
     }
   }
 
+  /**
+   * Skip if a test doesn't use CSE.
+   */
+  public static void skipIfCSEIsNotEnabled(Configuration configuration) {

Review comment:
       Okay ignore my previous comment. Refactoring this is good. 

##########
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestArnResource.java
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+import com.amazonaws.regions.Regions;
+import org.junit.Assert;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.hadoop.test.LambdaTestUtils.intercept;
+
+public class TestArnResource extends Assert {
+  private final static Logger LOG = 
LoggerFactory.getLogger(TestArnResource.class);
+
+  @Test
+  public void parseAccessPointFromArn() throws IllegalArgumentException {
+    describe("Parse AccessPoint ArnResource from arn string");
+
+    String accessPoint = "testAp";
+    String accountId = "123456789101";
+    String[][] regionPartitionEndpoints = new String[][] {
+        { Regions.EU_WEST_1.getName(), "aws", 
"s3-accesspoint.eu-west-1.amazonaws.com" },
+        { Regions.US_GOV_EAST_1.getName(), "aws-us-gov", 
"s3-accesspoint.us-gov-east-1.amazonaws.com" },
+        { Regions.CN_NORTH_1.getName(), "aws-cn", 
"s3-accesspoint.cn-north-1.amazonaws.com.cn" },
+    };
+
+    for (String[] testPair : regionPartitionEndpoints) {
+      String region = testPair[0];
+      String partition = testPair[1];
+      String endpoint = testPair[2];
+
+      // arn:partition:service:region:account-id:resource-type/resource-id
+      String arn = String.format("arn:%s:s3:%s:%s:accesspoint/%s", partition, 
region, accountId,
+          accessPoint);
+
+      ArnResource resource = ArnResource.accessPointFromArn(arn);
+      assertEquals(arn, resource.getFullArn());
+      assertEquals(accessPoint, resource.getName());
+      assertEquals(accountId, resource.getOwnerAccountId());
+      assertEquals(region, resource.getRegion());
+      assertEquals(endpoint, resource.getEndpoint());
+    }

Review comment:
       Add error messages in assert statements. Debugging tests through command 
line will be easier.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[GitHub] [hadoop] mukund-thakur commented on a change in pull request #3260: HADOOP-17198 Support S3 AccessPoint

Reply via email to