[GitHub] [hadoop] bogthe commented on a change in pull request #3260: HADOOP-17198 Support S3 AccessPoint

GitBox Thu, 05 Aug 2021 04:49:52 -0700


bogthe commented on a change in pull request #3260:
URL: https://github.com/apache/hadoop/pull/3260#discussion_r682739268




##########
File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
##########
@@ -1576,6 +1576,81 @@ Why explicitly declare a bucket bound to the central 
endpoint? It ensures
 that if the default endpoint is changed to a new region, data store in
 US-east is still reachable.
 
+## <a name="accesspoints"></a>Configuring S3 AccessPoints usage with S3a
+S3a now supports [S3 Access 
Point](https://aws.amazon.com/s3/features/access-points/) usage which
+improves VPC integration with S3 and simplifies your data's permission model 
because different
+policies can be applied now on the Access Point level. For more information 
about why to use them
+make sure to read the official documentation.
+
+Accessing data through an access point, is done by using its ARN, as opposed 
to just the bucket name.
+You can set the Access Point ARN property using the following configuration 
property:
+```xml
+<property>
+    <name>fs.s3a.accesspoint.arn</name>
+    <value> {ACCESSPOINT_ARN_HERE} </value>
+    <description>Configure S3a traffic to use this AccessPoint</description>
+</property>
+```
+
+Be mindful that this configures **all access** to S3a, and in turn S3, to go 
through that ARN.
+So for example `s3a://yourbucket/key` will now use your configured ARN when 
getting data from S3
+instead of your bucket. The flip side to this is that if you're working with 
multiple buckets

Review comment:
       I like it. Will update PR

##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
##########
@@ -2570,6 +2614,11 @@ protected S3ListResult continueListObjects(S3ListRequest 
request,
               OBJECT_CONTINUE_LIST_REQUEST,
               () -> {
                 if (useListV1) {
+                  if (accessPoint != null) {
+                    // AccessPoints are not compatible with V1List
+                    throw new InvalidRequestException("ListV1 is not supported 
by AccessPoints");

Review comment:
       Yep, good idea, upgrading it is!

##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
##########
@@ -400,6 +410,14 @@ public void initialize(URI name, Configuration 
originalConf)
       LOG.debug("Initializing S3AFileSystem for {}", bucket);
       // clone the configuration into one with propagated bucket options
       Configuration conf = propagateBucketOptions(originalConf, bucket);
+
+      String apArn = conf.getTrimmed(ACCESS_POINT_ARN, "");
+      if (!apArn.isEmpty()) {
+        accessPoint = ArnResource.accessPointFromArn(apArn);
+        LOG.info("Using AccessPoint ARN \"{}\" for bucket {}", apArn, bucket);

Review comment:
       good point 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[GitHub] [hadoop] bogthe commented on a change in pull request #3260: HADOOP-17198 Support S3 AccessPoint

Reply via email to