[ https://issues.apache.org/jira/browse/HADOOP-18073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638319#comment-17638319 ]
ASF GitHub Bot commented on HADOOP-18073: ----------------------------------------- passaro opened a new pull request, #5163: URL: https://github.com/apache/hadoop/pull/5163 ### Description of PR This is an initial draft PR containing all the changes implemented so far to upgrade S3A to the AWS SDK v2. Note that this is still a work in progress and we plan to further contribute to it to fill existing gaps and update the SDK when missing features are released (e.g. support for Client-side Encryption and public release of the new Transfer Manager, currently in preview). In the meantime, this PR should provide a view of the whole set of changes and start a conversation on the remaining open questions and on how to handle breaking changes that affect S3A. The new document at `hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/aws_sdk_v2_changelog.md` discusses the key changes contained in this PR and is the suggested starting point for the review. Further open questions to be discussed: 1. The region logic. Previously, if an endpoint was configured and no region, parse the region from the endpoint. If configured endpoint is the standard us-east-1 endpoint, set region as null, let SDK figure out the region. If no endpoint is configured, set region as us-east-1, and set `.withForceGlobalBucketAccessEnabled`. In SDK v2, there’s no cross region access, so the correct region of the bucket needs to be set. So we now get the region of the bucket using head bucket, and set it. In general, the guidance for the new SDK is to only set the region, and let the SDK determine the endpoint. 2. Bucket probes. Currently done with doesBucketExist and doesBucketExistV2. Why do we need these two separate levels? There is no doesBucketExist operation in SDK V2, it will need to be replaced with a HeadBucket/GetBucketACL. Also consider that, with the new region logic, we will need to do a HeadBucket while configuring the client if the region isn’t specified. 3. Progress Listeners. SDK V2 currently does not support attaching progress listeners on requests outside the Transfer Manager. We use them in Put and UploadPart in S3ABlockOutputStream. Are they required for the upgrade? 4. ACLs. LogDeliveryWrite, which is a bucket level ACL, is no longer supported in the SDK V2. S3A seems to use ACLs at the object level only. Can this ACL be removed? 5. Transfer Manager. You can no longer set a threshold for when to use the Transfer Manager. The default is 8MB. ### How was this patch tested? Run `mvn -Dparallel-tests -DtestsThreadCount=8 clean verify` in `eu-west-2`. The following tests are currently failing: |Test Suite |Test Name. |Reason | |- > Upgrade AWS SDK to v2 > --------------------- > > Key: HADOOP-18073 > URL: https://issues.apache.org/jira/browse/HADOOP-18073 > Project: Hadoop Common > Issue Type: Task > Components: auth, fs/s3 > Affects Versions: 3.3.1 > Reporter: xiaowei sun > Assignee: Ahmar Suhail > Priority: Major > Labels: pull-request-available > Attachments: Upgrading S3A to SDKV2.pdf > > > This task tracks upgrading Hadoop's AWS connector S3A from AWS SDK for Java > V1 to AWS SDK for Java V2. > Original use case: > {quote}We would like to access s3 with AWS SSO, which is supported in > software.amazon.awssdk:sdk-core:2.*. > In particular, from > [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html], > when to set 'fs.s3a.aws.credentials.provider', it must be > "com.amazonaws.auth.AWSCredentialsProvider". We would like to support > "software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider" which > supports AWS SSO, so users only need to authenticate once. > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org