[ 
https://issues.apache.org/jira/browse/HADOOP-18073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638319#comment-17638319
 ] 

ASF GitHub Bot commented on HADOOP-18073:
-----------------------------------------

passaro opened a new pull request, #5163:
URL: https://github.com/apache/hadoop/pull/5163

   ### Description of PR
   
   This is an initial draft PR containing all the changes implemented so far to 
upgrade S3A to the AWS SDK v2. Note that this is still a work in progress and 
we plan to further contribute to it to fill existing gaps and update the SDK 
when missing features are released (e.g. support for Client-side Encryption and 
public release of the new Transfer Manager, currently in preview). 
   
   In the meantime, this PR should provide a view of the whole set of changes 
and start a conversation on the remaining open questions and on how to handle 
breaking changes that affect S3A.
   
   The new document at 
`hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/aws_sdk_v2_changelog.md`
   discusses the key changes contained in this PR and is the suggested starting 
point for the review. 
   
   Further open questions to be discussed:
   
   1. The region logic. Previously, if an endpoint was configured and no 
region, parse the region from the endpoint. If configured endpoint is the 
standard us-east-1 endpoint, set region as null, let SDK figure out the region. 
If no endpoint is configured, set region as us-east-1, and set 
`.withForceGlobalBucketAccessEnabled`. In SDK v2, there’s no cross region 
access, so the correct region of the bucket needs to be set. So we now get the 
region of the bucket using head bucket, and set it. In general, the guidance 
for the new SDK is to only set the region, and let the SDK determine the 
endpoint.
   
   2. Bucket probes. Currently done with doesBucketExist and doesBucketExistV2. 
Why do we need these two separate levels? There is no doesBucketExist operation 
in SDK V2, it will need to be replaced with a HeadBucket/GetBucketACL. Also 
consider that, with the new region logic, we will need to do a HeadBucket while 
configuring the client if the region isn’t specified.
   
   3. Progress Listeners. SDK V2 currently does not support attaching progress 
listeners on requests outside the Transfer Manager. We use them in Put and 
UploadPart in S3ABlockOutputStream. Are they required for the upgrade?
   
   4. ACLs. LogDeliveryWrite, which is a bucket level ACL, is no longer 
supported in the SDK V2. S3A seems to use ACLs at the object level only. Can 
this ACL be removed?
   
   5. Transfer Manager. You can no longer set a threshold for when to use the 
Transfer Manager. The default is 8MB.
   
   
   ### How was this patch tested?
   
   Run `mvn -Dparallel-tests -DtestsThreadCount=8 clean verify` in `eu-west-2`.
   
   The following tests are currently failing:
   
   |Test Suite                        |Test Name.                               
       |Reason |
   |-

> Upgrade AWS SDK to v2
> ---------------------
>
>                 Key: HADOOP-18073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18073
>             Project: Hadoop Common
>          Issue Type: Task
>          Components: auth, fs/s3
>    Affects Versions: 3.3.1
>            Reporter: xiaowei sun
>            Assignee: Ahmar Suhail
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: Upgrading S3A to SDKV2.pdf
>
>
> This task tracks upgrading Hadoop's AWS connector S3A from AWS SDK for Java 
> V1 to AWS SDK for Java V2.
> Original use case:
> {quote}We would like to access s3 with AWS SSO, which is supported in 
> software.amazon.awssdk:sdk-core:2.*.
> In particular, from 
> [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html],
>  when to set 'fs.s3a.aws.credentials.provider', it must be 
> "com.amazonaws.auth.AWSCredentialsProvider". We would like to support 
> "software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider" which 
> supports AWS SSO, so users only need to authenticate once.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to