[jira] [Resolved] (HADOOP-18338) Unable to access data from S3 bucket over a vpc endpoint - 400 bad request

Steve Loughran (Jira) Wed, 13 Jul 2022 04:50:17 -0700


     [ 
https://issues.apache.org/jira/browse/HADOOP-18338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran resolved HADOOP-18338.
-------------------------------------
    Resolution: Not A Problem

change the endpoint and s3a doesn't know what region to sign requests with.

see HADOOP-17705 and set fs.s3a.bucket.region

> Unable to access data from S3 bucket over a vpc endpoint - 400 bad request
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-18338
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18338
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common, fs/s3
>            Reporter: Aarti
>            Priority: Major
>         Attachments: spark_s3.txt, spark_s3_vpce_error.txt
>
>
> We are trying to write to S3 bucket which has policy with specific IAM Users, 
> SSE and endpoint.  So this bucket has 2 endpoints mentioned in policy : 
> gateway endpoint and interface endpoint.
>  
> When we use gateway endpoint which is general one: 
> [https://s3.us-east-1.amazonaws.com|https://s3.us-east-1.amazonaws.com/] => 
> spark code executes successfully and writes to S3 bucket
> But when we use interface endpoint (which we have to use ideally): 
> [https://bucket.vpce-<>.s3.us-east-1.vpce.amazonaws.com|https://bucket.vpce-%3C%3E.s3.us-east-1.vpce.amazonaws.com/]
>  => spark code throws an error as :
>  
> py4j.protocol.Py4JJavaError: An error occurred while calling o91.save.
> : org.apache.hadoop.fs.s3a.AWSBadRequestException: doesBucketExist on <BUCKET 
> NAME>: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request 
> (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request 
> ID: BA67GFNR0Q127VFM; S3 Extended Request ID: 
> BopO6Cn1hNzXdWh89hZlnl/QyTJef/1cxmptuP6f4yH7tqfMO36s/7mF+q8v6L5+FmYHXbFdEss=; 
> Proxy: null), S3 Extended Request ID: 
> BopO6Cn1hNzXdWh89hZlnl/QyTJef/1cxmptuP6f4yH7tqfMO36s/7mF+q8v6L5+FmYHXbFdEss=:400
>  Bad Request: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 
> 400 Bad Request; Request ID: BA67GFNR0Q127VFM; S3 Extended Request ID: 
> BopO6Cn1hNzXdWh89hZlnl/QyTJef/1cxmptuP6f4yH7tqfMO36s/7mF+q8v6L5+FmYHXbFdEss=; 
> Proxy: null)
>  
> Attaching the pyspark code and exception trace
>   [^spark_s3.txt]
> ^[^spark_s3_vpce_error.txt]^



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-18338) Unable to access data from S3 bucket over a vpc endpoint - 400 bad request

Reply via email to