[
https://issues.apache.org/jira/browse/HADOOP-19347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HADOOP-19347:
------------------------------------
Summary: S3A: AWS SDK deleteObjects() and S3Store.deleteObjects() don't
handle 500 failures of individual objects (was: AWS SDK deleteObjects() and
S3Store.deleteObjects() don't handle 500 failures of individual objects)
> S3A: AWS SDK deleteObjects() and S3Store.deleteObjects() don't handle 500
> failures of individual objects
> --------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-19347
> URL: https://issues.apache.org/jira/browse/HADOOP-19347
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.4.1
> Reporter: Steve Loughran
> Priority: Minor
>
> S3Store.deleteObjects() encountered 500 error and didn't recover.
> We normally assume that 500 errors are already retried by the SDK so our own
> retry logic doesn't bother
> The root cause is that the 500 errors can surface within the bulk delete.
> * The delete POST returns 200, so SDK is happy
> * but one of the rows in the request is reports the S3Error "InternalError":
> {{Code=InternalError, Message=We encountered an internal error. Please try
> again.)]}}
> Proposed.
> * bulk delete invoker must map "InternalError" to AWSStatus500Exception and
> throw that.
> * Add a retry policy for bulk deletes which considers AWSStatus500Exception
> as retriable. retry. We currently don't on the assumption that the SDK will
> retry, which it does for base retries, but clearly not for multiobject delete.
> * Maybe also consider possibility that a partial 503 response could be
> generated? that is: only part of the delete throttled?
> {code}
> Caused by: org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteException:
> [S3Error(Key=table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000/file/,
> Code=InternalError, Message=We encountered an internal error. Please try
> again.)]
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:3186)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeysS3(S3AFileSystem.java:3422)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:3481)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem$OperationCallbacksImpl.removeKeys(S3AFileSystem.java:2558)
> at
> org.apache.hadoop.fs.s3a.impl.RenameOperation.lambda$removeSourceObjects$3(RenameOperation.java:625)
> at org.apache.hadoop.fs.s3a.Invoker.lambda$once$0(Invoker.java:165)
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
>
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]