Steve Loughran created HADOOP-19347:
---------------------------------------
Summary: AWS SDK deleteObjects() and S3Store.deleteObjects() don't
handle 500 failures of individual objects
Key: HADOOP-19347
URL: https://issues.apache.org/jira/browse/HADOOP-19347
Project: Hadoop Common
Issue Type: Bug
Components: fs/s3
Affects Versions: 3.4.1
Reporter: Steve Loughran
S3Store.deleteObjects() encountered 500 error and didn't recover.
We normally assume that 500 errors are already retried by the SDK so our own
retry logic doesn't bother
The root cause is that the 500 errors can surface within the bulk delete.
* The delete POST returns 200, so SDK is happy
* but one of the rows in the request is reports the S3Error "InternalError":
{{Code=InternalError, Message=We encountered an internal error. Please try
again.)]}}
Proposed.
* bulk delete invoker must map "InternalError" to AWSStatus500Exception and
throw that.
* Add a retry policy for bulk deletes which considers AWSStatus500Exception as
retriable. retry. We currently don't on the assumption that the SDK will retry,
which it does for base retries, but clearly not for multiobject delete.
* Maybe also consider possibility that a partial 503 response could be
generated? that is: only part of the delete throttled?
{code}
Caused by: org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteException:
[S3Error(Key=table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000/file/,
Code=InternalError, Message=We encountered an internal error. Please try
again.)]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:3186)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeysS3(S3AFileSystem.java:3422)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:3481)
at
org.apache.hadoop.fs.s3a.S3AFileSystem$OperationCallbacksImpl.removeKeys(S3AFileSystem.java:2558)
at
org.apache.hadoop.fs.s3a.impl.RenameOperation.lambda$removeSourceObjects$3(RenameOperation.java:625)
at org.apache.hadoop.fs.s3a.Invoker.lambda$once$0(Invoker.java:165)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]