FrankChen021 commented on code in PR #19394:
URL: https://github.com/apache/druid/pull/19394#discussion_r3227001338
##########
extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3DataSegmentMover.java:
##########
@@ -185,90 +183,75 @@ private void safeMove(
}
/**
- * Copies an object and after that checks that the object is present at the
target location, via a separate API call.
- * If it is not, an exception is thrown, and the object is not deleted at
the old location. This "paranoic" check
- * is added after it was observed that S3 may report a successful move, and
the object is not found at the target
- * location.
+ * Copies an S3 object to a target location and deletes the source.
+ * S3 has been strongly consistent since December 2020, so no post-copy
existence check is needed.
*/
- private void selfCheckingMove(
+ private void moveObject(
String s3Bucket,
String targetS3Bucket,
String s3Path,
String targetS3Path,
String copyMsg
- ) throws IOException, SegmentLoadingException
+ ) throws SegmentLoadingException
{
if (s3Bucket.equals(targetS3Bucket) && s3Path.equals(targetS3Path)) {
log.info("No need to move file[s3://%s/%s] onto itself", s3Bucket,
s3Path);
return;
}
final ServerSideEncryptingAmazonS3 s3Client = this.s3ClientSupplier.get();
- if (s3Client.doesObjectExist(s3Bucket, s3Path)) {
- ListObjectsV2Request request = ListObjectsV2Request.builder()
- .bucket(s3Bucket)
- .prefix(s3Path)
- .maxKeys(1)
- .build();
- final ListObjectsV2Response listResult = s3Client.listObjectsV2(request);
- // Using contents().size() instead of keyCount as, in some cases
- // it is observed that even though the contents returns some data
- // keyCount is still zero.
- if (listResult.contents().size() == 0) {
- // should never happen
- throw new ISE("Unable to list object [s3://%s/%s]", s3Bucket, s3Path);
- }
- final S3Object objectSummary = listResult.contents().get(0);
- if (objectSummary.storageClass() != null &&
- objectSummary.storageClass().equals(ObjectStorageClass.GLACIER)) {
- throw S3Exception.builder()
- .message(StringUtils.format(
- "Cannot move file[s3://%s/%s] of storage class glacier,
skipping.",
- s3Bucket,
- s3Path
- ))
- .build();
- } else {
- log.info("Moving file %s", copyMsg);
- CopyObjectRequest.Builder copyRequestBuilder =
CopyObjectRequest.builder()
- .sourceBucket(s3Bucket)
- .sourceKey(s3Path)
- .destinationBucket(targetS3Bucket)
- .destinationKey(targetS3Path);
- if (!config.getDisableAcl()) {
- final String headerValue = S3Utils.grantFullControlHeaderValue(
- S3Utils.grantFullControlToBucketOwner(s3Client, targetS3Bucket)
+ final HeadObjectResponse sourceMetadata;
+ try {
+ sourceMetadata = s3Client.getObjectMetadata(s3Bucket, s3Path);
+ }
+ catch (S3Exception e) {
+ if (e.statusCode() == 404 &&
"NoSuchKey".equals(S3Utils.getS3ErrorCode(e))) {
Review Comment:
[P2] Treat any HEAD 404 as a missing source
The idempotent already-moved path now only runs when headObject fails with
status 404 and error code NoSuchKey. S3 HEAD failures for absent objects can
surface as a generic 404/NotFound response, and other Druid S3 code handles
missing HEAD results by status alone. In that case this code rethrows before
checking whether the target object already exists, so a retry or competing move
where the source was deleted after a successful copy can incorrectly fail
instead of returning the moved segment. Please key this fallback off
statusCode() == 404 rather than requiring NoSuchKey.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]