Re: [PR] perf: Streamline s3 backend (druid)

via GitHub Tue, 12 May 2026 07:01:02 -0700


FrankChen021 commented on code in PR #19394:
URL: https://github.com/apache/druid/pull/19394#discussion_r3227001338



##########
extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3DataSegmentMover.java:
##########
@@ -185,90 +183,75 @@ private void safeMove(
   }
 
   /**
-   * Copies an object and after that checks that the object is present at the 
target location, via a separate API call.
-   * If it is not, an exception is thrown, and the object is not deleted at 
the old location. This "paranoic" check
-   * is added after it was observed that S3 may report a successful move, and 
the object is not found at the target
-   * location.
+   * Copies an S3 object to a target location and deletes the source.
+   * S3 has been strongly consistent since December 2020, so no post-copy 
existence check is needed.
    */
-  private void selfCheckingMove(
+  private void moveObject(
       String s3Bucket,
       String targetS3Bucket,
       String s3Path,
       String targetS3Path,
       String copyMsg
-  ) throws IOException, SegmentLoadingException
+  ) throws SegmentLoadingException
   {
     if (s3Bucket.equals(targetS3Bucket) && s3Path.equals(targetS3Path)) {
       log.info("No need to move file[s3://%s/%s] onto itself", s3Bucket, 
s3Path);
       return;
     }
     final ServerSideEncryptingAmazonS3 s3Client = this.s3ClientSupplier.get();
-    if (s3Client.doesObjectExist(s3Bucket, s3Path)) {
-      ListObjectsV2Request request = ListObjectsV2Request.builder()
-          .bucket(s3Bucket)
-          .prefix(s3Path)
-          .maxKeys(1)
-          .build();
-      final ListObjectsV2Response listResult = s3Client.listObjectsV2(request);
-      // Using contents().size() instead of keyCount as, in some cases
-      // it is observed that even though the contents returns some data
-      // keyCount is still zero.
-      if (listResult.contents().size() == 0) {
-        // should never happen
-        throw new ISE("Unable to list object [s3://%s/%s]", s3Bucket, s3Path);
-      }
-      final S3Object objectSummary = listResult.contents().get(0);
-      if (objectSummary.storageClass() != null &&
-          objectSummary.storageClass().equals(ObjectStorageClass.GLACIER)) {
-        throw S3Exception.builder()
-            .message(StringUtils.format(
-                "Cannot move file[s3://%s/%s] of storage class glacier, 
skipping.",
-                s3Bucket,
-                s3Path
-            ))
-            .build();
-      } else {
-        log.info("Moving file %s", copyMsg);
-        CopyObjectRequest.Builder copyRequestBuilder = 
CopyObjectRequest.builder()
-            .sourceBucket(s3Bucket)
-            .sourceKey(s3Path)
-            .destinationBucket(targetS3Bucket)
-            .destinationKey(targetS3Path);
-        if (!config.getDisableAcl()) {
-          final String headerValue = S3Utils.grantFullControlHeaderValue(
-              S3Utils.grantFullControlToBucketOwner(s3Client, targetS3Bucket)
+    final HeadObjectResponse sourceMetadata;
+    try {
+      sourceMetadata = s3Client.getObjectMetadata(s3Bucket, s3Path);
+    }
+    catch (S3Exception e) {
+      if (e.statusCode() == 404 && 
"NoSuchKey".equals(S3Utils.getS3ErrorCode(e))) {

Review Comment:
   [P2] Treat any HEAD 404 as a missing source
   
   The idempotent already-moved path now only runs when headObject fails with 
status 404 and error code NoSuchKey. S3 HEAD failures for absent objects can 
surface as a generic 404/NotFound response, and other Druid S3 code handles 
missing HEAD results by status alone. In that case this code rethrows before 
checking whether the target object already exists, so a retry or competing move 
where the source was deleted after a successful copy can incorrectly fail 
instead of returning the moved segment. Please key this fallback off 
statusCode() == 404 rather than requiring NoSuchKey.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] perf: Streamline s3 backend (druid)

Reply via email to