[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5459: Core: Use Bulk Delete when dropping table data and metadata

GitBox Sun, 07 Aug 2022 17:31:23 -0700


amogh-jahagirdar commented on code in PR #5459:
URL: https://github.com/apache/iceberg/pull/5459#discussion_r939745493



##########
core/src/main/java/org/apache/iceberg/CatalogUtil.java:
##########
@@ -151,20 +164,28 @@ private static void deleteFiles(FileIO io, 
Set<ManifestFile> allManifests) {
         .run(
             manifest -> {
               try (ManifestReader<?> reader = ManifestFiles.open(manifest, 
io)) {
+                List<String> pathsToDelete = Lists.newArrayList();
                 for (ManifestEntry<?> entry : reader.entries()) {
                   // intern the file path because the weak key map uses 
identity (==) instead of
                   // equals
                   String path = entry.file().path().toString().intern();
                   Boolean alreadyDeleted = deletedFiles.putIfAbsent(path, 
true);
                   if (alreadyDeleted == null || !alreadyDeleted) {
                     try {
-                      io.deleteFile(path);
+                      if (io instanceof SupportsBulkOperations) {
+                        pathsToDelete.add(path);
+                      } else {
+                        io.deleteFile(path);
+                      }
                     } catch (RuntimeException e) {
                       // this may happen if the map of deleted files gets 
cleaned up by gc
                       LOG.warn("Delete failed for data file: {}", path, e);
                     }
                   }
                 }
+                if (io instanceof SupportsBulkOperations) {

Review Comment:
   My rationale was for ensuring best effort we already have an onFailure + 
suppressFailureWhenFinished hook at the task level in case a bulk deletion 
failure gets throw. However, after a 2nd pass, the error message is a bit 
confusing in the context of batch deletion failure "Failed to get deleted 
files: this may cause orphaned data files". Since at the task level we are 
parallelizing over reading the manifests, it's only expecting reading the 
manifest to be a potential failure but now we're introducing bulk deletion as a 
potential failure and the error message in that case does not make sense.
   
    to avoid confusing messaging, I think it makes sense now to explicitly 
catch any exception from deletefiles and just log it, will update 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5459: Core: Use Bulk Delete when dropping table data and metadata

Reply via email to