amogh-jahagirdar commented on code in PR #5981:
URL: https://github.com/apache/iceberg/pull/5981#discussion_r996389820


##########
core/src/main/java/org/apache/iceberg/ReachableFileCleanup.java:
##########
@@ -84,19 +76,65 @@ public void cleanFiles(TableMetadata beforeExpiration, 
TableMetadata afterExpira
     deleteFiles(manifestListsToDelete, "manifest list");
   }
 
-  private Set<ManifestFile> readManifests(Set<Snapshot> snapshots) {
-    Set<ManifestFile> manifestFiles = Sets.newHashSet();
-    for (Snapshot snapshot : snapshots) {
-      try (CloseableIterable<ManifestFile> manifestFilesForSnapshot = 
readManifestFiles(snapshot)) {
-        for (ManifestFile manifestFile : manifestFilesForSnapshot) {
-          manifestFiles.add(manifestFile.copy());
-        }
-      } catch (IOException e) {
-        throw new RuntimeIOException(
-            e, "Failed to close manifest list: %s", 
snapshot.manifestListLocation());
-      }
+  private Set<ManifestFile> currentManifests(
+      Set<Snapshot> snapshots, Set<ManifestFile> manifestsToDelete) {
+    Set<ManifestFile> currentManifests = ConcurrentHashMap.newKeySet();
+    if (manifestsToDelete.isEmpty()) {
+      return currentManifests;

Review Comment:
   I think this is ultimately a naming problem, I struggled to find a concise 
name since it's really doing multiple things:
   
   1.) The function is reading the current manifests and pruning out the 
candidate manifests to delete at the same time. The reason it is done this way 
is ultimately for this procedure we are optimizing for memory usage, and the 
current manifests are expected to be a larger set on average compared to the 
manifests to delete. 
   
   2.) Following 1. what this means is the procedure should aim to minimize the 
amount of the current table state to read. The procedure can stop reading 
manifests in the current table state if all the candidate manifests have been 
pruned.
   
   >Generally speaking, that's not true to say that if there are no manifests 
to delete, there are no current manifests.
   
   3.) You're definitely correct here but from a "deletion correctness"  
standpoint this state is still good because if there are no manifests to delete 
this means that there are no data files we can actually delete and subsequent 
findFilesToDelete operations are guaranteed to be no-ops.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to