amogh-jahagirdar commented on code in PR #5981:
URL: https://github.com/apache/iceberg/pull/5981#discussion_r996201966


##########
core/src/main/java/org/apache/iceberg/ReachableFileCleanup.java:
##########
@@ -85,19 +79,60 @@ public void cleanFiles(TableMetadata beforeExpiration, 
TableMetadata afterExpira
   }
 
   private Set<ManifestFile> readManifests(Set<Snapshot> snapshots) {
-    Set<ManifestFile> manifestFiles = Sets.newHashSet();
-    for (Snapshot snapshot : snapshots) {
-      try (CloseableIterable<ManifestFile> manifestFilesForSnapshot = 
readManifestFiles(snapshot)) {
-        for (ManifestFile manifestFile : manifestFilesForSnapshot) {
-          manifestFiles.add(manifestFile.copy());
-        }
-      } catch (IOException e) {
-        throw new RuntimeIOException(
-            e, "Failed to close manifest list: %s", 
snapshot.manifestListLocation());
-      }
-    }
+    Set<ManifestFile> manifests = ConcurrentHashMap.newKeySet();
+    Tasks.foreach(snapshots)
+        .retry(3)
+        .stopOnFailure()
+        .throwFailureWhenFinished()
+        .executeWith(planExecutorService)
+        .onFailure(
+            (snapshot, exc) ->
+                LOG.warn(
+                    "Failed to determine manifests for snapshot {}", 
snapshot.snapshotId(), exc))
+        .run(
+            snapshot -> {
+              try (CloseableIterable<ManifestFile> manifestFilesForSnapshot =
+                  readManifestFiles(snapshot)) {
+                for (ManifestFile manifestFile : manifestFilesForSnapshot) {
+                  manifests.add(manifestFile.copy());
+                }
+              } catch (IOException e) {
+                throw new RuntimeIOException(
+                    e, "Failed to close manifest list: %s", 
snapshot.manifestListLocation());
+              }
+            });
+
+    return manifests;
+  }
+
+  private Set<ManifestFile> manifestFilesToDelete(
+      Set<ManifestFile> currentManifests, Set<Snapshot> expiredSnapshots) {

Review Comment:
   Yeah in my mind I was thinking in the worst case we anyways need to read the 
current manifests for the data files and structuring the code this way allows 
the set of current manifests to be re-used during the reachable data file 
analysis. But it's important to consider the average/common case more and I 
think i can structure the code in a readable way to re-use the determined set 
of current manifests. will update



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to