amogh-jahagirdar commented on code in PR #5981:
URL: https://github.com/apache/iceberg/pull/5981#discussion_r996466631
##########
core/src/main/java/org/apache/iceberg/ReachableFileCleanup.java:
##########
@@ -84,19 +76,65 @@ public void cleanFiles(TableMetadata beforeExpiration,
TableMetadata afterExpira
deleteFiles(manifestListsToDelete, "manifest list");
}
- private Set<ManifestFile> readManifests(Set<Snapshot> snapshots) {
- Set<ManifestFile> manifestFiles = Sets.newHashSet();
- for (Snapshot snapshot : snapshots) {
- try (CloseableIterable<ManifestFile> manifestFilesForSnapshot =
readManifestFiles(snapshot)) {
- for (ManifestFile manifestFile : manifestFilesForSnapshot) {
- manifestFiles.add(manifestFile.copy());
- }
- } catch (IOException e) {
- throw new RuntimeIOException(
- e, "Failed to close manifest list: %s",
snapshot.manifestListLocation());
- }
+ private Set<ManifestFile> currentManifests(
+ Set<Snapshot> snapshots, Set<ManifestFile> manifestsToDelete) {
+ Set<ManifestFile> currentManifests = ConcurrentHashMap.newKeySet();
+ if (manifestsToDelete.isEmpty()) {
+ return currentManifests;
Review Comment:
I renamed to pruneManifestsToDelete and added some inliine comments;
although still not great naming since the return set is the current manifests.
I think another way to make it more explicit and avoid the mutating of state
of the candidate manifests to delete is to have the function return a
Pair<Set<ManifestFile>, Set<ManifestFIle>> , where the first entry is the
current manifests and the second is the manifests that can safely be removed,
and these 2 sets can be used when determining the data files to delete later
on. I initially avoided this since the Pair seemed a bit more complex than just
mutating the state,
Let me know which approach you find preferable @kbendick @rdblue
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]