shangxinli commented on code in PR #592:
URL: https://github.com/apache/iceberg-cpp/pull/592#discussion_r3035738693


##########
src/iceberg/update/expire_snapshots.h:
##########
@@ -159,6 +173,47 @@ class ICEBERG_EXPORT ExpireSnapshots : public 
PendingUpdate {
   Result<std::unordered_set<int64_t>> UnreferencedSnapshotIdsToRetain(
       const SnapshotToRef& refs) const;
 
+  /// \brief Clean up files no longer referenced after snapshot expiration.
+  ///
+  /// Implements the "reachable file cleanup" strategy from Java's 
ReachableFileCleanup:
+  /// 1. Collect manifests from expired and retained snapshots
+  /// 2. Prune manifests still referenced by retained snapshots
+  /// 3. Find data files only in manifests being deleted (if kAll)
+  /// 4. Remove data files still reachable from retained manifests
+  /// 5. Delete orphaned manifests, manifest lists, and statistics files
+  ///
+  /// All deletions are best-effort: failures are suppressed to avoid blocking
+  /// metadata updates (matching Java's suppressFailureWhenFinished behavior).
+  ///
+  /// Branch/tag awareness: retained_snapshot_ids includes all snapshots 
referenced
+  /// by any branch or tag, as computed by Apply(). This prevents deleting 
files
+  /// that are still reachable from any ref.
+  ///
+  /// TODO(shangxinli): Add multi-threaded file deletion support.
+  /// TODO(shangxinli): Add IncrementalFileCleanup strategy for linear 
ancestry.
+  Status CleanExpiredFiles(const std::vector<int64_t>& expired_snapshot_ids);
+
+  /// \brief Read manifest paths from a single snapshot.
+  /// Best-effort: returns OK even if the snapshot or its manifests can't be 
read.
+  Status ReadManifestsForSnapshot(int64_t snapshot_id,
+                                  std::unordered_set<std::string>& 
manifest_paths);
+
+  /// \brief Find data files to delete by reading live entries from manifests 
being
+  /// deleted, then subtracting files still reachable from retained manifests.
+  /// If a retained manifest cannot be read, returns an empty set to prevent
+  /// accidental data loss.
+  Result<std::unordered_set<std::string>> FindDataFilesToDelete(
+      const std::unordered_set<std::string>& manifests_to_delete,
+      const std::unordered_set<std::string>& retained_manifests);
+
+  /// \brief Create a ManifestReader for the given ManifestFile.
+  Result<std::shared_ptr<ManifestReader>> MakeManifestReader(

Review Comment:
    Done — MakeManifestReader is now a free function in the anonymous namespace



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to