vamsikarnika commented on code in PR #621: URL: https://github.com/apache/incubator-xtable/pull/621#discussion_r1907496267
########## xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergConversionTarget.java: ########## @@ -211,18 +222,34 @@ public void syncFilesForDiff(DataFilesDiff dataFilesDiff) { @Override public void completeSync() { - transaction - .expireSnapshots() - .expireOlderThan( - Instant.now().minus(snapshotRetentionInHours, ChronoUnit.HOURS).toEpochMilli()) - .deleteWith(this::safeDelete) // ensures that only metadata files are deleted - .cleanExpiredFiles(true) - .commit(); + boolean useInternalIcebergCleaner = useInternalCleaner(); + ExpireSnapshots expireSnapshots = + transaction + .expireSnapshots() + .expireOlderThan( + Instant.now().minus(snapshotRetentionInHours, ChronoUnit.HOURS).toEpochMilli()) + .cleanExpiredFiles(!useInternalIcebergCleaner); // is internal cleaner is enabled, disable iceberg cleaner + List<Snapshot> removedSnapshots = expireSnapshots.apply(); + expireSnapshots.commit(); transaction.commitTransaction(); + // after commit is complete, clean up the manifest files + if (useInternalIcebergCleaner) { + cleanExpiredSnapshots(removedSnapshots); Review Comment: To clean up expired snapshots, we're relying on table's current state and list of removed snapshots. Another approach would be compare metadata before expire snapshots is committed and after. This would give us the exact changes expire snapshots commit has made (Iceberg uses this approach). But the problem with this approach is how to get the intermediate TableMetadata before commit is made, since Transaction class doesn't expose intermediate state before transaction is committed. One way is casting Transaction to BaseTransaction which exposes TransactionTableOperations which provides intermediate TableMetadata ``` TableMetadata beforeExpiration = ((BaseTransaction) transaction).underlyingOps().refresh(); transaction .expireSnapshots() .expireOlderThan( Instant.now().minus(snapshotRetentionInHours, ChronoUnit.HOURS).toEpochMilli()) .deleteWith(this::safeDelete) // ensures that only metadata files are deleted .cleanExpiredFiles(true) .commit(); TableMetadata afterExpiration = ((BaseTransaction) transaction).underlyingOps().refresh(); // do metadata clean based on before and after expiration metadata cleanUp(beforeExpiration, afterExpiration) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@xtable.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org