amogh-jahagirdar commented on code in PR #10962:
URL: https://github.com/apache/iceberg/pull/10962#discussion_r1877273362
##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -363,6 +363,10 @@ private ManifestFile filterManifest(
}
private boolean canContainDeletedFiles(ManifestFile manifest, boolean
trustManifestReferences) {
+ if (manifest.minSequenceNumber() > 0 && manifest.minSequenceNumber() <
minSequenceNumber) {
+ return true;
+ }
Review Comment:
Sorry for the delay, yes I believe you are right that the last condition
specifically at
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L427
is redundant, we can remove that! There is no longer any need to rely on that
condition because the only time we'd hit that point is the manifest needs to be
rewritten for correctness based on the other criteria. Then when the manifest
does get rewritten if there happen to be aged out deletes, we will rewrite them
here
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L475
OK so breaking down the pros/cons for eagerly removing aged out deletes
during commit where a manifest *only* has aged out deletes:
The pro of eager removal of aged out deletes is that space consumed by
metadata is reduced, and also planning times are at least slightly reduced.
The downside of eager removal is that even if a manifest does not need to be
rewritten, the commit process is incurring additional latency and risk of
failure (the IO involved) for writing out a new manifest.
IMO I feel like the upside for reclaiming some metadata storage/reducing
plan times, doesn't justify the risk of failure/additional latency being
incurred by the process on the commit path. Maybe in the most extreme case
where some large percentage of manifests _only_ have aged out deletes, is it
worth it?
I think you brought up good point though on rewrite manifests, the Java API
itself will just retain all the delete manifests but I do think the spark
action does support rewriting delete manifests
https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteManifestsSparkAction.java#L179
Perhaps it would be better for rewrite manifests to remove the aged out
deletes? At that point the work for removing the aged out deletes is more
intentional in the context of table maintenance and not on the general commit
path for `MergingSnapshotProducer` implementations? I'll need to think more on
this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]