jasonf20 commented on code in PR #10962:
URL: https://github.com/apache/iceberg/pull/10962#discussion_r1862492365
##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -363,6 +363,10 @@ private ManifestFile filterManifest(
}
private boolean canContainDeletedFiles(ManifestFile manifest, boolean
trustManifestReferences) {
+ if (manifest.minSequenceNumber() > 0 && manifest.minSequenceNumber() <
minSequenceNumber) {
+ return true;
+ }
Review Comment:
If that's the case then doesn't the last condition here do nothing since it
can only reach this check if one of the earlier checks was true anyway:
```java
deletePaths.contains(file.location())
|| deleteFiles.contains(file)
|| dropPartitions.contains(file.specId(),
file.partition())
|| (isDelete
&& entry.isLive()
&& entry.dataSequenceNumber() > 0
&& entry.dataSequenceNumber() <
minSequenceNumber);
```
It seems like perhaps `dropDeleteFilesOlderThan` has no affect anymore
(unless maybe `allDeletesReferenceManifests` gets set to false or something).
I think not removing by `minSequenceNumber` leaves undeleted delete files
that just never get applied to any files at query time, so it's not the end of
the world, but it does lead to some wasted storage and slightly longer scan
planning times.
Assuming we want to keep this behaviour perhaps we should just not use
`dropDeleteFilesOlderThan` anymore in `mergingSnapshotProducer` then?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]