amogh-jahagirdar commented on code in PR #13222:
URL: https://github.com/apache/iceberg/pull/13222#discussion_r2183765078
##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -1130,6 +1132,11 @@ protected ManifestReader<DataFile>
newManifestReader(ManifestFile manifest) {
protected Set<DataFile> newFileSet() {
return DataFileSet.create();
}
+
+ @Override
Review Comment:
This is in `DataFileFilterManager` which is nested in
`MergingSnapshotProducer`, I think this change makes sense since
dataFileFilterManager probably shouldn't support removing dangling deletes
since it won't even be reading those delete entries.
##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -920,6 +920,8 @@ protected Map<String, String> summary() {
@Override
public List<ManifestFile> apply(TableMetadata base, Snapshot snapshot) {
+ Set<DataFile> filesToBeDeleted = filterManager.filesToBeDeleted();
Review Comment:
Yeah +1 I'm not entirely confident that `filesToBeDeleted()` is a sufficient
source of truth for passing to `removeDanglingDeletesFor`. Does
`filesToBeDeleted` encompass entries that match row filters and path based
deletes/partition based driops?
##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -224,7 +235,9 @@ List<ManifestFile> filterManifests(Schema tableSchema,
List<ManifestFile> manife
private boolean canTrustManifestReferences(List<ManifestFile> manifests) {
Set<String> manifestLocations =
manifests.stream().map(ManifestFile::path).collect(Collectors.toSet());
- return allDeletesReferenceManifests &&
manifestLocations.containsAll(manifestsWithDeletes);
Review Comment:
@nastra I think the additional check is fine but is there a situation where
`allDeletesReferenceManifests` is true and the manifestsWithDeletes is empty?
Every delete operation would either invalidate `allDeletesReferenceManifests`
or if it's a delte(File f) where the file has defined a manifest location we'd
add to the manifestLocations set.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]