stevenzwu commented on code in PR #13222:
URL: https://github.com/apache/iceberg/pull/13222#discussion_r2183270318
##########
core/src/test/java/org/apache/iceberg/TestRewriteFiles.java:
##########
@@ -777,4 +778,40 @@ public void testNewDeleteFile() {
.rewriteFiles(Sets.newSet(FILE_A), Sets.newSet(FILE_A2)),
branch);
}
+
+ @TestTemplate
+ public void deleteDataFileAlsoRemovesDV() {
Review Comment:
nit: delete -> remove
##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -155,6 +161,11 @@ void caseSensitive(boolean newCaseSensitive) {
this.caseSensitive = newCaseSensitive;
}
+ protected void removeDanglingDeletesFor(Set<DataFile> dataFiles) {
Review Comment:
should we move this inside `DeleteFileFilterManager` since it is only
applicable there?
In the `MergingSnapshotProducer`, we can use the specific data and delete
classes instead of the base class `ManifestFilterManager`
##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -452,6 +468,11 @@ private boolean manifestHasDeletedFiles(
return false;
}
+ private boolean isDanglingDV(DeleteFile file) {
+ return ContentFileUtil.isDV(file)
+ && dataFilePathsWithDanglingDVs.contains(file.referencedDataFile());
Review Comment:
`removedDataFilePaths` seems more accurate. They may or may not have
dandling DVs
##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -1130,6 +1132,11 @@ protected ManifestReader<DataFile>
newManifestReader(ManifestFile manifest) {
protected Set<DataFile> newFileSet() {
return DataFileSet.create();
}
+
+ @Override
Review Comment:
I am confused here. is this method defined in the base class
`SnapshotProducer`?
##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -920,6 +920,8 @@ protected Map<String, String> summary() {
@Override
public List<ManifestFile> apply(TableMetadata base, Snapshot snapshot) {
+ Set<DataFile> filesToBeDeleted = filterManager.filesToBeDeleted();
Review Comment:
is this enough? this will include explicitly delete files via `delete(F
file)`. but it won't include the data files removed via `deleteExpression` or
`dropParitition, which are evaluated in the `filterManifests` step in line 927
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]