rdblue commented on a change in pull request #1105:
URL: https://github.com/apache/iceberg/pull/1105#discussion_r440505661
##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -166,41 +125,66 @@ protected void failMissingDeletePaths() {
protected void deleteByRowFilter(Expression expr) {
this.deleteExpression = expr;
filterManager.deleteByRowFilter(expr);
+ // if a delete file matches the row filter, then it can be deleted because
the rows will also be deleted
+ deleteFilterManager.deleteByRowFilter(expr);
}
/**
* Add a partition tuple to drop from the table during the delete phase.
*/
protected void dropPartition(StructLike partition) {
+ // dropping the data in a partition also drops all deletes in the partition
filterManager.dropPartition(partition);
+ deleteFilterManager.dropPartition(partition);
}
/**
- * Add a specific path to be deleted in the new snapshot.
+ * Add a specific data file to be deleted in the new snapshot.
*/
protected void delete(DataFile file) {
filterManager.delete(file);
}
+ /**
+ * Add a specific delete file to be deleted in the new snapshot.
+ */
+ protected void delete(DeleteFile file) {
+ deleteFilterManager.delete(file);
+ }
+
/**
* Add a specific path to be deleted in the new snapshot.
*/
protected void delete(CharSequence path) {
+ // because this is a location, it may be a data file or a delete file and
must be added to both filter managers
filterManager.delete(path);
+ deleteFilterManager.delete(path);
Review comment:
Good point. I think we should remove this and make it so the older
path-based delete only deletes data files. I haven't exposed
`delete(DeleteFile)` through any of the public APIs on purpose for the reason
you cite: it would un-delete data.
I think it is probably better to automatically delete in most circumstances,
like removing all deletes where the data files must also be deleted or removing
delete files if there are no data files that could match by sequence number.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]