aokolnychyi commented on a change in pull request #1105:
URL: https://github.com/apache/iceberg/pull/1105#discussion_r440258020
##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -166,41 +125,66 @@ protected void failMissingDeletePaths() {
protected void deleteByRowFilter(Expression expr) {
this.deleteExpression = expr;
filterManager.deleteByRowFilter(expr);
+ // if a delete file matches the row filter, then it can be deleted because
the rows will also be deleted
+ deleteFilterManager.deleteByRowFilter(expr);
}
/**
* Add a partition tuple to drop from the table during the delete phase.
*/
protected void dropPartition(StructLike partition) {
+ // dropping the data in a partition also drops all deletes in the partition
filterManager.dropPartition(partition);
+ deleteFilterManager.dropPartition(partition);
}
/**
- * Add a specific path to be deleted in the new snapshot.
+ * Add a specific data file to be deleted in the new snapshot.
*/
protected void delete(DataFile file) {
filterManager.delete(file);
}
+ /**
+ * Add a specific delete file to be deleted in the new snapshot.
+ */
+ protected void delete(DeleteFile file) {
+ deleteFilterManager.delete(file);
+ }
+
/**
* Add a specific path to be deleted in the new snapshot.
*/
protected void delete(CharSequence path) {
+ // because this is a location, it may be a data file or a delete file and
must be added to both filter managers
filterManager.delete(path);
+ deleteFilterManager.delete(path);
}
/**
- * Add a file to the new snapshot.
+ * Add a data file to the new snapshot.
*/
protected void add(DataFile file) {
+ addedFilesSummary.addedFile(spec, file);
hasNewFiles = true;
newFiles.add(file);
}
+ /**
+ * Add a delete file to the new snapshot.
+ */
+ protected void add(DeleteFile file) {
+ addedFilesSummary.addedFile(spec, file);
Review comment:
Will it be useful to track the number of delete files separately? I find
it extremely useful to look at snapshot summaries on large tables as it gives
an overview of the current state. Knowing the number of delete files separately
from the number of data files seems reasonable and can guide users to perform
compactions.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]