Hi iceberg dev: Currently, Junjie Chen and I have made some progress about the Rewrite Action for format v2. We will have two kinds of Rewrite Action:
1. The first one is rewriting equality delete rows into position delete rows. The PoC PR is here: https://github.com/apache/iceberg/pull/2216 2. The second one is removing all deletes when rewrite. The PR is: https://github.com/apache/iceberg/pull/2303 The motivation that Junjie and I made the priority of RewriteAction a bit higher is: we have some Asia companies who are doing the PoC about writing CDC/Upsert events into iceberg tables and then read it by batch flink/spark/presto job. The biggest bottleneck is small delete/data files, as the streaming job checkpoint periodically, it will produce so many small data/equality/pos files in the underlying filesystem, that will affect read performance. About the implementation of RewriteAction, I think we are confident to accomplish this. The key problem is: How to handle the conflicts between RewriteFiles txn and RowDelta txn ? I filed an issue here: https://github.com/apache/iceberg/issues/2308 In my opinion, The RewriteFiles action will never change the data set of the iceberg table, I mean it will not even add/remove/change a row. So from the database developer perspective, it should not conflict with the normal rewrite actions because there's no key/row overlap between the two actions. But for the iceberg implementation, we have to handle the conflicts because both RewriteAction and RowDelta txn are sharing the same increasing sequence number. Let's discuss the case from ISSUE#2308: The original table data set will have data set with seq id1: Seq1: (RowDelta 1) INSERT, <1, A> INSERT, <2, B> DELETE, <1, A> If RewriteAction commit before the following RowDelta, then will have the following operations with the sequence number: ( Finally, it will get the empty set when reading from the latest snapshot) Seq2: (Rewrite) INSERT, <2, B> Seq3: (RowDelta 2) DELETE, <2, B> While if RowDelta commit before the RewriteAction, then will have the following operations with sequence number: (Finally, it will get the <2, B> when reading from the latest snapshot ) Seq2: (RowDelta 2) DELETE, <2, B> Seq3: (Rewrite) INSERT, <2,B> Summary: As we can see, different commit orders will produce different data sets in the iceberg table, that's not the expected semantic from a user perspective. So I'm considering the RewriteFilesAction could just commit the txn without producing a new auto-increasing sequence id (use the largest sequence number among the existing files for RewriteAction) , then the results will always be consistent without considering the commit order. Since this change is touching the iceberg table format/spec, I'd like to hear your voice. What do you think about this thing ? Thanks.