Hi all, In April I wrote a formal specification for COW tables ( https://github.com/Vanlightly/table-formats-tlaplus/tree/main/hudi/v5_spec/basic_cow) and since then I was looking at possibly going back and adding MOR as well as archival and compaction.
I've read the code, read the docs and there's something that I can't figure out about timeline archival - how does Hudi prevent the archive process from archiving "live" instants? If for example, I have a primary key table with 2 file groups, and "min commits to keep" is 20 but the last 20 commits are all related to file group 2, then the commits of file group 1 would be archived, making file group 1 unreadable. Delta Lake handles log cleaning via checkpointing. Once a checkpoint has been inserted into the Delta Log, prior entries can be removed. But with Hudi, it seems you choose an arbitrary number of commits to keep, and so I am left wondering how it can be safe? I am sure I have missed something, thanks in advance. Jack Vanlightly