Thanks zouxxy for starting this discussion. ## API
First of all, I would like to define the API. Can you have a separate section to explain the API clearly, instead of placing it in the compatibility section. The 'delete-map.enabled' looks confusing to me. In Delta [1], it's name is 'enableDeletionVectors', I think maybe `deletion-vectors.enabled` is better? ## Format > The first version only supports file.format = parquet , and more formats will > be supported in the future. I think our design is unrelated to format, why not just work for ORC too? ## DeleteMap index file encoding It is better to separate file -> offset and bitmap. Because file offsets are the meta of the delete files, the reading occurs during planning. We can store the meta in IndexFileMeta. [1] https://delta.io/blog/2023-07-05-deletion-vectors/ Best, Jingsong On Thu, Jan 25, 2024 at 5:35 PM zouxxyy <[email protected]> wrote: > > Hi, Paimon Devs, I’d like to start a discussion about PIP-16[1]. > > Position delete is a solution to implement the Merge-On-Read (MOR) structure, > which has been adopted by other formats such as Iceberg and Delta. > By combining with Paimon's LSM tree, we can create a new position deletion > mode unique to Paimon. > Under this mode, extra overhead (lookup and write delete file) will be > introduced during writing, but during reading, data can be directly retrieved > using "data + filter with position delete", avoiding additional merge costs > between different files. > Furthermore, this mode can be easily integrated into native engine solutions > like Spark + Gluton in the future, thereby significantly enhancing read > performance. > > Look forward to your question and suggestions. > > Best, zouxxyy > > [1] https://cwiki.apache.org/confluence/x/Tws4EQ
