Thanks zouxxy for starting this discussion.

## API

First of all, I would like to define the API. Can you have a separate
section to explain the API clearly, instead of placing it in the
compatibility section.

The 'delete-map.enabled' looks confusing to me. In Delta [1], it's
name is 'enableDeletionVectors', I think maybe
`deletion-vectors.enabled` is better?

## Format

> The first version only supports file.format = parquet , and more formats will 
> be supported in the future.

I think our design is unrelated to format, why not just work for ORC too?

## DeleteMap index file encoding

It is better to separate file -> offset and bitmap. Because file
offsets are the meta of the delete files, the reading occurs during
planning. We can store the meta in IndexFileMeta.

[1] https://delta.io/blog/2023-07-05-deletion-vectors/

Best,
Jingsong


On Thu, Jan 25, 2024 at 5:35 PM zouxxyy <[email protected]> wrote:
>
> Hi, Paimon Devs, I’d like to start a discussion about PIP-16[1].
>
> Position delete is a solution to implement the Merge-On-Read (MOR) structure, 
> which has been adopted by other formats such as Iceberg and Delta.
> By combining with Paimon's LSM tree, we can create a new position deletion 
> mode unique to Paimon.
> Under this mode, extra overhead (lookup and write delete file) will be 
> introduced during writing, but during reading, data can be directly retrieved 
> using "data + filter with position delete", avoiding additional merge costs 
> between different files.
> Furthermore, this mode can be easily integrated into native engine solutions 
> like Spark + Gluton in the future, thereby significantly enhancing read 
> performance.
>
> Look forward to your question and suggestions.
>
> Best, zouxxyy
>
> [1] https://cwiki.apache.org/confluence/x/Tws4EQ

Reply via email to