Re: [DISCUSS] PIP-36: Introduce Incremental Clustering for Paimon Append Table

lei li Mon, 22 Sep 2025 21:24:30 -0700

Hi Jingsong,

Very thanks for your feedback!


I’ll reuse the `compact` to do incremental clustering. And the images had been 
updated.

Best,
Lei Li

> 2025年9月23日 10:40，Jingsong Li <[email protected]> 写道：
> 
> Hi Lei,
> 
> Thanks for starting this discussion.
> 
> 1. incremental-cluster.enabled can be clustering.incremental = true.
> 2. I think we can reuse `compact`. CALL sys.compact is OK.
> 3. Please update your image, I can not see them.
> 
> Best,
> Jingsong
> 
> On Tue, Sep 23, 2025 at 10:37 AM Jingsong Li <[email protected]> wrote:
>> 
>> Correct link should be:
>> 
>> https://cwiki.apache.org/confluence/display/PAIMON/PIP-36%3A+Introduce+Incremental+Clustering+for+Paimon+Append+Table
>> 
>> On Fri, Sep 19, 2025 at 5:41 PM lei li <[email protected]> wrote:
>>> 
>>> Hi everyone,
>>> 
>>> 
>>> I'd like to start a discussion about PIP-36: Introduce Incremental 
>>> Clustering for Paimon Append Table [1].
>>> 
>>> 
>>> Paimon currently supports ordering append tables using SFC (Space-Filling 
>>> Curve)[2]. The resulting data layout typically delivers better performance 
>>> for queries that target clustering keys. However, with the current 
>>> SortCompact, even when neither the data nor the clustering keys have 
>>> changed, each run still rewrites the entire dataset, which is extremely 
>>> costly. To address this, we plan to introduce a more flexible, incremental 
>>> clustering mechanism—Incremental Clustering. On each run, it selects only a 
>>> specific subset of files to cluster, avoiding a full rewrite. This enables 
>>> low-cost, sort-based optimization of the data layout and improves query 
>>> performance. In addition, with Incremental Clustering, you can adjust 
>>> clustering keys without rewriting existing data, the layout evolves 
>>> dynamically as cluster runs and gradually converges to an optimal state, 
>>> significantly reducing the decision-making complexity around data layout.
>>> 
>>> 
>>> Incremental Clustering supports:
>>> 
>>>  *   Support incremental clustering; minimizing write amplification as 
>>> possible.
>>>  *   Support small-file compaction; during rewrites, respect 
>>> target-file-size.
>>>  *   Support changing clustering keys; newly ingested data is clustered 
>>> according to the latest clustering keys.
>>>  *   Provide a full mode; when selected, the entire dataset is reclustered.
>>> 
>>> 
>>> The detailed design and PoC results can be see in PIP-36[1].
>>> 
>>> 
>>> Looking forward to your feedback, thanks!
>>> 
>>> 
>>> [1] 
>>> https://cwiki.apache.org/confluence/display/PAIMON/PIP-36%3A+Introduce+Incremental+Clustering+for+Paimon+Append+Table[2]
>>>  
>>> https://paimon.apache.org/docs/master/maintenance/dedicated-compaction/#sort-compact
>>> Best,
>>> 
>>> Lei Li

Re: [DISCUSS] PIP-36: Introduce Incremental Clustering for Paimon Append Table

Reply via email to