Re: [DISCUSS] PIP-36: Introduce Incremental Clustering for Paimon Append Table

Jingsong Li Mon, 22 Sep 2025 19:41:36 -0700

Hi Lei,

Thanks for starting this discussion.


1. incremental-cluster.enabled can be clustering.incremental = true.
2. I think we can reuse `compact`. CALL sys.compact is OK.
3. Please update your image, I can not see them.

Best,
Jingsong

On Tue, Sep 23, 2025 at 10:37 AM Jingsong Li <[email protected]> wrote:
>
> Correct link should be:
>
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-36%3A+Introduce+Incremental+Clustering+for+Paimon+Append+Table
>
> On Fri, Sep 19, 2025 at 5:41 PM lei li <[email protected]> wrote:
> >
> > Hi everyone,
> >
> >
> > I'd like to start a discussion about PIP-36: Introduce Incremental 
> > Clustering for Paimon Append Table [1].
> >
> >
> > Paimon currently supports ordering append tables using SFC (Space-Filling 
> > Curve)[2]. The resulting data layout typically delivers better performance 
> > for queries that target clustering keys. However, with the current 
> > SortCompact, even when neither the data nor the clustering keys have 
> > changed, each run still rewrites the entire dataset, which is extremely 
> > costly. To address this, we plan to introduce a more flexible, incremental 
> > clustering mechanism—Incremental Clustering. On each run, it selects only a 
> > specific subset of files to cluster, avoiding a full rewrite. This enables 
> > low-cost, sort-based optimization of the data layout and improves query 
> > performance. In addition, with Incremental Clustering, you can adjust 
> > clustering keys without rewriting existing data, the layout evolves 
> > dynamically as cluster runs and gradually converges to an optimal state, 
> > significantly reducing the decision-making complexity around data layout.
> >
> >
> > Incremental Clustering supports:
> >
> >   *   Support incremental clustering; minimizing write amplification as 
> > possible.
> >   *   Support small-file compaction; during rewrites, respect 
> > target-file-size.
> >   *   Support changing clustering keys; newly ingested data is clustered 
> > according to the latest clustering keys.
> >   *   Provide a full mode; when selected, the entire dataset is reclustered.
> >
> >
> > The detailed design and PoC results can be see in PIP-36[1].
> >
> >
> > Looking forward to your feedback, thanks!
> >
> >
> > [1] 
> > https://cwiki.apache.org/confluence/display/PAIMON/PIP-36%3A+Introduce+Incremental+Clustering+for+Paimon+Append+Table[2]
> >  
> > https://paimon.apache.org/docs/master/maintenance/dedicated-compaction/#sort-compact
> > Best,
> >
> > Lei Li

Re: [DISCUSS] PIP-36: Introduce Incremental Clustering for Paimon Append Table

Reply via email to