Hi everyone,
I'd like to start a discussion about PIP-36: Introduce Incremental Clustering for Paimon Append Table [1]. Paimon currently supports ordering append tables using SFC (Space-Filling Curve)[2]. The resulting data layout typically delivers better performance for queries that target clustering keys. However, with the current SortCompact, even when neither the data nor the clustering keys have changed, each run still rewrites the entire dataset, which is extremely costly. To address this, we plan to introduce a more flexible, incremental clustering mechanism—Incremental Clustering. On each run, it selects only a specific subset of files to cluster, avoiding a full rewrite. This enables low-cost, sort-based optimization of the data layout and improves query performance. In addition, with Incremental Clustering, you can adjust clustering keys without rewriting existing data, the layout evolves dynamically as cluster runs and gradually converges to an optimal state, significantly reducing the decision-making complexity around data layout. Incremental Clustering supports: * Support incremental clustering; minimizing write amplification as possible. * Support small-file compaction; during rewrites, respect target-file-size. * Support changing clustering keys; newly ingested data is clustered according to the latest clustering keys. * Provide a full mode; when selected, the entire dataset is reclustered. The detailed design and PoC results can be see in PIP-36[1]. Looking forward to your feedback, thanks! [1] https://cwiki.apache.org/confluence/display/PAIMON/PIP-36%3A+Introduce+Incremental+Clustering+for+Paimon+Append+Table[2] https://paimon.apache.org/docs/master/maintenance/dedicated-compaction/#sort-compact Best, Lei Li
