Hi Wencong, Mostly looks good to me.
"it will automatically determine the algorithm based on the number of columns in 'sink.clustering.by-columns'. " Please describe this clearly in the `Description`. Best, Jingsong On Mon, Apr 22, 2024 at 2:36 PM Wencong Liu <[email protected]> wrote: > > Hi devs, > > > > > I'm proposing a new feature to introduce range partitioning and sorting in > append scalable table > > writing for Flink. The goal is to optimize query performance by reducing data > scans on large datasets. > > > > > The proposal includes: > > > > > 1. Configurable range partitioning and sorting during data writing which > allows for > > a more efficient data distribution strategy. > > > > > 2. Introduction of new configurations that will enable users to specify > columns for > > comparison, choose a comparison algorithm for range partitioning, and further > sort each > > partition if required. > > > > > 3. Detailed explanation of the division of processing steps when range > partitioning > > is enabled and the conditional inclusion of the sorting phase. > > > > > Looking forward to discussing this in the upcoming PIP [1]. > > > > > Best regards, > > Wencong Liu > > > > > [1] > https://cwiki.apache.org/confluence/display/PAIMON/PIP-21%3A+Introduce+Range+Partition+And+Sort+in+Append+Scalable+Table+Batch+Writing+for+Flink
