Re: [DISCUSS] PIP-21: Introduce Range Partition And Sort in Append Scalable Table Batch Writing for Flink

Jingsong Li Mon, 22 Apr 2024 00:08:29 -0700

Hi Wencong,

Mostly looks good to me.


"it will automatically determine the algorithm based on the number of
columns in 'sink.clustering.by-columns'. "

Please describe this clearly in the `Description`.

Best,
Jingsong

On Mon, Apr 22, 2024 at 2:36 PM Wencong Liu <[email protected]> wrote:
>
> Hi devs,
>
>
>
>
> I'm proposing a new feature to introduce range partitioning and sorting in 
> append scalable table
>
> writing for Flink. The goal is to optimize query performance by reducing data 
> scans on large datasets.
>
>
>
>
> The proposal includes:
>
>
>
>
> 1. Configurable range partitioning and sorting during data writing which 
> allows for
>
> a more efficient data distribution strategy.
>
>
>
>
> 2. Introduction of new configurations that will enable users to specify 
> columns for
>
> comparison, choose a comparison algorithm for range partitioning, and further 
> sort each
>
> partition if required.
>
>
>
>
> 3. Detailed explanation of the division of processing steps when range 
> partitioning
>
> is enabled and the conditional inclusion of the sorting phase.
>
>
>
>
> Looking forward to discussing this in the upcoming PIP [1].
>
>
>
>
> Best regards,
>
> Wencong Liu
>
>
>
>
> [1] 
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-21%3A+Introduce+Range+Partition+And+Sort+in+Append+Scalable+Table+Batch+Writing+for+Flink

Re: [DISCUSS] PIP-21: Introduce Range Partition And Sort in Append Scalable Table Batch Writing for Flink

Reply via email to