[ 
https://issues.apache.org/jira/browse/SPARK-54785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Boumalhab updated SPARK-54785:
------------------------------------------
    Description: This PR introduces three new aggregate functions 
(`kll_merge_agg_bigint`, `kll_merge_agg_float`, `kll_merge_agg_double`) that 
enable efficient merging of multiple KLL sketch binaries across rows. While the 
existing scalar `kll_sketch_merge_*` functions can only merge two sketches at a 
time, these new aggregate variants can merge an arbitrary number of 
pre-computed sketches from different partitions, time windows, or datasets in a 
single query. This is essential for distributed analytics workflows where 
sketches are computed independently and later aggregated to obtain global 
quantile estimates. The implementation follows the same design patterns as the 
existing KLL aggregate functions, accepting an optional k parameter and 
properly handling NULL values.

> Add support for binary sketch aggregations in KLL
> -------------------------------------------------
>
>                 Key: SPARK-54785
>                 URL: https://issues.apache.org/jira/browse/SPARK-54785
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.1.0, 4.1.1
>            Reporter: Christopher Boumalhab
>            Priority: Minor
>
> This PR introduces three new aggregate functions (`kll_merge_agg_bigint`, 
> `kll_merge_agg_float`, `kll_merge_agg_double`) that enable efficient merging 
> of multiple KLL sketch binaries across rows. While the existing scalar 
> `kll_sketch_merge_*` functions can only merge two sketches at a time, these 
> new aggregate variants can merge an arbitrary number of pre-computed sketches 
> from different partitions, time windows, or datasets in a single query. This 
> is essential for distributed analytics workflows where sketches are computed 
> independently and later aggregated to obtain global quantile estimates. The 
> implementation follows the same design patterns as the existing KLL aggregate 
> functions, accepting an optional k parameter and properly handling NULL 
> values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to