[ 
https://issues.apache.org/jira/browse/SPARK-54785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Boumalhab updated SPARK-54785:
------------------------------------------
    Description: 
This PR introduces three new aggregate functions (`kll_merge_agg_bigint`, 
`kll_merge_agg_float`, `kll_merge_agg_double`) that enable efficient merging of 
multiple KLL sketch binaries across rows. While the existing scalar 
`kll_sketch_merge_*` functions can only merge two sketches at a time, these new 
aggregate variants can merge an arbitrary number of pre-computed sketches from 
different partitions, time windows, or datasets in a single query. This is 
essential for distributed analytics workflows where sketches are computed 
independently and later aggregated to obtain global quantile estimates. The 
implementation follows the same design patterns as the existing KLL aggregate 
functions, accepting an optional k parameter and properly handling NULL values.

 

https://github.com/apache/spark/pull/53548

  was:This PR introduces three new aggregate functions (`kll_merge_agg_bigint`, 
`kll_merge_agg_float`, `kll_merge_agg_double`) that enable efficient merging of 
multiple KLL sketch binaries across rows. While the existing scalar 
`kll_sketch_merge_*` functions can only merge two sketches at a time, these new 
aggregate variants can merge an arbitrary number of pre-computed sketches from 
different partitions, time windows, or datasets in a single query. This is 
essential for distributed analytics workflows where sketches are computed 
independently and later aggregated to obtain global quantile estimates. The 
implementation follows the same design patterns as the existing KLL aggregate 
functions, accepting an optional k parameter and properly handling NULL values.


> Add support for binary sketch aggregations in KLL
> -------------------------------------------------
>
>                 Key: SPARK-54785
>                 URL: https://issues.apache.org/jira/browse/SPARK-54785
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.1.0, 4.1.1
>            Reporter: Christopher Boumalhab
>            Priority: Minor
>
> This PR introduces three new aggregate functions (`kll_merge_agg_bigint`, 
> `kll_merge_agg_float`, `kll_merge_agg_double`) that enable efficient merging 
> of multiple KLL sketch binaries across rows. While the existing scalar 
> `kll_sketch_merge_*` functions can only merge two sketches at a time, these 
> new aggregate variants can merge an arbitrary number of pre-computed sketches 
> from different partitions, time windows, or datasets in a single query. This 
> is essential for distributed analytics workflows where sketches are computed 
> independently and later aggregated to obtain global quantile estimates. The 
> implementation follows the same design patterns as the existing KLL aggregate 
> functions, accepting an optional k parameter and properly handling NULL 
> values.
>  
> https://github.com/apache/spark/pull/53548



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to