[
https://issues.apache.org/jira/browse/SPARK-54785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christopher Boumalhab updated SPARK-54785:
------------------------------------------
Description: This PR introduces three new aggregate functions
(`kll_merge_agg_bigint`, `kll_merge_agg_float`, `kll_merge_agg_double`) that
enable efficient merging of multiple KLL sketch binaries across rows. While the
existing scalar `kll_sketch_merge_*` functions can only merge two sketches at a
time, these new aggregate variants can merge an arbitrary number of
pre-computed sketches from different partitions, time windows, or datasets in a
single query. This is essential for distributed analytics workflows where
sketches are computed independently and later aggregated to obtain global
quantile estimates. The implementation follows the same design patterns as the
existing KLL aggregate functions, accepting an optional k parameter and
properly handling NULL values.
> Add support for binary sketch aggregations in KLL
> -------------------------------------------------
>
> Key: SPARK-54785
> URL: https://issues.apache.org/jira/browse/SPARK-54785
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.1.0, 4.1.1
> Reporter: Christopher Boumalhab
> Priority: Minor
>
> This PR introduces three new aggregate functions (`kll_merge_agg_bigint`,
> `kll_merge_agg_float`, `kll_merge_agg_double`) that enable efficient merging
> of multiple KLL sketch binaries across rows. While the existing scalar
> `kll_sketch_merge_*` functions can only merge two sketches at a time, these
> new aggregate variants can merge an arbitrary number of pre-computed sketches
> from different partitions, time windows, or datasets in a single query. This
> is essential for distributed analytics workflows where sketches are computed
> independently and later aggregated to obtain global quantile estimates. The
> implementation follows the same design patterns as the existing KLL aggregate
> functions, accepting an optional k parameter and properly handling NULL
> values.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]