[ 
https://issues.apache.org/jira/browse/CALCITE-7330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando resolved CALCITE-7330.
-------------------------------------------
    Resolution: Fixed

Fixed via 
[{{585cbda}}|https://github.com/apache/calcite/commit/585cbda5270c3619960e3e63d3142ce0f3553c4c],
 thanks to the reviewers for their help!

> AggregateCaseToFilterRule should not be applied on aggregate functions that 
> don't skip NULL inputs
> --------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-7330
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7330
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.41.0
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.42.0
>
>
> h2. Problem
> _AggregateCaseToFilterRule_ incorrectly transforms queries for aggregate 
> functions where _NULL_ values have semantic significance.
> The rule transforms:
> {noformat}
> AGG(CASE WHEN condition THEN value ELSE NULL END)
> {noformat}
> to:
> {noformat}
> AGG(value) FILTER (WHERE condition)
> {noformat}
> This transformation is valid for standard SQL aggregates (SUM, AVG, MIN, MAX, 
> COUNT) which ignore {_}NULL{_}s.
> However, custom user-defined aggregate functions (UDAFs) may treat NULL 
> values as semantically significant inputs, making this transformation 
> incorrect (an example are Postgres array aggregate functions which are mostly 
> not ignoring NULL values, see the 
> [doc|https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-AGGREGATE]).
>  The former passes NULL to the aggregate, the latter filters out those rows 
> entirely.
> AggregateCall.ignoreNulls() Is Not Enough
> At first I thought that the existing _AggregateCall.ignoreNulls()_ could be 
> used, but this field represents whether the user explicitly specified _IGNORE 
> NULLS_ or _RESPECT NULLS_ syntax in their SQL query, which is only valid for 
> window functions, while _AggregateCaseToFilterRule_ operates on _Aggregate_ 
> nodes, so it's not modeling what we are interested into here.
> My proposal is to add a _SqlAggFunction.skipsNullInputs()_ method to express 
> the semantic null-handling behavior of aggregate function types:
>  - Returns true by default (standard SQL behavior, all built-in aggregates 
> ignore NULLs), so *+fully backward compatible+*
>  - Custom UDAFs can override to return false to indicate NULLs are 
> semantically significant
>  - _AggregateCaseToFilterRule_ checks this method before applying the 
> transformation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to