[
https://issues.apache.org/jira/browse/CALCITE-7330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alessandro Solimando updated CALCITE-7330:
------------------------------------------
Description:
h2. Problem
_AggregateCaseToFilterRule_ incorrectly transforms queries for aggregate
functions where _NULL_ values have semantic significance.
The rule transforms:
{noformat}
AGG(CASE WHEN condition THEN value ELSE NULL END)
{noformat}
to:
{noformat}
AGG(value) FILTER (WHERE condition)
{noformat}
This transformation is valid for standard SQL aggregates (SUM, AVG, MIN, MAX,
COUNT) which ignore {_}NULL{_}s.
However, custom user-defined aggregate functions (UDAFs) may treat NULL values
as semantically significant inputs, making this transformation incorrect (an
example are Postgres array aggregate functions which are mostly not ignoring
NULL values, see the
[doc|https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-AGGREGATE]).
The former passes NULL to the aggregate, the latter filters out those rows
entirely.
AggregateCall.ignoreNulls() Is Not Enough
At first I thought that the existing _AggregateCall.ignoreNulls()_ could be
used, but this field represents whether the user explicitly specified _IGNORE
NULLS_ or _RESPECT NULLS_ syntax in their SQL query, which is only valid for
window functions, while _AggregateCaseToFilterRule_ operates on _Aggregate_
nodes, so it's not modeling what we are interested into here.
My proposal is to add a _SqlAggFunction.skipsNullInputs()_ method to express
the semantic null-handling behavior of aggregate function types:
- Returns true by default (standard SQL behavior, all built-in aggregates
ignore NULLs), so *+fully backward compatible+*
- Custom UDAFs can override to return false to indicate NULLs are semantically
significant
- _AggregateCaseToFilterRule_ checks this method before applying the
transformation
was:
h2. Problem
_AggregateCaseToFilterRule_ incorrectly transforms queries for aggregate
functions where _NULL_ values have semantic significance.
The rule transforms:
{noformat}
AGG(CASE WHEN condition THEN value ELSE NULL END)
{noformat}
to:
{noformat}
AGG(value) FILTER (WHERE condition)
{noformat}
This transformation is valid for standard SQL aggregates (SUM, AVG, MIN, MAX,
COUNT) which ignore {_}NULL{_}s.
However, custom user-defined aggregate functions (UDAFs) may treat NULL values
as semantically significant inputs, making this transformation incorrect (an
example are Postgres array aggregate functions which are mostly not ignoring
NULL values, see the
[doc|https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-AGGREGATE]).
The former passes NULL to the aggregate, the latter filters out those rows
entirely.
AggregateCall.ignoreNulls() Is Not Enough
At first I thought that the existing _AggregateCall.ignoreNulls()_ could be
used, but this field represents whether the user explicitly specified _IGNORE
NULLS_ or _RESPECT NULLS_ syntax in their SQL query, which is only valid for
window functions, while _AggregateCaseToFilterRule_ operates on _Aggregate_
nodes, so it's not modeling what we are interested into here.
My proposal is to add a _SqlAggFunction.ignoresNulls()_ method to express the
semantic null-handling behavior of aggregate function types:
- Returns true by default (standard SQL behavior, all built-in aggregates
ignore NULLs), so *+fully backward compatible+*
- Custom UDAFs can override to return false to indicate NULLs are semantically
significant
- _AggregateCaseToFilterRule_ checks this method before applying the
transformation
> AggregateCaseToFilterRule should not be applied on aggregate functions that
> don't ignore NULLs
> ----------------------------------------------------------------------------------------------
>
> Key: CALCITE-7330
> URL: https://issues.apache.org/jira/browse/CALCITE-7330
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.41.0
> Reporter: Alessandro Solimando
> Assignee: Alessandro Solimando
> Priority: Major
> Fix For: 1.42.0
>
>
> h2. Problem
> _AggregateCaseToFilterRule_ incorrectly transforms queries for aggregate
> functions where _NULL_ values have semantic significance.
> The rule transforms:
> {noformat}
> AGG(CASE WHEN condition THEN value ELSE NULL END)
> {noformat}
> to:
> {noformat}
> AGG(value) FILTER (WHERE condition)
> {noformat}
> This transformation is valid for standard SQL aggregates (SUM, AVG, MIN, MAX,
> COUNT) which ignore {_}NULL{_}s.
> However, custom user-defined aggregate functions (UDAFs) may treat NULL
> values as semantically significant inputs, making this transformation
> incorrect (an example are Postgres array aggregate functions which are mostly
> not ignoring NULL values, see the
> [doc|https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-AGGREGATE]).
> The former passes NULL to the aggregate, the latter filters out those rows
> entirely.
> AggregateCall.ignoreNulls() Is Not Enough
> At first I thought that the existing _AggregateCall.ignoreNulls()_ could be
> used, but this field represents whether the user explicitly specified _IGNORE
> NULLS_ or _RESPECT NULLS_ syntax in their SQL query, which is only valid for
> window functions, while _AggregateCaseToFilterRule_ operates on _Aggregate_
> nodes, so it's not modeling what we are interested into here.
> My proposal is to add a _SqlAggFunction.skipsNullInputs()_ method to express
> the semantic null-handling behavior of aggregate function types:
> - Returns true by default (standard SQL behavior, all built-in aggregates
> ignore NULLs), so *+fully backward compatible+*
> - Custom UDAFs can override to return false to indicate NULLs are
> semantically significant
> - _AggregateCaseToFilterRule_ checks this method before applying the
> transformation
--
This message was sent by Atlassian Jira
(v8.20.10#820010)