Alessandro Solimando created CALCITE-7125:
---------------------------------------------
Summary: Impossible to get a plan with partial aggregate push-down
via IntersectToDistinctRule
Key: CALCITE-7125
URL: https://issues.apache.org/jira/browse/CALCITE-7125
Project: Calcite
Issue Type: Bug
Components: core
Affects Versions: 1.40.0
Reporter: Alessandro Solimando
CALCITE-6893 changed _IntersectToDistinctRule_ to rewrite "INTERSECT DISTINCT"
queries using a tagging plus filtering strategy rather than pushing aggregates
into the "UNION" inputs.
Rationale was to let other rules (_AGGREGATE_UNION_TRANSPOSE_ is mentioned in
the ticket description) produce the plan with partial aggregation push-down,
and let the CBO do the choice.
Consider the following query:
{noformat}
SELECT empno FROM emp
INTERSECT
SELECT empno FROM emp
{noformat}
_IntersectToDistinctRule_, as per CALCITE-6893, would rewrite it as follows:
{noformat}
LogicalProject(EMPNO=[$0])
LogicalFilter(condition=[AND(>($1, 0), >($2, 0))])
LogicalAggregate(group=[{0}], count_i0=[COUNT() FILTER (WHERE i = 0)],
count_i1=[COUNT() FILTER (WHERE i = 1)])
LogicalUnion(all=[true])
LogicalProject(EMPNO=[$0], i=[0]) // "tag" i=0
TableScan(emp)
LogicalProject(EMPNO=[$0], i=[1]) // "tag" i=1
TableScan(emp)
{noformat}
This structure introduces:
- A tagging column "i" added via projection
- Two Projects wrapping each input
- A single Aggregate with COUNT() FILTER expressions over those tags
- A final filter requiring count_iN > 0, for N in "1..#operands"
With this rewrite, the planner:
- Can no longer push aggregates into the union inputs
- Cannot apply rules like
_AGGREGATE_UNION_TRANSPOSE_/_AggregateUnionTransposeRule_ or
_AGGREGATE_PROJECT_MERGE_/_AggregateProjectMergeRule_
- Fails to recover the earlier, sometimes more efficient plan shape:
{noformat}
LogicalAggregate(group=[{0}])
LogicalUnion(all=[true])
LogicalAggregate(group=[{0}])
...
LogicalAggregate(group=[{0}])
...
{noformat}
This prevents early row reduction and downstream rule composition, even when
all relevant transformation rules are enabled.
Prior to CALCITE-6893, the same query with the same planner configuration
produced the optimized form above. That plan is no longer reachable with the
current {_}IntersectToDistinctRule{_}, and no combination of core rules
(including _AGGREGATE_UNION_TRANSPOSE_/_AggregateUnionTransposeRule_,
_PROJECT_SET_OP_TRANSPOSE_/_ProjectSetOpTransposeRule_, etc.) can recover it.
Unless I am mistaken and someone can suggest further rules that can lead to the
same plan as before, this seems like a regression.
My proposal would be to either revert the change, or to at least expose the
"old" behavior via a configuration flag.
My understanding is that, the original rewrite proposed in CALCITE-6893 was
supposed to rewritable (by means of
_AGGREGATE_UNION_TRANSPOSE_/_AggregateUnionTransposeRule_ rule). The rewrite
was incorrect and the currently implemented form has been suggested, but it
hasn't been discussed if the "new" plan shape (post CALCITE-6893) could be
transformed into the "old" plan shape (pre CALCITE-6893).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)