[
https://issues.apache.org/jira/browse/CALCITE-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17898087#comment-17898087
]
Julian Hyde commented on CALCITE-6691:
--------------------------------------
Ah, I see. That suggests that SqlToRelConverter is making some invalid
assumptions when translating QUALIFY. The minimal test case should then be in
SqlToRelConverterTest. (I wouldn't put it in RelToSqlConverterTest because that
adds other components to the mix.)
> Qualify expressions not updated when merging two Project
> --------------------------------------------------------
>
> Key: CALCITE-6691
> URL: https://issues.apache.org/jira/browse/CALCITE-6691
> Project: Calcite
> Issue Type: Bug
> Reporter: Lino Rosa
> Priority: Major
>
> Given a query like the below
> {code:sql}
> WITH t0 AS (
> SELECT
> name,
> salary
> FROM mytable
> ),
> t1 AS (
> SELECT
> t0.name
> FROM t0
> qualify row_number() over (partition by t0.name order by t0.salary desc)
> = 1
> )
> select name from t1
> {code}
> We end up with the *wrong* query plan below:
> {noformat}
> [Logical plan]
> LogicalProject(name=[$0])
> LogicalFilter(condition=[$1])
> LogicalProject(name=[$5], QualifyExpression=[=(ROW_NUMBER() OVER
> (PARTITION BY $0 ORDER BY $1 DESC), 1)])
> LogicalTableScan(table=[[mytable]])
> {noformat}
> Under {{PARTITION BY $0}}, the {{$0}} should be referencing {{t0.name}} which
> on the select list on the same node is {{name=[$5]}}. So something is off.
> ----
> After some debugging I can trace back the issue.
> Here is the query plan *unsquashed*:
> {noformat}
> [Logical plan]
> LogicalProject(name=[$0])
> LogicalProject(name=[$0])
> LogicalFilter(condition=[$1])
> LogicalProject(name=[$0], QualifyExpression=[=(ROW_NUMBER() OVER
> (PARTITION BY $0 ORDER BY $1 DESC), 1)])
> LogicalProject(name=[$5], salary=[$6])
> LogicalTableScan(table=[[mytable]])
> {noformat}
> And here it is *squashed*:
> {noformat}
> [Logical plan]
> LogicalProject(name=[$0])
> LogicalFilter(condition=[$1])
> LogicalProject(name=[$5], QualifyExpression=[=(ROW_NUMBER() OVER
> (PARTITION BY $0 ORDER BY $1 DESC), 1)])
> LogicalTableScan(table=[[mytable]])
> {noformat}
> This is the relevant section before squashing:
> {noformat}
> (1) LogicalProject(name=[$0], QualifyExpression=[=(ROW_NUMBER() OVER
> (PARTITION BY $0 ORDER BY $1 DESC), 1)])
> (2) LogicalProject(name=[$5], salary=[$6])
> (3) LogicalTableScan(table=[[mytable]])
> {noformat}
> When it's generating {{(1)}} from {{SqlToRelConverter}}, Calcite realizes it
> has an opportunity to squash {{(1)}} and {{(2)}} together. It will keep
> {{(1)}} and delete {{(2)}}. But to do that it will first have to replace
> references from {{(1)}} and rewrite them to push past the parent project
> {{(2)}} (See RelOptUtil#pushPastProject).
> It does this in steps, first it goes over the projected columns (in this case
> only {{name}}) then over any qualify expressions.
> The rough logic is straightforward. It takes {{name=[$0]}} from {{(1)}} and
> replaces that with whatever sits on index {{$0}} on {{(2)}}. So {{name=[$0]}}
> becomes {{name=[$5]}}. Then it removes {{(2)}} from the query plan.
> At this point we have this:
> {noformat}
> (1) LogicalProject(name=[$5], QualifyExpression=[=(ROW_NUMBER() OVER
> (PARTITION BY $0 ORDER BY $1 DESC), 1)])
> (3) LogicalTableScan(table=[[mytable]])
> {noformat}
> Now Calcite will do the same for the {{QualifyExpression}} which still has
> the wrong indexes. This is where it goes wrong. It eagerly removed {{(2)}}
> from the plan so there's no way for it to even know it there was another
> projection in there. It won't do anything and the indexes for
> {{QualifyExpression}} are now all wrong.
> More specifically right in this section inside {{RelBuilder#Project}},
> {{frame.rel}} will not be an instance of {{Project}} because we just removed
> that project (it was {{(2)}}). It will be an instance of {{TableScan}} at
> this point.
> {code:java}
> // Do not merge projection when top projection has correlation variables
> bloat:
> if (frame.rel instanceof Project
> && config.bloat() >= 0
> && variables.isEmpty()) {
> final Project project = (Project) frame.rel;
> {code}
> ----
> I suppose the fix would be to delay removing the parent {{Project}} from the
> plan to after we've processed the entire {{Project}}. But I don't know that
> much about the project to try it. I'm happy to give it a shot if any of the
> maintainers point me in the right direction.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)