Arnaud Jegou created CALCITE-7144:
-------------------------------------
Summary: LIMIT should not be pushed through projections containing
window functions
Key: CALCITE-7144
URL: https://issues.apache.org/jira/browse/CALCITE-7144
Project: Calcite
Issue Type: Bug
Affects Versions: 1.40.0
Reporter: Arnaud Jegou
_This is an issue with the RelBuilder, we have not noticed this behaviour with
rules_
When adding a limit/offset with no collation, the RelBuilder will try to push
it down to a lower sort with no limit or offset, even if there is a project in
between
Basically instead of generating the following:
{code:java}
LogicalSort(fetch=[10])
LogicalProject(xxx)
LogicalSort(sort0=[$1], dir0=[ASC]) {code}
it will generate:
{code:java}
LogicalProject(xxx)
LogicalSort(fetch=[10], sort0=[$1], dir0=[ASC]) {code}
This is valid for "regular" projects, but not if the project contains a window
function as in this case the limit would be applied before the aggregation and
produce invalid results
What actually happens in the code is, in
[RelBuilder.sortLimit:|https://github.com/apache/calcite/blob/66afac689ea5b3f59c8050f9a59619c5c5bee771/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L3662]
# if there is [no
collation|https://github.com/apache/calcite/blob/66afac689ea5b3f59c8050f9a59619c5c5bee771/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L3687]
# and the current top node [is a
project|https://github.com/apache/calcite/blob/66afac689ea5b3f59c8050f9a59619c5c5bee771/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L3706]
# and the input of that project [is a
Sort|https://github.com/apache/calcite/blob/66afac689ea5b3f59c8050f9a59619c5c5bee771/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L3708]
# and that sort has [no limit or
offset|https://github.com/apache/calcite/blob/66afac689ea5b3f59c8050f9a59619c5c5bee771/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L3710]
# then the "new" limit and offset are [pushed down into that
sort|https://github.com/apache/calcite/blob/66afac689ea5b3f59c8050f9a59619c5c5bee771/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L3711]
{code:java}
if (fieldCollations.isEmpty()) { // 1
// ...
if (top instanceof Project) { // 2
final Project project = (Project) top;
if (project.getInput() instanceof Sort) { // 3
final Sort sort2 = (Sort) project.getInput();
if (sort2.offset == null && sort2.fetch == null) { // 4
final RelNode sort =
struct.sortFactory.createSort(sort2.getInput(),
sort2.collation, offsetNode, fetchNode); // 5
replaceTop(
struct.projectFactory.createProject(sort,
project.getHints(),
project.getProjects(),
Pair.right(project.getNamedProjects()),
project.getVariablesSet()));
return this; {code}
One possible fix would simply be to check that the project has no window
function in 3) by replacing the condition with
{code:java}
if (!project.containsOver() && project.getInput() instanceof Sort) { {code}
I am not 100% sure that this is the right fix but it looks correct, and the
tests do run against a version containing this change without errors
Here is a test case that covers this scenario:
{code:java}
/**
* Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-XXX">[CALCITE-XXX]
* RelBuilder pushes limits through window functions which is invalid</a>.
* The test checks that the RelBuilder does not merge a limit through a
Project
* that contains a windowed aggregate function into a lower sort. */
@Test void testLimitOverProjectWithWindowFunctions() {
final Function<RelBuilder, RelNode> f = b -> b.scan("EMP")
.sort(1)
.project(b.field("DEPTNO"),
b.aggregateCall(SqlStdOperatorTable.ROW_NUMBER)
.over()
.partitionBy()
.orderBy(b.field("EMPNO"))
.rowsUnbounded()
.as("x"))
.limit(0, 10)
.build();
final String expected = ""
+ "LogicalSort(fetch=[10])\n"
+ " LogicalProject(DEPTNO=[$7], x=[ROW_NUMBER() OVER (ORDER BY $0)])\n"
+ " LogicalSort(sort0=[$1], dir0=[ASC])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
assertThat(f.apply(createBuilder()), hasTree(expected));
}{code}
And I opened a pr to show what the fix would look like, if the proposed fix is
correct: [https://github.com/apache/calcite/pull/4503] (but feel free to
discard it and create a new one if need be)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)