[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351432#comment-17351432
 ] 

Julian Hyde commented on CALCITE-4617:
--------------------------------------

[~rubenql], I'm glad you found CALCITE-1507. It's definitely related.

I get confused when people use terms like "non-preserved side" and 
"count-preserving". I prefer to think in terms of EMP and DEPT. So, in the 
query that is shown in CALCITE-1507, which we will call query 1,
{code:java}
select d.deptno, empno
    from sales.dept d
    left join sales.emp e using (deptno)
    order by d.deptno offset 1{code}
we know that one row of {{d}} might have 1 or more rows from {{e}}. If the 
first deptno is 10, we want to skip just the first employee in dept 10, not the 
whole of dept 10. So pushing down the offset to the {{d}} side is not valid.

Now consider a query that has {{order by e.deptno}} rather than {{order by 
d.deptno}}, which we will call query 2:
{code:java}
select d.deptno, e.empno
    from sales.dept d
    left join sales.emp e using (deptno)
    order by e.deptno offset 1{code}
We know that every row from {{e}} will appear exactly once in the output. 
Therefore we can push the sort and the offset down to the {{e}} side. We have 
to retain the sort, with no offset, after the join. Fixing the plan for this 
query should be the goal of this bug.

Now consider two inner join queries, query 3,
{code:java}
select d.deptno, e.empno
    from sales.dept d
    join sales.emp e using (deptno)
    order by d.deptno offset 1{code}
and query 4,
{code:java}
select d.deptno, e.empno
    from sales.dept d
    join sales.emp e using (deptno)
    order by e.deptno offset 1{code}
These queries differ only in their {{order by}} key. But because of the join 
condition, and because the query is inner, {{e.deptno}} is always equal to 
{{d.deptno}}, and therefore the queries 3 and 4 are equivalent. If we know (by 
virtue of a foreign key constraint) that every row of {{e}} appears exactly 
once in the output, it will be safe to push the sort and offset to the {{e}} 
side. Recognizing that sort keys are equivalent could be done in another JIRA 
case, but we should not do it in this one.

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -------------------------------------------------------------------------
>
>                 Key: CALCITE-4617
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4617
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.26.0
>            Reporter: Sylvain Crozon
>            Assignee: Ruben Q L
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.27.0
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
>     LogicalJoin(condition=[=($0, $9)], joinType=[right])
>       LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
>     LogicalJoin(condition=[=($0, $9)], joinType=[right])
>       LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>       LogicalSort(offset=[2], fetch=[10])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to