[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-06-01 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355207#comment-17355207
 ] 

Julian Hyde commented on CALCITE-4617:
--

What I would like to see here is for someone to think through ALL of the cases, 
and come up with a theory for when it should be valid to push down offset 
and/or limit. I came up with a case to prove that the thinking hasn't been done.

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.28.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-06-01 Thread duan xiong (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355154#comment-17355154
 ] 

duan xiong commented on CALCITE-4617:
-

[~sylvaincrozon] Sorry. I have mistaked empno  for deptno . So up sql's lack of 
'top empno 11'.  
{code:java}
EnumerableLimit(fetch=[11])
 EnumerableTableScan(table=[[scott, EMP]]){code}
should be:
{code:java}
EnumerableLimit(sort0={$1},fetch=[11])
 EnumerableTableScan(table=[[scott, EMP]]){code}
right ? I know you want to describe limit+offset. Just this a small problem.  
And or I misunderstood ?

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.28.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-06-01 Thread Sylvain Crozon (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355112#comment-17355112
 ] 

Sylvain Crozon commented on CALCITE-4617:
-

[~nobigo] the query doesn't specify we need to get the top 10 rows by deptno, 
nor is it what the test should be about. We just want to make sure that we 
don't skip rows we shouldn't be skipping. I added {{order by empno}} to make 
sure the results are consistent, but for this test, the order by clause doesn't 
matter.

[~rubenql] I reverted the changes with {{offset+fetch}}, but kept the test with 
the query from Julian. Are there more tests you would like to add?

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.28.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-06-01 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354875#comment-17354875
 ] 

Ruben Q L commented on CALCITE-4617:


[~nobigo], I'm just describing the current behavior of the rule.
There might be some scenarios where the rule is not applied that could 
theoretically be eligible, but we need to be careful that changing the rule 
will *not* produce wrong results for certain plans. IMHO, the scope of the 
current ticket should be limited to fixing the described issue (wrong results 
in certain plans when {{offset}} is involved), and not trying to enlarge the 
applicability of the rule (which may cause other issues as side effects). Of 
course, we can always create a separate for the latter.

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.28.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-31 Thread duan xiong (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354716#comment-17354716
 ] 

duan xiong commented on CALCITE-4617:
-

Hi, [~rubenql] . You mentioned above that :
 - If the input is already sorted and we are not reducing the number of tuples, 
we bail out

What do you mean? Because if  the input is already sorted by join condition, we 
still can push offset down.

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.28.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-31 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354462#comment-17354462
 ] 

Stamatis Zampetakis commented on CALCITE-4617:
--

It seems that the PR still needs a bit of work, so I will defer this to 1.28.0. 
We are already a bit late with 1.27.0, so I want to avoid delaying the release 
much longer. 

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-31 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354350#comment-17354350
 ] 

Ruben Q L commented on CALCITE-4617:


[~sylvaincrozon] I have the impression that your original patch was in the 
right track, we just required more unit tests to verify explicitly that some 
missing scenarios behave correctly.

[~julianhyde], we need to remember that {{SortJoinTransposeRule}} has a limited 
applicability, and there are a few conditions that must be satisfied in order 
for the rule to be applied (because otherwise it might produce wrong results):
- If join is not a left or right outer, we bail out
- If sort is not a trivial order-by, and if there is any sort column that is 
not part of the input where the sort is pushed, we bail out
- If sort has an offset, and if the non-preserved side of the join is not 
count-preserving against the join condition, we bail out
- If the input is already sorted and we are not reducing the number of tuples, 
we bail out

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-30 Thread duan xiong (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354170#comment-17354170
 ] 

duan xiong commented on CALCITE-4617:
-

{code:java}
// ### [CALCITE-4617] Wrong offset when SortJoinTransposeRule pushes a sort 
node with an offset
select d.deptno, empno
 from "scott".dept d
 right join "scott".emp e using (deptno)
 order by empno limit 10 offset 1;
++---+
| DEPTNO | EMPNO |
++---+
| 30 | 7499 |
| 30 | 7521 |
| 20 | 7566 |
| 30 | 7654 |
| 30 | 7698 |
| 10 | 7782 |
| 20 | 7788 |
| 10 | 7839 |
| 30 | 7844 |
| 20 | 7876 |
++---+
(10 rows)
!ok
EnumerableCalc(expr#0..2=[{inputs}], proj#0..1=[{exprs}])
  EnumerableLimit(offset=[1], fetch=[10])
EnumerableSort(sort0=[$1], dir0=[ASC])
  EnumerableHashJoin(condition=[=($0, $2)], joinType=[right])
EnumerableCalc(expr#0..2=[{inputs}], DEPTNO=[$t0])
  EnumerableTableScan(table=[[scott, DEPT]])
EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], DEPTNO=[$t7])
  EnumerableLimit(fetch=[11])
EnumerableTableScan(table=[[scott, EMP]])
!plan
{code}
[~sylvaincrozon] Hi. This result is not valid.  Because the table scott.emp 
fetch 11 row not order by deptno ? This just get 11 rows in scott.emp, not 
ensure this rows is 'top deptno 11'. or I misunderstood

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-30 Thread Sylvain Crozon (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354077#comment-17354077
 ] 

Sylvain Crozon commented on CALCITE-4617:
-

The use cases I had in mind yesterday (1 row from the input where we push the 
sort node resulting in multiple rows after the join) doesn't apply here because 
CALCITE-1507 ensures that the columns on the other side of the join are unique 
for the columns used in the join condition. So this rule only pushes 
offset/limit if we know that each row of the input where we push the sort node 
results in 1 row only after the join.

Then I think it's always safe to push offset/limit and I can revert the change 
I made yesterday.

In your example, shouldn't the result be (30, 1) and (null, 2) since it's a 
right join and the emp table is on the right side?
We push the offset/limit on the right side (emp table) and we'll end up with 
the right result because depno is unique on the left side.

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-30 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354071#comment-17354071
 ] 

Julian Hyde commented on CALCITE-4617:
--

I think there are some situations where it is valid to push, and some where it 
is not. We should push where it is valid. Whether it is valid depends on the 
join type (inner, left, right, full), uniqueness of inputs on the join key, and 
referential integrity. So there are several combinations to consider.

I am surprised that my query 0 is handled differently than the query at the top 
of this bug. I don’t see a reason to treat offset differently to offset+limit.  

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-29 Thread Sylvain Crozon (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353842#comment-17353842
 ] 

Sylvain Crozon commented on CALCITE-4617:
-

[~julianhyde] I believe the right thing to do would be to not push the offset 
or limit past the join. There is no way to know what rows from the left side of 
the join will match rows from the right side and so we don't know whether it's 
safe to skip an entire row on the right side or not. 

But not pushing the limit inside the join would mean we'd retrieve more than we 
need. Intuitively I think we could push {{fetch=sort.limit+sort.fetch}} to make 
sure we fetch enough to account for both once the datasets are merged. 
Something like:
{code:java}
Sort(offset=1, limit=10) 
  Join
Scan
Sort(limit=11)
  Scan{code}
 

I updated the PR with this logic and added a test case with the example you 
suggested. 

 

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-29 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353694#comment-17353694
 ] 

Ruben Q L commented on CALCITE-4617:


[~julianhyde], if I am not mistaken, your proposed query 0 would return the 
correct result, because {{SortJoinTransposeRule}} will not be applied due to 
this control:
{code}
...
// If the input is already sorted and we are not reducing the number of tuples,
// we bail out
if (RelMdUtil.checkInputForCollationAndLimit(mq, join.getRight(),
rightCollation, sort.offset, sort.fetch)) {
  return;
}
{code}

But you're right, it is better to have a explicit test case for this.

[~sylvaincrozon], could you please enlarge the test coverage of 
testSortJoinTranspose in {{RelOptRulesTest}}? See examples in [~julianhyde]'s 
comments

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-28 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353540#comment-17353540
 ] 

Julian Hyde commented on CALCITE-4617:
--

[~rubenql], No, I meant "left" in query 2. It is a different case because of 
the uniqueness of {{emp.deptno}}.

Now consider query 0,
{code:sql}
select d.deptno, empno
from sales.dept d
right join sales.emp e using (deptno)
offset 1
{code}
which is basically the query that this case is about. Suppose that {{emp}} has 
(empno, deptno) rows (1, 30), (2, 10) and {{dept}} has (deptno) rows (20), (30).

The query should return either of the rows (20, null) and (30, 1) and (skipping 
the other row because of the offset). But with this transformation, it will 
skip the {{emp}} row (1, 30), and therefore may wrongly emit a result of (30, 
null).

In short, this transformation is not valid. I do agree that the bug fix makes 
an invalid transformation slightly better. But we haven't finished the 
analysis, so we're not "done".

I could be persuaded that this fix could go into 1.27 only if we have a valid 
test case.

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-28 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353095#comment-17353095
 ] 

Ruben Q L commented on CALCITE-4617:


[~zabetak] IMO ready to go, I'll proceed with the merge.

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-28 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353067#comment-17353067
 ] 

Stamatis Zampetakis commented on CALCITE-4617:
--

[~rubenql] is this ready to go or should we push in the next release?

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-26 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351616#comment-17351616
 ] 

Ruben Q L commented on CALCITE-4617:


Thanks for the clear explanation, [~julianhyde]. Just nit-picking, if I am not 
mistaken in the context of {{SortJoinTransposeRule}}, query 2 should rather be 
a {{RIGHT JOIN}}, otherwise the rule would not be applied.
It's clear that the patch makes sense and solves a real bug, I'll merge it soon.


> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-25 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351432#comment-17351432
 ] 

Julian Hyde commented on CALCITE-4617:
--

[~rubenql], I'm glad you found CALCITE-1507. It's definitely related.

I get confused when people use terms like "non-preserved side" and 
"count-preserving". I prefer to think in terms of EMP and DEPT. So, in the 
query that is shown in CALCITE-1507, which we will call query 1,
{code:java}
select d.deptno, empno
from sales.dept d
left join sales.emp e using (deptno)
order by d.deptno offset 1{code}
we know that one row of {{d}} might have 1 or more rows from {{e}}. If the 
first deptno is 10, we want to skip just the first employee in dept 10, not the 
whole of dept 10. So pushing down the offset to the {{d}} side is not valid.

Now consider a query that has {{order by e.deptno}} rather than {{order by 
d.deptno}}, which we will call query 2:
{code:java}
select d.deptno, e.empno
from sales.dept d
left join sales.emp e using (deptno)
order by e.deptno offset 1{code}
We know that every row from {{e}} will appear exactly once in the output. 
Therefore we can push the sort and the offset down to the {{e}} side. We have 
to retain the sort, with no offset, after the join. Fixing the plan for this 
query should be the goal of this bug.

Now consider two inner join queries, query 3,
{code:java}
select d.deptno, e.empno
from sales.dept d
join sales.emp e using (deptno)
order by d.deptno offset 1{code}
and query 4,
{code:java}
select d.deptno, e.empno
from sales.dept d
join sales.emp e using (deptno)
order by e.deptno offset 1{code}
These queries differ only in their {{order by}} key. But because of the join 
condition, and because the query is inner, {{e.deptno}} is always equal to 
{{d.deptno}}, and therefore the queries 3 and 4 are equivalent. If we know (by 
virtue of a foreign key constraint) that every row of {{e}} appears exactly 
once in the output, it will be safe to push the sort and offset to the {{e}} 
side. Recognizing that sort keys are equivalent could be done in another JIRA 
case, but we should not do it in this one.

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-25 Thread Sylvain Crozon (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351154#comment-17351154
 ] 

Sylvain Crozon commented on CALCITE-4617:
-

I updated the commit message

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-25 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351139#comment-17351139
 ] 

Ruben Q L commented on CALCITE-4617:


Ok, let's keep the fetch out of the scope of the current ticket (we can always 
create a separate issue in the future if required).
PR looks in a better shape, [~sylvaincrozon] could you please add you name 
between parenthesis at the end of the commit message: {{[CALCITE-xyz] bla bla 
(Firstname Lastname)}} to comply with out c[ontributing 
guidelines|https://calcite.apache.org/develop/#contributing]?

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-25 Thread Sylvain Crozon (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351083#comment-17351083
 ] 

Sylvain Crozon commented on CALCITE-4617:
-

I don't have a strong opinion on {{fetch}}. I considered removing it as well, 
but thought there might be edge cases and that it would be safer to leave it, 
although I can't think of any.

 

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-25 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351062#comment-17351062
 ] 

Ruben Q L commented on CALCITE-4617:


Thanks for the clarification [~sylvaincrozon], it makes sense.
To go one step further... what about {{sort.fetch}}? Should we apply the same 
logic and use it only once too (push it to the inner sort and use {{null}} in 
the top sort, as the PR proposes with the {{offset}}? Even if a duplicated 
{{fetch}} is not as critical as a duplicated {{offset}} (it does not result in 
wrong results), it does result in a less optimized plan since the same LIMIT is 
executed twice. WDYT?

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-25 Thread Sylvain Crozon (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351048#comment-17351048
 ] 

Sylvain Crozon commented on CALCITE-4617:
-

[~rubenql] my understanding of 
https://issues.apache.org/jira/browse/CALCITE-1507 is that it was preventing 
pushing the offset past the join when it would result in the wrong results 
because the join may discard a row from the input based on the join condition. 

This doesn't change that behaviour, but makes sure that if we are able to push 
the offset past the join, that we only apply it once (in the join's input where 
it makes sense) rather than both on the join's input and then again after the 
join, which results in skipping twice as many rows. 

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-25 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350926#comment-17350926
 ] 

Ruben Q L commented on CALCITE-4617:


BTW, I just found an old (related?) ticket that was supposed to deal with 
offset issues in SortJoinTransposeRule: CALCITE-1507
{code}
// 3) If sort has an offset, and if the non-preserved side
// of the join is not count-preserving against the join
// condition, we bail out
{code}

[~sylvaincrozon], [~julianhyde] do you think that patch was wrong / incomplete? 
It seems the current PR is "fixing" precisely one of the tests 
(testSortJoinTranspose6) added by CALCITE-1507. Should we focus on fixing 
CALCITE-1507's logic (if it needs to be fixed) rather than just using {{null}} 
offset? Also, notice that there is another test added by CALCITE-1507 
(testSortJoinTranspose7) that verifies that OFFSET is cannot be pushed (and the 
rule is not applied).


> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-25 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350906#comment-17350906
 ] 

Ruben Q L commented on CALCITE-4617:


[~sylvaincrozon] could you please add more unit tests to the PR?

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.
>  
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>   LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on 
> top of the join should just have a null offset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-21 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17349444#comment-17349444
 ] 

Julian Hyde commented on CALCITE-4617:
--

No question that there's a bug here, and the PR fixes one case of the bug. But 
whoever reviews this PR needs to think through the cases where the join has a 
selectivity exactly one, less than one, more than one. More test cases may be 
needed.

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. When the sort node has an offset, 
> we should only apply it once, otherwise we end up skipping twice as many rows 
> as we should. The sort node added on top of the join should have a null 
> offset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-21 Thread Sylvain Crozon (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17349440#comment-17349440
 ] 

Sylvain Crozon commented on CALCITE-4617:
-

I believe the fix is quite straightforward for this, I submitted a PR: 
[https://github.com/apache/calcite/pull/2417]

 

But not sure if there are use cases I'm not thinking of where this fix wouldn't 
work

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. }}
> When the sort node has an offset, we should only apply it once, otherwise we 
> end up skipping twice as many rows as we should. The sort node added on top 
> of the join should have a null offset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-21 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17349441#comment-17349441
 ] 

Julian Hyde commented on CALCITE-4617:
--

Commit messages should NEVER start with the word 'fix'.

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. }}
> When the sort node has an offset, we should only apply it once, otherwise we 
> end up skipping twice as many rows as we should. The sort node added on top 
> of the join should have a null offset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4617) Wrong offset when SortJoinTransposeRule pushes a sort node with an offset

2021-05-21 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17349439#comment-17349439
 ] 

Julian Hyde commented on CALCITE-4617:
--

Can you give a simple example SQL query that gives the wrong results? (I 
understand the case, but it will make it easier for others to understand.)

Please remove the formatting from the first paragraph of the description.

> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Sylvain Crozon
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{The SortJoinTransposeRule will push a sort node past a join, and then 
> duplicate the sort node on top of the join. }}
> When the sort node has an offset, we should only apply it once, otherwise we 
> end up skipping twice as many rows as we should. The sort node added on top 
> of the join should have a null offset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)