[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule

2023-09-04 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761850#comment-17761850
 ] 

Ruben Q L commented on CALCITE-5952:


bq. My new proposal is "SemiJoinJoinTransposeRule should check if JoinType does 
support pushing predicates into its inputs"

Sounds good (y)   
(nitpick: I'd simplify it a bit by using just "supports" instead of "does 
support")

> Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
> ---
>
> Key: CALCITE-5952
> URL: https://issues.apache.org/jira/browse/CALCITE-5952
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.35.0
>Reporter: Leonid Chistov
>Assignee: Leonid Chistov
>Priority: Major
>  Labels: pull-request-available
>
> The following test will fail if added to RelOptRulesTest.java
> {code:java}
> @Test void testCanNotPushSemiJoinToRightJoinBranch() {
>   final Function relFn = b -> b
>   .scan("EMP")
>   .scan("DEPT")
>   .join(JoinRelType.LEFT,
>   b.equals(
>   b.field(2, 0, "DEPTNO"),
>   b.field(2, 1, "DEPTNO"))
>   )
>   .scan("BONUS")
>   // semi join only relates to RHS fields of left join
>   .semiJoin(b.equals(
>   b.field(2, 0, "DNAME"),
>   b.field(2, 1, "JOB")))
>   .build();
>   relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
> } {code}
> Produced plan will look like:
> {code:java}
> LogicalJoin(condition=[=($7, $8)], joinType=[left])
>   LogicalTableScan(table=[[scott, EMP]])
>   LogicalJoin(condition=[=($1, $4)], joinType=[semi])
>     LogicalTableScan(table=[[scott, DEPT]])
>     LogicalTableScan(table=[[scott, BONUS]]) {code}
> Which is different from the original plan:
> {code:java}
> LogicalJoin(condition=[=($9, $12)], joinType=[semi])
>   LogicalJoin(condition=[=($7, $8)], joinType=[left])
>     LogicalTableScan(table=[[scott, EMP]])
>     LogicalTableScan(table=[[scott, DEPT]])
>   LogicalTableScan(table=[[scott, BONUS]]) {code}
> This is not correct - in general case it is not correct to push semi-join to 
> right side of left-join.
> The reason is the following:
> Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows 
> will have *nulls* for *DEPT* columns in the result of left-join and they will 
> be rejected by the top semi-join.
> But if we push semi-join to RHS of left-join, we are going to see rows from 
> *EMP* with *nulls* on the *DEPT* side in the final result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule

2023-09-04 Thread Leonid Chistov (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761846#comment-17761846
 ] 

Leonid Chistov commented on CALCITE-5952:
-

[~rubenql] 

Yes, in your example we push semi-join to the left side of the left join.

When I was writing "should not push SemiJoin to the right (left) input of the 
Left (Right) Join", I thought that everyone would read it as:

"should not push to the right input of Left Join and to the left input of the 
Right Join".

=

My new proposal is "SemiJoinJoinTransposeRule should check if JoinType does 
support pushing predicates into its inputs"

=

>  perhaps it would be nice to add these examples of Left/Right Join where the 
>rule can be applied

OK

> Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
> ---
>
> Key: CALCITE-5952
> URL: https://issues.apache.org/jira/browse/CALCITE-5952
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.35.0
>Reporter: Leonid Chistov
>Assignee: Leonid Chistov
>Priority: Major
>  Labels: pull-request-available
>
> The following test will fail if added to RelOptRulesTest.java
> {code:java}
> @Test void testCanNotPushSemiJoinToRightJoinBranch() {
>   final Function relFn = b -> b
>   .scan("EMP")
>   .scan("DEPT")
>   .join(JoinRelType.LEFT,
>   b.equals(
>   b.field(2, 0, "DEPTNO"),
>   b.field(2, 1, "DEPTNO"))
>   )
>   .scan("BONUS")
>   // semi join only relates to RHS fields of left join
>   .semiJoin(b.equals(
>   b.field(2, 0, "DNAME"),
>   b.field(2, 1, "JOB")))
>   .build();
>   relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
> } {code}
> Produced plan will look like:
> {code:java}
> LogicalJoin(condition=[=($7, $8)], joinType=[left])
>   LogicalTableScan(table=[[scott, EMP]])
>   LogicalJoin(condition=[=($1, $4)], joinType=[semi])
>     LogicalTableScan(table=[[scott, DEPT]])
>     LogicalTableScan(table=[[scott, BONUS]]) {code}
> Which is different from the original plan:
> {code:java}
> LogicalJoin(condition=[=($9, $12)], joinType=[semi])
>   LogicalJoin(condition=[=($7, $8)], joinType=[left])
>     LogicalTableScan(table=[[scott, EMP]])
>     LogicalTableScan(table=[[scott, DEPT]])
>   LogicalTableScan(table=[[scott, BONUS]]) {code}
> This is not correct - in general case it is not correct to push semi-join to 
> right side of left-join.
> The reason is the following:
> Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows 
> will have *nulls* for *DEPT* columns in the result of left-join and they will 
> be rejected by the top semi-join.
> But if we push semi-join to RHS of left-join, we are going to see rows from 
> *EMP* with *nulls* on the *DEPT* side in the final result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule

2023-09-04 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761830#comment-17761830
 ] 

Ruben Q L commented on CALCITE-5952:


{quote}Maybe just "SemiJoinJoinTransposeRule should not push SemiJoin to the 
right (left) input of the Left (Right) Join"
{quote}
Mmmm, I think that is not entirely accurate, there are cases where Left (Right) 
Join can apply the rule: if the SemiJoin keys contains keys from X (Join's LHS 
in SemiJoinJoinTransposeRule example), then the rule can be applied on a 
LeftJoin (but not on a RightJoin), and vice-versa; right?
{noformat}
LogicalJoin(condition: emp.bonusId = bonus.id, joinType=[semi])
  LogicalJoin(condition: emp.deptId = dept.id, joinType=[left])
LogicalTableScan(table=[[scott, EMP]])
LogicalTableScan(table=[[scott, DEPT]])
  LogicalTableScan(table=[[scott, BONUS]]) 
=> valid transformation
LogicalJoin(condition: emp.deptId = dept.id, joinType=[left])
  LogicalJoin(condition: emp.bonusId = bonus.id, joinType=[semi])
LogicalTableScan(table=[[scott, EMP]])
LogicalTableScan(table=[[scott, BONUS]]) 
  LogicalTableScan(table=[[scott, DEPT]])
{noformat}
If they do not exist, perhaps it would be nice to add these examples of 
Left/Right Join where the rule can be applied (still after your patch) in 
RelOptRulesTest.java, wdyt?

> Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
> ---
>
> Key: CALCITE-5952
> URL: https://issues.apache.org/jira/browse/CALCITE-5952
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.35.0
>Reporter: Leonid Chistov
>Assignee: Leonid Chistov
>Priority: Major
>  Labels: pull-request-available
>
> The following test will fail if added to RelOptRulesTest.java
> {code:java}
> @Test void testCanNotPushSemiJoinToRightJoinBranch() {
>   final Function relFn = b -> b
>   .scan("EMP")
>   .scan("DEPT")
>   .join(JoinRelType.LEFT,
>   b.equals(
>   b.field(2, 0, "DEPTNO"),
>   b.field(2, 1, "DEPTNO"))
>   )
>   .scan("BONUS")
>   // semi join only relates to RHS fields of left join
>   .semiJoin(b.equals(
>   b.field(2, 0, "DNAME"),
>   b.field(2, 1, "JOB")))
>   .build();
>   relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
> } {code}
> Produced plan will look like:
> {code:java}
> LogicalJoin(condition=[=($7, $8)], joinType=[left])
>   LogicalTableScan(table=[[scott, EMP]])
>   LogicalJoin(condition=[=($1, $4)], joinType=[semi])
>     LogicalTableScan(table=[[scott, DEPT]])
>     LogicalTableScan(table=[[scott, BONUS]]) {code}
> Which is different from the original plan:
> {code:java}
> LogicalJoin(condition=[=($9, $12)], joinType=[semi])
>   LogicalJoin(condition=[=($7, $8)], joinType=[left])
>     LogicalTableScan(table=[[scott, EMP]])
>     LogicalTableScan(table=[[scott, DEPT]])
>   LogicalTableScan(table=[[scott, BONUS]]) {code}
> This is not correct - in general case it is not correct to push semi-join to 
> right side of left-join.
> The reason is the following:
> Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows 
> will have *nulls* for *DEPT* columns in the result of left-join and they will 
> be rejected by the top semi-join.
> But if we push semi-join to RHS of left-join, we are going to see rows from 
> *EMP* with *nulls* on the *DEPT* side in the final result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule

2023-09-04 Thread Leonid Chistov (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761825#comment-17761825
 ] 

Leonid Chistov commented on CALCITE-5952:
-

[~rubenql]

Maybe just "SemiJoinJoinTransposeRule should not push SemiJoin to the right 
(left) input of the Left (Right) Join"

> Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
> ---
>
> Key: CALCITE-5952
> URL: https://issues.apache.org/jira/browse/CALCITE-5952
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.35.0
>Reporter: Leonid Chistov
>Assignee: Leonid Chistov
>Priority: Major
>  Labels: pull-request-available
>
> The following test will fail if added to RelOptRulesTest.java
> {code:java}
> @Test void testCanNotPushSemiJoinToRightJoinBranch() {
>   final Function relFn = b -> b
>   .scan("EMP")
>   .scan("DEPT")
>   .join(JoinRelType.LEFT,
>   b.equals(
>   b.field(2, 0, "DEPTNO"),
>   b.field(2, 1, "DEPTNO"))
>   )
>   .scan("BONUS")
>   // semi join only relates to RHS fields of left join
>   .semiJoin(b.equals(
>   b.field(2, 0, "DNAME"),
>   b.field(2, 1, "JOB")))
>   .build();
>   relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
> } {code}
> Produced plan will look like:
> {code:java}
> LogicalJoin(condition=[=($7, $8)], joinType=[left])
>   LogicalTableScan(table=[[scott, EMP]])
>   LogicalJoin(condition=[=($1, $4)], joinType=[semi])
>     LogicalTableScan(table=[[scott, DEPT]])
>     LogicalTableScan(table=[[scott, BONUS]]) {code}
> Which is different from the original plan:
> {code:java}
> LogicalJoin(condition=[=($9, $12)], joinType=[semi])
>   LogicalJoin(condition=[=($7, $8)], joinType=[left])
>     LogicalTableScan(table=[[scott, EMP]])
>     LogicalTableScan(table=[[scott, DEPT]])
>   LogicalTableScan(table=[[scott, BONUS]]) {code}
> This is not correct - in general case it is not correct to push semi-join to 
> right side of left-join.
> The reason is the following:
> Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows 
> will have *nulls* for *DEPT* columns in the result of left-join and they will 
> be rejected by the top semi-join.
> But if we push semi-join to RHS of left-join, we are going to see rows from 
> *EMP* with *nulls* on the *DEPT* side in the final result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule

2023-09-04 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761799#comment-17761799
 ] 

Ruben Q L commented on CALCITE-5952:


Thanks [~lchistov1987].
IMO both tests are complementary (and relevant), if I had to chose, I'd prefer 
the runtime test; but the more tests, the better.
Normally, the process that we follow to fix a bug is:
- Produce a unit test that shows the bug, i.e. it fails.
- Correct the issue.
- The test from the 1st point now passes.

In terms of the test for the 1st point, IMO it is better to show the issue on 
an actual query (ideally in SQL, but I agree that Quidem tests do have some 
limitation and there is some discussion going on). So I think it is more clear 
(and closer to a real-life scenario) producing a query plan that "suffers from 
the problem", either it throws and exception or it produces a wrong the result.
I don't know what others might think, but these are my 2 cents on this topic.

Regarding the PR, it looks good. Perhaps the Jira/PR tittle (and the commit 
message) is not entirely accurate: it talks about LeftJoin, but the problem can 
also occur with RightJoin. What about something along the lines: 
"SemiJoinJoinTransposeRule must not be applied if SemiJoin has keys from Join's 
LHS/RHS and Join type does not support pushing predicates into its left/right 
input"? (a bit long, I know, but I cant come up with a better title, 
suggestions are welcome :)  

> Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
> ---
>
> Key: CALCITE-5952
> URL: https://issues.apache.org/jira/browse/CALCITE-5952
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.35.0
>Reporter: Leonid Chistov
>Assignee: Leonid Chistov
>Priority: Major
>  Labels: pull-request-available
>
> The following test will fail if added to RelOptRulesTest.java
> {code:java}
> @Test void testCanNotPushSemiJoinToRightJoinBranch() {
>   final Function relFn = b -> b
>   .scan("EMP")
>   .scan("DEPT")
>   .join(JoinRelType.LEFT,
>   b.equals(
>   b.field(2, 0, "DEPTNO"),
>   b.field(2, 1, "DEPTNO"))
>   )
>   .scan("BONUS")
>   // semi join only relates to RHS fields of left join
>   .semiJoin(b.equals(
>   b.field(2, 0, "DNAME"),
>   b.field(2, 1, "JOB")))
>   .build();
>   relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
> } {code}
> Produced plan will look like:
> {code:java}
> LogicalJoin(condition=[=($7, $8)], joinType=[left])
>   LogicalTableScan(table=[[scott, EMP]])
>   LogicalJoin(condition=[=($1, $4)], joinType=[semi])
>     LogicalTableScan(table=[[scott, DEPT]])
>     LogicalTableScan(table=[[scott, BONUS]]) {code}
> Which is different from the original plan:
> {code:java}
> LogicalJoin(condition=[=($9, $12)], joinType=[semi])
>   LogicalJoin(condition=[=($7, $8)], joinType=[left])
>     LogicalTableScan(table=[[scott, EMP]])
>     LogicalTableScan(table=[[scott, DEPT]])
>   LogicalTableScan(table=[[scott, BONUS]]) {code}
> This is not correct - in general case it is not correct to push semi-join to 
> right side of left-join.
> The reason is the following:
> Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows 
> will have *nulls* for *DEPT* columns in the result of left-join and they will 
> be rejected by the top semi-join.
> But if we push semi-join to RHS of left-join, we are going to see rows from 
> *EMP* with *nulls* on the *DEPT* side in the final result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule

2023-09-04 Thread Leonid Chistov (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761757#comment-17761757
 ] 

Leonid Chistov commented on CALCITE-5952:
-

[~rubenql] 

I have prepared a Pull Request: [https://github.com/apache/calcite/pull/3407] 
and added the runtime test also. But I don't really like it since it looks like 
this approach will force test authors to implement tests for optimization rules 
twice - one test for plan correctness and other for runtime correctness.

Also, there is a new email thread about the same question 
[https://lists.apache.org/thread/r4yhn77m92fodmz8xm6k40sfd130w7hh,] which ends 
with an idea that we probably should establish proper infrastructure for 
specific rules testing in Quidem test.

WDYT?

> Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
> ---
>
> Key: CALCITE-5952
> URL: https://issues.apache.org/jira/browse/CALCITE-5952
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.35.0
>Reporter: Leonid Chistov
>Assignee: Leonid Chistov
>Priority: Major
>
> The following test will fail if added to RelOptRulesTest.java
> {code:java}
> @Test void testCanNotPushSemiJoinToRightJoinBranch() {
>   final Function relFn = b -> b
>   .scan("EMP")
>   .scan("DEPT")
>   .join(JoinRelType.LEFT,
>   b.equals(
>   b.field(2, 0, "DEPTNO"),
>   b.field(2, 1, "DEPTNO"))
>   )
>   .scan("BONUS")
>   // semi join only relates to RHS fields of left join
>   .semiJoin(b.equals(
>   b.field(2, 0, "DNAME"),
>   b.field(2, 1, "JOB")))
>   .build();
>   relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
> } {code}
> Produced plan will look like:
> {code:java}
> LogicalJoin(condition=[=($7, $8)], joinType=[left])
>   LogicalTableScan(table=[[scott, EMP]])
>   LogicalJoin(condition=[=($1, $4)], joinType=[semi])
>     LogicalTableScan(table=[[scott, DEPT]])
>     LogicalTableScan(table=[[scott, BONUS]]) {code}
> Which is different from the original plan:
> {code:java}
> LogicalJoin(condition=[=($9, $12)], joinType=[semi])
>   LogicalJoin(condition=[=($7, $8)], joinType=[left])
>     LogicalTableScan(table=[[scott, EMP]])
>     LogicalTableScan(table=[[scott, DEPT]])
>   LogicalTableScan(table=[[scott, BONUS]]) {code}
> This is not correct - in general case it is not correct to push semi-join to 
> right side of left-join.
> The reason is the following:
> Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows 
> will have *nulls* for *DEPT* columns in the result of left-join and they will 
> be rejected by the top semi-join.
> But if we push semi-join to RHS of left-join, we are going to see rows from 
> *EMP* with *nulls* on the *DEPT* side in the final result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule

2023-08-23 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758083#comment-17758083
 ] 

Ruben Q L commented on CALCITE-5952:


The rule-based test that you proposed is totally relevant and should be 
included. My idea was to complement it with another "result-based" test. 
I don't know if you can "force" the bug with a quidem test, maybe you can try. 
Otherwise, perhaps it is easier to reproduce the bug in somewhere like 
{{EnumerableJoinTest.java}}? (it already contains other RelBuilder-based join 
tests, some of them adding/removing specific rules, where the resulted rows are 
checked).

> Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
> ---
>
> Key: CALCITE-5952
> URL: https://issues.apache.org/jira/browse/CALCITE-5952
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.35.0
>Reporter: Leonid Chistov
>Assignee: Leonid Chistov
>Priority: Major
>
> The following test will fail if added to RelOptRulesTest.java
> {code:java}
> @Test void testCanNotPushSemiJoinToRightJoinBranch() {
>   final Function relFn = b -> b
>   .scan("EMP")
>   .scan("DEPT")
>   .join(JoinRelType.LEFT,
>   b.equals(
>   b.field(2, 0, "DEPTNO"),
>   b.field(2, 1, "DEPTNO"))
>   )
>   .scan("BONUS")
>   // semi join only relates to RHS fields of left join
>   .semiJoin(b.equals(
>   b.field(2, 0, "DNAME"),
>   b.field(2, 1, "JOB")))
>   .build();
>   relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
> } {code}
> Produced plan will look like:
> {code:java}
> LogicalJoin(condition=[=($7, $8)], joinType=[left])
>   LogicalTableScan(table=[[scott, EMP]])
>   LogicalJoin(condition=[=($1, $4)], joinType=[semi])
>     LogicalTableScan(table=[[scott, DEPT]])
>     LogicalTableScan(table=[[scott, BONUS]]) {code}
> Which is different from the original plan:
> {code:java}
> LogicalJoin(condition=[=($9, $12)], joinType=[semi])
>   LogicalJoin(condition=[=($7, $8)], joinType=[left])
>     LogicalTableScan(table=[[scott, EMP]])
>     LogicalTableScan(table=[[scott, DEPT]])
>   LogicalTableScan(table=[[scott, BONUS]]) {code}
> This is not correct - in general case it is not correct to push semi-join to 
> right side of left-join.
> The reason is the following:
> Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows 
> will have *nulls* for *DEPT* columns in the result of left-join and they will 
> be rejected by the top semi-join.
> But if we push semi-join to RHS of left-join, we are going to see rows from 
> *EMP* with *nulls* on the *DEPT* side in the final result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule

2023-08-23 Thread Leonid Chistov (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758074#comment-17758074
 ] 

Leonid Chistov commented on CALCITE-5952:
-

[~rubenql]

Can you suggest which Calcite infrastructure to use to write such test (or 
where to look for the analogous examples) that will do the planning and execute 
the plan?

I have tried to look on Quidem tests, but they seem to be more 
frontend-oriented, I didn't find a way to make them run with specific 
optimization enabled.

> Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
> ---
>
> Key: CALCITE-5952
> URL: https://issues.apache.org/jira/browse/CALCITE-5952
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.35.0
>Reporter: Leonid Chistov
>Assignee: Leonid Chistov
>Priority: Major
>
> The following test will fail if added to RelOptRulesTest.java
> {code:java}
> @Test void testCanNotPushSemiJoinToRightJoinBranch() {
>   final Function relFn = b -> b
>   .scan("EMP")
>   .scan("DEPT")
>   .join(JoinRelType.LEFT,
>   b.equals(
>   b.field(2, 0, "DEPTNO"),
>   b.field(2, 1, "DEPTNO"))
>   )
>   .scan("BONUS")
>   // semi join only relates to RHS fields of left join
>   .semiJoin(b.equals(
>   b.field(2, 0, "DNAME"),
>   b.field(2, 1, "JOB")))
>   .build();
>   relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
> } {code}
> Produced plan will look like:
> {code:java}
> LogicalJoin(condition=[=($7, $8)], joinType=[left])
>   LogicalTableScan(table=[[scott, EMP]])
>   LogicalJoin(condition=[=($1, $4)], joinType=[semi])
>     LogicalTableScan(table=[[scott, DEPT]])
>     LogicalTableScan(table=[[scott, BONUS]]) {code}
> Which is different from the original plan:
> {code:java}
> LogicalJoin(condition=[=($9, $12)], joinType=[semi])
>   LogicalJoin(condition=[=($7, $8)], joinType=[left])
>     LogicalTableScan(table=[[scott, EMP]])
>     LogicalTableScan(table=[[scott, DEPT]])
>   LogicalTableScan(table=[[scott, BONUS]]) {code}
> This is not correct - in general case it is not correct to push semi-join to 
> right side of left-join.
> The reason is the following:
> Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows 
> will have *nulls* for *DEPT* columns in the result of left-join and they will 
> be rejected by the top semi-join.
> But if we push semi-join to RHS of left-join, we are going to see rows from 
> *EMP* with *nulls* on the *DEPT* side in the final result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule

2023-08-23 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757957#comment-17757957
 ] 

Ruben Q L commented on CALCITE-5952:


I meant showing e.g. a query/plan with employee, department, bonus values, with 
the rows returned by the problematic plan vs the rows that should be returned.

> Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
> ---
>
> Key: CALCITE-5952
> URL: https://issues.apache.org/jira/browse/CALCITE-5952
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.35.0
>Reporter: Leonid Chistov
>Assignee: Leonid Chistov
>Priority: Major
>
> The following test will fail if added to RelOptRulesTest.java
> {code:java}
> @Test void testCanNotPushSemiJoinToRightJoinBranch() {
>   final Function relFn = b -> b
>   .scan("EMP")
>   .scan("DEPT")
>   .join(JoinRelType.LEFT,
>   b.equals(
>   b.field(2, 0, "DEPTNO"),
>   b.field(2, 1, "DEPTNO"))
>   )
>   .scan("BONUS")
>   // semi join only relates to RHS fields of left join
>   .semiJoin(b.equals(
>   b.field(2, 0, "DNAME"),
>   b.field(2, 1, "JOB")))
>   .build();
>   relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
> } {code}
> Produced plan will look like:
> {code:java}
> LogicalJoin(condition=[=($7, $8)], joinType=[left])
>   LogicalTableScan(table=[[scott, EMP]])
>   LogicalJoin(condition=[=($1, $4)], joinType=[semi])
>     LogicalTableScan(table=[[scott, DEPT]])
>     LogicalTableScan(table=[[scott, BONUS]]) {code}
> Which is different from the original plan:
> {code:java}
> LogicalJoin(condition=[=($9, $12)], joinType=[semi])
>   LogicalJoin(condition=[=($7, $8)], joinType=[left])
>     LogicalTableScan(table=[[scott, EMP]])
>     LogicalTableScan(table=[[scott, DEPT]])
>   LogicalTableScan(table=[[scott, BONUS]]) {code}
> This is not correct - in general case it is not correct to push semi-join to 
> right side of left-join.
> The reason is the following:
> Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows 
> will have *nulls* for *DEPT* columns in the result of left-join and they will 
> be rejected by the top semi-join.
> But if we push semi-join to RHS of left-join, we are going to see rows from 
> *EMP* with *nulls* on the *DEPT* side in the final result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule

2023-08-23 Thread Leonid Chistov (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757955#comment-17757955
 ] 

Leonid Chistov commented on CALCITE-5952:
-

[~rubenql] 

The expected result is mentioned in issue description (I have called it 
"original plan", meaning that plan that is constructed by RelBuilder is not 
supposed to be changed by SEMI_JOIN_JOIN_TRANSPOSE):

 
{code:java}
LogicalJoin(condition=[=($9, $12)], joinType=[semi])
  LogicalJoin(condition=[=($7, $8)], joinType=[left])
    LogicalTableScan(table=[[scott, EMP]])
    LogicalTableScan(table=[[scott, DEPT]])
  LogicalTableScan(table=[[scott, BONUS]])  {code}
 

 

> Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
> ---
>
> Key: CALCITE-5952
> URL: https://issues.apache.org/jira/browse/CALCITE-5952
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.35.0
>Reporter: Leonid Chistov
>Assignee: Leonid Chistov
>Priority: Major
>
> The following test will fail if added to RelOptRulesTest.java
> {code:java}
> @Test void testCanNotPushSemiJoinToRightJoinBranch() {
>   final Function relFn = b -> b
>   .scan("EMP")
>   .scan("DEPT")
>   .join(JoinRelType.LEFT,
>   b.equals(
>   b.field(2, 0, "DEPTNO"),
>   b.field(2, 1, "DEPTNO"))
>   )
>   .scan("BONUS")
>   // semi join only relates to RHS fields of left join
>   .semiJoin(b.equals(
>   b.field(2, 0, "DNAME"),
>   b.field(2, 1, "JOB")))
>   .build();
>   relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
> } {code}
> Produced plan will look like:
> {code:java}
> LogicalJoin(condition=[=($7, $8)], joinType=[left])
>   LogicalTableScan(table=[[scott, EMP]])
>   LogicalJoin(condition=[=($1, $4)], joinType=[semi])
>     LogicalTableScan(table=[[scott, DEPT]])
>     LogicalTableScan(table=[[scott, BONUS]]) {code}
> Which is different from the original plan:
> {code:java}
> LogicalJoin(condition=[=($9, $12)], joinType=[semi])
>   LogicalJoin(condition=[=($7, $8)], joinType=[left])
>     LogicalTableScan(table=[[scott, EMP]])
>     LogicalTableScan(table=[[scott, DEPT]])
>   LogicalTableScan(table=[[scott, BONUS]]) {code}
> This is not correct - in general case it is not correct to push semi-join to 
> right side of left-join.
> The reason is the following:
> Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows 
> will have *nulls* for *DEPT* columns in the result of left-join and they will 
> be rejected by the top semi-join.
> But if we push semi-join to RHS of left-join, we are going to see rows from 
> *EMP* with *nulls* on the *DEPT* side in the final result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule

2023-08-23 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757919#comment-17757919
 ] 

Ruben Q L commented on CALCITE-5952:


Would it be possible to have a test showing the expected results vs actual 
results?

> Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
> ---
>
> Key: CALCITE-5952
> URL: https://issues.apache.org/jira/browse/CALCITE-5952
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.35.0
>Reporter: Leonid Chistov
>Assignee: Leonid Chistov
>Priority: Major
>
> The following test will fail if added to RelOptRulesTest.java
> {code:java}
> @Test void testCanNotPushSemiJoinToRightJoinBranch() {
>   final Function relFn = b -> b
>   .scan("EMP")
>   .scan("DEPT")
>   .join(JoinRelType.LEFT,
>   b.equals(
>   b.field(2, 0, "DEPTNO"),
>   b.field(2, 1, "DEPTNO"))
>   )
>   .scan("BONUS")
>   // semi join only relates to RHS fields of left join
>   .semiJoin(b.equals(
>   b.field(2, 0, "DNAME"),
>   b.field(2, 1, "JOB")))
>   .build();
>   relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
> } {code}
> Produced plan will look like:
> {code:java}
> LogicalJoin(condition=[=($7, $8)], joinType=[left])
>   LogicalTableScan(table=[[scott, EMP]])
>   LogicalJoin(condition=[=($1, $4)], joinType=[semi])
>     LogicalTableScan(table=[[scott, DEPT]])
>     LogicalTableScan(table=[[scott, BONUS]]) {code}
> Which is different from the original plan:
> {code:java}
> LogicalJoin(condition=[=($9, $12)], joinType=[semi])
>   LogicalJoin(condition=[=($7, $8)], joinType=[left])
>     LogicalTableScan(table=[[scott, EMP]])
>     LogicalTableScan(table=[[scott, DEPT]])
>   LogicalTableScan(table=[[scott, BONUS]]) {code}
> This is not correct - in general case it is not correct to push semi-join to 
> right side of left-join.
> The reason is the following:
> Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows 
> will have *nulls* for *DEPT* columns in the result of left-join and they will 
> be rejected by the top semi-join.
> But if we push semi-join to RHS of left-join, we are going to see rows from 
> *EMP* with *nulls* on the *DEPT* side in the final result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)