[ 
https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761799#comment-17761799
 ] 

Ruben Q L commented on CALCITE-5952:
------------------------------------

Thanks [~lchistov1987].
IMO both tests are complementary (and relevant), if I had to chose, I'd prefer 
the runtime test; but the more tests, the better.
Normally, the process that we follow to fix a bug is:
- Produce a unit test that shows the bug, i.e. it fails.
- Correct the issue.
- The test from the 1st point now passes.

In terms of the test for the 1st point, IMO it is better to show the issue on 
an actual query (ideally in SQL, but I agree that Quidem tests do have some 
limitation and there is some discussion going on). So I think it is more clear 
(and closer to a real-life scenario) producing a query plan that "suffers from 
the problem", either it throws and exception or it produces a wrong the result.
I don't know what others might think, but these are my 2 cents on this topic.

Regarding the PR, it looks good. Perhaps the Jira/PR tittle (and the commit 
message) is not entirely accurate: it talks about LeftJoin, but the problem can 
also occur with RightJoin. What about something along the lines: 
"SemiJoinJoinTransposeRule must not be applied if SemiJoin has keys from Join's 
LHS/RHS and Join type does not support pushing predicates into its left/right 
input"? (a bit long, I know, but I cant come up with a better title, 
suggestions are welcome :)  

> Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
> ---------------------------------------------------------------------------
>
>                 Key: CALCITE-5952
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5952
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.35.0
>            Reporter: Leonid Chistov
>            Assignee: Leonid Chistov
>            Priority: Major
>              Labels: pull-request-available
>
> The following test will fail if added to RelOptRulesTest.java
> {code:java}
> @Test void testCanNotPushSemiJoinToRightJoinBranch() {
>   final Function<RelBuilder, RelNode> relFn = b -> b
>       .scan("EMP")
>       .scan("DEPT")
>       .join(JoinRelType.LEFT,
>           b.equals(
>               b.field(2, 0, "DEPTNO"),
>               b.field(2, 1, "DEPTNO"))
>       )
>       .scan("BONUS")
>       // semi join only relates to RHS fields of left join
>       .semiJoin(b.equals(
>           b.field(2, 0, "DNAME"),
>           b.field(2, 1, "JOB")))
>       .build();
>   relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
> } {code}
> Produced plan will look like:
> {code:java}
> LogicalJoin(condition=[=($7, $8)], joinType=[left])
>   LogicalTableScan(table=[[scott, EMP]])
>   LogicalJoin(condition=[=($1, $4)], joinType=[semi])
>     LogicalTableScan(table=[[scott, DEPT]])
>     LogicalTableScan(table=[[scott, BONUS]]) {code}
> Which is different from the original plan:
> {code:java}
> LogicalJoin(condition=[=($9, $12)], joinType=[semi])
>   LogicalJoin(condition=[=($7, $8)], joinType=[left])
>     LogicalTableScan(table=[[scott, EMP]])
>     LogicalTableScan(table=[[scott, DEPT]])
>   LogicalTableScan(table=[[scott, BONUS]]) {code}
> This is not correct - in general case it is not correct to push semi-join to 
> right side of left-join.
> The reason is the following:
> Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows 
> will have *nulls* for *DEPT* columns in the result of left-join and they will 
> be rejected by the top semi-join.
> But if we push semi-join to RHS of left-join, we are going to see rows from 
> *EMP* with *nulls* on the *DEPT* side in the final result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to