[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761850#comment-17761850 ] Ruben Q L commented on CALCITE-5952: bq. My new proposal is "SemiJoinJoinTransposeRule should check if JoinType does support pushing predicates into its inputs" Sounds good (y) (nitpick: I'd simplify it a bit by using just "supports" instead of "does support") > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > Labels: pull-request-available > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761846#comment-17761846 ] Leonid Chistov commented on CALCITE-5952: - [~rubenql] Yes, in your example we push semi-join to the left side of the left join. When I was writing "should not push SemiJoin to the right (left) input of the Left (Right) Join", I thought that everyone would read it as: "should not push to the right input of Left Join and to the left input of the Right Join". = My new proposal is "SemiJoinJoinTransposeRule should check if JoinType does support pushing predicates into its inputs" = > perhaps it would be nice to add these examples of Left/Right Join where the >rule can be applied OK > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > Labels: pull-request-available > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761830#comment-17761830 ] Ruben Q L commented on CALCITE-5952: {quote}Maybe just "SemiJoinJoinTransposeRule should not push SemiJoin to the right (left) input of the Left (Right) Join" {quote} Mmmm, I think that is not entirely accurate, there are cases where Left (Right) Join can apply the rule: if the SemiJoin keys contains keys from X (Join's LHS in SemiJoinJoinTransposeRule example), then the rule can be applied on a LeftJoin (but not on a RightJoin), and vice-versa; right? {noformat} LogicalJoin(condition: emp.bonusId = bonus.id, joinType=[semi]) LogicalJoin(condition: emp.deptId = dept.id, joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) => valid transformation LogicalJoin(condition: emp.deptId = dept.id, joinType=[left]) LogicalJoin(condition: emp.bonusId = bonus.id, joinType=[semi]) LogicalTableScan(table=[[scott, EMP]]) LogicalTableScan(table=[[scott, BONUS]]) LogicalTableScan(table=[[scott, DEPT]]) {noformat} If they do not exist, perhaps it would be nice to add these examples of Left/Right Join where the rule can be applied (still after your patch) in RelOptRulesTest.java, wdyt? > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > Labels: pull-request-available > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761825#comment-17761825 ] Leonid Chistov commented on CALCITE-5952: - [~rubenql] Maybe just "SemiJoinJoinTransposeRule should not push SemiJoin to the right (left) input of the Left (Right) Join" > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > Labels: pull-request-available > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761799#comment-17761799 ] Ruben Q L commented on CALCITE-5952: Thanks [~lchistov1987]. IMO both tests are complementary (and relevant), if I had to chose, I'd prefer the runtime test; but the more tests, the better. Normally, the process that we follow to fix a bug is: - Produce a unit test that shows the bug, i.e. it fails. - Correct the issue. - The test from the 1st point now passes. In terms of the test for the 1st point, IMO it is better to show the issue on an actual query (ideally in SQL, but I agree that Quidem tests do have some limitation and there is some discussion going on). So I think it is more clear (and closer to a real-life scenario) producing a query plan that "suffers from the problem", either it throws and exception or it produces a wrong the result. I don't know what others might think, but these are my 2 cents on this topic. Regarding the PR, it looks good. Perhaps the Jira/PR tittle (and the commit message) is not entirely accurate: it talks about LeftJoin, but the problem can also occur with RightJoin. What about something along the lines: "SemiJoinJoinTransposeRule must not be applied if SemiJoin has keys from Join's LHS/RHS and Join type does not support pushing predicates into its left/right input"? (a bit long, I know, but I cant come up with a better title, suggestions are welcome :) > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > Labels: pull-request-available > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761757#comment-17761757 ] Leonid Chistov commented on CALCITE-5952: - [~rubenql] I have prepared a Pull Request: [https://github.com/apache/calcite/pull/3407] and added the runtime test also. But I don't really like it since it looks like this approach will force test authors to implement tests for optimization rules twice - one test for plan correctness and other for runtime correctness. Also, there is a new email thread about the same question [https://lists.apache.org/thread/r4yhn77m92fodmz8xm6k40sfd130w7hh,] which ends with an idea that we probably should establish proper infrastructure for specific rules testing in Quidem test. WDYT? > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758083#comment-17758083 ] Ruben Q L commented on CALCITE-5952: The rule-based test that you proposed is totally relevant and should be included. My idea was to complement it with another "result-based" test. I don't know if you can "force" the bug with a quidem test, maybe you can try. Otherwise, perhaps it is easier to reproduce the bug in somewhere like {{EnumerableJoinTest.java}}? (it already contains other RelBuilder-based join tests, some of them adding/removing specific rules, where the resulted rows are checked). > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758074#comment-17758074 ] Leonid Chistov commented on CALCITE-5952: - [~rubenql] Can you suggest which Calcite infrastructure to use to write such test (or where to look for the analogous examples) that will do the planning and execute the plan? I have tried to look on Quidem tests, but they seem to be more frontend-oriented, I didn't find a way to make them run with specific optimization enabled. > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757957#comment-17757957 ] Ruben Q L commented on CALCITE-5952: I meant showing e.g. a query/plan with employee, department, bonus values, with the rows returned by the problematic plan vs the rows that should be returned. > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757955#comment-17757955 ] Leonid Chistov commented on CALCITE-5952: - [~rubenql] The expected result is mentioned in issue description (I have called it "original plan", meaning that plan that is constructed by RelBuilder is not supposed to be changed by SEMI_JOIN_JOIN_TRANSPOSE): {code:java} LogicalJoin(condition=[=($9, $12)], joinType=[semi]) LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757919#comment-17757919 ] Ruben Q L commented on CALCITE-5952: Would it be possible to have a test showing the expected results vs actual results? > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)