[jira] [Commented] (CALCITE-5732) EnumerableJoin should not return rows matching condition 'null = null'

2023-06-01 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17728250#comment-17728250
 ] 

Ruben Q L commented on CALCITE-5732:


I see. The problem is that both algorithms check for a null join key, and in 
that case they "skip it", so it works as expected on this case.
However, in case of a composite join key, it will never be null (but it can 
contain null elements inside), so the MergeJoin/HashJoin algorithm goes on. 
Since the key Comparator/EqualityComparer that they use is a "standard one", 
they will consider that two nulls are equal, which is wrong for these join 
scenarios.

> EnumerableJoin should not return rows matching condition 'null = null'
> --
>
> Key: CALCITE-5732
> URL: https://issues.apache.org/jira/browse/CALCITE-5732
> Project: Calcite
>  Issue Type: Bug
>  Components: linq4j
>Reporter: Viggo Chen
>Priority: Major
> Fix For: 1.35.0
>
>
> In `EnumerableJoinTest#testMergeJoinWithCompositeKeyAndNullValues`, the query 
> is like 
> {code:java}
> select 
>   emps.empid
> from 
>   emps a join emps b
> on a.deptno = b.deptno
> and a.commission = b.commission;{code}
> and the data is like 
> {code:java}
>   INSERT INTO "emps" VALUES (100, 10, 'Bill', 1, 1000);
>   INSERT INTO "emps" VALUES (200, 20, 'Eric', 8000, 500);
>   INSERT INTO "emps" VALUES (150, 10, 'Sebastian', 7000, null);
>   INSERT INTO "emps" VALUES (110, 10, 'Theodore', 11500, 250); {code}
> And row with empid = 150 is in expected result. Is this the expected result 
> of join with null condition.
> Whats more hash join result with condition a.deptno = b.deptno and 
> a.commission = b.commission is same as merge join. And if there is just one 
> condition a.commission = b.commission, the result do not include empid = 150.
>  
> Here is a unit test for it
> {code:java}
> @Test void testHashJoinWithCompositeKeyAndNullValues() {
>   // Both join side 'commission' a limited to null, so a.commission = 
> b.commission should always be false.
>   // So all columns in right table b are expected to be null, this sql should 
> result in 0 rows.
>   final String sql = "select * from\n"
>   + " (select empid, salary, commission from emps where commission is 
> null) as a\n"
>   + " left join\n"
>   + " (select empid, salary, commission from emps where commission is 
> null) as b\n"
>   + " on a.salary = b.salary and a.commission = b.commission\n"
>   + " where b.empid is not null";
>   CalciteAssert.that()
>   .with(CalciteConnectionProperty.LEX, Lex.JAVA)
>   .with(CalciteConnectionProperty.FORCE_DECORRELATE, false)
>   .withSchema("s", new ReflectiveSchema(new HrSchemaBig()))
>   .query(sql)
>   .withHook(Hook.PLANNER, (Consumer) planner -> {
> planner.removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE);
> planner.addRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE);
>   })
>   .explainContains("EnumerableHashJoin")
>   .returnsCount(0)
>   ;
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5732) EnumerableJoin should not return rows matching condition 'null = null'

2023-05-31 Thread Viggo Chen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17728189#comment-17728189
 ] 

Viggo Chen commented on CALCITE-5732:
-

[~rubenql] Yes, code to get join key is like 
{code:java}
final TKey outerKey = outerKeySelector.apply(outer);
if (outerKey == null) {
  innerEnumerable = null;
} else {...} {code}
If  join on simple key, outerKey can be null. If it's composite key, outerKey 
can never be null.

> EnumerableJoin should not return rows matching condition 'null = null'
> --
>
> Key: CALCITE-5732
> URL: https://issues.apache.org/jira/browse/CALCITE-5732
> Project: Calcite
>  Issue Type: Bug
>  Components: linq4j
>Reporter: Viggo Chen
>Priority: Major
> Fix For: 1.35.0
>
>
> In `EnumerableJoinTest#testMergeJoinWithCompositeKeyAndNullValues`, the query 
> is like 
> {code:java}
> select 
>   emps.empid
> from 
>   emps a join emps b
> on a.deptno = b.deptno
> and a.commission = b.commission;{code}
> and the data is like 
> {code:java}
>   INSERT INTO "emps" VALUES (100, 10, 'Bill', 1, 1000);
>   INSERT INTO "emps" VALUES (200, 20, 'Eric', 8000, 500);
>   INSERT INTO "emps" VALUES (150, 10, 'Sebastian', 7000, null);
>   INSERT INTO "emps" VALUES (110, 10, 'Theodore', 11500, 250); {code}
> And row with empid = 150 is in expected result. Is this the expected result 
> of join with null condition.
> Whats more hash join result with condition a.deptno = b.deptno and 
> a.commission = b.commission is same as merge join. And if there is just one 
> condition a.commission = b.commission, the result do not include empid = 150.
>  
> Here is a unit test for it
> {code:java}
> @Test void testHashJoinWithCompositeKeyAndNullValues() {
>   // Both join side 'commission' a limited to null, so a.commission = 
> b.commission should always be false.
>   // So all columns in right table b are expected to be null, this sql should 
> result in 0 rows.
>   final String sql = "select * from\n"
>   + " (select empid, salary, commission from emps where commission is 
> null) as a\n"
>   + " left join\n"
>   + " (select empid, salary, commission from emps where commission is 
> null) as b\n"
>   + " on a.salary = b.salary and a.commission = b.commission\n"
>   + " where b.empid is not null";
>   CalciteAssert.that()
>   .with(CalciteConnectionProperty.LEX, Lex.JAVA)
>   .with(CalciteConnectionProperty.FORCE_DECORRELATE, false)
>   .withSchema("s", new ReflectiveSchema(new HrSchemaBig()))
>   .query(sql)
>   .withHook(Hook.PLANNER, (Consumer) planner -> {
> planner.removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE);
> planner.addRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE);
>   })
>   .explainContains("EnumerableHashJoin")
>   .returnsCount(0)
>   ;
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5732) EnumerableJoin should not return rows matching condition 'null = null'

2023-05-31 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17728105#comment-17728105
 ] 

Ruben Q L commented on CALCITE-5732:


[~viggoc] just to confirm: the bug occurs with both HashJoin and MergeJoin, but 
only when the join condition is composite (if it is simple, then nulls are 
correctly handled)?

> EnumerableJoin should not return rows matching condition 'null = null'
> --
>
> Key: CALCITE-5732
> URL: https://issues.apache.org/jira/browse/CALCITE-5732
> Project: Calcite
>  Issue Type: Bug
>  Components: linq4j
>Reporter: Viggo Chen
>Priority: Major
> Fix For: 1.35.0
>
>
> In `EnumerableJoinTest#testMergeJoinWithCompositeKeyAndNullValues`, the query 
> is like 
> {code:java}
> select 
>   emps.empid
> from 
>   emps a join emps b
> on a.deptno = b.deptno
> and a.commission = b.commission;{code}
> and the data is like 
> {code:java}
>   INSERT INTO "emps" VALUES (100, 10, 'Bill', 1, 1000);
>   INSERT INTO "emps" VALUES (200, 20, 'Eric', 8000, 500);
>   INSERT INTO "emps" VALUES (150, 10, 'Sebastian', 7000, null);
>   INSERT INTO "emps" VALUES (110, 10, 'Theodore', 11500, 250); {code}
> And row with empid = 150 is in expected result. Is this the expected result 
> of join with null condition.
> Whats more hash join result with condition a.deptno = b.deptno and 
> a.commission = b.commission is same as merge join. And if there is just one 
> condition a.commission = b.commission, the result do not include empid = 150.
>  
> Here is a unit test for it
> {code:java}
> @Test void testHashJoinWithCompositeKeyAndNullValues() {
>   // Both join side 'commission' a limited to null, so a.commission = 
> b.commission should always be false.
>   // So all columns in right table b are expected to be null, this sql should 
> result in 0 rows.
>   final String sql = "select * from\n"
>   + " (select empid, salary, commission from emps where commission is 
> null) as a\n"
>   + " left join\n"
>   + " (select empid, salary, commission from emps where commission is 
> null) as b\n"
>   + " on a.salary = b.salary and a.commission = b.commission\n"
>   + " where b.empid is not null";
>   CalciteAssert.that()
>   .with(CalciteConnectionProperty.LEX, Lex.JAVA)
>   .with(CalciteConnectionProperty.FORCE_DECORRELATE, false)
>   .withSchema("s", new ReflectiveSchema(new HrSchemaBig()))
>   .query(sql)
>   .withHook(Hook.PLANNER, (Consumer) planner -> {
> planner.removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE);
> planner.addRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE);
>   })
>   .explainContains("EnumerableHashJoin")
>   .returnsCount(0)
>   ;
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5732) EnumerableJoin should not return rows matching condition 'null = null'

2023-05-30 Thread Viggo Chen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727757#comment-17727757
 ] 

Viggo Chen commented on CALCITE-5732:
-

[~julianhyde] The bug only appears when there are more than one join on 
condition.

 

> EnumerableJoin should not return rows matching condition 'null = null'
> --
>
> Key: CALCITE-5732
> URL: https://issues.apache.org/jira/browse/CALCITE-5732
> Project: Calcite
>  Issue Type: Bug
>  Components: linq4j
>Reporter: Viggo Chen
>Priority: Major
> Fix For: 1.35.0
>
>
> In `EnumerableJoinTest#testMergeJoinWithCompositeKeyAndNullValues`, the query 
> is like 
> {code:java}
> select 
>   emps.empid
> from 
>   emps a join emps b
> on a.deptno = b.deptno
> and a.commission = b.commission;{code}
> and the data is like 
> {code:java}
>   INSERT INTO "emps" VALUES (100, 10, 'Bill', 1, 1000);
>   INSERT INTO "emps" VALUES (200, 20, 'Eric', 8000, 500);
>   INSERT INTO "emps" VALUES (150, 10, 'Sebastian', 7000, null);
>   INSERT INTO "emps" VALUES (110, 10, 'Theodore', 11500, 250); {code}
> And row with empid = 150 is in expected result. Is this the expected result 
> of join with null condition.
> Whats more hash join result with condition a.deptno = b.deptno and 
> a.commission = b.commission is same as merge join. And if there is just one 
> condition a.commission = b.commission, the result do not include empid = 150.
>  
> Here is a unit test for it
> {code:java}
> @Test void testHashJoinWithCompositeKeyAndNullValues() {
>   // Both join side 'commission' a limited to null, so a.commission = 
> b.commission should always be false.
>   // So all columns in right table b are expected to be null, this sql should 
> result in 0 rows.
>   final String sql = "select * from\n"
>   + " (select empid, salary, commission from emps where commission is 
> null) as a\n"
>   + " left join\n"
>   + " (select empid, salary, commission from emps where commission is 
> null) as b\n"
>   + " on a.salary = b.salary and a.commission = b.commission\n"
>   + " where b.empid is not null";
>   CalciteAssert.that()
>   .with(CalciteConnectionProperty.LEX, Lex.JAVA)
>   .with(CalciteConnectionProperty.FORCE_DECORRELATE, false)
>   .withSchema("s", new ReflectiveSchema(new HrSchemaBig()))
>   .query(sql)
>   .withHook(Hook.PLANNER, (Consumer) planner -> {
> planner.removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE);
> planner.addRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE);
>   })
>   .explainContains("EnumerableHashJoin")
>   .returnsCount(0)
>   ;
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)