[jira] [Commented] (CALCITE-5953) AggregateCaseToFilterRule may make inaccurate SUM transformation
[ https://issues.apache.org/jira/browse/CALCITE-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758383#comment-17758383 ] Zoltan Haindrich commented on CALCITE-5953: --- I would also find it more natural if SUM would follow the mathematical definition (so that an empty sum is 0) I believe when `SUM` was placed next to the other aggregates like `MIN` they have made some generalizations. I was thinking about possible alternatives to fixing this: * compute it from multiple aggs {code:sql} consider: select SUM(CASE WHEN x=1 THEN 2 ELSE 0 END from t; select CASE WHEN s>0 THEN 2*s ELSE NULLIF(cnt,0) END from (select count(1) filter (where x=1) as s,count(1) as cnt from t) tt; or with ANY_VALUE: select CASE WHEN s>0 THEN 2*s ELSE a END from (select count(1) filter (where x=1) as s,any_value(0) as a from t) tt; {code} ** so we are exchanging 1 agg with a complex CASE into 2 simpler aggs + a CASE on top of it ** pro of this approach: *** it could be extended very easily to also cover the case when ELSE is not 0 *** possibly handle more branches for the same variable - at the cost of 1 more aggregation *** it could possibly extended to rewrite multi branch cases - but every branch will cost 1 more aggregation * consider only `SUM0` to optimize ** ProjectAggregateMergeRule could prepare SUM0 for this ** this fixes all above issues *** but also disables it from the more natural usages I'm still thinking about alternatives - let me know what you think about the above ; or if you have any other ideas! > AggregateCaseToFilterRule may make inaccurate SUM transformation > > > Key: CALCITE-5953 > URL: https://issues.apache.org/jira/browse/CALCITE-5953 > Project: Calcite > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > consider: {{sum(case when x = 1 then 2 else 0 end) as b}} > notice that this expression may only be null if there are no rows in the table > {{AggregateCaseToFilterRule}} rewrites the above expression to {{sum(1) > filter (where x=2)}} which broadens when it could be `null` to when there are > no matches to the filter > * *A* is *0* correctly in this case; but I think it will be still *0* in case > there are 0 input rows > * The result for *B* supposed to be *0* but since there are no matches by the > filter it becomes *null* > * *C* is not touched > {code} > # Convert CASE to FILTER without matches > select sum(case when x = 1 then 1 else 0 end) as a, > sum(case when x = 1 then 2 else 0 end) as b, > sum(case when x = 1 then 3 else -1 end) as c > from (values 0, null, 0, 2) as t(x); > +---+---++ > | A | B | C | > +---+---++ > | 0 | 0 | -4 | > +---+---++ > (1 row) > !ok > EnumerableCalc(expr#0..2=[{inputs}], expr#3=[CAST($t0):INTEGER], A=[$t3], > B=[$t1], C=[$t2]) > EnumerableAggregate(group=[{}], A=[COUNT() FILTER $1], B=[SUM($2) FILTER > $3], C=[SUM($0)]) > EnumerableCalc(expr#0=[{inputs}], expr#1=[1], expr#2=[=($t0, $t1)], > expr#3=[3], expr#4=[-1], expr#5=[CASE($t2, $t3, $t4)], expr#6=[IS TRUE($t2)], > expr#7=[2], $f2=[$t5], $f3=[$t6], $f4=[$t7], $f5=[$t6]) > EnumerableValues(tuples=[[{ 0 }, { null }, { 0 }, { 2 }]]) > !plan > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5953) AggregateCaseToFilterRule may make inaccurate SUM transformation
[ https://issues.apache.org/jira/browse/CALCITE-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758313#comment-17758313 ] Benchao Li commented on CALCITE-5953: - bq. I think the SQL designers made a mistake when they decided the SUM over 0 rows should return NULL [~julianhyde], I'm not sure about this, per my understanding, {{NULL}} could exactly mean that all input is {{NULL}} (not necessarily 0 rows), but 0 can only mean that the sum of all values (maybe include zero or negative values) are 0. So returning {{NULL}} could give us more information than 0. Please correct me if I'm wrong. > AggregateCaseToFilterRule may make inaccurate SUM transformation > > > Key: CALCITE-5953 > URL: https://issues.apache.org/jira/browse/CALCITE-5953 > Project: Calcite > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > consider: {{sum(case when x = 1 then 2 else 0 end) as b}} > notice that this expression may only be null if there are no rows in the table > {{AggregateCaseToFilterRule}} rewrites the above expression to {{sum(1) > filter (where x=2)}} which broadens when it could be `null` to when there are > no matches to the filter > * *A* is *0* correctly in this case; but I think it will be still *0* in case > there are 0 input rows > * The result for *B* supposed to be *0* but since there are no matches by the > filter it becomes *null* > * *C* is not touched > {code} > # Convert CASE to FILTER without matches > select sum(case when x = 1 then 1 else 0 end) as a, > sum(case when x = 1 then 2 else 0 end) as b, > sum(case when x = 1 then 3 else -1 end) as c > from (values 0, null, 0, 2) as t(x); > +---+---++ > | A | B | C | > +---+---++ > | 0 | 0 | -4 | > +---+---++ > (1 row) > !ok > EnumerableCalc(expr#0..2=[{inputs}], expr#3=[CAST($t0):INTEGER], A=[$t3], > B=[$t1], C=[$t2]) > EnumerableAggregate(group=[{}], A=[COUNT() FILTER $1], B=[SUM($2) FILTER > $3], C=[SUM($0)]) > EnumerableCalc(expr#0=[{inputs}], expr#1=[1], expr#2=[=($t0, $t1)], > expr#3=[3], expr#4=[-1], expr#5=[CASE($t2, $t3, $t4)], expr#6=[IS TRUE($t2)], > expr#7=[2], $f2=[$t5], $f3=[$t6], $f4=[$t7], $f5=[$t6]) > EnumerableValues(tuples=[[{ 0 }, { null }, { 0 }, { 2 }]]) > !plan > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5950) Default column constraint is erroneously processed.
[ https://issues.apache.org/jira/browse/CALCITE-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758261#comment-17758261 ] Julian Hyde commented on CALCITE-5950: -- Which issues do you think are false positives? We have a fairly low level of flakiness (and yes: error-prone, checker-framework, autostyle, checkstyle, and lint are pedantic but we like it that way). > Default column constraint is erroneously processed. > --- > > Key: CALCITE-5950 > URL: https://issues.apache.org/jira/browse/CALCITE-5950 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Evgeny Stanilovsky >Assignee: Evgeny Stanilovsky >Priority: Major > Labels: pull-request-available > > II change table.iq a bit and found a problem with processing default column > constraint: > {code:java} > create table tdef (i int not null, j int default 100); > (0 rows modified) > !update > insert into tdef values (1, DEFAULT); > (1 row modified) > !update > insert into tdef(i) values (2); > (1 row modified) > !update > select * from tdef order by i; > +---+-+ > | I | J | > +---+-+ > | 1 | 100 | > | 2 | 100 | > +---+-+ > (2 rows) > !ok > but obtain from calcite: > +---+-+ > | I | J | > +---+-+ > | 1 | | > | 2 | 100 | > +---+-+ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-5914) Cache compiled regular expressions in SQL function runtime
[ https://issues.apache.org/jira/browse/CALCITE-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CALCITE-5914: Labels: pull-request-available (was: ) > Cache compiled regular expressions in SQL function runtime > -- > > Key: CALCITE-5914 > URL: https://issues.apache.org/jira/browse/CALCITE-5914 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > Labels: pull-request-available > > Cache compiled regular expressions (and other amortized work) in SQL function > runtime. Compiling a regular expression to a pattern is expensive (compared > to the cost of matching, given an existing pattern) and therefore caching the > compiled form will yield performance benefits if the regular expression is > constant or has a small number of values. > Consider the following query: > {code:java} > SELECT ename, job, RLIKE(ename, 'A.*'), RLIKE(ename, job || '.*') > FROM emp > {code} > The first regular expression, '{{A.\*}}', is constant and can be compiled at > prepare time or at the start of execution; the second regular expression, > {{job || '.\*'}}, might vary from one row to the next. However if the {{job}} > column has a small number of values it still might be beneficial to cache the > compiled regular expression. > If {{SqlFunctions.rlike}} could use a cache (mapping from {{String}} to > {{{}java.util.regex.Pattern{}}}) then it would achieve benefits in both the > constant and non-constant cases. > The cache needs to: > * be thread-safe (in case queries are executing using multiple threads), > * return thread-safe objects (as is {{{}Pattern{}}}), > * have bounded space (so that a query doesn't blow memory with 1 million > distinct regular expressions), > * disposed after the query has terminated, > * (ideally) share with regexes of the same language in the same query, > * not conflict with regexes of different languages in the same query. > One possible implementation is to add an {{interface FunctionState}}, with > subclasses including {{class RegexpCache}}, and if argument 1 of a function > is a subclass of {{FunctionState}} the compiler would initialize the state in > the generated code. The function can rely on the state argument being > initialized, and being the same object from one call to the next. Example: > {code:java} > interface FunctionState { > } > class RegexpCache implements FunctionState { > final Cache cache = ...; > } > {code} > This change should install the cache for all applicable functions, including > LIKE, ILIKE, RLIKE, SIMILAR, posix regex, REGEXP_REPLACE (MySQL, Oracle), > REGEXP_CONTAINS (BigQuery), other BigQuery REGEXP_ functions, PARSE_URL, > JSON_REPLACE, PARSE_TIMESTAMP. > It can also be used for functions that have mutable state, e.g. {{RANDOM}} > with and without a seed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5953) AggregateCaseToFilterRule may make inaccurate SUM transformation
[ https://issues.apache.org/jira/browse/CALCITE-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758220#comment-17758220 ] Julian Hyde commented on CALCITE-5953: -- Good one. (I think the SQL designers made a mistake when they decided the SUM over 0 rows should return NULL, but we have to live with their decision.) > AggregateCaseToFilterRule may make inaccurate SUM transformation > > > Key: CALCITE-5953 > URL: https://issues.apache.org/jira/browse/CALCITE-5953 > Project: Calcite > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > consider: {{sum(case when x = 1 then 2 else 0 end) as b}} > notice that this expression may only be null if there are no rows in the table > {{AggregateCaseToFilterRule}} rewrites the above expression to {{sum(1) > filter (where x=2)}} which broadens when it could be `null` to when there are > no matches to the filter > * *A* is *0* correctly in this case; but I think it will be still *0* in case > there are 0 input rows > * The result for *B* supposed to be *0* but since there are no matches by the > filter it becomes *null* > * *C* is not touched > {code} > # Convert CASE to FILTER without matches > select sum(case when x = 1 then 1 else 0 end) as a, > sum(case when x = 1 then 2 else 0 end) as b, > sum(case when x = 1 then 3 else -1 end) as c > from (values 0, null, 0, 2) as t(x); > +---+---++ > | A | B | C | > +---+---++ > | 0 | 0 | -4 | > +---+---++ > (1 row) > !ok > EnumerableCalc(expr#0..2=[{inputs}], expr#3=[CAST($t0):INTEGER], A=[$t3], > B=[$t1], C=[$t2]) > EnumerableAggregate(group=[{}], A=[COUNT() FILTER $1], B=[SUM($2) FILTER > $3], C=[SUM($0)]) > EnumerableCalc(expr#0=[{inputs}], expr#1=[1], expr#2=[=($t0, $t1)], > expr#3=[3], expr#4=[-1], expr#5=[CASE($t2, $t3, $t4)], expr#6=[IS TRUE($t2)], > expr#7=[2], $f2=[$t5], $f3=[$t6], $f4=[$t7], $f5=[$t6]) > EnumerableValues(tuples=[[{ 0 }, { null }, { 0 }, { 2 }]]) > !plan > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5949) RexExecutable correct handling of invalid constant expressions
[ https://issues.apache.org/jira/browse/CALCITE-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758147#comment-17758147 ] Claude Brisson commented on CALCITE-5949: - Summary updated. > RexExecutable correct handling of invalid constant expressions > -- > > Key: CALCITE-5949 > URL: https://issues.apache.org/jira/browse/CALCITE-5949 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Claude Brisson >Priority: Major > Labels: pull-request-available > > While reducing, when encountering an invalid expression in the list of > constant expressions, RexExecutor is meant to return all initial expressions > unchanged. > It fails to do so, because already handled correct expressions have already > been added to the returned list, which can be greater than the input list. > For instance, when given the list \{ LN(2), LN(-2) }, the RexExecutor will > output a list of length 3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-5953) AggregateCaseToFilterRule may make inaccurate SUM transformation
[ https://issues.apache.org/jira/browse/CALCITE-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated CALCITE-5953: -- Description: consider: {{sum(case when x = 1 then 2 else 0 end) as b}} notice that this expression may only be null if there are no rows in the table {{AggregateCaseToFilterRule}} rewrites the above expression to {{sum(1) filter (where x=2)}} which broadens when it could be `null` to when there are no matches to the filter * *A* is *0* correctly in this case; but I think it will be still *0* in case there are 0 input rows * The result for *B* supposed to be *0* but since there are no matches by the filter it becomes *null* * *C* is not touched {code} # Convert CASE to FILTER without matches select sum(case when x = 1 then 1 else 0 end) as a, sum(case when x = 1 then 2 else 0 end) as b, sum(case when x = 1 then 3 else -1 end) as c from (values 0, null, 0, 2) as t(x); +---+---++ | A | B | C | +---+---++ | 0 | 0 | -4 | +---+---++ (1 row) !ok EnumerableCalc(expr#0..2=[{inputs}], expr#3=[CAST($t0):INTEGER], A=[$t3], B=[$t1], C=[$t2]) EnumerableAggregate(group=[{}], A=[COUNT() FILTER $1], B=[SUM($2) FILTER $3], C=[SUM($0)]) EnumerableCalc(expr#0=[{inputs}], expr#1=[1], expr#2=[=($t0, $t1)], expr#3=[3], expr#4=[-1], expr#5=[CASE($t2, $t3, $t4)], expr#6=[IS TRUE($t2)], expr#7=[2], $f2=[$t5], $f3=[$t6], $f4=[$t7], $f5=[$t6]) EnumerableValues(tuples=[[{ 0 }, { null }, { 0 }, { 2 }]]) !plan {code} was: consider: `sum(case when x = 1 then 2 else 0 end) as b` notice that this expression may only be null if there are no rows in the table `AggregateCaseToFilterRule` rewrites the above expression to `sum(1) filter (where x=2)` which broadens when it could be `null` to when there are no matches to the filter * `A` is `0` correctly in this case; but I think it will be still `0` in case there are 0 input rows * The result for `B` supposed to be `0` but since there are no matches by the filter it becomes `null` * `C` is not touched ``` # Convert CASE to FILTER without matches select sum(case when x = 1 then 1 else 0 end) as a, sum(case when x = 1 then 2 else 0 end) as b, sum(case when x = 1 then 3 else -1 end) as c from (values 0, null, 0, 2) as t(x); +---+---++ | A | B | C | +---+---++ | 0 | 0 | -4 | +---+---++ (1 row) !ok EnumerableCalc(expr#0..2=[{inputs}], expr#3=[CAST($t0):INTEGER], A=[$t3], B=[$t1], C=[$t2]) EnumerableAggregate(group=[{}], A=[COUNT() FILTER $1], B=[SUM($2) FILTER $3], C=[SUM($0)]) EnumerableCalc(expr#0=[{inputs}], expr#1=[1], expr#2=[=($t0, $t1)], expr#3=[3], expr#4=[-1], expr#5=[CASE($t2, $t3, $t4)], expr#6=[IS TRUE($t2)], expr#7=[2], $f2=[$t5], $f3=[$t6], $f4=[$t7], $f5=[$t6]) EnumerableValues(tuples=[[{ 0 }, { null }, { 0 }, { 2 }]]) !plan ``` > AggregateCaseToFilterRule may make inaccurate SUM transformation > > > Key: CALCITE-5953 > URL: https://issues.apache.org/jira/browse/CALCITE-5953 > Project: Calcite > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > consider: {{sum(case when x = 1 then 2 else 0 end) as b}} > notice that this expression may only be null if there are no rows in the table > {{AggregateCaseToFilterRule}} rewrites the above expression to {{sum(1) > filter (where x=2)}} which broadens when it could be `null` to when there are > no matches to the filter > * *A* is *0* correctly in this case; but I think it will be still *0* in case > there are 0 input rows > * The result for *B* supposed to be *0* but since there are no matches by the > filter it becomes *null* > * *C* is not touched > {code} > # Convert CASE to FILTER without matches > select sum(case when x = 1 then 1 else 0 end) as a, > sum(case when x = 1 then 2 else 0 end) as b, > sum(case when x = 1 then 3 else -1 end) as c > from (values 0, null, 0, 2) as t(x); > +---+---++ > | A | B | C | > +---+---++ > | 0 | 0 | -4 | > +---+---++ > (1 row) > !ok > EnumerableCalc(expr#0..2=[{inputs}], expr#3=[CAST($t0):INTEGER], A=[$t3], > B=[$t1], C=[$t2]) > EnumerableAggregate(group=[{}], A=[COUNT() FILTER $1], B=[SUM($2) FILTER > $3], C=[SUM($0)]) > EnumerableCalc(expr#0=[{inputs}], expr#1=[1], expr#2=[=($t0, $t1)], > expr#3=[3], expr#4=[-1], expr#5=[CASE($t2, $t3, $t4)], expr#6=[IS TRUE($t2)], > expr#7=[2], $f2=[$t5], $f3=[$t6], $f4=[$t7], $f5=[$t6]) > EnumerableValues(tuples=[[{ 0 }, { null }, { 0 }, { 2 }]]) > !plan > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-5953) AggregateCaseToFilterRule may make inaccurate SUM transformation
Zoltan Haindrich created CALCITE-5953: - Summary: AggregateCaseToFilterRule may make inaccurate SUM transformation Key: CALCITE-5953 URL: https://issues.apache.org/jira/browse/CALCITE-5953 Project: Calcite Issue Type: Bug Reporter: Zoltan Haindrich Assignee: Zoltan Haindrich consider: `sum(case when x = 1 then 2 else 0 end) as b` notice that this expression may only be null if there are no rows in the table `AggregateCaseToFilterRule` rewrites the above expression to `sum(1) filter (where x=2)` which broadens when it could be `null` to when there are no matches to the filter * `A` is `0` correctly in this case; but I think it will be still `0` in case there are 0 input rows * The result for `B` supposed to be `0` but since there are no matches by the filter it becomes `null` * `C` is not touched ``` # Convert CASE to FILTER without matches select sum(case when x = 1 then 1 else 0 end) as a, sum(case when x = 1 then 2 else 0 end) as b, sum(case when x = 1 then 3 else -1 end) as c from (values 0, null, 0, 2) as t(x); +---+---++ | A | B | C | +---+---++ | 0 | 0 | -4 | +---+---++ (1 row) !ok EnumerableCalc(expr#0..2=[{inputs}], expr#3=[CAST($t0):INTEGER], A=[$t3], B=[$t1], C=[$t2]) EnumerableAggregate(group=[{}], A=[COUNT() FILTER $1], B=[SUM($2) FILTER $3], C=[SUM($0)]) EnumerableCalc(expr#0=[{inputs}], expr#1=[1], expr#2=[=($t0, $t1)], expr#3=[3], expr#4=[-1], expr#5=[CASE($t2, $t3, $t4)], expr#6=[IS TRUE($t2)], expr#7=[2], $f2=[$t5], $f3=[$t6], $f4=[$t7], $f5=[$t6]) EnumerableValues(tuples=[[{ 0 }, { null }, { 0 }, { 2 }]]) !plan ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5950) Default column constraint is erroneously processed.
[ https://issues.apache.org/jira/browse/CALCITE-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758123#comment-17758123 ] Evgeny Stanilovsky commented on CALCITE-5950: - PR is ready for review, besides some CI / CheckerFramework issues, seems they are false positive, did i need to fix all of them ? > Default column constraint is erroneously processed. > --- > > Key: CALCITE-5950 > URL: https://issues.apache.org/jira/browse/CALCITE-5950 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Evgeny Stanilovsky >Assignee: Evgeny Stanilovsky >Priority: Major > Labels: pull-request-available > > II change table.iq a bit and found a problem with processing default column > constraint: > {code:java} > create table tdef (i int not null, j int default 100); > (0 rows modified) > !update > insert into tdef values (1, DEFAULT); > (1 row modified) > !update > insert into tdef(i) values (2); > (1 row modified) > !update > select * from tdef order by i; > +---+-+ > | I | J | > +---+-+ > | 1 | 100 | > | 2 | 100 | > +---+-+ > (2 rows) > !ok > but obtain from calcite: > +---+-+ > | I | J | > +---+-+ > | 1 | | > | 2 | 100 | > +---+-+ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-5950) Default column constraint is erroneously processed.
[ https://issues.apache.org/jira/browse/CALCITE-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CALCITE-5950: Labels: pull-request-available (was: ) > Default column constraint is erroneously processed. > --- > > Key: CALCITE-5950 > URL: https://issues.apache.org/jira/browse/CALCITE-5950 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Evgeny Stanilovsky >Assignee: Evgeny Stanilovsky >Priority: Major > Labels: pull-request-available > > II change table.iq a bit and found a problem with processing default column > constraint: > {code:java} > create table tdef (i int not null, j int default 100); > (0 rows modified) > !update > insert into tdef values (1, DEFAULT); > (1 row modified) > !update > insert into tdef(i) values (2); > (1 row modified) > !update > select * from tdef order by i; > +---+-+ > | I | J | > +---+-+ > | 1 | 100 | > | 2 | 100 | > +---+-+ > (2 rows) > !ok > but obtain from calcite: > +---+-+ > | I | J | > +---+-+ > | 1 | | > | 2 | 100 | > +---+-+ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758083#comment-17758083 ] Ruben Q L commented on CALCITE-5952: The rule-based test that you proposed is totally relevant and should be included. My idea was to complement it with another "result-based" test. I don't know if you can "force" the bug with a quidem test, maybe you can try. Otherwise, perhaps it is easier to reproduce the bug in somewhere like {{EnumerableJoinTest.java}}? (it already contains other RelBuilder-based join tests, some of them adding/removing specific rules, where the resulted rows are checked). > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758074#comment-17758074 ] Leonid Chistov commented on CALCITE-5952: - [~rubenql] Can you suggest which Calcite infrastructure to use to write such test (or where to look for the analogous examples) that will do the planning and execute the plan? I have tried to look on Quidem tests, but they seem to be more frontend-oriented, I didn't find a way to make them run with specific optimization enabled. > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CALCITE-5909) SqlParserTest.testNoUnintendedNewReservedKeywords fails in IDE while passes in command line
[ https://issues.apache.org/jira/browse/CALCITE-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757960#comment-17757960 ] Ran Tao edited comment on CALCITE-5909 at 8/23/23 12:02 PM: [~libenchao] thanks. yes, this case always fail. agree with you. I have changed the commit name and Jira name. was (Author: lemonjing): [~libenchao] thanks. yes, this case always fail. agree with you. > SqlParserTest.testNoUnintendedNewReservedKeywords fails in IDE while passes > in command line > --- > > Key: CALCITE-5909 > URL: https://issues.apache.org/jira/browse/CALCITE-5909 > Project: Calcite > Issue Type: Bug > Components: tests >Reporter: LakeShen >Assignee: Ran Tao >Priority: Minor > Labels: pull-request-available > Attachments: image-2023-08-08-23-32-55-466.png > > > When I run the SqlParserTest,the testNoUnintendedNewReservedKeywords method > failed,the exception like this: > {code:java} > java.lang.AssertionError: The parser has at least one new reserved keyword. > Are you sure it should be reserved? Difference: {code} > The picture like this: > !image-2023-08-08-23-32-55-466.png|width=1543,height=496! > I could fix this problem.More importantly, why is this method failing, but > the Calcite pipeline is passing? I think we should look at something we missed -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-5909) SqlParserTest.testNoUnintendedNewReservedKeywords fails in IDE while passes in command line
[ https://issues.apache.org/jira/browse/CALCITE-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ran Tao updated CALCITE-5909: - Summary: SqlParserTest.testNoUnintendedNewReservedKeywords fails in IDE while passes in command line (was: Sometimes SqlParserTest.testNoUnintendedNewReservedKeywords fails in the IDE but passes when run from the command line) > SqlParserTest.testNoUnintendedNewReservedKeywords fails in IDE while passes > in command line > --- > > Key: CALCITE-5909 > URL: https://issues.apache.org/jira/browse/CALCITE-5909 > Project: Calcite > Issue Type: Bug > Components: tests >Reporter: LakeShen >Assignee: Ran Tao >Priority: Minor > Labels: pull-request-available > Attachments: image-2023-08-08-23-32-55-466.png > > > When I run the SqlParserTest,the testNoUnintendedNewReservedKeywords method > failed,the exception like this: > {code:java} > java.lang.AssertionError: The parser has at least one new reserved keyword. > Are you sure it should be reserved? Difference: {code} > The picture like this: > !image-2023-08-08-23-32-55-466.png|width=1543,height=496! > I could fix this problem.More importantly, why is this method failing, but > the Calcite pipeline is passing? I think we should look at something we missed -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CALCITE-5909) Sometimes SqlParserTest.testNoUnintendedNewReservedKeywords fails in the IDE but passes when run from the command line
[ https://issues.apache.org/jira/browse/CALCITE-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757960#comment-17757960 ] Ran Tao edited comment on CALCITE-5909 at 8/23/23 10:51 AM: [~libenchao] thanks. yes, this case always fail. agree with you. was (Author: lemonjing): [~libenchao] thanks. agree with you. this case always fail. > Sometimes SqlParserTest.testNoUnintendedNewReservedKeywords fails in the IDE > but passes when run from the command line > -- > > Key: CALCITE-5909 > URL: https://issues.apache.org/jira/browse/CALCITE-5909 > Project: Calcite > Issue Type: Bug > Components: tests >Reporter: LakeShen >Assignee: Ran Tao >Priority: Minor > Labels: pull-request-available > Attachments: image-2023-08-08-23-32-55-466.png > > > When I run the SqlParserTest,the testNoUnintendedNewReservedKeywords method > failed,the exception like this: > {code:java} > java.lang.AssertionError: The parser has at least one new reserved keyword. > Are you sure it should be reserved? Difference: {code} > The picture like this: > !image-2023-08-08-23-32-55-466.png|width=1543,height=496! > I could fix this problem.More importantly, why is this method failing, but > the Calcite pipeline is passing? I think we should look at something we missed -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5909) Sometimes SqlParserTest.testNoUnintendedNewReservedKeywords fails in the IDE but passes when run from the command line
[ https://issues.apache.org/jira/browse/CALCITE-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757960#comment-17757960 ] Ran Tao commented on CALCITE-5909: -- [~libenchao] thanks. agree with you. this case always fail. > Sometimes SqlParserTest.testNoUnintendedNewReservedKeywords fails in the IDE > but passes when run from the command line > -- > > Key: CALCITE-5909 > URL: https://issues.apache.org/jira/browse/CALCITE-5909 > Project: Calcite > Issue Type: Bug > Components: tests >Reporter: LakeShen >Assignee: Ran Tao >Priority: Minor > Labels: pull-request-available > Attachments: image-2023-08-08-23-32-55-466.png > > > When I run the SqlParserTest,the testNoUnintendedNewReservedKeywords method > failed,the exception like this: > {code:java} > java.lang.AssertionError: The parser has at least one new reserved keyword. > Are you sure it should be reserved? Difference: {code} > The picture like this: > !image-2023-08-08-23-32-55-466.png|width=1543,height=496! > I could fix this problem.More importantly, why is this method failing, but > the Calcite pipeline is passing? I think we should look at something we missed -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757957#comment-17757957 ] Ruben Q L commented on CALCITE-5952: I meant showing e.g. a query/plan with employee, department, bonus values, with the rows returned by the problematic plan vs the rows that should be returned. > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757955#comment-17757955 ] Leonid Chistov edited comment on CALCITE-5952 at 8/23/23 10:41 AM: --- [~rubenql] The expected result is mentioned in issue description (I have called it "original plan", meaning that plan that is constructed by RelBuilder is not supposed to be changed by SEMI_JOIN_JOIN_TRANSPOSE): {code:java} LogicalJoin(condition=[=($9, $12)], joinType=[semi]) LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} was (Author: JIRAUSER298393): [~rubenql] The expected result is mentioned in issue description (I have called it "original plan", meaning that plan that is constructed by RelBuilder is not supposed to be changed by SEMI_JOIN_JOIN_TRANSPOSE): {code:java} LogicalJoin(condition=[=($9, $12)], joinType=[semi]) LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757955#comment-17757955 ] Leonid Chistov edited comment on CALCITE-5952 at 8/23/23 10:41 AM: --- [~rubenql] The expected result is mentioned in issue description (I have called it "original plan", meaning that plan that is constructed by RelBuilder is not supposed to be changed by SEMI_JOIN_JOIN_TRANSPOSE): {code:java} LogicalJoin(condition=[=($9, $12)], joinType=[semi]) LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} was (Author: JIRAUSER298393): [~rubenql] The expected result is mentioned in issue description (I have called it "original plan", meaning that plan that is constructed by RelBuilder is not supposed to be changed by SEMI_JOIN_JOIN_TRANSPOSE): {code:java} LogicalJoin(condition=[=($9, $12)], joinType=[semi]) LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757955#comment-17757955 ] Leonid Chistov commented on CALCITE-5952: - [~rubenql] The expected result is mentioned in issue description (I have called it "original plan", meaning that plan that is constructed by RelBuilder is not supposed to be changed by SEMI_JOIN_JOIN_TRANSPOSE): {code:java} LogicalJoin(condition=[=($9, $12)], joinType=[semi]) LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757919#comment-17757919 ] Ruben Q L commented on CALCITE-5952: Would it be possible to have a test showing the expected results vs actual results? > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > This is not correct - in general case it is not correct to push semi-join to > right side of left-join. > The reason is the following: > Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows > will have *nulls* for *DEPT* columns in the result of left-join and they will > be rejected by the top semi-join. > But if we push semi-join to RHS of left-join, we are going to see rows from > *EMP* with *nulls* on the *DEPT* side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5732) EnumerableHashJoin and EnumerableMergeJoin on composite key return rows matching condition 'null = null'
[ https://issues.apache.org/jira/browse/CALCITE-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757918#comment-17757918 ] Ruben Q L commented on CALCITE-5732: If someone could please help me out reviewing the PR, I'd really like to correct this issue for the next release. Thanks. > EnumerableHashJoin and EnumerableMergeJoin on composite key return rows > matching condition 'null = null' > > > Key: CALCITE-5732 > URL: https://issues.apache.org/jira/browse/CALCITE-5732 > Project: Calcite > Issue Type: Bug > Components: core, linq4j >Reporter: Viggo Chen >Assignee: Ruben Q L >Priority: Major > Labels: pull-request-available > Fix For: 1.36.0 > > > In `EnumerableJoinTest#testMergeJoinWithCompositeKeyAndNullValues`, the query > is like > {code:java} > select > emps.empid > from > emps a join emps b > on a.deptno = b.deptno > and a.commission = b.commission;{code} > and the data is like > {code:java} > INSERT INTO "emps" VALUES (100, 10, 'Bill', 1, 1000); > INSERT INTO "emps" VALUES (200, 20, 'Eric', 8000, 500); > INSERT INTO "emps" VALUES (150, 10, 'Sebastian', 7000, null); > INSERT INTO "emps" VALUES (110, 10, 'Theodore', 11500, 250); {code} > And row with empid = 150 is in expected result. Is this the expected result > of join with null condition. > Whats more hash join result with condition a.deptno = b.deptno and > a.commission = b.commission is same as merge join. And if there is just one > condition a.commission = b.commission, the result do not include empid = 150. > > Here is a unit test for it > {code:java} > @Test void testHashJoinWithCompositeKeyAndNullValues() { > // Both join side 'commission' a limited to null, so a.commission = > b.commission should always be false. > // So all columns in right table b are expected to be null, this sql should > result in 0 rows. > final String sql = "select * from\n" > + " (select empid, salary, commission from emps where commission is > null) as a\n" > + " left join\n" > + " (select empid, salary, commission from emps where commission is > null) as b\n" > + " on a.salary = b.salary and a.commission = b.commission\n" > + " where b.empid is not null"; > CalciteAssert.that() > .with(CalciteConnectionProperty.LEX, Lex.JAVA) > .with(CalciteConnectionProperty.FORCE_DECORRELATE, false) > .withSchema("s", new ReflectiveSchema(new HrSchemaBig())) > .query(sql) > .withHook(Hook.PLANNER, (Consumer) planner -> { > planner.removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE); > planner.addRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE); > }) > .explainContains("EnumerableHashJoin") > .returnsCount(0) > ; > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leonid Chistov updated CALCITE-5952: Description: The following test will fail if added to RelOptRulesTest.java {code:java} @Test void testCanNotPushSemiJoinToRightJoinBranch() { final Function relFn = b -> b .scan("EMP") .scan("DEPT") .join(JoinRelType.LEFT, b.equals( b.field(2, 0, "DEPTNO"), b.field(2, 1, "DEPTNO")) ) .scan("BONUS") // semi join only relates to RHS fields of left join .semiJoin(b.equals( b.field(2, 0, "DNAME"), b.field(2, 1, "JOB"))) .build(); relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); } {code} Produced plan will look like: {code:java} LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalJoin(condition=[=($1, $4)], joinType=[semi]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} Which is different from the original plan: {code:java} LogicalJoin(condition=[=($9, $12)], joinType=[semi]) LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} This is not correct - in general case it is not correct to push semi-join to right side of left-join. The reason is the following: Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows will have *nulls* for *DEPT* columns in the result of left-join and they will be rejected by the top semi-join. But if we push semi-join to RHS of left-join, we are going to see rows from *EMP* with *nulls* on the *DEPT* side in the final result. was: The following test will fail if added to RelOptRulesTest.java {code:java} @Test void testCanNotPushSemiJoinToRightJoinBranch() { final Function relFn = b -> b .scan("EMP") .scan("DEPT") .join(JoinRelType.LEFT, b.equals( b.field(2, 0, "DEPTNO"), b.field(2, 1, "DEPTNO")) ) .scan("BONUS") .semiJoin(b.equals( b.field(2, 0, "DNAME"), b.field(2, 1, "JOB"))) .build(); relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); } {code} Produced plan will look like: {code:java} LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalJoin(condition=[=($1, $4)], joinType=[semi]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} Which is different from the original plan: {code:java} LogicalJoin(condition=[=($9, $12)], joinType=[semi]) LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} This is not correct - in general case it is not correct to push semi-join to right side of left-join. The reason is the following: Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows will have *nulls* for *DEPT* columns in the result of left-join and they will be rejected by the top semi-join. But if we push semi-join to RHS of left-join, we are going to see rows from *EMP* with *nulls* on the *DEPT* side in the final result. > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > // semi join only relates to RHS fields of left join > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)]
[jira] [Updated] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
[ https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leonid Chistov updated CALCITE-5952: Description: The following test will fail if added to RelOptRulesTest.java {code:java} @Test void testCanNotPushSemiJoinToRightJoinBranch() { final Function relFn = b -> b .scan("EMP") .scan("DEPT") .join(JoinRelType.LEFT, b.equals( b.field(2, 0, "DEPTNO"), b.field(2, 1, "DEPTNO")) ) .scan("BONUS") .semiJoin(b.equals( b.field(2, 0, "DNAME"), b.field(2, 1, "JOB"))) .build(); relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); } {code} Produced plan will look like: {code:java} LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalJoin(condition=[=($1, $4)], joinType=[semi]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} Which is different from the original plan: {code:java} LogicalJoin(condition=[=($9, $12)], joinType=[semi]) LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} This is not correct - in general case it is not correct to push semi-join to right side of left-join. The reason is the following: Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows will have *nulls* for *DEPT* columns in the result of left-join and they will be rejected by the top semi-join. But if we push semi-join to RHS of left-join, we are going to see rows from *EMP* with *nulls* on the *DEPT* side in the final result. was: The following test will fail if added to RelOptRulesTest.java {code:java} @Test void testCanNotPushSemiJoinToRightJoinBranch() { final Function relFn = b -> b .scan("EMP") .scan("DEPT") .join(JoinRelType.LEFT, b.equals( b.field(2, 0, "DEPTNO"), b.field(2, 1, "DEPTNO")) ) .scan("BONUS") .semiJoin(b.equals( b.field(2, 0, "DNAME"), b.field(2, 1, "JOB"))) .build(); relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); } {code} Produced plan will look like: {code:java} LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalJoin(condition=[=($1, $4)], joinType=[semi]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} Which is different from the original plan: {code:java} LogicalJoin(condition=[=($9, $12)], joinType=[semi]) LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} This is not correct - in general case it is not correct to push semi-join to right side of left-join. The reason is the following: Consider rows from `EMP` that have no matching rows in DEPT. These rows will have `nulls` for `DEPT` columns in the result of left-join and they will be rejected by the top semi-join. But if we push semi-join to RHS of left-join, we are going to see rows from `EMP` with `nulls` on the `DEPT` side in the final result. > Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule > --- > > Key: CALCITE-5952 > URL: https://issues.apache.org/jira/browse/CALCITE-5952 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.35.0 >Reporter: Leonid Chistov >Assignee: Leonid Chistov >Priority: Major > > The following test will fail if added to RelOptRulesTest.java > {code:java} > @Test void testCanNotPushSemiJoinToRightJoinBranch() { > final Function relFn = b -> b > .scan("EMP") > .scan("DEPT") > .join(JoinRelType.LEFT, > b.equals( > b.field(2, 0, "DEPTNO"), > b.field(2, 1, "DEPTNO")) > ) > .scan("BONUS") > .semiJoin(b.equals( > b.field(2, 0, "DNAME"), > b.field(2, 1, "JOB"))) > .build(); > relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); > } {code} > Produced plan will look like: > {code:java} > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) > LogicalJoin(condition=[=($1, $4)], joinType=[semi]) > LogicalTableScan(table=[[scott, DEPT]]) > LogicalTableScan(table=[[scott, BONUS]]) {code} > Which is different from the original plan: > {code:java} > LogicalJoin(condition=[=($9, $12)], joinType=[semi]) > LogicalJoin(condition=[=($7, $8)], joinType=[left]) > LogicalTableScan(table=[[scott, EMP]]) >
[jira] [Created] (CALCITE-5952) Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
Leonid Chistov created CALCITE-5952: --- Summary: Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule Key: CALCITE-5952 URL: https://issues.apache.org/jira/browse/CALCITE-5952 Project: Calcite Issue Type: Bug Affects Versions: 1.35.0 Reporter: Leonid Chistov Assignee: Leonid Chistov The following test will fail if added to RelOptRulesTest.java {code:java} @Test void testCanNotPushSemiJoinToRightJoinBranch() { final Function relFn = b -> b .scan("EMP") .scan("DEPT") .join(JoinRelType.LEFT, b.equals( b.field(2, 0, "DEPTNO"), b.field(2, 1, "DEPTNO")) ) .scan("BONUS") .semiJoin(b.equals( b.field(2, 0, "DNAME"), b.field(2, 1, "JOB"))) .build(); relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged(); } {code} Produced plan will look like: {code:java} LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalJoin(condition=[=($1, $4)], joinType=[semi]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} Which is different from the original plan: {code:java} LogicalJoin(condition=[=($9, $12)], joinType=[semi]) LogicalJoin(condition=[=($7, $8)], joinType=[left]) LogicalTableScan(table=[[scott, EMP]]) LogicalTableScan(table=[[scott, DEPT]]) LogicalTableScan(table=[[scott, BONUS]]) {code} This is not correct - in general case it is not correct to push semi-join to right side of left-join. The reason is the following: Consider rows from `EMP` that have no matching rows in DEPT. These rows will have `nulls` for `DEPT` columns in the result of left-join and they will be rejected by the top semi-join. But if we push semi-join to RHS of left-join, we are going to see rows from `EMP` with `nulls` on the `DEPT` side in the final result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5909) Sometimes SqlParserTest.testNoUnintendedNewReservedKeywords fails in the IDE but passes when run from the command line
[ https://issues.apache.org/jira/browse/CALCITE-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757903#comment-17757903 ] Benchao Li commented on CALCITE-5909: - [~taoran] I see used a different commit message from the Jira title, usually we keep them the same, concisely describe the symptoms or what should be the expected behavior. I presume that "Sometimes" is not needed, it should "always" fail if you run it in IDE directly, am I right? Then I propose to change the Jira title and commit message to "SqlParserTest.testNoUnintendedNewReservedKeywords fails in IDE while passes in command line", if you and others agree, I'll go on and merge it with this message. > Sometimes SqlParserTest.testNoUnintendedNewReservedKeywords fails in the IDE > but passes when run from the command line > -- > > Key: CALCITE-5909 > URL: https://issues.apache.org/jira/browse/CALCITE-5909 > Project: Calcite > Issue Type: Bug > Components: tests >Reporter: LakeShen >Assignee: Ran Tao >Priority: Minor > Labels: pull-request-available > Attachments: image-2023-08-08-23-32-55-466.png > > > When I run the SqlParserTest,the testNoUnintendedNewReservedKeywords method > failed,the exception like this: > {code:java} > java.lang.AssertionError: The parser has at least one new reserved keyword. > Are you sure it should be reserved? Difference: {code} > The picture like this: > !image-2023-08-08-23-32-55-466.png|width=1543,height=496! > I could fix this problem.More importantly, why is this method failing, but > the Calcite pipeline is passing? I think we should look at something we missed -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-5732) EnumerableHashJoin and EnumerableMergeJoin on composite key return rows matching condition 'null = null'
[ https://issues.apache.org/jira/browse/CALCITE-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruben Q L updated CALCITE-5732: --- Component/s: core > EnumerableHashJoin and EnumerableMergeJoin on composite key return rows > matching condition 'null = null' > > > Key: CALCITE-5732 > URL: https://issues.apache.org/jira/browse/CALCITE-5732 > Project: Calcite > Issue Type: Bug > Components: core, linq4j >Reporter: Viggo Chen >Assignee: Ruben Q L >Priority: Major > Labels: pull-request-available > Fix For: 1.36.0 > > > In `EnumerableJoinTest#testMergeJoinWithCompositeKeyAndNullValues`, the query > is like > {code:java} > select > emps.empid > from > emps a join emps b > on a.deptno = b.deptno > and a.commission = b.commission;{code} > and the data is like > {code:java} > INSERT INTO "emps" VALUES (100, 10, 'Bill', 1, 1000); > INSERT INTO "emps" VALUES (200, 20, 'Eric', 8000, 500); > INSERT INTO "emps" VALUES (150, 10, 'Sebastian', 7000, null); > INSERT INTO "emps" VALUES (110, 10, 'Theodore', 11500, 250); {code} > And row with empid = 150 is in expected result. Is this the expected result > of join with null condition. > Whats more hash join result with condition a.deptno = b.deptno and > a.commission = b.commission is same as merge join. And if there is just one > condition a.commission = b.commission, the result do not include empid = 150. > > Here is a unit test for it > {code:java} > @Test void testHashJoinWithCompositeKeyAndNullValues() { > // Both join side 'commission' a limited to null, so a.commission = > b.commission should always be false. > // So all columns in right table b are expected to be null, this sql should > result in 0 rows. > final String sql = "select * from\n" > + " (select empid, salary, commission from emps where commission is > null) as a\n" > + " left join\n" > + " (select empid, salary, commission from emps where commission is > null) as b\n" > + " on a.salary = b.salary and a.commission = b.commission\n" > + " where b.empid is not null"; > CalciteAssert.that() > .with(CalciteConnectionProperty.LEX, Lex.JAVA) > .with(CalciteConnectionProperty.FORCE_DECORRELATE, false) > .withSchema("s", new ReflectiveSchema(new HrSchemaBig())) > .query(sql) > .withHook(Hook.PLANNER, (Consumer) planner -> { > planner.removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE); > planner.addRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE); > }) > .explainContains("EnumerableHashJoin") > .returnsCount(0) > ; > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)