[jira] [Commented] (CALCITE-5907) Unexpected boolean expression simplification for And expression
[ https://issues.apache.org/jira/browse/CALCITE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753026#comment-17753026 ] Yunhong Zheng commented on CALCITE-5907: Hi, [~kgyrtkirk] and [~jhyde]. Sorry for the late reply. Thanks for the explanation. Now, I agree with your answering. As I enable implicit type coercion in calcite, this example query will get a right plan. The current issue in Flink is that Flink doesn't support implicit type coercion and doesn't prevent type mismatches occurs during the validate phase, which will result in an error. I think I need to solve this problem from the perspective of Flink. This issue can be closed, Thanks a lot. > Unexpected boolean expression simplification for And expression > --- > > Key: CALCITE-5907 > URL: https://issues.apache.org/jira/browse/CALCITE-5907 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Yunhong Zheng >Priority: Major > Fix For: 1.36.0 > > > As FLINK-27402 shown. If we have a table MyTable(a Int, b Boolean, c > String). Calcite will not simplify this case ( c is Varchar type while > SqlLiteral is boolean): > {code:java} > SELECT * FROM MyTable WHERE c = true;{code} > As the logical plan is : > {code:java} > LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c]) > +- LogicalProject(inputs=[0..2]) > +- LogicalFilter(condition=[=($2, true)]) > +- LogicalTableScan(table=[[default_catalog, default_database, > MyTable]]){code} > However, Calcite will simplify this case while simplifyAnd : > {code:java} > SELECT * FROM MyTable WHERE b = true and c = true;{code} > As the logical plan is shown below: 'b = true' and 'c = true' both were > simplified to 'b' and 'c': > {code:java} > LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c]) > +- LogicalProject(inputs=[0..2]) > +- LogicalFilter(condition=[AND($1, $2)]) > +- LogicalTableScan(table=[[default_catalog, default_database, > MyTable]]){code} > This may cause error because of filter condition is a Varchar type literal. > > After reading Calcite's code, I found that. The logic of > RexSimplify.implify() and RexSimplify.implifyAnd() is different, where the > logic of RexSimplify.implifyAnd() is problematic: > {code:java} > // Simplify BOOLEAN expressions if possible > while (term.getKind() == SqlKind.EQUALS) { > RexCall call = (RexCall) term; > if (call.getOperands().get(0).isAlwaysTrue()) { > term = call.getOperands().get(1); > terms.set(i, term); > continue; > } else if (call.getOperands().get(1).isAlwaysTrue()) { > term = call.getOperands().get(0); > terms.set(i, term); > continue; > } > break; > } {code} > The above code cannot make such a simple judgment, as there may not be an > implicit conversion to ensure that the types on both sides of the condition > are consistent. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-5907) Unexpected boolean expression simplification for And expression
[ https://issues.apache.org/jira/browse/CALCITE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yunhong Zheng updated CALCITE-5907: --- Description: As FLINK-27402 shown. If we have a table MyTable(a Int, b Boolean, c String). Calcite will not simplify this case ( c is Varchar type while SqlLiteral is boolean): {code:java} SELECT * FROM MyTable WHERE c = true;{code} As the logical plan is : {code:java} LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c]) +- LogicalProject(inputs=[0..2]) +- LogicalFilter(condition=[=($2, true)]) +- LogicalTableScan(table=[[default_catalog, default_database, MyTable]]){code} However, Calcite will simplify this case while simplifyAnd : {code:java} SELECT * FROM MyTable WHERE b = true and c = true;{code} As the logical plan is shown below: 'b = true' and 'c = true' both were simplified to 'b' and 'c': {code:java} LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c]) +- LogicalProject(inputs=[0..2]) +- LogicalFilter(condition=[AND($1, $2)]) +- LogicalTableScan(table=[[default_catalog, default_database, MyTable]]){code} This may cause error because of filter condition is a Varchar type literal. After reading Calcite's code, I found that. The logic of RexSimplify.implify() and RexSimplify.implifyAnd() is different, where the logic of RexSimplify.implifyAnd() is problematic: {code:java} // Simplify BOOLEAN expressions if possible while (term.getKind() == SqlKind.EQUALS) { RexCall call = (RexCall) term; if (call.getOperands().get(0).isAlwaysTrue()) { term = call.getOperands().get(1); terms.set(i, term); continue; } else if (call.getOperands().get(1).isAlwaysTrue()) { term = call.getOperands().get(0); terms.set(i, term); continue; } break; } {code} The above code cannot make such a simple judgment, as there may not be an implicit conversion to ensure that the types on both sides of the condition are consistent. was: As [FLINK-27402|https://issues.apache.org/jira/browse/FLINK-27402] shown. If we have a table MyTable(a Int, b Boolean, c String). Calcite will not simplify this case ( c is Varchar type while SqlLiteral is boolean): {code:java} SELECT * FROM MyTable WHERE c = true;{code} As the logical plan is : {code:java} LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c]) +- LogicalProject(inputs=[0..2]) +- LogicalFilter(condition=[=($2, true)]) +- LogicalTableScan(table=[[default_catalog, default_database, MyTable]]){code} However, Calcite will simplify this case while simplifyAnd : {code:java} SELECT * FROM MyTable WHERE b = true and c = true;{code} As the logical plan is shown below: 'b = true' and 'c = true' both were simplified to 'b' and 'c': {code:java} LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c]) +- LogicalProject(inputs=[0..2]) +- LogicalFilter(condition=[AND($1, $2)]) +- LogicalTableScan(table=[[default_catalog, default_database, MyTable]]){code} This may cause error because of filter condition is a Varchar type literal. After reading Calcite's code, I found that. The logic of RexSimplify.implify() and RexSimplify.implifyAnd() is different, where the logic of RexSimplify.implifyAnd() is problematic: {code:java} // Simplify BOOLEAN expressions if possible while (term.getKind() == SqlKind.EQUALS) { RexCall call = (RexCall) term; if (call.getOperands().get(0).isAlwaysTrue()) { term = call.getOperands().get(1); terms.set(i, term); continue; } else if (call.getOperands().get(1).isAlwaysTrue()) { term = call.getOperands().get(0); terms.set(i, term); continue; } break; } {code} The above code cannot make such a simple judgment, as there may not be an implicit conversion to ensure that the types on both sides of the condition are consistent. > Unexpected boolean expression simplification for And expression > --- > > Key: CALCITE-5907 > URL: https://issues.apache.org/jira/browse/CALCITE-5907 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Yunhong Zheng >Priority: Major > Fix For: 1.36.0 > > > As FLINK-27402 shown. If we have a table MyTable(a Int, b Boolean, c > String). Calcite will not simplify this case ( c is Varchar type while > SqlLiteral is boolean): > {code:java} > SELECT * FROM MyTable WHERE c = true;{code} > As the logical plan is : > {code:java} > LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c]) > +- LogicalProject(inputs=[0..2]) > +- LogicalFilter(condition=[=($2, true)]) > +- LogicalTableScan(table=[[default_catalog, default_database, > MyTable]]){code} > However, Calcite will simplify this case while simplifyAnd : > {code:java} > SELECT * FROM MyTable WHERE b = true
[jira] [Commented] (CALCITE-5907) Unexpected boolean expression simplification for And expression
[ https://issues.apache.org/jira/browse/CALCITE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752022#comment-17752022 ] Yunhong Zheng commented on CALCITE-5907: Hi, [~julianhyde] . This looks like the calcite bug. Can you take a look, thanks! > Unexpected boolean expression simplification for And expression > --- > > Key: CALCITE-5907 > URL: https://issues.apache.org/jira/browse/CALCITE-5907 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Yunhong Zheng >Priority: Major > Fix For: 1.36.0 > > > As [FLINK-27402|https://issues.apache.org/jira/browse/FLINK-27402] shown. If > we have a table MyTable(a Int, b Boolean, c String). Calcite will not > simplify this case ( c is Varchar type while SqlLiteral is boolean): > > {code:java} > SELECT * FROM MyTable WHERE c = true;{code} > > As the logical plan is : > > {code:java} > LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c]) > +- LogicalProject(inputs=[0..2]) > +- LogicalFilter(condition=[=($2, true)]) > +- LogicalTableScan(table=[[default_catalog, default_database, > MyTable]]){code} > > However, Calcite will simplify this case while simplifyAnd : > > {code:java} > SELECT * FROM MyTable WHERE b = true and c = true;{code} > > As the logical plan is shown below: 'b = true' and 'c = true' both were > simplified to 'b' and 'c': > > {code:java} > LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c]) > +- LogicalProject(inputs=[0..2]) > +- LogicalFilter(condition=[AND($1, $2)]) > +- LogicalTableScan(table=[[default_catalog, default_database, > MyTable]]){code} > > This may cause error because of filter condition is a Varchar type literal. > > After reading Calcite's code, I found that. The logic of > RexSimplify.implify() and RexSimplify.implifyAnd() is different, where the > logic of RexSimplify.implifyAnd() is problematic: > {code:java} > // Simplify BOOLEAN expressions if possible > while (term.getKind() == SqlKind.EQUALS) { > RexCall call = (RexCall) term; > if (call.getOperands().get(0).isAlwaysTrue()) { > term = call.getOperands().get(1); > terms.set(i, term); > continue; > } else if (call.getOperands().get(1).isAlwaysTrue()) { > term = call.getOperands().get(0); > terms.set(i, term); > continue; > } > break; > } {code} > The above code cannot make such a simple judgment, as there may not be an > implicit conversion to ensure that the types on both sides of the condition > are consistent. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-5907) Unexpected boolean expression simplification for And expression
Yunhong Zheng created CALCITE-5907: -- Summary: Unexpected boolean expression simplification for And expression Key: CALCITE-5907 URL: https://issues.apache.org/jira/browse/CALCITE-5907 Project: Calcite Issue Type: Bug Components: core Affects Versions: 1.35.0 Reporter: Yunhong Zheng Fix For: 1.36.0 As [FLINK-27402|https://issues.apache.org/jira/browse/FLINK-27402] shown. If we have a table MyTable(a Int, b Boolean, c String). Calcite will not simplify this case ( c is Varchar type while SqlLiteral is boolean): {code:java} SELECT * FROM MyTable WHERE c = true;{code} As the logical plan is : {code:java} LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c]) +- LogicalProject(inputs=[0..2]) +- LogicalFilter(condition=[=($2, true)]) +- LogicalTableScan(table=[[default_catalog, default_database, MyTable]]){code} However, Calcite will simplify this case while simplifyAnd : {code:java} SELECT * FROM MyTable WHERE b = true and c = true;{code} As the logical plan is shown below: 'b = true' and 'c = true' both were simplified to 'b' and 'c': {code:java} LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c]) +- LogicalProject(inputs=[0..2]) +- LogicalFilter(condition=[AND($1, $2)]) +- LogicalTableScan(table=[[default_catalog, default_database, MyTable]]){code} This may cause error because of filter condition is a Varchar type literal. After reading Calcite's code, I found that. The logic of RexSimplify.implify() and RexSimplify.implifyAnd() is different, where the logic of RexSimplify.implifyAnd() is problematic: {code:java} // Simplify BOOLEAN expressions if possible while (term.getKind() == SqlKind.EQUALS) { RexCall call = (RexCall) term; if (call.getOperands().get(0).isAlwaysTrue()) { term = call.getOperands().get(1); terms.set(i, term); continue; } else if (call.getOperands().get(1).isAlwaysTrue()) { term = call.getOperands().get(0); terms.set(i, term); continue; } break; } {code} The above code cannot make such a simple judgment, as there may not be an implicit conversion to ensure that the types on both sides of the condition are consistent. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-4351) RelMdUtil#numDistinctVals always returns 0 for large inputs
[ https://issues.apache.org/jira/browse/CALCITE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649537#comment-17649537 ] Yunhong Zheng commented on CALCITE-4351: Hi, [~fan_li_ya], and [~julianhyde] , is there a solution to this problem? For RelMdUtil#numDistinctVals, Math.log(1.0 - 1.0 / dSize) will return zero easily for same large databases, like TPC-DS benchmark. Return zero will affect the selection of the join strategies (turn HashJoin to NestedLoopJoin) in some cases. So I think it need some protective measures for boundary cases. Like if 'res = (1.0 - Math.exp(expo)) * dSize' return zero, we need return dSize instead of zero ) > RelMdUtil#numDistinctVals always returns 0 for large inputs > --- > > Key: CALCITE-4351 > URL: https://issues.apache.org/jira/browse/CALCITE-4351 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.26.0 >Reporter: Caizhi Weng >Priority: Major > > Previous implementation of {{RelMdUtil#numDistinctVals}} uses the > approximation {{ln(1 + x) ~= x}} when {{x}} is small. > However CALCITE-4132 remove this approximation to make the result more > accurate. This causes the function to calculate an incorrect result for large > inputs (for example, when {{domainSize = 1e18}} and {{numSelected = 1e10}} > the result is 0) due to precision problems. > What I would suggest is to treat small and large inputs in different ways. > For small inputs we use the new, more precise function and for large inputs > we use the old, approximated function. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-5431) Method RelMdUtil$numDistinctVals() wrongly return zero if the method input domainSize is a very large value
Yunhong Zheng created CALCITE-5431: -- Summary: Method RelMdUtil$numDistinctVals() wrongly return zero if the method input domainSize is a very large value Key: CALCITE-5431 URL: https://issues.apache.org/jira/browse/CALCITE-5431 Project: Calcite Issue Type: Bug Components: core Affects Versions: 1.33.0 Reporter: Yunhong Zheng Fix For: 1.33.0 Method {{RelMdUtil$numDistinctVals()}} wrongly return zero if the method input {{domainSize}} is a very large value. If {{domainSize}} is a very large value (like 2 x 10^16), Math.log(1.0 - 1.0 / dSize) will return 0, and {{domainSize}} will be set to 0, which is unreasonable in this scenario. -- This message was sent by Atlassian Jira (v8.20.10#820010)