[jira] [Commented] (CALCITE-5907) Unexpected boolean expression simplification for And expression

2023-08-10 Thread Yunhong Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753026#comment-17753026
 ] 

Yunhong Zheng commented on CALCITE-5907:


Hi, [~kgyrtkirk] and [~jhyde]. Sorry for the late reply. Thanks for the 
explanation. Now, I agree with your answering. As I enable implicit type 
coercion in calcite, this example query will get a right plan.

The current issue in Flink is that Flink doesn't support implicit type coercion 
and doesn't prevent type mismatches occurs during the validate phase, which 
will result in an error. I think I need to solve this problem from the 
perspective of Flink. This issue can be closed, Thanks a lot.

> Unexpected boolean expression simplification for And expression
> ---
>
> Key: CALCITE-5907
> URL: https://issues.apache.org/jira/browse/CALCITE-5907
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Yunhong Zheng
>Priority: Major
> Fix For: 1.36.0
>
>
> As FLINK-27402  shown. If we have a table  MyTable(a Int, b Boolean, c 
> String).  Calcite will not simplify this case ( c is Varchar type while 
> SqlLiteral is boolean):
> {code:java}
> SELECT * FROM MyTable WHERE c = true;{code}
> As the logical plan is :
> {code:java}
> LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c])
> +- LogicalProject(inputs=[0..2])
>    +- LogicalFilter(condition=[=($2, true)])
>       +- LogicalTableScan(table=[[default_catalog, default_database, 
> MyTable]]){code}
> However, Calcite will simplify this case while simplifyAnd :
> {code:java}
> SELECT * FROM MyTable WHERE b = true and c = true;{code}
> As the logical plan is shown below: 'b = true' and 'c = true' both were 
> simplified to 'b' and 'c':
> {code:java}
> LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c])
> +- LogicalProject(inputs=[0..2])
>    +- LogicalFilter(condition=[AND($1, $2)])
>       +- LogicalTableScan(table=[[default_catalog, default_database, 
> MyTable]]){code}
> This may cause error because of filter condition is a Varchar type literal.
>  
> After reading Calcite's code, I found that. The logic of 
> RexSimplify.implify() and RexSimplify.implifyAnd() is different, where the 
> logic of RexSimplify.implifyAnd() is problematic:
> {code:java}
> // Simplify BOOLEAN expressions if possible
> while (term.getKind() == SqlKind.EQUALS) {
>   RexCall call = (RexCall) term;
>   if (call.getOperands().get(0).isAlwaysTrue()) {
> term = call.getOperands().get(1);
> terms.set(i, term);
> continue;
>   } else if (call.getOperands().get(1).isAlwaysTrue()) {
> term = call.getOperands().get(0);
> terms.set(i, term);
> continue;
>   }
>   break;
> } {code}
> The above code cannot make such a simple judgment, as there may not be an 
> implicit conversion to ensure that the types on both sides of the condition 
> are consistent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5907) Unexpected boolean expression simplification for And expression

2023-08-08 Thread Yunhong Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yunhong Zheng updated CALCITE-5907:
---
Description: 
As FLINK-27402  shown. If we have a table  MyTable(a Int, b Boolean, c String). 
 Calcite will not simplify this case ( c is Varchar type while SqlLiteral is 
boolean):
{code:java}
SELECT * FROM MyTable WHERE c = true;{code}
As the logical plan is :
{code:java}
LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c])
+- LogicalProject(inputs=[0..2])
   +- LogicalFilter(condition=[=($2, true)])
      +- LogicalTableScan(table=[[default_catalog, default_database, 
MyTable]]){code}
However, Calcite will simplify this case while simplifyAnd :
{code:java}
SELECT * FROM MyTable WHERE b = true and c = true;{code}
As the logical plan is shown below: 'b = true' and 'c = true' both were 
simplified to 'b' and 'c':
{code:java}
LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c])
+- LogicalProject(inputs=[0..2])
   +- LogicalFilter(condition=[AND($1, $2)])
      +- LogicalTableScan(table=[[default_catalog, default_database, 
MyTable]]){code}
This may cause error because of filter condition is a Varchar type literal.

 

After reading Calcite's code, I found that. The logic of RexSimplify.implify() 
and RexSimplify.implifyAnd() is different, where the logic of 
RexSimplify.implifyAnd() is problematic:
{code:java}
// Simplify BOOLEAN expressions if possible
while (term.getKind() == SqlKind.EQUALS) {
  RexCall call = (RexCall) term;
  if (call.getOperands().get(0).isAlwaysTrue()) {
term = call.getOperands().get(1);
terms.set(i, term);
continue;
  } else if (call.getOperands().get(1).isAlwaysTrue()) {
term = call.getOperands().get(0);
terms.set(i, term);
continue;
  }
  break;
} {code}
The above code cannot make such a simple judgment, as there may not be an 
implicit conversion to ensure that the types on both sides of the condition are 
consistent.

  was:
As [FLINK-27402|https://issues.apache.org/jira/browse/FLINK-27402]  shown. If 
we have a table  MyTable(a Int, b Boolean, c String).  Calcite will not 
simplify this case ( c is Varchar type while SqlLiteral is boolean):

 
{code:java}
SELECT * FROM MyTable WHERE c = true;{code}
 

As the logical plan is :

 
{code:java}
LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c])
+- LogicalProject(inputs=[0..2])
   +- LogicalFilter(condition=[=($2, true)])
      +- LogicalTableScan(table=[[default_catalog, default_database, 
MyTable]]){code}
 

However, Calcite will simplify this case while simplifyAnd :

 
{code:java}
SELECT * FROM MyTable WHERE b = true and c = true;{code}
 

As the logical plan is shown below: 'b = true' and 'c = true' both were 
simplified to 'b' and 'c':

 
{code:java}
LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c])
+- LogicalProject(inputs=[0..2])
   +- LogicalFilter(condition=[AND($1, $2)])
      +- LogicalTableScan(table=[[default_catalog, default_database, 
MyTable]]){code}
 

This may cause error because of filter condition is a Varchar type literal.

 

After reading Calcite's code, I found that. The logic of RexSimplify.implify() 
and RexSimplify.implifyAnd() is different, where the logic of 
RexSimplify.implifyAnd() is problematic:
{code:java}
// Simplify BOOLEAN expressions if possible
while (term.getKind() == SqlKind.EQUALS) {
  RexCall call = (RexCall) term;
  if (call.getOperands().get(0).isAlwaysTrue()) {
term = call.getOperands().get(1);
terms.set(i, term);
continue;
  } else if (call.getOperands().get(1).isAlwaysTrue()) {
term = call.getOperands().get(0);
terms.set(i, term);
continue;
  }
  break;
} {code}
The above code cannot make such a simple judgment, as there may not be an 
implicit conversion to ensure that the types on both sides of the condition are 
consistent.


> Unexpected boolean expression simplification for And expression
> ---
>
> Key: CALCITE-5907
> URL: https://issues.apache.org/jira/browse/CALCITE-5907
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Yunhong Zheng
>Priority: Major
> Fix For: 1.36.0
>
>
> As FLINK-27402  shown. If we have a table  MyTable(a Int, b Boolean, c 
> String).  Calcite will not simplify this case ( c is Varchar type while 
> SqlLiteral is boolean):
> {code:java}
> SELECT * FROM MyTable WHERE c = true;{code}
> As the logical plan is :
> {code:java}
> LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c])
> +- LogicalProject(inputs=[0..2])
>    +- LogicalFilter(condition=[=($2, true)])
>       +- LogicalTableScan(table=[[default_catalog, default_database, 
> MyTable]]){code}
> However, Calcite will simplify this case while simplifyAnd :
> {code:java}
> SELECT * FROM MyTable WHERE b = true 

[jira] [Commented] (CALCITE-5907) Unexpected boolean expression simplification for And expression

2023-08-08 Thread Yunhong Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752022#comment-17752022
 ] 

Yunhong Zheng commented on CALCITE-5907:


Hi, [~julianhyde] . This looks like the calcite bug. Can you take a look, 
thanks!

> Unexpected boolean expression simplification for And expression
> ---
>
> Key: CALCITE-5907
> URL: https://issues.apache.org/jira/browse/CALCITE-5907
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Yunhong Zheng
>Priority: Major
> Fix For: 1.36.0
>
>
> As [FLINK-27402|https://issues.apache.org/jira/browse/FLINK-27402]  shown. If 
> we have a table  MyTable(a Int, b Boolean, c String).  Calcite will not 
> simplify this case ( c is Varchar type while SqlLiteral is boolean):
>  
> {code:java}
> SELECT * FROM MyTable WHERE c = true;{code}
>  
> As the logical plan is :
>  
> {code:java}
> LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c])
> +- LogicalProject(inputs=[0..2])
>    +- LogicalFilter(condition=[=($2, true)])
>       +- LogicalTableScan(table=[[default_catalog, default_database, 
> MyTable]]){code}
>  
> However, Calcite will simplify this case while simplifyAnd :
>  
> {code:java}
> SELECT * FROM MyTable WHERE b = true and c = true;{code}
>  
> As the logical plan is shown below: 'b = true' and 'c = true' both were 
> simplified to 'b' and 'c':
>  
> {code:java}
> LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c])
> +- LogicalProject(inputs=[0..2])
>    +- LogicalFilter(condition=[AND($1, $2)])
>       +- LogicalTableScan(table=[[default_catalog, default_database, 
> MyTable]]){code}
>  
> This may cause error because of filter condition is a Varchar type literal.
>  
> After reading Calcite's code, I found that. The logic of 
> RexSimplify.implify() and RexSimplify.implifyAnd() is different, where the 
> logic of RexSimplify.implifyAnd() is problematic:
> {code:java}
> // Simplify BOOLEAN expressions if possible
> while (term.getKind() == SqlKind.EQUALS) {
>   RexCall call = (RexCall) term;
>   if (call.getOperands().get(0).isAlwaysTrue()) {
> term = call.getOperands().get(1);
> terms.set(i, term);
> continue;
>   } else if (call.getOperands().get(1).isAlwaysTrue()) {
> term = call.getOperands().get(0);
> terms.set(i, term);
> continue;
>   }
>   break;
> } {code}
> The above code cannot make such a simple judgment, as there may not be an 
> implicit conversion to ensure that the types on both sides of the condition 
> are consistent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5907) Unexpected boolean expression simplification for And expression

2023-08-08 Thread Yunhong Zheng (Jira)
Yunhong Zheng created CALCITE-5907:
--

 Summary: Unexpected boolean expression simplification for And 
expression
 Key: CALCITE-5907
 URL: https://issues.apache.org/jira/browse/CALCITE-5907
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.35.0
Reporter: Yunhong Zheng
 Fix For: 1.36.0


As [FLINK-27402|https://issues.apache.org/jira/browse/FLINK-27402]  shown. If 
we have a table  MyTable(a Int, b Boolean, c String).  Calcite will not 
simplify this case ( c is Varchar type while SqlLiteral is boolean):

 
{code:java}
SELECT * FROM MyTable WHERE c = true;{code}
 

As the logical plan is :

 
{code:java}
LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c])
+- LogicalProject(inputs=[0..2])
   +- LogicalFilter(condition=[=($2, true)])
      +- LogicalTableScan(table=[[default_catalog, default_database, 
MyTable]]){code}
 

However, Calcite will simplify this case while simplifyAnd :

 
{code:java}
SELECT * FROM MyTable WHERE b = true and c = true;{code}
 

As the logical plan is shown below: 'b = true' and 'c = true' both were 
simplified to 'b' and 'c':

 
{code:java}
LogicalSink(table=[*anonymous_collect$1*], fields=[a, b, c])
+- LogicalProject(inputs=[0..2])
   +- LogicalFilter(condition=[AND($1, $2)])
      +- LogicalTableScan(table=[[default_catalog, default_database, 
MyTable]]){code}
 

This may cause error because of filter condition is a Varchar type literal.

 

After reading Calcite's code, I found that. The logic of RexSimplify.implify() 
and RexSimplify.implifyAnd() is different, where the logic of 
RexSimplify.implifyAnd() is problematic:
{code:java}
// Simplify BOOLEAN expressions if possible
while (term.getKind() == SqlKind.EQUALS) {
  RexCall call = (RexCall) term;
  if (call.getOperands().get(0).isAlwaysTrue()) {
term = call.getOperands().get(1);
terms.set(i, term);
continue;
  } else if (call.getOperands().get(1).isAlwaysTrue()) {
term = call.getOperands().get(0);
terms.set(i, term);
continue;
  }
  break;
} {code}
The above code cannot make such a simple judgment, as there may not be an 
implicit conversion to ensure that the types on both sides of the condition are 
consistent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-4351) RelMdUtil#numDistinctVals always returns 0 for large inputs

2022-12-19 Thread Yunhong Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649537#comment-17649537
 ] 

Yunhong Zheng commented on CALCITE-4351:


Hi, [~fan_li_ya], and [~julianhyde] ,  is there a solution to this problem? For 
RelMdUtil#numDistinctVals,  Math.log(1.0 - 1.0 / dSize) will return zero easily 
for same large databases, like TPC-DS benchmark. Return zero will affect the 
selection of the join strategies (turn HashJoin to NestedLoopJoin) in some 
cases. So I think it need some protective measures for boundary cases. Like if 
'res = (1.0 - Math.exp(expo)) * dSize' return zero, we need return dSize 
instead of zero

)

> RelMdUtil#numDistinctVals always returns 0 for large inputs
> ---
>
> Key: CALCITE-4351
> URL: https://issues.apache.org/jira/browse/CALCITE-4351
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.26.0
>Reporter: Caizhi Weng
>Priority: Major
>
> Previous implementation of {{RelMdUtil#numDistinctVals}} uses the 
> approximation {{ln(1 + x) ~= x}} when {{x}} is small.
> However CALCITE-4132 remove this approximation to make the result more 
> accurate. This causes the function to calculate an incorrect result for large 
> inputs (for example, when {{domainSize = 1e18}} and {{numSelected = 1e10}} 
> the result is 0) due to precision problems.
> What I would suggest is to treat small and large inputs in different ways. 
> For small inputs we use the new, more precise function and for large inputs 
> we use the old, approximated function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5431) Method RelMdUtil$numDistinctVals() wrongly return zero if the method input domainSize is a very large value

2022-12-12 Thread Yunhong Zheng (Jira)
Yunhong Zheng created CALCITE-5431:
--

 Summary: Method RelMdUtil$numDistinctVals() wrongly return zero if 
the method input domainSize is a very large value
 Key: CALCITE-5431
 URL: https://issues.apache.org/jira/browse/CALCITE-5431
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.33.0
Reporter: Yunhong Zheng
 Fix For: 1.33.0


Method {{RelMdUtil$numDistinctVals()}} wrongly return zero if the method input 
{{domainSize}} is a very large value. If {{domainSize}} is a very large value 
(like 2 x 10^16), Math.log(1.0 - 1.0 / dSize) will return 0, and {{domainSize}} 
will be set to 0, which is unreasonable in this scenario.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)