date:20230814

[jira] [Commented] (CALCITE-5889) Add the RelRule that converts Minus into UNION ALL..GROUP BY...WHERE

2023-08-14 Thread LakeShen (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754442#comment-17754442
 ] 

LakeShen commented on CALCITE-5889:
---

The PR is ready,link:[https://github.com/apache/calcite/pull/3367]

if someone have time, please help me to review it, thank you very much:)

> Add the RelRule that converts Minus into UNION ALL..GROUP BY...WHERE
> 
>
> Key: CALCITE-5889
> URL: https://issues.apache.org/jira/browse/CALCITE-5889
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: LakeShen
>Assignee: LakeShen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> There are many projects that implement optimizers based on Calcite,our 
> optimizer is also based on Calcite.
> Calcite has a lot of good rules in CoreRules.It has UnionToDistinctRule and 
> IntersectToDistinctRule RelRule ,UnionToDistinctRule is that converts 
> Union(all = false) to Union(all=true) + Aggregate,IntersectToDistinctRule is 
> that converts Intersect(all=false) to Union(all=true) + Aggregate + 
> Filter.None of these rules translate Minus to other RelNode combinations.
> Normally, a computation engine does not have a Minus operator, so it is 
> common to convert Minus to some other relational algebra combination in the 
> optimizer.For example,in presto,it has the ImplementIntersectAndExceptAsUnion 
> PlanOptimizer that converts Minus to into UNION ALL..GROUP BY...WHERE. In 
> flink,it has the ReplaceMinusWithAntiJoinRule RelOptRule. In dremio-oss,it 
> has MinusToJoin RelOptRule.All of this rules,converts the Minus to other 
> composition of relational algebra.
> Since there are no optimization rules for dealing with minus in calcite, 
> users of calcite generally need to write their own optimization rules for 
> dealing with Minus.
> I think it makes sense to add the RelRule that converts Minus to other 
> composition of relational algebranto ,like UNION ALL.. GROUP BY... WHERE. So 
> for users of calcite,they don't have to go through the extra work of writing 
> rules for Minus to convert it to something else.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5922) The SQL generated for the POSITION function(with 3 input arguments) by the SparkSqlDialect is not recognized by Spark SQL

2023-08-14 Thread hongyu guo (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongyu guo updated CALCITE-5922:

Summary: The SQL generated for the POSITION function(with 3 input 
arguments) by the SparkSqlDialect is not recognized by Spark SQL  (was: The SQL 
generated for the POSITION function(with 3 input arguments) by the 
SparkSqlDialect is not recognized by Spark SQL.)

> The SQL generated for the POSITION function(with 3 input arguments) by the 
> SparkSqlDialect is not recognized by Spark SQL
> -
>
> Key: CALCITE-5922
> URL: https://issues.apache.org/jira/browse/CALCITE-5922
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: hongyu guo
>Assignee: hongyu guo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> In SparkSQL, POSITION(substr, str[, pos]) function only accept 
> comma-separated when there are 3 arguments.
> For example:
> {code:java}
> // SparkSQL accepted SQL
> select POSITION('a', 'abc', 1);{code}
> Calcite will use the IN and FROM keyword to separate the input arguments when 
> unparsing.
> {code:java}
> // Calcite accepted and unparsed SQL
> select POSITION('a' IN 'abc' FROM 1){code}
> For 2 augument inputs, SparkSQL accept both syntaxes. So I think we should 
> write a rule in SparkSqlDialect to convert keyword-separated syntax to 
> comma-separted syntax for POSITION function.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-5922) The SQL generated for the POSITION function(with 3 input arguments) by the SparkSqlDialect is not recognized by Spark SQL.

2023-08-14 Thread hongyu guo (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754436#comment-17754436
 ] 

hongyu guo edited comment on CALCITE-5922 at 8/15/23 6:30 AM:
--

Hi, Julian. Thank you for you replay.

I change the summary to 

_The SQL generated for the POSITION function(with 3 input arguments) by the 
SparkSqlDialect is not recognized by Spark SQL._


was (Author: JIRAUSER300840):
Hi Julian~

Thank you for you replay,

I change the summary to 

_The SQL generated for the POSITION function(with 3 input arguments) by the 
SparkSqlDialect is not recognized by Spark SQL._

> The SQL generated for the POSITION function(with 3 input arguments) by the 
> SparkSqlDialect is not recognized by Spark SQL.
> --
>
> Key: CALCITE-5922
> URL: https://issues.apache.org/jira/browse/CALCITE-5922
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: hongyu guo
>Assignee: hongyu guo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> In SparkSQL, POSITION(substr, str[, pos]) function only accept 
> comma-separated when there are 3 arguments.
> For example:
> {code:java}
> // SparkSQL accepted SQL
> select POSITION('a', 'abc', 1);{code}
> Calcite will use the IN and FROM keyword to separate the input arguments when 
> unparsing.
> {code:java}
> // Calcite accepted and unparsed SQL
> select POSITION('a' IN 'abc' FROM 1){code}
> For 2 augument inputs, SparkSQL accept both syntaxes. So I think we should 
> write a rule in SparkSqlDialect to convert keyword-separated syntax to 
> comma-separted syntax for POSITION function.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5922) The SQL generated for the POSITION function(with 3 input arguments) by the SparkSqlDialect is not recognized by Spark SQL.

2023-08-14 Thread hongyu guo (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754436#comment-17754436
 ] 

hongyu guo commented on CALCITE-5922:
-

Hi Julian~

Thank you for you replay,

I change the summary to 

_The SQL generated for the POSITION function(with 3 input arguments) by the 
SparkSqlDialect is not recognized by Spark SQL._

> The SQL generated for the POSITION function(with 3 input arguments) by the 
> SparkSqlDialect is not recognized by Spark SQL.
> --
>
> Key: CALCITE-5922
> URL: https://issues.apache.org/jira/browse/CALCITE-5922
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: hongyu guo
>Assignee: hongyu guo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> In SparkSQL, POSITION(substr, str[, pos]) function only accept 
> comma-separated when there are 3 arguments.
> For example:
> {code:java}
> // SparkSQL accepted SQL
> select POSITION('a', 'abc', 1);{code}
> Calcite will use the IN and FROM keyword to separate the input arguments when 
> unparsing.
> {code:java}
> // Calcite accepted and unparsed SQL
> select POSITION('a' IN 'abc' FROM 1){code}
> For 2 augument inputs, SparkSQL accept both syntaxes. So I think we should 
> write a rule in SparkSqlDialect to convert keyword-separated syntax to 
> comma-separted syntax for POSITION function.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (CALCITE-5922) The SQL generated for the POSITION function(with 3 input arguments) by the SparkSqlDialect is not recognized by Spark SQL.

2023-08-14 Thread hongyu guo (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongyu guo reassigned CALCITE-5922:
---

Assignee: hongyu guo

> The SQL generated for the POSITION function(with 3 input arguments) by the 
> SparkSqlDialect is not recognized by Spark SQL.
> --
>
> Key: CALCITE-5922
> URL: https://issues.apache.org/jira/browse/CALCITE-5922
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: hongyu guo
>Assignee: hongyu guo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> In SparkSQL, POSITION(substr, str[, pos]) function only accept 
> comma-separated when there are 3 arguments.
> For example:
> {code:java}
> // SparkSQL accepted SQL
> select POSITION('a', 'abc', 1);{code}
> Calcite will use the IN and FROM keyword to separate the input arguments when 
> unparsing.
> {code:java}
> // Calcite accepted and unparsed SQL
> select POSITION('a' IN 'abc' FROM 1){code}
> For 2 augument inputs, SparkSQL accept both syntaxes. So I think we should 
> write a rule in SparkSqlDialect to convert keyword-separated syntax to 
> comma-separted syntax for POSITION function.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5922) The SQL generated for the POSITION function(with 3 input arguments) by the SparkSqlDialect is not recognized by Spark SQL.

2023-08-14 Thread hongyu guo (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongyu guo updated CALCITE-5922:

Summary: The SQL generated for the POSITION function(with 3 input 
arguments) by the SparkSqlDialect is not recognized by Spark SQL.  (was: 
POSITION signature incorrect for SparkSQL)

> The SQL generated for the POSITION function(with 3 input arguments) by the 
> SparkSqlDialect is not recognized by Spark SQL.
> --
>
> Key: CALCITE-5922
> URL: https://issues.apache.org/jira/browse/CALCITE-5922
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: hongyu guo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> In SparkSQL, POSITION(substr, str[, pos]) function only accept 
> comma-separated when there are 3 arguments.
> For example:
> {code:java}
> // SparkSQL accepted SQL
> select POSITION('a', 'abc', 1);{code}
> Calcite will use the IN and FROM keyword to separate the input arguments when 
> unparsing.
> {code:java}
> // Calcite accepted and unparsed SQL
> select POSITION('a' IN 'abc' FROM 1){code}
> For 2 augument inputs, SparkSQL accept both syntaxes. So I think we should 
> write a rule in SparkSqlDialect to convert keyword-separated syntax to 
> comma-separted syntax for POSITION function.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5889) Add the RelRule that converts Minus into UNION ALL..GROUP BY...WHERE

2023-08-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CALCITE-5889:

Labels: pull-request-available  (was: )

> Add the RelRule that converts Minus into UNION ALL..GROUP BY...WHERE
> 
>
> Key: CALCITE-5889
> URL: https://issues.apache.org/jira/browse/CALCITE-5889
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: LakeShen
>Assignee: LakeShen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> There are many projects that implement optimizers based on Calcite,our 
> optimizer is also based on Calcite.
> Calcite has a lot of good rules in CoreRules.It has UnionToDistinctRule and 
> IntersectToDistinctRule RelRule ,UnionToDistinctRule is that converts 
> Union(all = false) to Union(all=true) + Aggregate,IntersectToDistinctRule is 
> that converts Intersect(all=false) to Union(all=true) + Aggregate + 
> Filter.None of these rules translate Minus to other RelNode combinations.
> Normally, a computation engine does not have a Minus operator, so it is 
> common to convert Minus to some other relational algebra combination in the 
> optimizer.For example,in presto,it has the ImplementIntersectAndExceptAsUnion 
> PlanOptimizer that converts Minus to into UNION ALL..GROUP BY...WHERE. In 
> flink,it has the ReplaceMinusWithAntiJoinRule RelOptRule. In dremio-oss,it 
> has MinusToJoin RelOptRule.All of this rules,converts the Minus to other 
> composition of relational algebra.
> Since there are no optimization rules for dealing with minus in calcite, 
> users of calcite generally need to write their own optimization rules for 
> dealing with Minus.
> I think it makes sense to add the RelRule that converts Minus to other 
> composition of relational algebranto ,like UNION ALL.. GROUP BY... WHERE. So 
> for users of calcite,they don't have to go through the extra work of writing 
> rules for Minus to convert it to something else.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5880) When the association condition of the association subquery is 'is not distinct from', the join condition becomes '=' after decorrelation.

2023-08-14 Thread libopeng (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754397#comment-17754397
 ] 

libopeng commented on CALCITE-5880:
---

I tried to verify 'null=null' using Quidem and found that Quidem calculated 
'null=null' to be valid, which seems to be another issue.

> When the association condition of the association subquery is 'is not 
> distinct from', the join condition becomes '=' after decorrelation.
> -
>
> Key: CALCITE-5880
> URL: https://issues.apache.org/jira/browse/CALCITE-5880
> Project: Calcite
>  Issue Type: Bug
>Reporter: libopeng
>Assignee: LakeShen
>Priority: Major
> Attachments: image-2023-08-01-15-20-22-105.png, 
> image-2023-08-02-10-15-00-455.png, image-2023-08-13-21-36-00-494.png
>
>
> {code:java}
> select EMPNO 
> from emp
> where EXISTS (select DEPTNO from dept where emp.EMPNO is not DISTINCT from 
> dept.DEPTNO){code}
> before decorrelation
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalFilter(condition=[IS NOT NULL($9)])
>     LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{0}])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalAggregate(group=[{}], agg#0=[MIN($0)])
>         LogicalProject($f0=[true])
>           LogicalFilter(condition=[OR(AND(IS NULL($cor0.EMPNO), IS NULL($0)), 
> IS TRUE(=($cor0.EMPNO, $0)))])<---   here is 't1.x is not distinct from 
> t2.x'
>             LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}
> after decorrelation
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalFilter(condition=[IS NOT NULL($9)])
>     LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$13])
>       LogicalJoin(condition=[AND(=($0, $11), =($9, $12))], joinType=[left])   
>  <---here is 't1.x=t2.x and (t1.x is null = t2.x is null)'
>         LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], 
> HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f9=[false], 
> EMPNO0=[$0])
>           LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>         LogicalAggregate(group=[{0, 1}], agg#0=[MIN($2)])
>           LogicalProject(EMPNO=[$2], $f9=[$3], $f0=[true])
>             LogicalJoin(condition=[=($2, $0)], joinType=[inner])
>               LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>               LogicalProject(EMPNO=[$0], $f9=[false])
>                 LogicalTableScan(table=[[CATALOG, SALES, EMP]])  {code}
>  
> {code:java}
>     EMP               | DEPT
> +-+   |  +-+
> | EMPNO  ||  | DEPTNO |
> +-+   |  +-+
> | null |  |  | null   |
>{code}
> expect result
> {code:java}
>   EMPNO
> +---+
> |  null |
> +---+{code}
> actual result
> {code:java}
>   EMPNO
> +---+
> +---+ {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-5880) When the association condition of the association subquery is 'is not distinct from', the join condition becomes '=' after decorrelation.

2023-08-14 Thread libopeng (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754387#comment-17754387
 ] 

libopeng edited comment on CALCITE-5880 at 8/15/23 3:46 AM:


Sorry, I wasn't aware of the problem.
[~julianhyde] 
EnumerableHashJoin(condition=[AND(=($6, $11), =($8, $12))], joinType=[semi])
This join condition looks wrong, it translates to e1.comm=e2.comm and (e1.comm 
is nulll =e2.comm is null), which does not guarantee that null is equal to null;
 


was (Author: JIRAUSER298961):
Sorry, I wasn't aware of the problem.
[~julianhyde] 
EnumerableHashJoin(condition=[AND(=($6, $11), =($8, $12))], joinType=[semi])
This join condition looks wrong, it does not guarantee that null is equal to 
null;

> When the association condition of the association subquery is 'is not 
> distinct from', the join condition becomes '=' after decorrelation.
> -
>
> Key: CALCITE-5880
> URL: https://issues.apache.org/jira/browse/CALCITE-5880
> Project: Calcite
>  Issue Type: Bug
>Reporter: libopeng
>Assignee: LakeShen
>Priority: Major
> Attachments: image-2023-08-01-15-20-22-105.png, 
> image-2023-08-02-10-15-00-455.png, image-2023-08-13-21-36-00-494.png
>
>
> {code:java}
> select EMPNO 
> from emp
> where EXISTS (select DEPTNO from dept where emp.EMPNO is not DISTINCT from 
> dept.DEPTNO){code}
> before decorrelation
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalFilter(condition=[IS NOT NULL($9)])
>     LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{0}])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalAggregate(group=[{}], agg#0=[MIN($0)])
>         LogicalProject($f0=[true])
>           LogicalFilter(condition=[OR(AND(IS NULL($cor0.EMPNO), IS NULL($0)), 
> IS TRUE(=($cor0.EMPNO, $0)))])<---   here is 't1.x is not distinct from 
> t2.x'
>             LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}
> after decorrelation
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalFilter(condition=[IS NOT NULL($9)])
>     LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$13])
>       LogicalJoin(condition=[AND(=($0, $11), =($9, $12))], joinType=[left])   
>  <---here is 't1.x=t2.x and (t1.x is null = t2.x is null)'
>         LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], 
> HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f9=[false], 
> EMPNO0=[$0])
>           LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>         LogicalAggregate(group=[{0, 1}], agg#0=[MIN($2)])
>           LogicalProject(EMPNO=[$2], $f9=[$3], $f0=[true])
>             LogicalJoin(condition=[=($2, $0)], joinType=[inner])
>               LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>               LogicalProject(EMPNO=[$0], $f9=[false])
>                 LogicalTableScan(table=[[CATALOG, SALES, EMP]])  {code}
>  
> {code:java}
>     EMP               | DEPT
> +-+   |  +-+
> | EMPNO  ||  | DEPTNO |
> +-+   |  +-+
> | null |  |  | null   |
>{code}
> expect result
> {code:java}
>   EMPNO
> +---+
> |  null |
> +---+{code}
> actual result
> {code:java}
>   EMPNO
> +---+
> +---+ {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5880) When the association condition of the association subquery is 'is not distinct from', the join condition becomes '=' after decorrelation.

2023-08-14 Thread libopeng (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754387#comment-17754387
 ] 

libopeng commented on CALCITE-5880:
---

Sorry, I wasn't aware of the problem.
[~julianhyde] 
EnumerableHashJoin(condition=[AND(=($6, $11), =($8, $12))], joinType=[semi])
This join condition looks wrong, it does not guarantee that null is equal to 
null;

> When the association condition of the association subquery is 'is not 
> distinct from', the join condition becomes '=' after decorrelation.
> -
>
> Key: CALCITE-5880
> URL: https://issues.apache.org/jira/browse/CALCITE-5880
> Project: Calcite
>  Issue Type: Bug
>Reporter: libopeng
>Assignee: LakeShen
>Priority: Major
> Attachments: image-2023-08-01-15-20-22-105.png, 
> image-2023-08-02-10-15-00-455.png, image-2023-08-13-21-36-00-494.png
>
>
> {code:java}
> select EMPNO 
> from emp
> where EXISTS (select DEPTNO from dept where emp.EMPNO is not DISTINCT from 
> dept.DEPTNO){code}
> before decorrelation
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalFilter(condition=[IS NOT NULL($9)])
>     LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{0}])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalAggregate(group=[{}], agg#0=[MIN($0)])
>         LogicalProject($f0=[true])
>           LogicalFilter(condition=[OR(AND(IS NULL($cor0.EMPNO), IS NULL($0)), 
> IS TRUE(=($cor0.EMPNO, $0)))])<---   here is 't1.x is not distinct from 
> t2.x'
>             LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}
> after decorrelation
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalFilter(condition=[IS NOT NULL($9)])
>     LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$13])
>       LogicalJoin(condition=[AND(=($0, $11), =($9, $12))], joinType=[left])   
>  <---here is 't1.x=t2.x and (t1.x is null = t2.x is null)'
>         LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], 
> HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f9=[false], 
> EMPNO0=[$0])
>           LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>         LogicalAggregate(group=[{0, 1}], agg#0=[MIN($2)])
>           LogicalProject(EMPNO=[$2], $f9=[$3], $f0=[true])
>             LogicalJoin(condition=[=($2, $0)], joinType=[inner])
>               LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>               LogicalProject(EMPNO=[$0], $f9=[false])
>                 LogicalTableScan(table=[[CATALOG, SALES, EMP]])  {code}
>  
> {code:java}
>     EMP               | DEPT
> +-+   |  +-+
> | EMPNO  ||  | DEPTNO |
> +-+   |  +-+
> | null |  |  | null   |
>{code}
> expect result
> {code:java}
>   EMPNO
> +---+
> |  null |
> +---+{code}
> actual result
> {code:java}
>   EMPNO
> +---+
> +---+ {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5906) JDBC adapter should generate TABLESAMPLE

2023-08-14 Thread LakeShen (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754366#comment-17754366
 ] 

LakeShen commented on CALCITE-5906:
---

Hi [~julianhyde] ,thanks for your review and reply,I have followed with your 
advice.

> JDBC adapter should generate TABLESAMPLE
> 
>
> Key: CALCITE-5906
> URL: https://issues.apache.org/jira/browse/CALCITE-5906
> Project: Calcite
>  Issue Type: Bug
>Reporter: LakeShen
>Assignee: LakeShen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.36.0
>
> Attachments: image-2023-08-08-20-16-27-188.png, 
> image-2023-08-08-20-19-01-269.png
>
>
> Currently in the RelToSqlConverter class, there is no visit method to handle 
> the Sample RelNode.If a SQL plan has Sample RelNode, RelToSqlConverter will 
> throw a AssertionError like this:
> {code:java}
>  java.lang.AssertionError: Need to implement 
> org.apache.calcite.rel.core.Sample
> {code}
> This will cause the Sample RelNode plan to fail to translate to the 
> corresponding dialect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5906) JDBC adapter should generate TABLESAMPLE

2023-08-14 Thread LakeShen (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LakeShen updated CALCITE-5906:
--
Summary: JDBC adapter should generate TABLESAMPLE  (was: Add the 
visit(Sample e) implementation  in RelToSqlConverter )

> JDBC adapter should generate TABLESAMPLE
> 
>
> Key: CALCITE-5906
> URL: https://issues.apache.org/jira/browse/CALCITE-5906
> Project: Calcite
>  Issue Type: Bug
>Reporter: LakeShen
>Assignee: LakeShen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.36.0
>
> Attachments: image-2023-08-08-20-16-27-188.png, 
> image-2023-08-08-20-19-01-269.png
>
>
> Currently in the RelToSqlConverter class, there is no visit method to handle 
> the Sample RelNode.If a SQL plan has Sample RelNode, RelToSqlConverter will 
> throw a AssertionError like this:
> {code:java}
>  java.lang.AssertionError: Need to implement 
> org.apache.calcite.rel.core.Sample
> {code}
> This will cause the Sample RelNode plan to fail to translate to the 
> corresponding dialect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5830) Add ARRAY_INSERT function(enabled in Spark library)

2023-08-14 Thread Jiajun Xie (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754363#comment-17754363
 ] 

Jiajun Xie commented on CALCITE-5830:
-

[~taoran] , I got it.

Thank you for your detailed explanation.

> Add ARRAY_INSERT function(enabled in Spark library)
> ---
>
> Key: CALCITE-5830
> URL: https://issues.apache.org/jira/browse/CALCITE-5830
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.34.0
>Reporter: Ran Tao
>Assignee: Ran Tao
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-08-14-11-17-33-205.png, 
> image-2023-08-14-11-20-46-189.png
>
>
> array_insert(x, pos, val) - Places val into index pos of array x. Array 
> indices start at 1, or start from the end if index is negative. Index above 
> array size appends the array, or prepends the array if index is negative, 
> with 'null' elements
> *Examples:*
> > SELECT array_insert(array(1, 2, 3, 4), 5, 5); [1,2,3,4,5]
> > SELECT array_insert(array(5, 3, 2, 1), -3, 4); [5,4,3,2,1] 
> https://spark.apache.org/docs/latest/api/sql/index.html#array_insert



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (CALCITE-5929) Improve LogicalWindow print plan to add the constant value

2023-08-14 Thread xiong duan (Jira)

xiong duan created CALCITE-5929:
---

 Summary: Improve LogicalWindow print plan to add the constant value
 Key: CALCITE-5929
 URL: https://issues.apache.org/jira/browse/CALCITE-5929
 Project: Calcite
  Issue Type: Improvement
  Components: core
Affects Versions: 1.35.0
Reporter: xiong duan


In general, we can extract the original SQL from the plan. But when the SQL 
includes the LogicalWindow with a constant value, the Plan didn't print it.

The SQL:
{code:java}
select COUNT(*) over (
ORDER BY empno
ROWS BETWEEN 5 + 5 PRECEDING AND 1 PRECEDING) AS w_count from emp{code}
The Plan:
{code:java}
LogicalProject($0=[$1])
  LogicalWindow(window#0=[window(order by [0] rows between $1 PRECEDING and $2 
PRECEDING aggs [COUNT()])])    
  LogicalProject(EMPNO=[$0])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
In this plan, we don't know the $1 and $2 values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (CALCITE-5878) Calcite's ci check process fails due to concurrency errors

2023-08-14 Thread Julian Hyde (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde resolved CALCITE-5878.
--
Resolution: Cannot Reproduce

It seems to be working now.

> Calcite's ci check process fails due to concurrency errors
> --
>
> Key: CALCITE-5878
> URL: https://issues.apache.org/jira/browse/CALCITE-5878
> Project: Calcite
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.35.0
>Reporter: Ran Tao
>Priority: Major
>
> I have created a PR from 1.35 calcite main. And the PR failed and cause by:
> [https://github.com/apache/calcite/pull/3340]
> [https://github.com/apache/calcite/actions/runs/5686803178/job/15414233954?pr=3340]
> I have tested with some empty commits, it's failed again. So it's maybe a ci 
> potential bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5880) When the association condition of the association subquery is 'is not distinct from', the join condition becomes '=' after decorrelation.

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754350#comment-17754350
 ] 

Julian Hyde commented on CALCITE-5880:
--

I tried a simple query:
{code}
> explain plan for
> select * from scott_clone.emp as e1
> where exists (
>   select 1 from scott_clone.emp as e2
> where e2.comm is not distinct from e1.comm);
EnumerableCalc(expr#0..8=[{inputs}], proj#0..7=[{exprs}])
  EnumerableHashJoin(condition=[AND(=($6, $11), =($8, $12))], joinType=[semi])
EnumerableCalc(expr#0..7=[{inputs}], expr#8=[IS NULL($t6)], 
proj#0..8=[{exprs}])
  EnumerableTableScan(table=[[SCOTT_CLONE, EMP]])
EnumerableNestedLoopJoin(condition=[OR(AND(IS NULL($1), $3), =($1, $2))], 
joinType=[inner])
  EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], COMM=[$t6])
EnumerableTableScan(table=[[SCOTT_CLONE, EMP]])
  EnumerableAggregate(group=[{0, 1}])
EnumerableCalc(expr#0..7=[{inputs}], expr#8=[IS NULL($t6)], COMM=[$t6], 
$f8=[$t8])
  EnumerableTableScan(table=[[SCOTT_CLONE, EMP]])
{code}

I used the {{comm}} column because it is nullable in the standard {{EMP}} 
table. And sure enough the plan handles null values: {{OR(AND(IS NULL($1), $3), 
=($1, $2))}}.

Are you sure that the optimizer is not optimizing {{IS NOT DISTINCT FROM}} to 
{{=}} because one of your columns is {{NOT NULL}}?

> When the association condition of the association subquery is 'is not 
> distinct from', the join condition becomes '=' after decorrelation.
> -
>
> Key: CALCITE-5880
> URL: https://issues.apache.org/jira/browse/CALCITE-5880
> Project: Calcite
>  Issue Type: Bug
>Reporter: libopeng
>Assignee: LakeShen
>Priority: Major
> Attachments: image-2023-08-01-15-20-22-105.png, 
> image-2023-08-02-10-15-00-455.png, image-2023-08-13-21-36-00-494.png
>
>
> {code:java}
> select EMPNO 
> from emp
> where EXISTS (select DEPTNO from dept where emp.EMPNO is not DISTINCT from 
> dept.DEPTNO){code}
> before decorrelation
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalFilter(condition=[IS NOT NULL($9)])
>     LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{0}])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalAggregate(group=[{}], agg#0=[MIN($0)])
>         LogicalProject($f0=[true])
>           LogicalFilter(condition=[OR(AND(IS NULL($cor0.EMPNO), IS NULL($0)), 
> IS TRUE(=($cor0.EMPNO, $0)))])<---   here is 't1.x is not distinct from 
> t2.x'
>             LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}
> after decorrelation
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalFilter(condition=[IS NOT NULL($9)])
>     LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$13])
>       LogicalJoin(condition=[AND(=($0, $11), =($9, $12))], joinType=[left])   
>  <---here is 't1.x=t2.x and (t1.x is null = t2.x is null)'
>         LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], 
> HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f9=[false], 
> EMPNO0=[$0])
>           LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>         LogicalAggregate(group=[{0, 1}], agg#0=[MIN($2)])
>           LogicalProject(EMPNO=[$2], $f9=[$3], $f0=[true])
>             LogicalJoin(condition=[=($2, $0)], joinType=[inner])
>               LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>               LogicalProject(EMPNO=[$0], $f9=[false])
>                 LogicalTableScan(table=[[CATALOG, SALES, EMP]])  {code}
>  
> {code:java}
>     EMP               | DEPT
> +-+   |  +-+
> | EMPNO  ||  | DEPTNO |
> +-+   |  +-+
> | null |  |  | null   |
>{code}
> expect result
> {code:java}
>   EMPNO
> +---+
> |  null |
> +---+{code}
> actual result
> {code:java}
>   EMPNO
> +---+
> +---+ {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5880) When the association condition of the association subquery is 'is not distinct from', the join condition becomes '=' after decorrelation.

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754348#comment-17754348
 ] 

Julian Hyde commented on CALCITE-5880:
--

[~leepb], In this case, let's not use example queries that have a COUNT in a 
subquery. We already know there is a bug with such queries (see CALCITE-5743). 
Everyone will get confused.

> When the association condition of the association subquery is 'is not 
> distinct from', the join condition becomes '=' after decorrelation.
> -
>
> Key: CALCITE-5880
> URL: https://issues.apache.org/jira/browse/CALCITE-5880
> Project: Calcite
>  Issue Type: Bug
>Reporter: libopeng
>Assignee: LakeShen
>Priority: Major
> Attachments: image-2023-08-01-15-20-22-105.png, 
> image-2023-08-02-10-15-00-455.png, image-2023-08-13-21-36-00-494.png
>
>
> {code:java}
> select EMPNO 
> from emp
> where EXISTS (select DEPTNO from dept where emp.EMPNO is not DISTINCT from 
> dept.DEPTNO){code}
> before decorrelation
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalFilter(condition=[IS NOT NULL($9)])
>     LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{0}])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalAggregate(group=[{}], agg#0=[MIN($0)])
>         LogicalProject($f0=[true])
>           LogicalFilter(condition=[OR(AND(IS NULL($cor0.EMPNO), IS NULL($0)), 
> IS TRUE(=($cor0.EMPNO, $0)))])<---   here is 't1.x is not distinct from 
> t2.x'
>             LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}
> after decorrelation
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalFilter(condition=[IS NOT NULL($9)])
>     LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$13])
>       LogicalJoin(condition=[AND(=($0, $11), =($9, $12))], joinType=[left])   
>  <---here is 't1.x=t2.x and (t1.x is null = t2.x is null)'
>         LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], 
> HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f9=[false], 
> EMPNO0=[$0])
>           LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>         LogicalAggregate(group=[{0, 1}], agg#0=[MIN($2)])
>           LogicalProject(EMPNO=[$2], $f9=[$3], $f0=[true])
>             LogicalJoin(condition=[=($2, $0)], joinType=[inner])
>               LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>               LogicalProject(EMPNO=[$0], $f9=[false])
>                 LogicalTableScan(table=[[CATALOG, SALES, EMP]])  {code}
>  
> {code:java}
>     EMP               | DEPT
> +-+   |  +-+
> | EMPNO  ||  | DEPTNO |
> +-+   |  +-+
> | null |  |  | null   |
>{code}
> expect result
> {code:java}
>   EMPNO
> +---+
> |  null |
> +---+{code}
> actual result
> {code:java}
>   EMPNO
> +---+
> +---+ {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5866) Not all dialects support sorting in sub-queries

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754347#comment-17754347
 ] 

Julian Hyde commented on CALCITE-5866:
--

Orthogonal to SQL semantics is {{RelNode}} semantics. In a tree of logical 
RelNodes, order is irrelevant. For example, you can perform a {{Sort}} and 
follow it by a {{Limit 10}} and the limit is under no obligation to take the 
first 10 rows from the sort. In fact there is no 'first 10' because the output 
is an unordered collection.

Given that the input SQL has to pass through RelNode before it becomes output 
SQL, you should expect all ordering to disappear.

> Not all dialects support sorting in sub-queries
> ---
>
> Key: CALCITE-5866
> URL: https://issues.apache.org/jira/browse/CALCITE-5866
> Project: Calcite
>  Issue Type: Bug
>Reporter: Will Noble
>Priority: Minor
>
> The rel-to-sql converter inserts subqueries in certain situations, such as 
> when sorting by ordinal (see CALCITE-5768). Certain dialects, such as MSSQL 
> (SQL Server) ignore the {{ORDER BY}} clause in a subquery. We may need a new 
> dialect-level setting like {{canSortInSubQuery}} that dictates whether 
> Calcite can safely insert sub-queries with {{ORDER BY}} clauses in them, or 
> whether it must do everything in it's power to avoid that (such as refusing 
> to sort by ordinal in cases where not all sort expressions are included in 
> the {{SELECT}} list).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-5887) Trim function can’t work well with SqlDelegatingConformance

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754342#comment-17754342
 ] 

Julian Hyde edited comment on CALCITE-5887 at 8/14/23 11:59 PM:


Conformance determines what queries Calcite considers valid. It should only be 
used in the parser and validator. After that point, the meaning of a 
{{RexNode}} should be fully determined by the {{RexNode}}.

Rather than making conformance global, we should be looking at solutions like 
adding extra arguments to the {{RexCall}} to the {{TRIM}} function, or adding 
multiple {{TRIM}} functions with different behaviors (like how CALCITE-5831 
introduces a Spark-specific variant of {{SOUNDEX}}). 


was (Author: julianhyde):
Conformance determines what queries Calcite considers valid. It should only be 
used in the parser and validator. After that point, the meaning of a 
{{RexNode}} should be fully determined by the {{RexNode}}.

Rather than making conformance global, we should be looking at solutions like 
adding extra arguments to the {{RexCall}} to the {{TRIM}} function.

> Trim function can’t work well with SqlDelegatingConformance
> ---
>
> Key: CALCITE-5887
> URL: https://issues.apache.org/jira/browse/CALCITE-5887
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Guoliang Sun
>Priority: Major
>
> In Calcite, there are test cases that override this function, e.g.: 
> SqlFunctionsTest#testTrim.
> But the use case ignores the fact that the function is optimized in SQL with 
> rules, which I'll illustrate with an example below.
>  
> Take the following SQL as an example of the expected result: *calcite*
> {code:java}
> select trim(both 'io' from 'iocalciteio'){code}
>  
>  
> I wanted to use Calcite's SQLConformance by way of a custom extension, so I 
> inherited SqlDelegatingConformance, overrode the allowExtendedTrim method, 
> and set it when SqlParser was initialized.
>  
> The SQL is then optimized by a series of parsing transformations, and at this 
> point it comes to {*}CoreRules.PROJECT_REDUCE_EXPRESSIONS{*}, which calls 
> RexExecutorImpl to compute the RexNode expression, reducing the constants by 
> generating code for optimization purposes. In the 
> *{color:#172b4d}RexExecutorImpl#compile{color}* method, you can see that the 
> SqlConformance is written to Default, which will cause the custom 
> *SqlDelegatingConformance* set earlier to not take effect, see the code below.
> {code:java}
> private static String compile(RexBuilder rexBuilder, List constExps,
> RexToLixTranslator.InputGetter getter, RelDataType rowType) {
>   final RexProgramBuilder programBuilder =
>   new RexProgramBuilder(rowType, rexBuilder);
>   for (RexNode node : constExps) {
> programBuilder.addProject(
> node, "c" + programBuilder.getProjectList().size());
>   }
>   final RelDataTypeFactory typeFactory = rexBuilder.getTypeFactory();
>   final JavaTypeFactory javaTypeFactory = typeFactory instanceof 
> JavaTypeFactory
>   ? (JavaTypeFactory) typeFactory
>   : new JavaTypeFactoryImpl(typeFactory.getTypeSystem());
>   final BlockBuilder blockBuilder = new BlockBuilder();
>   final ParameterExpression root0_ =
>   Expressions.parameter(Object.class, "root0");
>   final ParameterExpression root_ = DataContext.ROOT;
>   blockBuilder.add(
>   Expressions.declare(
>   Modifier.FINAL, root_,
>   Expressions.convert_(root0_, DataContext.class)));
>   final SqlConformance conformance = SqlConformanceEnum.DEFAULT;
>   final RexProgram program = programBuilder.getProgram();
>   final List expressions =
>   RexToLixTranslator.translateProjects(program, javaTypeFactory,
>   conformance, blockBuilder, null, null, root_, getter, null);
>   blockBuilder.add(
>   Expressions.return_(null,
>   Expressions.newArrayInit(Object[].class, expressions)));
>   final MethodDeclaration methodDecl =
>   Expressions.methodDecl(Modifier.PUBLIC, Object[].class,
>   BuiltInMethod.FUNCTION1_APPLY.method.getName(),
>   ImmutableList.of(root0_), blockBuilder.toBlock());
>   String code = Expressions.toString(methodDecl);
>   if (CalciteSystemProperty.DEBUG.value()) {
> Util.debugCode(System.out, code);
>   }
>   return code;
> } {code}
>  
>  
> When you get {*}SqlConformanceEnum.DEFAULT{*}, you get a MethodCallExpression 
> in the *TrimImplementor#implementSafe* method that doesn't allow Trim to do 
> multiple character operations on SQL, so Calcite throws an exception in the 
> SqlFunctions#trim method.
> *{color:#de350b}Trim error: trim character must be exactly 1 character{color}*
>  
> Coincidentally, this exception is caught by the try catch in the 
> {color:#172b4d

[jira] [Commented] (CALCITE-5887) Trim function can’t work well with SqlDelegatingConformance

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754342#comment-17754342
 ] 

Julian Hyde commented on CALCITE-5887:
--

Conformance determines what queries Calcite considers valid. It should only be 
used in the parser and validator. After that point, the meaning of a 
{{RexNode}} should be fully determined by the {{RexNode}}.

Rather than making conformance global, we should be looking at solutions like 
adding extra arguments to the {{RexCall}} to the {{TRIM}} function.

> Trim function can’t work well with SqlDelegatingConformance
> ---
>
> Key: CALCITE-5887
> URL: https://issues.apache.org/jira/browse/CALCITE-5887
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Guoliang Sun
>Priority: Major
>
> In Calcite, there are test cases that override this function, e.g.: 
> SqlFunctionsTest#testTrim.
> But the use case ignores the fact that the function is optimized in SQL with 
> rules, which I'll illustrate with an example below.
>  
> Take the following SQL as an example of the expected result: *calcite*
> {code:java}
> select trim(both 'io' from 'iocalciteio'){code}
>  
>  
> I wanted to use Calcite's SQLConformance by way of a custom extension, so I 
> inherited SqlDelegatingConformance, overrode the allowExtendedTrim method, 
> and set it when SqlParser was initialized.
>  
> The SQL is then optimized by a series of parsing transformations, and at this 
> point it comes to {*}CoreRules.PROJECT_REDUCE_EXPRESSIONS{*}, which calls 
> RexExecutorImpl to compute the RexNode expression, reducing the constants by 
> generating code for optimization purposes. In the 
> *{color:#172b4d}RexExecutorImpl#compile{color}* method, you can see that the 
> SqlConformance is written to Default, which will cause the custom 
> *SqlDelegatingConformance* set earlier to not take effect, see the code below.
> {code:java}
> private static String compile(RexBuilder rexBuilder, List constExps,
> RexToLixTranslator.InputGetter getter, RelDataType rowType) {
>   final RexProgramBuilder programBuilder =
>   new RexProgramBuilder(rowType, rexBuilder);
>   for (RexNode node : constExps) {
> programBuilder.addProject(
> node, "c" + programBuilder.getProjectList().size());
>   }
>   final RelDataTypeFactory typeFactory = rexBuilder.getTypeFactory();
>   final JavaTypeFactory javaTypeFactory = typeFactory instanceof 
> JavaTypeFactory
>   ? (JavaTypeFactory) typeFactory
>   : new JavaTypeFactoryImpl(typeFactory.getTypeSystem());
>   final BlockBuilder blockBuilder = new BlockBuilder();
>   final ParameterExpression root0_ =
>   Expressions.parameter(Object.class, "root0");
>   final ParameterExpression root_ = DataContext.ROOT;
>   blockBuilder.add(
>   Expressions.declare(
>   Modifier.FINAL, root_,
>   Expressions.convert_(root0_, DataContext.class)));
>   final SqlConformance conformance = SqlConformanceEnum.DEFAULT;
>   final RexProgram program = programBuilder.getProgram();
>   final List expressions =
>   RexToLixTranslator.translateProjects(program, javaTypeFactory,
>   conformance, blockBuilder, null, null, root_, getter, null);
>   blockBuilder.add(
>   Expressions.return_(null,
>   Expressions.newArrayInit(Object[].class, expressions)));
>   final MethodDeclaration methodDecl =
>   Expressions.methodDecl(Modifier.PUBLIC, Object[].class,
>   BuiltInMethod.FUNCTION1_APPLY.method.getName(),
>   ImmutableList.of(root0_), blockBuilder.toBlock());
>   String code = Expressions.toString(methodDecl);
>   if (CalciteSystemProperty.DEBUG.value()) {
> Util.debugCode(System.out, code);
>   }
>   return code;
> } {code}
>  
>  
> When you get {*}SqlConformanceEnum.DEFAULT{*}, you get a MethodCallExpression 
> in the *TrimImplementor#implementSafe* method that doesn't allow Trim to do 
> multiple character operations on SQL, so Calcite throws an exception in the 
> SqlFunctions#trim method.
> *{color:#de350b}Trim error: trim character must be exactly 1 character{color}*
>  
> Coincidentally, this exception is caught by the try catch in the 
> {color:#172b4d}*RexExecutable#reduce*{color} method, which returns the 
> initial unoptimized expression, with no exception output, and after the final 
> execution is complete, you can only see that the result is not as expected.
> By the way, I don't think it's reasonable to output the exception message, if 
> the optimization fails, at least it should be logged for troubleshooting.
>  
> Regarding solution ideas, I can think of two ways at the moment and would 
> like to know what the community thinks, or if the community thinks this is a 
> problem?
>  * Change SQLConformance to gl

[jira] [Comment Edited] (CALCITE-5887) Trim function can’t work well with SqlDelegatingConformance

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17750592#comment-17750592
 ] 

Julian Hyde edited comment on CALCITE-5887 at 8/14/23 11:56 PM:


The problem was introduced by CALCITE-2571.


was (Author: JIRAUSER298470):
The problem was introduced by this issue:

https://issues.apache.org/jira/browse/CALCITE-2571

> Trim function can’t work well with SqlDelegatingConformance
> ---
>
> Key: CALCITE-5887
> URL: https://issues.apache.org/jira/browse/CALCITE-5887
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Guoliang Sun
>Priority: Major
>
> In Calcite, there are test cases that override this function, e.g.: 
> SqlFunctionsTest#testTrim.
> But the use case ignores the fact that the function is optimized in SQL with 
> rules, which I'll illustrate with an example below.
>  
> Take the following SQL as an example of the expected result: *calcite*
> {code:java}
> select trim(both 'io' from 'iocalciteio'){code}
>  
>  
> I wanted to use Calcite's SQLConformance by way of a custom extension, so I 
> inherited SqlDelegatingConformance, overrode the allowExtendedTrim method, 
> and set it when SqlParser was initialized.
>  
> The SQL is then optimized by a series of parsing transformations, and at this 
> point it comes to {*}CoreRules.PROJECT_REDUCE_EXPRESSIONS{*}, which calls 
> RexExecutorImpl to compute the RexNode expression, reducing the constants by 
> generating code for optimization purposes. In the 
> *{color:#172b4d}RexExecutorImpl#compile{color}* method, you can see that the 
> SqlConformance is written to Default, which will cause the custom 
> *SqlDelegatingConformance* set earlier to not take effect, see the code below.
> {code:java}
> private static String compile(RexBuilder rexBuilder, List constExps,
> RexToLixTranslator.InputGetter getter, RelDataType rowType) {
>   final RexProgramBuilder programBuilder =
>   new RexProgramBuilder(rowType, rexBuilder);
>   for (RexNode node : constExps) {
> programBuilder.addProject(
> node, "c" + programBuilder.getProjectList().size());
>   }
>   final RelDataTypeFactory typeFactory = rexBuilder.getTypeFactory();
>   final JavaTypeFactory javaTypeFactory = typeFactory instanceof 
> JavaTypeFactory
>   ? (JavaTypeFactory) typeFactory
>   : new JavaTypeFactoryImpl(typeFactory.getTypeSystem());
>   final BlockBuilder blockBuilder = new BlockBuilder();
>   final ParameterExpression root0_ =
>   Expressions.parameter(Object.class, "root0");
>   final ParameterExpression root_ = DataContext.ROOT;
>   blockBuilder.add(
>   Expressions.declare(
>   Modifier.FINAL, root_,
>   Expressions.convert_(root0_, DataContext.class)));
>   final SqlConformance conformance = SqlConformanceEnum.DEFAULT;
>   final RexProgram program = programBuilder.getProgram();
>   final List expressions =
>   RexToLixTranslator.translateProjects(program, javaTypeFactory,
>   conformance, blockBuilder, null, null, root_, getter, null);
>   blockBuilder.add(
>   Expressions.return_(null,
>   Expressions.newArrayInit(Object[].class, expressions)));
>   final MethodDeclaration methodDecl =
>   Expressions.methodDecl(Modifier.PUBLIC, Object[].class,
>   BuiltInMethod.FUNCTION1_APPLY.method.getName(),
>   ImmutableList.of(root0_), blockBuilder.toBlock());
>   String code = Expressions.toString(methodDecl);
>   if (CalciteSystemProperty.DEBUG.value()) {
> Util.debugCode(System.out, code);
>   }
>   return code;
> } {code}
>  
>  
> When you get {*}SqlConformanceEnum.DEFAULT{*}, you get a MethodCallExpression 
> in the *TrimImplementor#implementSafe* method that doesn't allow Trim to do 
> multiple character operations on SQL, so Calcite throws an exception in the 
> SqlFunctions#trim method.
> *{color:#de350b}Trim error: trim character must be exactly 1 character{color}*
>  
> Coincidentally, this exception is caught by the try catch in the 
> {color:#172b4d}*RexExecutable#reduce*{color} method, which returns the 
> initial unoptimized expression, with no exception output, and after the final 
> execution is complete, you can only see that the result is not as expected.
> By the way, I don't think it's reasonable to output the exception message, if 
> the optimization fails, at least it should be logged for troubleshooting.
>  
> Regarding solution ideas, I can think of two ways at the moment and would 
> like to know what the community thinks, or if the community thinks this is a 
> problem?
>  * Change SQLConformance to global
>  * Change CalciteConnectionConfigImpl to global
> This would allow you to get the SQLConformance information that has been set 
>

[jira] [Commented] (CALCITE-5888) Assertion error in aggregate

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754340#comment-17754340
 ] 

Julian Hyde commented on CALCITE-5888:
--

This is the {{assert}} in question:
{code}
assert groupSet.length() <= input.getRowType().getFieldCount();
{code}

I agree with [~nobigo]. We need a minimal SQL repro query. With that, maybe 
someone can identify a change to (say) {{RelBuilder}}.

It's also possible that previously you were running with asserts disabled.

> Assertion error in aggregate
> 
>
> Key: CALCITE-5888
> URL: https://issues.apache.org/jira/browse/CALCITE-5888
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Petr Masopust
>Priority: Major
>
> We have {{relBuilder.aggregate(groupKey, aggregateCalls)}} in our code with 
> values {{[AS($31, 'a_label_d_opp_stage_id_foodmart_bd4c9c91212f9f'), 
> AS(CAST(FLOOR($34, FLAG(QUARTER))):TIMESTAMP(0), 
> 'a_label_snapshot_timestamp_quarter_foodmart_89fe710c8628d9')]}} and 
> {{[COUNT(CASE(AND(COALESCE($37, false), =($36, $13)), $15, null:NULL)), 
> SUM(CASE(AND(COALESCE($37, false), =($36, $13)), $2, null:NULL)), 
> MAX(CASE(AND(COALESCE($37, false), =($36, $13)), 0, null:NULL)), 
> MAX(CASE(AND(COALESCE($37, false), =($36, $13)), 0, null:NULL))].}}
> It works perfectly in version 1.34.0 but in 1.35.0 we got this exception:
> {{java.lang.AssertionError}}
> {{    at org.apache.calcite.rel.core.Aggregate.(Aggregate.java:175)}}
> {{    at 
> org.apache.calcite.rel.logical.LogicalAggregate.(LogicalAggregate.java:72)}}
> {{    at 
> org.apache.calcite.rel.logical.LogicalAggregate.create_(LogicalAggregate.java:144)}}
> {{    at 
> org.apache.calcite.rel.logical.LogicalAggregate.create(LogicalAggregate.java:116)}}
> {{    at 
> org.apache.calcite.rel.core.RelFactories$AggregateFactoryImpl.createAggregate(RelFactories.java:328)}}
> {{    at 
> org.apache.calcite.tools.RelBuilder.aggregate_(RelBuilder.java:2564)}}
> {{    at 
> org.apache.calcite.tools.RelBuilder.aggregate_(RelBuilder.java:2511)}}
> {{    at org.apache.calcite.tools.RelBuilder.aggregate(RelBuilder.java:2348)}}
>  
> I think it is either missing {{permute}} or assert should compare 
> {{cardinality}} instead of {{length.}} Because it compares field index? to 
> number of fields which looks like nonsense to me.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5914) Cache compiled regular expressions in SQL function runtime

2023-08-14 Thread Julian Hyde (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated CALCITE-5914:
-
Description: 
Cache compiled regular expressions (and other amortized work) in SQL function 
runtime. Compiling a regular expression to a pattern is expensive (compared to 
the cost of matching, given an existing pattern) and therefore caching the 
compiled form will yield performance benefits if the regular expression is 
constant or has a small number of values.

Consider the following query:
{code:java}
SELECT ename, job, RLIKE(ename, 'A.*'), RLIKE(ename, job || '.*')
FROM emp
{code}
The first regular expression, '{{A.\*}}', is constant and can be compiled at 
prepare time or at the start of execution; the second regular expression, {{job 
|| '.\*'}}, might vary from one row to the next. However if the {{job}} column 
has a small number of values it still might be beneficial to cache the compiled 
regular expression.

If {{SqlFunctions.rlike}} could use a cache (mapping from {{String}} to 
{{{}java.util.regex.Pattern{}}}) then it would achieve benefits in both the 
constant and non-constant cases.

The cache needs to:
 * be thread-safe (in case queries are executing using multiple threads),
 * return thread-safe objects (as is {{{}Pattern{}}}),
 * have bounded space (so that a query doesn't blow memory with 1 million 
distinct regular expressions),
 * disposed after the query has terminated,
 * (ideally) share with regexes of the same language in the same query,
 * not conflict with regexes of different languages in the same query.

One possible implementation is to add an {{interface FunctionState}}, with 
subclasses including {{class RegexpCache}}, and if argument 1 of a function is 
a subclass of {{FunctionState}} the compiler would initialize the state in the 
generated code. The function can rely on the state argument being initialized, 
and being the same object from one call to the next. Example:
{code:java}
interface FunctionState {
}

class RegexpCache implements FunctionState {
  final Cache cache = ...;
}
{code}
This change should install the cache for all applicable functions, including 
LIKE, ILIKE, RLIKE, SIMILAR, posix regex, REGEXP_REPLACE (MySQL, Oracle), 
REGEXP_CONTAINS (BigQuery), other BigQuery REGEXP_ functions, PARSE_URL, 
JSON_REPLACE, PARSE_TIMESTAMP.

It can also be used for functions that have mutable state, e.g. {{RANDOM}} with 
and without a seed.

  was:
Cache compiled regular expressions (and other amortized work) in SQL function 
runtime. Compiling a regular expression to a pattern is expensive (compared to 
the cost of matching, given an existing pattern) and therefore caching the 
compiled form will yield performance benefits if the regular expression is 
constant or has a small number of values.

Consider the following query:
{code:java}
SELECT ename, job, RLIKE(ename, 'A.*'), RLIKE(ename, job || '.*')
FROM emp
{code}
The first regular expression, '{{A.\*}}', is constant and can be compiled at 
prepare time or at the start of execution; the second regular expression, {{job 
|| '.\*'}}, might vary from one row to the next. However if the {{job}} column 
has a small number of values it still might be beneficial to cache the compiled 
regular expression.

If {{SqlFunctions.rlike}} could use a cache (mapping from {{String}} to 
{{{}java.util.regex.Pattern{}}}) then it would achieve benefits in both the 
constant and non-constant cases.

The cache needs to:
 * be thread-safe (in case queries are executing using multiple threads),
 * return thread-safe objects (as is {{{}Pattern{}}}),
 * have bounded space (so that a query doesn't blow memory with 1 million 
distinct regular expressions),
 * disposed after the query has terminated,
 * (ideally) share with regexes of the same language in the same query,
 * not conflict with regexes of different languages in the same query.

One possible implementation is to add an {{interface FunctionState}}, with 
subclasses including {{class RegexpCache}}, and if argument 1 of a function is 
a subclass of {{FunctionState}} the compiler would initialize the state in the 
generated code. The function can rely on the state argument being initialized, 
and being the same object from one call to the next. Example:
{code:java}
interface FunctionState {
}

class RegexpCache implements FunctionState {
  final Cache cache = ...;
}
{code}
This change should install the cache for all applicable functions, including 
LIKE, ILIKE, RLIKE, SIMILAR, posix regex, REGEXP_REPLACE (MySQL, Oracle), 
REGEXP_CONTAINS (BigQuery), other BigQuery REGEXP_ functions, PARSE_URL, 
JSON_REPLACE, PARSE_TIMESTAMP.


> Cache compiled regular expressions in SQL function runtime
> --
>
> Key: CALCITE-5914
> URL: https://issues.apache.org/jira/browse/CALCITE-5914
> Project: Calci

[jira] [Commented] (CALCITE-5906) Add the visit(Sample e) implementation in RelToSqlConverter

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754338#comment-17754338
 ] 

Julian Hyde commented on CALCITE-5906:
--

The PR looks good. Can you rebase it onto main (which now contains CALCITE-5895 
and CALCITE-5916). Also change the summary (and commit message, and 
descriptions of test cases) so that it reads like a feature request, something 
like "JDBC adapter should generate TABLESAMPLE".

> Add the visit(Sample e) implementation  in RelToSqlConverter 
> -
>
> Key: CALCITE-5906
> URL: https://issues.apache.org/jira/browse/CALCITE-5906
> Project: Calcite
>  Issue Type: Bug
>Reporter: LakeShen
>Assignee: LakeShen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.36.0
>
> Attachments: image-2023-08-08-20-16-27-188.png, 
> image-2023-08-08-20-19-01-269.png
>
>
> Currently in the RelToSqlConverter class, there is no visit method to handle 
> the Sample RelNode.If a SQL plan has Sample RelNode, RelToSqlConverter will 
> throw a AssertionError like this:
> {code:java}
>  java.lang.AssertionError: Need to implement 
> org.apache.calcite.rel.core.Sample
> {code}
> This will cause the Sample RelNode plan to fail to translate to the 
> corresponding dialect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5917) Comments in TimeString class are incorrect

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754334#comment-17754334
 ] 

Julian Hyde commented on CALCITE-5917:
--

I agree with [~libenchao] that a PR is sufficient for minor changes such as 
fixing comments.

[~mbudiu], It's not obvious to me what you consider to be incorrect. Is it that 
we are don't 'set ... the field' but instead return a copy (because 
{{TimeString}} is immutable)? Or is it that we don't store as many significant 
digits as you hoped?

> Comments in TimeString class are incorrect
> --
>
> Key: CALCITE-5917
> URL: https://issues.apache.org/jira/browse/CALCITE-5917
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Priority: Trivial
>
> Here is an example of an incorrect comment:
> {code:java}
>   /** Sets the fraction field of a {@code TimeString} to a given number
>* of milliseconds. Nukes the value set via {@link #withNanos}.
>*
>* For example,
>* {@code new TimeString(1970, 1, 1, 2, 3, 4).withMillis(56)}
>* yields {@code TIME '1970-01-01 02:03:04.056'}. */
>   public TimeString withMillis(int millis) {
> {code}
> There are several such comments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-5743) Query gives incorrect result when COUNT appears in the correlated subquery select list

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754333#comment-17754333
 ] 

Julian Hyde edited comment on CALCITE-5743 at 8/14/23 10:56 PM:


[~shenlang], Here is a query on the built-in {{EMP}} table. The correct answer 
(shown) is one row. Calcite currently returns zero rows:
{code}
select *
from dept
where deptno in (
  select count(*)
  from emp
  where comm is null);
+++--+
| DEPTNO |   DNAME|   LOC|
+++--+
| 10 | ACCOUNTING | NEW YORK |
+++--+
{code}

({{EMP}} has 14 rows, 4 of which have a not-null value for {{comm}}.)


was (Author: julianhyde):
[~shenlang], Here is a query on the built-in EMP table. The correct answer 
(shown) is one row. Calcite currently returns zero rows:
{code}
select *
from dept
where deptno in (
  select count(*)
  from emp
  where comm is null);
+++--+
| DEPTNO |   DNAME|   LOC|
+++--+
| 10 | ACCOUNTING | NEW YORK |
+++--+
{code}

> Query gives incorrect result when COUNT appears in the correlated subquery 
> select list
> --
>
> Key: CALCITE-5743
> URL: https://issues.apache.org/jira/browse/CALCITE-5743
> Project: Calcite
>  Issue Type: Bug
>Reporter: libopeng
>Priority: Major
>
> {code:java}
> SELECT a 
> FROM t1 t1 
> WHERE b IN (SELECT COUNT (*) FROM t2 WHERE t1.a=t2.a); {code}
> {code:java}
>   t1   | t2
> +--+   |  +-+
> | a | b |  |  | a |
> +--+   |  +-+
> | 3 | 6 |  |  | 3 |
> | 10 | 1 | |  | 3 |
> | 8 | 0 |  |  | 10 | 
>|
> {code}
> correct result
> {code:java}
> +--+
> | a |
> +--+
> | 10 |
> | 8 |{code}
> after decorrelate
> {code:java}
> LogicalProject(A=[$0])
>   LogicalJoin(condition=[AND(=($0, $3), =($1, $2))], joinType=[inner])
>     LogicalTableScan(table=[[t1]])
>     LogicalFilter(condition=[=($0, $0)])
>       LogicalProject(EXPR$0=[$1], a=[$0])
>         LogicalAggregate(group=[{0}], EXPR$0=[COUNT()])
>           LogicalProject(a=[$0])
>             LogicalFilter(condition=[=($0, $0)])
>               LogicalTableScan(table=[[t2]]) {code}
> error result
> {code:java}
> +--+
> | a |
> +--+
> | 10 | {code}
> Data with count=0 will be lost
> This issue was discovered in [this 
> issue|https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-5568]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5743) Query gives incorrect result when COUNT appears in the correlated subquery select list

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754333#comment-17754333
 ] 

Julian Hyde commented on CALCITE-5743:
--

[~shenlang], Here is a query on the built-in EMP table. The correct answer 
(shown) is one row. Calcite currently returns zero rows:
{code}
select *
from dept
where deptno in (
  select count(*)
  from emp
  where comm is null);
+++--+
| DEPTNO |   DNAME|   LOC|
+++--+
| 10 | ACCOUNTING | NEW YORK |
+++--+
{code}

> Query gives incorrect result when COUNT appears in the correlated subquery 
> select list
> --
>
> Key: CALCITE-5743
> URL: https://issues.apache.org/jira/browse/CALCITE-5743
> Project: Calcite
>  Issue Type: Bug
>Reporter: libopeng
>Priority: Major
>
> {code:java}
> SELECT a 
> FROM t1 t1 
> WHERE b IN (SELECT COUNT (*) FROM t2 WHERE t1.a=t2.a); {code}
> {code:java}
>   t1   | t2
> +--+   |  +-+
> | a | b |  |  | a |
> +--+   |  +-+
> | 3 | 6 |  |  | 3 |
> | 10 | 1 | |  | 3 |
> | 8 | 0 |  |  | 10 | 
>|
> {code}
> correct result
> {code:java}
> +--+
> | a |
> +--+
> | 10 |
> | 8 |{code}
> after decorrelate
> {code:java}
> LogicalProject(A=[$0])
>   LogicalJoin(condition=[AND(=($0, $3), =($1, $2))], joinType=[inner])
>     LogicalTableScan(table=[[t1]])
>     LogicalFilter(condition=[=($0, $0)])
>       LogicalProject(EXPR$0=[$1], a=[$0])
>         LogicalAggregate(group=[{0}], EXPR$0=[COUNT()])
>           LogicalProject(a=[$0])
>             LogicalFilter(condition=[=($0, $0)])
>               LogicalTableScan(table=[[t2]]) {code}
> error result
> {code:java}
> +--+
> | a |
> +--+
> | 10 | {code}
> Data with count=0 will be lost
> This issue was discovered in [this 
> issue|https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-5568]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5914) Cache compiled regular expressions in SQL function runtime

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754330#comment-17754330
 ] 

Julian Hyde commented on CALCITE-5914:
--

Suppose {{FunctionState}} has a method {{void initialize(List 
arguments)}}. The implementation can look at those arguments, identify which 
are constant, and initialize itself accordingly.

> Cache compiled regular expressions in SQL function runtime
> --
>
> Key: CALCITE-5914
> URL: https://issues.apache.org/jira/browse/CALCITE-5914
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> Cache compiled regular expressions (and other amortized work) in SQL function 
> runtime. Compiling a regular expression to a pattern is expensive (compared 
> to the cost of matching, given an existing pattern) and therefore caching the 
> compiled form will yield performance benefits if the regular expression is 
> constant or has a small number of values.
> Consider the following query:
> {code:java}
> SELECT ename, job, RLIKE(ename, 'A.*'), RLIKE(ename, job || '.*')
> FROM emp
> {code}
> The first regular expression, '{{A.\*}}', is constant and can be compiled at 
> prepare time or at the start of execution; the second regular expression, 
> {{job || '.\*'}}, might vary from one row to the next. However if the {{job}} 
> column has a small number of values it still might be beneficial to cache the 
> compiled regular expression.
> If {{SqlFunctions.rlike}} could use a cache (mapping from {{String}} to 
> {{{}java.util.regex.Pattern{}}}) then it would achieve benefits in both the 
> constant and non-constant cases.
> The cache needs to:
>  * be thread-safe (in case queries are executing using multiple threads),
>  * return thread-safe objects (as is {{{}Pattern{}}}),
>  * have bounded space (so that a query doesn't blow memory with 1 million 
> distinct regular expressions),
>  * disposed after the query has terminated,
>  * (ideally) share with regexes of the same language in the same query,
>  * not conflict with regexes of different languages in the same query.
> One possible implementation is to add an {{interface FunctionState}}, with 
> subclasses including {{class RegexpCache}}, and if argument 1 of a function 
> is a subclass of {{FunctionState}} the compiler would initialize the state in 
> the generated code. The function can rely on the state argument being 
> initialized, and being the same object from one call to the next. Example:
> {code:java}
> interface FunctionState {
> }
> class RegexpCache implements FunctionState {
>   final Cache cache = ...;
> }
> {code}
> This change should install the cache for all applicable functions, including 
> LIKE, ILIKE, RLIKE, SIMILAR, posix regex, REGEXP_REPLACE (MySQL, Oracle), 
> REGEXP_CONTAINS (BigQuery), other BigQuery REGEXP_ functions, PARSE_URL, 
> JSON_REPLACE, PARSE_TIMESTAMP.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-5909) Sometimes SqlParserTest.testNoUnintendedNewReservedKeywords fails in the IDE but passes when run from the command line

2023-08-14 Thread Ran Tao (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754270#comment-17754270
 ] 

Ran Tao edited comment on CALCITE-5909 at 8/14/23 9:53 PM:
---

got it. we should follow by this. anyone help to review it will be appreciated.


was (Author: lemonjing):
got it.

> Sometimes SqlParserTest.testNoUnintendedNewReservedKeywords fails in the IDE 
> but passes when run from the command line
> --
>
> Key: CALCITE-5909
> URL: https://issues.apache.org/jira/browse/CALCITE-5909
> Project: Calcite
>  Issue Type: Bug
>  Components: tests
>Reporter: LakeShen
>Assignee: Ran Tao
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2023-08-08-23-32-55-466.png
>
>
> When I run the SqlParserTest,the testNoUnintendedNewReservedKeywords method 
> failed,the exception like this:
> {code:java}
> java.lang.AssertionError: The parser has at least one new reserved keyword. 
> Are you sure it should be reserved? Difference: {code}
> The picture like this:
> !image-2023-08-08-23-32-55-466.png|width=1543,height=496!
> I could fix this problem.More importantly, why is this method failing, but 
> the Calcite pipeline is passing? I think we should look at something we missed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5909) Sometimes SqlParserTest.testNoUnintendedNewReservedKeywords fails in the IDE but passes when run from the command line

2023-08-14 Thread Ran Tao (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754270#comment-17754270
 ] 

Ran Tao commented on CALCITE-5909:
--

got it.

> Sometimes SqlParserTest.testNoUnintendedNewReservedKeywords fails in the IDE 
> but passes when run from the command line
> --
>
> Key: CALCITE-5909
> URL: https://issues.apache.org/jira/browse/CALCITE-5909
> Project: Calcite
>  Issue Type: Bug
>  Components: tests
>Reporter: LakeShen
>Assignee: Ran Tao
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2023-08-08-23-32-55-466.png
>
>
> When I run the SqlParserTest,the testNoUnintendedNewReservedKeywords method 
> failed,the exception like this:
> {code:java}
> java.lang.AssertionError: The parser has at least one new reserved keyword. 
> Are you sure it should be reserved? Difference: {code}
> The picture like this:
> !image-2023-08-08-23-32-55-466.png|width=1543,height=496!
> I could fix this problem.More importantly, why is this method failing, but 
> the Calcite pipeline is passing? I think we should look at something we missed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-985) Validate MERGE

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754252#comment-17754252
 ] 

Julian Hyde commented on CALCITE-985:
-

[~jiajunbernoulli], I saw you requested that I re-review the PR again. I rarely 
have time to review PRs. As I said above, if you just made the changes I 
suggested, go ahead and merge to main.

> Validate MERGE
> --
>
> Key: CALCITE-985
> URL: https://issues.apache.org/jira/browse/CALCITE-985
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Jiajun Xie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> Now CALCITE-974 is fixed, we can validate INSERT, UPDATE and DELETE; 
> sql-to-rel conversion also succeeds, although I'm not sure we can execute. 
> MERGE does not yet pass validation.
> I have added SqlToRelConverterTest.testMerge; we need one or two validator 
> fixes to make it pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5921) SqlOperatorFixture.checkFails and checkAggFails don't check runtime failure

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754228#comment-17754228
 ] 

Julian Hyde commented on CALCITE-5921:
--

Is this case related to CALCITE-2535 and CALCITE-525?

> SqlOperatorFixture.checkFails and checkAggFails don't check runtime failure
> ---
>
> Key: CALCITE-5921
> URL: https://issues.apache.org/jira/browse/CALCITE-5921
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Runkang He
>Assignee: Runkang He
>Priority: Major
>
> SqlOperatorFixture.checkFails and checkAggFails don't check runtime failure. 
> See more in [code 
> line|https://github.com/apache/calcite/blob/50c3edfc3d6630528ab51fe836bd50df82cc7db8/testkit/src/main/java/org/apache/calcite/test/SqlOperatorFixtureImpl.java#L160].
>  When the parameter `runtime` of SqlOperatorFixture.checkFails is true, it 
> should execute the query and check runtime failure, but currently it ignores 
> this, and only checks the parse and validation failure.
> When fix this, there are 4 failed test cases in CalciteSqlOperatorTest.
> At last, this issue was found when to implement `BIT_GET` function in 
> CALCITE-5848.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5922) POSITION signature incorrect for SparkSQL

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754226#comment-17754226
 ] 

Julian Hyde commented on CALCITE-5922:
--

Does this bug concern parsing SQL, validating SQL, or generating SQL? You need 
to make it clear in the summary.

> POSITION signature incorrect for SparkSQL
> -
>
> Key: CALCITE-5922
> URL: https://issues.apache.org/jira/browse/CALCITE-5922
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: hongyu guo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> In SparkSQL, POSITION(substr, str[, pos]) function only accept 
> comma-separated when there are 3 arguments.
> For example:
> {code:java}
> // SparkSQL accepted SQL
> select POSITION('a', 'abc', 1);{code}
> Calcite will use the IN and FROM keyword to separate the input arguments when 
> unparsing.
> {code:java}
> // Calcite accepted and unparsed SQL
> select POSITION('a' IN 'abc' FROM 1){code}
> For 2 augument inputs, SparkSQL accept both syntaxes. So I think we should 
> write a rule in SparkSqlDialect to convert keyword-separated syntax to 
> comma-separted syntax for POSITION function.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5923) Some test cases in `SqlOperatorTest` violates the test fixture's design principle

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754224#comment-17754224
 ] 

Julian Hyde commented on CALCITE-5923:
--

Thanks for this. This is a useful change.

The commit message (and jira description) are a little misleading. There aren't 
widespread violations of the design principle. The summary should talk about 
using junit lifecycle to simplify parameterized tests.

Are there other places in Calcite that would benefit from Lifecycle.PER_CLASS?

> Some test cases in `SqlOperatorTest` violates the test fixture's design 
> principle
> -
>
> Key: CALCITE-5923
> URL: https://issues.apache.org/jira/browse/CALCITE-5923
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Runkang He
>Assignee: Runkang He
>Priority: Minor
>  Labels: pull-request-available
>
> There are some test cases in `SqlOperatorTest` directly use the 
> `SqlOperatorFixtureImpl.DEFAULT` to get the `SqlOperatorFixture`, including 
> `SqlOperatorTest.testCast` and many other test cases related with `CAST` 
> operator. This causes that the result check is missing when execute 
> `CalciteSqlOperatorTest`, which should has result check.
> This violates the design principle introduced by CALCITE-4885, which we 
> should alway use `SqlOperatorTest.fixture()` to get the `SqlOperatorFixture`. 
> This principle allows us to override`fixture()` method in subclasses to run 
> tests in a different environment.
> So I think we should fix these related test cases to keep consistent with the 
> principle.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5909) Sometimes SqlParserTest.testNoUnintendedNewReservedKeywords fails in the IDE but passes when run from the command line

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754219#comment-17754219
 ] 

Julian Hyde commented on CALCITE-5909:
--

Sorry, but please don't ask me personally to review stuff. When others see that 
I have been asked, they don't step forward to review instead. I have already 
made my contribution to this case. My contribution has been to get the design 
right.

> Sometimes SqlParserTest.testNoUnintendedNewReservedKeywords fails in the IDE 
> but passes when run from the command line
> --
>
> Key: CALCITE-5909
> URL: https://issues.apache.org/jira/browse/CALCITE-5909
> Project: Calcite
>  Issue Type: Bug
>  Components: tests
>Reporter: LakeShen
>Assignee: Ran Tao
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2023-08-08-23-32-55-466.png
>
>
> When I run the SqlParserTest,the testNoUnintendedNewReservedKeywords method 
> failed,the exception like this:
> {code:java}
> java.lang.AssertionError: The parser has at least one new reserved keyword. 
> Are you sure it should be reserved? Difference: {code}
> The picture like this:
> !image-2023-08-08-23-32-55-466.png|width=1543,height=496!
> I could fix this problem.More importantly, why is this method failing, but 
> the Calcite pipeline is passing? I think we should look at something we missed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5928) Add SortHomogenizeRule for homogenization sort fields for futher optimize

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754214#comment-17754214
 ] 

Julian Hyde commented on CALCITE-5928:
--

Thanks for logging this case. Complex cases may be difficult to solve with 
transformation rules because the goal (the interesting order required by, say, 
a join) may be several relational operators above the source of the ordering. 
But let's see how far we can go with stepwise transformation rules.

> Add SortHomogenizeRule for homogenization sort fields for futher optimize
> -
>
> Key: CALCITE-5928
> URL: https://issues.apache.org/jira/browse/CALCITE-5928
> Project: Calcite
>  Issue Type: New Feature
>Reporter: JingDas
>Assignee: JingDas
>Priority: Minor
>
> When an interesting order is pushed down, some columns may have to be 
> substituted with equivalent columns in the new context. This is referred to 
> as homogenuzatzon.
> For example, consider the following query:
>  
> {code:java}
> select * from sales.emp e
> left join (
>   select * from sales.dept d) d on e.deptno = d.deptno
> order by e.sal, d.deptno{code}
> The filed `e.sal`, `d.deptno` will not push down through join, because 
> d.deptno is unavalilable until after the join.
>  
> We can homogenize the sort fields from `e.sal`, `d.deptno` to `e.sal`, 
> `e.deptno`  since 
> e.deptno = d.deptno. Then sort fields can be pushed down by 
> `SortJoinTransposeRule`
> The theoretical basis comes from paper [Fundamental Techniques for Order 
> Optimization|https://dl.acm.org/doi/pdf/10.1145/233269.233320]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5894) Add SortRemoveRedundantRule to remove redundant sort fields if they are functionally dependent on other sort fields

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754212#comment-17754212
 ] 

Julian Hyde commented on CALCITE-5894:
--

I've also added some comments to CALCITE-5913. Let's start with a simple 
interface, and a simple implementation (FD = key dependency), and iterate.

> Add SortRemoveRedundantRule to remove redundant sort fields if they are 
> functionally dependent on other sort fields
> ---
>
> Key: CALCITE-5894
> URL: https://issues.apache.org/jira/browse/CALCITE-5894
> Project: Calcite
>  Issue Type: New Feature
>Reporter: JingDas
>Assignee: JingDas
>Priority: Minor
>  Labels: pull-request-available
>
> In some scene, Sort fields can be reduct, if sort fields contain unique key
> For example
> {code:java}
> SELECT ename, salary FROM Emp
> order by empno, ename{code}
> where `empno` is a key,  `ename` is redundant since `empno` alone is 
> sufficient to determine the order of any two records.
> So the SQL can be optimized as following:
> {code:java}
> SELECT name, Emp.salary FROM Emp
> order by empno{code}
> For another example:
> {code:java}
> SELECT e_agg.c, e_agg.ename
> FROM
> (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg
> ORDER BY e_agg.ename, e_agg.job, e_agg.c {code}
> Although `e_agg.ename` is not a key but field `ename` is unique and not null, 
> it can be optimized as following:
> {code:java}
> SELECT e_agg.c, e_agg.ename
> FROM (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg
> ORDER BY e_agg.ename, e_agg.job{code}
> Sorting is an expensive operation, however. Therefore, it is imperative that 
> sorting
> is optimized to avoid unnecessary sort field.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5913) Support to get functional dependency metadata in RelMetadataQuery

2023-08-14 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754211#comment-17754211
 ] 

Julian Hyde commented on CALCITE-5913:
--

[~thomas.rebele], I don't know the right interface. We need to balance ease of 
use for consumers of the metadata with ease of implementation for producers, 
and efficiency when caching and re-computing metadata.

The implementation of your proposed interface {{ImmutableBitSet 
getFunctionalDependentColumns(ImmutableBitSet columns)}} would in most cases be 
nothing more than calling seems to be no more than calling {{boolean 
isFunctionallyDetermined(ImmutableBitSet columns, int column)}} for each 
column, so there doesn't seem to be a clear performance win.

We will also want to ask which columns determine an expression ({{RexNode}}).

The hardest thing will be taking predicates into account. Those predicates are 
expressions, and those expressions are made up of columns, so we may have to 
deal with cycles. We may end up with an algorithm that iterates until it 
reaches a fixed point. Or it may be simpler to compute pointwise (my original 
proposed interface). When we have a corpus of tests in {{RelMetadataTest}}  - 
including some tough cases including predicates - we should reconsider the 
interface.

> Support to get functional dependency metadata in RelMetadataQuery
> -
>
> Key: CALCITE-5913
> URL: https://issues.apache.org/jira/browse/CALCITE-5913
> Project: Calcite
>  Issue Type: New Feature
>Reporter: JingDas
>Assignee: JingDas
>Priority: Major
>
> Functional dependency analysis can be applied to various problems in query 
> optimization:
> selectivity estimation, estimation of (intermediate) result sizes, *order 
> optimization*
> *(in particular sort avoidance),* cost estimation, and various problems in 
> the area of semantic query optimization, as said in the book《[Exploiting 
> Functional Dependence in Query 
> Optimization》|https://cs.uwaterloo.ca/research/tr/2000/11/CS-2000-11.thesis.pdf]
>  
> In calcite, it may be metadata that something like 'FunctionalDependency' 
> BuiltInMetadata as following:
> {code:java}
> public abstract class BuiltInMetadata {
>// ...
>public interface FunctionalDependency extends Metadata {
>/** Returns whether column is functionally dependent on columns. */
>Boolean functionallyDetermine(ImmutableBitSet columns, int column);
>}
> } {code}
>  
> As the above book said, functional dependency analysis is a valuable and 
> challenging work. I think support order optimization
> (in particular sort avoidance) by the relevant functional dependency metadata 
> firstly, and then get complete functional dependency function step by step.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-5894) Add SortRemoveRedundantRule to remove redundant sort fields if they are functionally dependent on other sort fields

2023-08-14 Thread JingDas (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753862#comment-17753862
 ] 

JingDas edited comment on CALCITE-5894 at 8/14/23 3:09 PM:
---

[~jhyde] I re-read this part of the paper again. I aggre with you, after “order 
homogenization”, interesting order can be push down. If the interesting order 
is above join, after push down,this give a chance to use `

EnumerableMergeJoin` instead of `EnumerableHashJoin` in the optimize because 
one input of join is sorted. In another scene after “order homogenization” sort 
maybe removed, which can not happen without “order homogenization”. I will log 
a new Jira case for “order homogenization” which is 
[CALCITE-5928|https://issues.apache.org/jira/browse/CALCITE-5928]


was (Author: JIRAUSER292370):
[~jhyde] I re-read this part of the paper again. I aggre with you, after “order 
homogenization”, interesting order can be push down. If the interesting order 
is above join, after push down,this give a chance to use `

EnumerableMergeJoin` instead of `EnumerableHashJoin` in the optimize because 
one input of join is sorted. In another scene after “order homogenization” sort 
maybe removed, which can not happen without “order homogenization”. I will log 
a new Jira case for “order homogenization”.

> Add SortRemoveRedundantRule to remove redundant sort fields if they are 
> functionally dependent on other sort fields
> ---
>
> Key: CALCITE-5894
> URL: https://issues.apache.org/jira/browse/CALCITE-5894
> Project: Calcite
>  Issue Type: New Feature
>Reporter: JingDas
>Assignee: JingDas
>Priority: Minor
>  Labels: pull-request-available
>
> In some scene, Sort fields can be reduct, if sort fields contain unique key
> For example
> {code:java}
> SELECT ename, salary FROM Emp
> order by empno, ename{code}
> where `empno` is a key,  `ename` is redundant since `empno` alone is 
> sufficient to determine the order of any two records.
> So the SQL can be optimized as following:
> {code:java}
> SELECT name, Emp.salary FROM Emp
> order by empno{code}
> For another example:
> {code:java}
> SELECT e_agg.c, e_agg.ename
> FROM
> (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg
> ORDER BY e_agg.ename, e_agg.job, e_agg.c {code}
> Although `e_agg.ename` is not a key but field `ename` is unique and not null, 
> it can be optimized as following:
> {code:java}
> SELECT e_agg.c, e_agg.ename
> FROM (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg
> ORDER BY e_agg.ename, e_agg.job{code}
> Sorting is an expensive operation, however. Therefore, it is imperative that 
> sorting
> is optimized to avoid unnecessary sort field.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (CALCITE-5928) Add SortHomogenizeRule for homogenization sort fields for futher optimize

2023-08-14 Thread JingDas (Jira)

JingDas created CALCITE-5928:


 Summary: Add SortHomogenizeRule for homogenization sort fields for 
futher optimize
 Key: CALCITE-5928
 URL: https://issues.apache.org/jira/browse/CALCITE-5928
 Project: Calcite
  Issue Type: New Feature
Reporter: JingDas
Assignee: JingDas


When an interesting order is pushed down, some columns may have to be 
substituted with equivalent columns in the new context. This is referred to as 
homogenuzatzon.

For example, consider the following query:

 
{code:java}
select * from sales.emp e
left join (
  select * from sales.dept d) d on e.deptno = d.deptno
order by e.sal, d.deptno{code}
The filed `e.sal`, `d.deptno` will not push down through join, because d.deptno 
is unavalilable until after the join.

 

We can homogenize the sort fields from `e.sal`, `d.deptno` to `e.sal`, 
`e.deptno`  since 

e.deptno = d.deptno. Then sort fields can be pushed down by 
`SortJoinTransposeRule`

The theoretical basis comes from paper [Fundamental Techniques for Order 
Optimization|https://dl.acm.org/doi/pdf/10.1145/233269.233320]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5881) Support to get foreign keys metadata in RelMetadataQuery

2023-08-14 Thread LakeShen (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754074#comment-17754074
 ] 

LakeShen commented on CALCITE-5881:
---

Mostly I think foreign keys are just table constraints. If someone else wants 
to know the unique key information, I think they can use 
RelMetadataQuery#getUniqueKeys or functional dependencies.

But that's just my idea,if I am wrong,pls correct me,thanks.

> Support to get foreign keys metadata in RelMetadataQuery
> 
>
> Key: CALCITE-5881
> URL: https://issues.apache.org/jira/browse/CALCITE-5881
> Project: Calcite
>  Issue Type: New Feature
>Reporter: JingDas
>Assignee: JingDas
>Priority: Major
>  Labels: pull-request-available
>
> We can get constraints by RelOptTable#getReferentialConstraints method, but 
> maybe can't get appropriate constraints at top relNode.
> For example:
> SQL:
> {code:java}
> SELECT DEPT.name, emp_agg.deptno, emp_agg.ename, DEPT.deptno
> FROM DEPT
> RIGHT JOIN
> (SELECT COUNT(sal), deptno, ename FROM EMP GROUP BY deptno, ename) emp_agg
> ON DEPT.deptno = emp_agg.deptno
> WHERE emp_agg.ename = 'job'{code}
> The relNode is:
> {code:java}
> LogicalProject(NAME=[$1], DEPTNO=[$3], ENAME=[$4], DEPTNO0=[$0])
>   LogicalFilter(condition=[=($4, 'job')])
>     LogicalJoin(condition=[=($0, $3)], joinType=[right])
>       LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>       LogicalProject(EXPR$0=[$2], DEPTNO=[$0], ENAME=[$1])
>         LogicalAggregate(group=[{0, 1}], EXPR$0=[COUNT($2)])
>           LogicalProject(DEPTNO=[$7], ENAME=[$1], SAL=[$5])
>             LogicalTableScan(table=[[CATALOG, SALES, EMPNULLABLES]]){code}
> where the foreign key is the DEPTNO column of CATALOG.SALES.EMP table,
> reference the DEPTNO unique column of CATALOG.SALES.DEPT table.
> When we want to get foreign keys metadata on `LogicalJoin` or top 
> `LogicalProject`, There is no such method currently, it seems that we should 
> trace the field column origin to get the foreign key and corresponding unique 
> key.
>  
> The final result of this feature is something likely as following:
> When we want to get foreign keys metadata on `LogicalJoin`, the `LogicalJoin` 
> rowType is
> {code:java}
> RecordType(INTEGER DEPTNO, VARCHAR(10) NAME, BIGINT EXPR$0, INTEGER DEPTNO0, 
> VARCHAR(20) ENAME).{code}
> We expect the foreign keys metadata:
> {code:java}
> foreignColumns bitset is {3}
> uniqueColumns bitset is {0}{code}
> When we want to get foreign keys metadata on top `LogicalProject`, the 
> `LogicalProject` rowType is
> {code:java}
> RecordType(VARCHAR(10) NAME, INTEGER DEPTNO, VARCHAR(20) ENAME, INTEGER 
> DEPTNO0).{code}
> We expect the foreigns key metadata:
> {code:java}
> foreignColumns bitset is {1}
> uniqueColumns bitset is {3}{code}
> All the foreign or unique columns is 0 based index.
> Foreign keys metadata is very useful in many optimize scenes. Such as it can 
> be used in join eliminate when join type is inner join and some other star 
> schema query optimize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-5881) Support to get foreign keys metadata in RelMetadataQuery

2023-08-14 Thread JingDas (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754069#comment-17754069
 ] 

JingDas edited comment on CALCITE-5881 at 8/14/23 12:19 PM:


[~shenlang] Foreign key is a special constrait, it contains  not only 
foreignKey but also the uniqueKey that it reference, and foreignKey can be 
combined. In some scene, such as we want to know the join condition is 
foreign-unique key or not. Maybe `RelMetadataQuery #getForeignKeys` is better, 
just like `RelMetadataQuery#getUniqueKeys`.


was (Author: JIRAUSER292370):
[~shenlang] Foreign key is a special constrait, it contains  not only 
foreignKey but also the uniqueKey that if reference, and foreignKey can be 
combined. In some scene, such as we want to know the join condition is 
foreign-unique key or not. Maybe `RelMetadataQuery #getForeignKeys` is better, 
just like `RelMetadataQuery#getUniqueKeys`.

> Support to get foreign keys metadata in RelMetadataQuery
> 
>
> Key: CALCITE-5881
> URL: https://issues.apache.org/jira/browse/CALCITE-5881
> Project: Calcite
>  Issue Type: New Feature
>Reporter: JingDas
>Assignee: JingDas
>Priority: Major
>  Labels: pull-request-available
>
> We can get constraints by RelOptTable#getReferentialConstraints method, but 
> maybe can't get appropriate constraints at top relNode.
> For example:
> SQL:
> {code:java}
> SELECT DEPT.name, emp_agg.deptno, emp_agg.ename, DEPT.deptno
> FROM DEPT
> RIGHT JOIN
> (SELECT COUNT(sal), deptno, ename FROM EMP GROUP BY deptno, ename) emp_agg
> ON DEPT.deptno = emp_agg.deptno
> WHERE emp_agg.ename = 'job'{code}
> The relNode is:
> {code:java}
> LogicalProject(NAME=[$1], DEPTNO=[$3], ENAME=[$4], DEPTNO0=[$0])
>   LogicalFilter(condition=[=($4, 'job')])
>     LogicalJoin(condition=[=($0, $3)], joinType=[right])
>       LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>       LogicalProject(EXPR$0=[$2], DEPTNO=[$0], ENAME=[$1])
>         LogicalAggregate(group=[{0, 1}], EXPR$0=[COUNT($2)])
>           LogicalProject(DEPTNO=[$7], ENAME=[$1], SAL=[$5])
>             LogicalTableScan(table=[[CATALOG, SALES, EMPNULLABLES]]){code}
> where the foreign key is the DEPTNO column of CATALOG.SALES.EMP table,
> reference the DEPTNO unique column of CATALOG.SALES.DEPT table.
> When we want to get foreign keys metadata on `LogicalJoin` or top 
> `LogicalProject`, There is no such method currently, it seems that we should 
> trace the field column origin to get the foreign key and corresponding unique 
> key.
>  
> The final result of this feature is something likely as following:
> When we want to get foreign keys metadata on `LogicalJoin`, the `LogicalJoin` 
> rowType is
> {code:java}
> RecordType(INTEGER DEPTNO, VARCHAR(10) NAME, BIGINT EXPR$0, INTEGER DEPTNO0, 
> VARCHAR(20) ENAME).{code}
> We expect the foreign keys metadata:
> {code:java}
> foreignColumns bitset is {3}
> uniqueColumns bitset is {0}{code}
> When we want to get foreign keys metadata on top `LogicalProject`, the 
> `LogicalProject` rowType is
> {code:java}
> RecordType(VARCHAR(10) NAME, INTEGER DEPTNO, VARCHAR(20) ENAME, INTEGER 
> DEPTNO0).{code}
> We expect the foreigns key metadata:
> {code:java}
> foreignColumns bitset is {1}
> uniqueColumns bitset is {3}{code}
> All the foreign or unique columns is 0 based index.
> Foreign keys metadata is very useful in many optimize scenes. Such as it can 
> be used in join eliminate when join type is inner join and some other star 
> schema query optimize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5881) Support to get foreign keys metadata in RelMetadataQuery

2023-08-14 Thread JingDas (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754069#comment-17754069
 ] 

JingDas commented on CALCITE-5881:
--

[~shenlang] Foreign key is a special constrait, it contains  not only 
foreignKey but also the uniqueKey that if reference, and foreignKey can be 
combined. In some scene, such as we want to know the join condition is 
foreign-unique key or not. Maybe `RelMetadataQuery #getForeignKeys` is better, 
just like `RelMetadataQuery#getUniqueKeys`.

> Support to get foreign keys metadata in RelMetadataQuery
> 
>
> Key: CALCITE-5881
> URL: https://issues.apache.org/jira/browse/CALCITE-5881
> Project: Calcite
>  Issue Type: New Feature
>Reporter: JingDas
>Assignee: JingDas
>Priority: Major
>  Labels: pull-request-available
>
> We can get constraints by RelOptTable#getReferentialConstraints method, but 
> maybe can't get appropriate constraints at top relNode.
> For example:
> SQL:
> {code:java}
> SELECT DEPT.name, emp_agg.deptno, emp_agg.ename, DEPT.deptno
> FROM DEPT
> RIGHT JOIN
> (SELECT COUNT(sal), deptno, ename FROM EMP GROUP BY deptno, ename) emp_agg
> ON DEPT.deptno = emp_agg.deptno
> WHERE emp_agg.ename = 'job'{code}
> The relNode is:
> {code:java}
> LogicalProject(NAME=[$1], DEPTNO=[$3], ENAME=[$4], DEPTNO0=[$0])
>   LogicalFilter(condition=[=($4, 'job')])
>     LogicalJoin(condition=[=($0, $3)], joinType=[right])
>       LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>       LogicalProject(EXPR$0=[$2], DEPTNO=[$0], ENAME=[$1])
>         LogicalAggregate(group=[{0, 1}], EXPR$0=[COUNT($2)])
>           LogicalProject(DEPTNO=[$7], ENAME=[$1], SAL=[$5])
>             LogicalTableScan(table=[[CATALOG, SALES, EMPNULLABLES]]){code}
> where the foreign key is the DEPTNO column of CATALOG.SALES.EMP table,
> reference the DEPTNO unique column of CATALOG.SALES.DEPT table.
> When we want to get foreign keys metadata on `LogicalJoin` or top 
> `LogicalProject`, There is no such method currently, it seems that we should 
> trace the field column origin to get the foreign key and corresponding unique 
> key.
>  
> The final result of this feature is something likely as following:
> When we want to get foreign keys metadata on `LogicalJoin`, the `LogicalJoin` 
> rowType is
> {code:java}
> RecordType(INTEGER DEPTNO, VARCHAR(10) NAME, BIGINT EXPR$0, INTEGER DEPTNO0, 
> VARCHAR(20) ENAME).{code}
> We expect the foreign keys metadata:
> {code:java}
> foreignColumns bitset is {3}
> uniqueColumns bitset is {0}{code}
> When we want to get foreign keys metadata on top `LogicalProject`, the 
> `LogicalProject` rowType is
> {code:java}
> RecordType(VARCHAR(10) NAME, INTEGER DEPTNO, VARCHAR(20) ENAME, INTEGER 
> DEPTNO0).{code}
> We expect the foreigns key metadata:
> {code:java}
> foreignColumns bitset is {1}
> uniqueColumns bitset is {3}{code}
> All the foreign or unique columns is 0 based index.
> Foreign keys metadata is very useful in many optimize scenes. Such as it can 
> be used in join eliminate when join type is inner join and some other star 
> schema query optimize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-5915) Missing SQL hints in not-expanded subqueries.

2023-08-14 Thread Vladimir Steshin (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754067#comment-17754067
 ] 

Vladimir Steshin edited comment on CALCITE-5915 at 8/14/23 12:15 PM:
-

[~shenlang], thank you. I've realized what's going on. We use 
{code:java}
CoreRules.FILTER_SUB_QUERY_TO_CORRELATE,
CoreRules.PROJECT_SUB_QUERY_TO_CORRELATE,
CoreRules.JOIN_SUB_QUERY_TO_CORRELATE
{code}
but we have no call of _RelDecorrelator#decorrelateQuery_. And the sub-query 
transformation omits hints propagation:
{code:java}
public static RelNode RelOptUtil#propagateRelHints(RelNode originalRel, RelNode 
equiv) {
if (!(originalRel instanceof Hintable)
|| ((Hintable) originalRel).getHints().size() == 0) {
// Exits here.
  return equiv;
}
 ...
  }
{code}
This happens because after the sub-queries change, hints appear below this 
code's _RelNode originalRel_. It has no hints, the propagation stops. Example:
{code:java}
LogicalProject
   LogicalFilter (had no hints, *propagateRelHints(RelNode, RelNode) exists*)
  LogicalJoin
 left: TableScan
 right: LogicalAggregate
LogicalProject (has *hints to propagate* down)
   LogicalFilter
  LogicalTableScan (*waits for the hints*)
{code}
To me, it is a bit weird that the propagation checks current node's hints and 
exists. Hints might be extracted further. Well, I just call 
_propagateRelHints(RelNode, boolean)_ after that rule set. Works.




was (Author: vladsz83):
[~shenlang], thank you. I've realized what's going on. We use 
{code:java}
CoreRules.FILTER_SUB_QUERY_TO_CORRELATE,
CoreRules.PROJECT_SUB_QUERY_TO_CORRELATE,
CoreRules.JOIN_SUB_QUERY_TO_CORRELATE
{code}
but we have no call of _RelDecorrelator#decorrelateQuery_. And the subquery 
transformation omits hints propagation:
{code:java}
public static RelNode RelOptUtil#propagateRelHints(RelNode originalRel, RelNode 
equiv) {
if (!(originalRel instanceof Hintable)
|| ((Hintable) originalRel).getHints().size() == 0) {
// Exists here.
  return equiv;
}
 ...
  }
{code}
This happens because after the subqueries change hints appear below this code's 
_RelNode originalRel_. It has no hints, the propagation stops. Example:
{code:java}
LogicalProject
   LogicalFilter (had no hints, *propagateRelHints(RelNode, RelNode) exists*)
  LogicalJoin
 left: TableScan
 right: LogicalAggregate
LogicalProject (has *hints to propagate* down)
   LogicalFilter
  LogicalTableScan (*waits for the hints*)
{code}
To me it is a bit weird that the propagation checks current node's hints and 
exists. Hints might be extracted further. Well, I just call 
_propagateRelHints(RelNode, boolean)_ after that rule set. Works.



> Missing SQL hints in not-expanded subqueries. 
> --
>
> Key: CALCITE-5915
> URL: https://issues.apache.org/jira/browse/CALCITE-5915
> Project: Calcite
>  Issue Type: Bug
>Reporter: Vladimir Steshin
>Priority: Minor
> Attachments: SqlHintsInSubqueriesWithDisabledExpanding.java, 
> image-2023-08-13-23-11-56-589.png
>
>
> Not sure if it is a bug becuase the query plans in this case might not be 
> final and require further expanding, but the SQL hints might not be 
> propagated to subqueries when SqlToRelConverter#withExpand==false. This 
> happens because hints are pushed down with 
> _RelOptUtil#RelHintPropagateShuttle_ and 
> _RelOptUtil#SubTreeHintPropagateShuttle_ which travese through 
> _RelNode#getInputs()_. But what if node is not accessible as other node’s 
> input like keeping in _LogicalFilter#condition_? Then the shuttles skip such 
> nodes.
> Test attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5915) Missing SQL hints in not-expanded subqueries.

2023-08-14 Thread Vladimir Steshin (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754067#comment-17754067
 ] 

Vladimir Steshin commented on CALCITE-5915:
---

[~shenlang], thank you. I've realized what's going on. We use 
{code:java}
CoreRules.FILTER_SUB_QUERY_TO_CORRELATE,
CoreRules.PROJECT_SUB_QUERY_TO_CORRELATE,
CoreRules.JOIN_SUB_QUERY_TO_CORRELATE
{code}
but we have no call of _RelDecorrelator#decorrelateQuery_. And the subquery 
transformation omits hints propagation:
{code:java}
public static RelNode RelOptUtil#propagateRelHints(RelNode originalRel, RelNode 
equiv) {
if (!(originalRel instanceof Hintable)
|| ((Hintable) originalRel).getHints().size() == 0) {
// Exists here.
  return equiv;
}
 ...
  }
{code}
This happens because after the subqueries change hints appear below this code's 
_RelNode originalRel_. It has no hints, the propagation stops. Example:
{code:java}
LogicalProject
   LogicalFilter (had no hints, *propagateRelHints(RelNode, RelNode) exists*)
  LogicalJoin
 left: TableScan
 right: LogicalAggregate
LogicalProject (has *hints to propagate* down)
   LogicalFilter
  LogicalTableScan (*waits for the hints*)
{code}
To me it is a bit weird that the propagation checks current node's hints and 
exists. Hints might be extracted further. Well, I just call 
_propagateRelHints(RelNode, boolean)_ after that rule set. Works.



> Missing SQL hints in not-expanded subqueries. 
> --
>
> Key: CALCITE-5915
> URL: https://issues.apache.org/jira/browse/CALCITE-5915
> Project: Calcite
>  Issue Type: Bug
>Reporter: Vladimir Steshin
>Priority: Minor
> Attachments: SqlHintsInSubqueriesWithDisabledExpanding.java, 
> image-2023-08-13-23-11-56-589.png
>
>
> Not sure if it is a bug becuase the query plans in this case might not be 
> final and require further expanding, but the SQL hints might not be 
> propagated to subqueries when SqlToRelConverter#withExpand==false. This 
> happens because hints are pushed down with 
> _RelOptUtil#RelHintPropagateShuttle_ and 
> _RelOptUtil#SubTreeHintPropagateShuttle_ which travese through 
> _RelNode#getInputs()_. But what if node is not accessible as other node’s 
> input like keeping in _LogicalFilter#condition_? Then the shuttles skip such 
> nodes.
> Test attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5913) Support to get functional dependency metadata in RelMetadataQuery

2023-08-14 Thread JingDas (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754065#comment-17754065
 ] 

JingDas commented on CALCITE-5913:
--

[~thomas.rebele] I got your point, it sounds reasonable. CALCITE-5894 is a 
minimal implementation and will not contain the method as following:
{code:java}
ImmutableBitSet getFunctionalDependentColumns(ImmutableBitSet columns); 
List getNontrivialFunctionalDependencies();  {code}
This is a umbrella jira, as functional dependency may be complex, We can work 
on it together. If you have a certain idea and code implementation. you may log 
another Jira case for specific design and work on it.

> Support to get functional dependency metadata in RelMetadataQuery
> -
>
> Key: CALCITE-5913
> URL: https://issues.apache.org/jira/browse/CALCITE-5913
> Project: Calcite
>  Issue Type: New Feature
>Reporter: JingDas
>Assignee: JingDas
>Priority: Major
>
> Functional dependency analysis can be applied to various problems in query 
> optimization:
> selectivity estimation, estimation of (intermediate) result sizes, *order 
> optimization*
> *(in particular sort avoidance),* cost estimation, and various problems in 
> the area of semantic query optimization, as said in the book《[Exploiting 
> Functional Dependence in Query 
> Optimization》|https://cs.uwaterloo.ca/research/tr/2000/11/CS-2000-11.thesis.pdf]
>  
> In calcite, it may be metadata that something like 'FunctionalDependency' 
> BuiltInMetadata as following:
> {code:java}
> public abstract class BuiltInMetadata {
>// ...
>public interface FunctionalDependency extends Metadata {
>/** Returns whether column is functionally dependent on columns. */
>Boolean functionallyDetermine(ImmutableBitSet columns, int column);
>}
> } {code}
>  
> As the above book said, functional dependency analysis is a valuable and 
> challenging work. I think support order optimization
> (in particular sort avoidance) by the relevant functional dependency metadata 
> firstly, and then get complete functional dependency function step by step.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5894) Add SortRemoveRedundantRule to remove redundant sort fields if they are functionally dependent on other sort fields

2023-08-14 Thread JingDas (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754054#comment-17754054
 ] 

JingDas commented on CALCITE-5894:
--

[~thomas.rebele] As discussed in  CALCITE-5913 the first method as following:
{code:java}
/** Returns whether column is functionally dependent on columns. */
   Boolean functionallyDetermine(ImmutableBitSet columns, int column); 
{code}
will be added in this Jira case, and give it the minimal implementation to 
satisfy the need of `SortRemoveRedundantRule`. 

> Add SortRemoveRedundantRule to remove redundant sort fields if they are 
> functionally dependent on other sort fields
> ---
>
> Key: CALCITE-5894
> URL: https://issues.apache.org/jira/browse/CALCITE-5894
> Project: Calcite
>  Issue Type: New Feature
>Reporter: JingDas
>Assignee: JingDas
>Priority: Minor
>  Labels: pull-request-available
>
> In some scene, Sort fields can be reduct, if sort fields contain unique key
> For example
> {code:java}
> SELECT ename, salary FROM Emp
> order by empno, ename{code}
> where `empno` is a key,  `ename` is redundant since `empno` alone is 
> sufficient to determine the order of any two records.
> So the SQL can be optimized as following:
> {code:java}
> SELECT name, Emp.salary FROM Emp
> order by empno{code}
> For another example:
> {code:java}
> SELECT e_agg.c, e_agg.ename
> FROM
> (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg
> ORDER BY e_agg.ename, e_agg.job, e_agg.c {code}
> Although `e_agg.ename` is not a key but field `ename` is unique and not null, 
> it can be optimized as following:
> {code:java}
> SELECT e_agg.c, e_agg.ename
> FROM (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg
> ORDER BY e_agg.ename, e_agg.job{code}
> Sorting is an expensive operation, however. Therefore, it is imperative that 
> sorting
> is optimized to avoid unnecessary sort field.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5894) Add SortRemoveRedundantRule to remove redundant sort fields if they are functionally dependent on other sort fields

2023-08-14 Thread Thomas Rebele (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754042#comment-17754042
 ] 

Thomas Rebele commented on CALCITE-5894:


I've added some comments on the design of the FunctionalDependency interface, 
as I thought it would be discussed there. I would appreciate if the interface 
could be made a bit more general, so that I can use it to align an already 
existing optimization algorithm to use the new FD interface. Will CALCITE-5913 
be merged before this ticket?

> Add SortRemoveRedundantRule to remove redundant sort fields if they are 
> functionally dependent on other sort fields
> ---
>
> Key: CALCITE-5894
> URL: https://issues.apache.org/jira/browse/CALCITE-5894
> Project: Calcite
>  Issue Type: New Feature
>Reporter: JingDas
>Assignee: JingDas
>Priority: Minor
>  Labels: pull-request-available
>
> In some scene, Sort fields can be reduct, if sort fields contain unique key
> For example
> {code:java}
> SELECT ename, salary FROM Emp
> order by empno, ename{code}
> where `empno` is a key,  `ename` is redundant since `empno` alone is 
> sufficient to determine the order of any two records.
> So the SQL can be optimized as following:
> {code:java}
> SELECT name, Emp.salary FROM Emp
> order by empno{code}
> For another example:
> {code:java}
> SELECT e_agg.c, e_agg.ename
> FROM
> (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg
> ORDER BY e_agg.ename, e_agg.job, e_agg.c {code}
> Although `e_agg.ename` is not a key but field `ename` is unique and not null, 
> it can be optimized as following:
> {code:java}
> SELECT e_agg.c, e_agg.ename
> FROM (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg
> ORDER BY e_agg.ename, e_agg.job{code}
> Sorting is an expensive operation, however. Therefore, it is imperative that 
> sorting
> is optimized to avoid unnecessary sort field.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5913) Support to get functional dependency metadata in RelMetadataQuery

2023-08-14 Thread Thomas Rebele (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754038#comment-17754038
 ] 

Thomas Rebele commented on CALCITE-5913:


[~jingda], could you give an example what you mean with returning 
{{Set}} for {{getFunctionalDependentColumns(ImmutableBitSet 
columns)}} being helpful? Just returning an ImmutableBitSet should be enough, 
as the following functional dependencies are equivalent, because of 
[https://en.wikipedia.org/wiki/Armstrong%27s_axioms#Decomposition] and 
[https://en.wikipedia.org/wiki/Armstrong%27s_axioms#Union_(Notation)]:
 * X -> Y and X -> Z (represented as a {{{}Set{}}})
 * X -> YZ (represented as a {{{}ImmutableBitSet{}}})

About the usefulness of {{{}List 
getNontrivialFunctionalDependencies(){}}}: I actually have an algorithm that 
needs all the FDs to do some advanced optimizations. Reconstructing them from 
{{boolean isFunctionallyDetermined(ImmutableBitSet columns, int column)}} would 
be possible, but it does not have the best performance. Therefore I propose to 
provide three methods in {{{}FunctionalDependencies{}}}. To make things easier 
for the libraries that use Calcite, a class that implements the interface could 
be provided. Here a pseudo-code example:
{code:java}
FDBuilder b = new FDBuilder();
b.addFD({empno}, {ssn, ename, sal, comm});
b.addFD({ssn}, {empno, ename, sal, comm});
FunctionalDependenciesImpl fds = b.build();{code}
The object is cached by the class generated by RelMetadataHandlerGeneratorUtil. 
It could be cached at the schema level as well.

> Support to get functional dependency metadata in RelMetadataQuery
> -
>
> Key: CALCITE-5913
> URL: https://issues.apache.org/jira/browse/CALCITE-5913
> Project: Calcite
>  Issue Type: New Feature
>Reporter: JingDas
>Assignee: JingDas
>Priority: Major
>
> Functional dependency analysis can be applied to various problems in query 
> optimization:
> selectivity estimation, estimation of (intermediate) result sizes, *order 
> optimization*
> *(in particular sort avoidance),* cost estimation, and various problems in 
> the area of semantic query optimization, as said in the book《[Exploiting 
> Functional Dependence in Query 
> Optimization》|https://cs.uwaterloo.ca/research/tr/2000/11/CS-2000-11.thesis.pdf]
>  
> In calcite, it may be metadata that something like 'FunctionalDependency' 
> BuiltInMetadata as following:
> {code:java}
> public abstract class BuiltInMetadata {
>// ...
>public interface FunctionalDependency extends Metadata {
>/** Returns whether column is functionally dependent on columns. */
>Boolean functionallyDetermine(ImmutableBitSet columns, int column);
>}
> } {code}
>  
> As the above book said, functional dependency analysis is a valuable and 
> challenging work. I think support order optimization
> (in particular sort avoidance) by the relevant functional dependency metadata 
> firstly, and then get complete functional dependency function step by step.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-5909) Sometimes SqlParserTest.testNoUnintendedNewReservedKeywords fails in the IDE but passes when run from the command line

2023-08-14 Thread Ran Tao (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753909#comment-17753909
 ] 

Ran Tao edited comment on CALCITE-5909 at 8/14/23 11:10 AM:


[~julianhyde] yes. sure. [https://github.com/apache/calcite/pull/3366] If you 
have time, pls help to review it? also welcome other developers to review it, 
thanks.


was (Author: lemonjing):
[~julianhyde] yes. sure. when the PR is ready, i will ping you. thanks

> Sometimes SqlParserTest.testNoUnintendedNewReservedKeywords fails in the IDE 
> but passes when run from the command line
> --
>
> Key: CALCITE-5909
> URL: https://issues.apache.org/jira/browse/CALCITE-5909
> Project: Calcite
>  Issue Type: Bug
>  Components: tests
>Reporter: LakeShen
>Assignee: Ran Tao
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2023-08-08-23-32-55-466.png
>
>
> When I run the SqlParserTest,the testNoUnintendedNewReservedKeywords method 
> failed,the exception like this:
> {code:java}
> java.lang.AssertionError: The parser has at least one new reserved keyword. 
> Are you sure it should be reserved? Difference: {code}
> The picture like this:
> !image-2023-08-08-23-32-55-466.png|width=1543,height=496!
> I could fix this problem.More importantly, why is this method failing, but 
> the Calcite pipeline is passing? I think we should look at something we missed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-5920) Reset PERCENTILE_CONT/PERCENTILE_DISC to ReservedKeyWords

2023-08-14 Thread Ran Tao (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754029#comment-17754029
 ] 

Ran Tao edited comment on CALCITE-5920 at 8/14/23 10:53 AM:


PR link: [https://github.com/apache/calcite/pull/3366]
(one PR contains two commits by following the commiter's suggestion)


was (Author: lemonjing):
PR link: [https://github.com/apache/calcite/pull/3366]

> Reset PERCENTILE_CONT/PERCENTILE_DISC to ReservedKeyWords
> -
>
> Key: CALCITE-5920
> URL: https://issues.apache.org/jira/browse/CALCITE-5920
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Ran Tao
>Assignee: Ran Tao
>Priority: Major
> Fix For: 1.36.0
>
>
> In https://issues.apache.org/jira/browse/CALCITE-5564, it changed 
> PERCENTILE_CONT/PERCENTILE_DISC to NonReservedKeyWords, however they are 
> ReservedKeywords from sql-2008.
> we should correct it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5920) Reset PERCENTILE_CONT/PERCENTILE_DISC to ReservedKeyWords

2023-08-14 Thread Ran Tao (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754029#comment-17754029
 ] 

Ran Tao commented on CALCITE-5920:
--

PR link: [https://github.com/apache/calcite/pull/3366]

> Reset PERCENTILE_CONT/PERCENTILE_DISC to ReservedKeyWords
> -
>
> Key: CALCITE-5920
> URL: https://issues.apache.org/jira/browse/CALCITE-5920
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Ran Tao
>Assignee: Ran Tao
>Priority: Major
> Fix For: 1.36.0
>
>
> In https://issues.apache.org/jira/browse/CALCITE-5564, it changed 
> PERCENTILE_CONT/PERCENTILE_DISC to NonReservedKeyWords, however they are 
> ReservedKeywords from sql-2008.
> we should correct it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-5830) Add ARRAY_INSERT function(enabled in Spark library)

2023-08-14 Thread Ran Tao (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753883#comment-17753883
 ] 

Ran Tao edited comment on CALCITE-5830 at 8/14/23 8:17 AM:
---

[~jiajunbernoulli] hi, jiajun. I get your point. however, the result and 
behavior of yours is wrong. IMHO, firstly, i think in calcite we should respect 
Apache Spark not Databricks Spark. Of cause, in your case I have got some 
details: 

apache-spark-3.4.0:
{code:java}
spark-sql (default)> SELECT array_insert(array('a', 'b', 'c'), -1, 'z');
["a","b","z","c"]
Time taken: 4.478 seconds, Fetched 1 row(s) {code}
apache-spark-3.4.1
{code:java}
spark-sql (default)> SELECT array_insert(array('a', 'b', 'c'), -5, 'z');
["z",null,null,"a","b","c"]
Time taken: 3.587 seconds, Fetched 1 row(s)
 {code}
 

1.this function in databricks's doc is a old version(it's wrong and fixed 
immediately, however the doc is not updated), I opened a discussion in spark 
devs: [https://lists.apache.org/thread/1p5hkql96k5qc5ww6wkd7mq6qdbgyz1n] it was 
doc error, they will update it.

>> Forward to dev

Yes, the databricks runtime 13.0 and 13.1 and 13.2 are all ok and have the same 
behavior with open source Apache Spark 3.4.x.
But I think the docs of databricks need to be updated[1]. It's confusing. - Ran 
Tao

>> Oh I get it. Let me report to the docs people (I'm at databricks) – Sean Owen

 

2.here is the some stable and latest databricks's spark behavior, it's same as 
Open Source Apache Spark behavior, and also same as the implementation in this 
PR. And this is correct result. you can test in databricks either.

!image-2023-08-14-11-20-46-189.png|width=583,height=506!

Finally, spark has this behavior because it distinguish the prepends(negative 
index) and appends(positive index). I have replied in the PR for you.


was (Author: lemonjing):
[~jiajunbernoulli] hi, jiajun. I get your point. however, the result and 
behavior of yours is wrong. IMHO, firstly, i think in calcite we should respect 
Apache Spark not Databricks Spark. Of cause, in your case I have got some 
details: 

apache-spark-3.4.0:
{code:java}
spark-sql (default)> SELECT array_insert(array('a', 'b', 'c'), -1, 'z');
["a","b","z","c"]
Time taken: 4.478 seconds, Fetched 1 row(s) {code}
apache-spark-3.4.1
{code:java}
spark-sql (default)> SELECT array_insert(array('a', 'b', 'c'), -5, 'z');
["z",null,null,"a","b","c"]
Time taken: 3.587 seconds, Fetched 1 row(s)
 {code}
 

1.this function in databricks's doc is a old version(it's wrong and fixed 
immediately, however the doc is not updated), I opened a discussion in spark 
devs: [https://lists.apache.org/thread/1p5hkql96k5qc5ww6wkd7mq6qdbgyz1n] it was 
doc error, they will update it.

2.here is the some stable and latest databricks's spark behavior, it's same as 
Open Source Apache Spark behavior, and also same as the implementation in this 
PR. And this is correct result. you can test in databricks either.

!image-2023-08-14-11-20-46-189.png|width=583,height=506!

Finally, spark has this behavior because it distinguish the prepends(negative 
index) and appends(positive index). I have replied in the PR for you.

> Add ARRAY_INSERT function(enabled in Spark library)
> ---
>
> Key: CALCITE-5830
> URL: https://issues.apache.org/jira/browse/CALCITE-5830
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.34.0
>Reporter: Ran Tao
>Assignee: Ran Tao
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-08-14-11-17-33-205.png, 
> image-2023-08-14-11-20-46-189.png
>
>
> array_insert(x, pos, val) - Places val into index pos of array x. Array 
> indices start at 1, or start from the end if index is negative. Index above 
> array size appends the array, or prepends the array if index is negative, 
> with 'null' elements
> *Examples:*
> > SELECT array_insert(array(1, 2, 3, 4), 5, 5); [1,2,3,4,5]
> > SELECT array_insert(array(5, 3, 2, 1), -3, 4); [5,4,3,2,1] 
> https://spark.apache.org/docs/latest/api/sql/index.html#array_insert



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

53 matches

Mail list logo