[jira] [Commented] (CALCITE-3079) Successive dependent windows cannot be implemented in same expression level
[ https://issues.apache.org/jira/browse/CALCITE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843691#comment-16843691 ] Feng Zhu commented on CALCITE-3079: --- I opened a PR(https://github.com/apache/calcite/pull/1220) for this issue. > Successive dependent windows cannot be implemented in same expression level > --- > > Key: CALCITE-3079 > URL: https://issues.apache.org/jira/browse/CALCITE-3079 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.20.0 >Reporter: Feng Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Recently, we encountered an IndexOutOfBoundsException when running a > complicated query containing successive dependent windows.The issue can be > reproduced by the following simple query on table *t1(a, b, c)*. > {code:java} > Q1: > select sum(s) over (partition by aa) as ss " + > from ( > select a as aa, sum(b) over (partition by a, c) as s > from t1 > ) t2";{code} > The exception is: > {code:java} > Exception in thread "main" java.sql.SQLException: Error while executing SQL > "select sum(s) over (partition by aa) as ss from (select a as aa, sum(b) over > (partition by a, c) as s from t1) t2": index (0) must be less than size (0) > at org.apache.calcite.avatica.Helper.createException(Helper.java:56) > at org.apache.calcite.avatica.Helper.createException(Helper.java:41) > at > org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:163) > at > org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:227) > at org.apache.calcite.JDBCDemo.main(JDBCDemo.java:70) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) > Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less than > size (0) > at > com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:310) > at > com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:293) > at > com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:67) > at > org.apache.calcite.adapter.enumerable.EnumUtils$2.get(EnumUtils.java:115) > at > org.apache.calcite.adapter.enumerable.EnumUtils$2.get(EnumUtils.java:110) > at > org.apache.calcite.adapter.enumerable.EnumerableWindow.lambda$implement$0(EnumerableWindow.java:442) > at > org.apache.calcite.adapter.enumerable.EnumerableWindow$3.rexArguments(EnumerableWindow.java:854) > . > {code} > > However, the modified query below can be executed in a right way. > {code:java} > Q2: > select sum(s) over (partition by aa) as ss " + > from ( > select a as aa, sum(b) over (partition by a, c) + 0 as s > from t1 > ) t2{code} > This issue is caused by > *_ProjectToWindowRule_*({color:#ff}CalcRelSplitter{color}). When > splitting window expressions in Project node, the rule ignores to check > whether a window and its input window are in the same level.Due to such > beheavior, two successive window expressions are implemented in same level > and the RelNode after transformation is: > {code:java} > LogicalProject($0=[$4]) > LogicalWindow(window#0=[window(partition {0, 2} order by [] range between > UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])], > window#1=[window(partition {0} order by [] range between UNBOUNDED PRECEDING > and UNBOUNDED FOLLOWING aggs [SUM($3)])]) > EnumerableTableScan(subset=[rel#7:Subset#0.ENUMERABLE.[]], table=[[ttt, > test]]){code} > As for *Q2*, two window expressions are not "successive", an _*Add(+)*_ > operation results to implementing them in different levels. The RelNode after > transformation is: > {code:java} > LogicalProject($0=[$2]) > LogicalWindow(window#0=[window(partition {0} order by [] range between > UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])]) > LogicalProject(a=[$0], $1=[+($3, 0)]) > LogicalWindow(window#0=[window(partition {0, 2} order by [] range > between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])]) > EnumerableTableScan(subset=[rel#7:Subset#0.ENUMERABLE.[]], > table=[[ttt, test]]){code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3077) Rewrite CUBE&ROLLUP&CUBE queries in SparkSqlDialect
[ https://issues.apache.org/jira/browse/CALCITE-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843688#comment-16843688 ] Feng Zhu edited comment on CALCITE-3077 at 5/20/19 6:34 AM: Thanks, [~zabetak]. By the way, could you assign this task to me. was (Author: donnyzone): Thanks, [~zabetak]. > Rewrite CUBE&ROLLUP&CUBE queries in SparkSqlDialect > --- > > Key: CALCITE-3077 > URL: https://issues.apache.org/jira/browse/CALCITE-3077 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.20.0 >Reporter: Feng Zhu >Priority: Major > > *Background:* we are building a platform that adopts Calcite to process > (i.e., parse&validate&convert&optimize) SQL queries and then regenerate the > final SQL. For the purpose of handling large volume data, we use the popular > SparkSQL engine to execute the generated SQL query. > However, we found a great part of real-world test cases failed, due to syntax > differences of > *_CUBE/ROLLUP/GROUPING SETS_* clauses. Spark SQL dialect supports only "WITH > ROLLUP&CUBE" in the "GROUP BY" clause. The corresponding grammer [1] is > defined as below. > {code:java} > aggregation > : GROUP BY groupingExpressions+=expression (',' > groupingExpressions+=expression)* ( > WITH kind=ROLLUP > | WITH kind=CUBE > | kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')')? > | GROUP BY kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')' > ; > {code} > To fill this gap, I think we need to rewrite CUBE/ROLLUP/GROUPING SETS > clauses in SparkSqlDialect, especially for some complex cases. > {code:java} > group by cube ((a, b), (c, d)) > group by cube(a,b), cube(c,d) > {code} > [1]https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-3077) Rewrite CUBE&ROLLUP&CUBE queries in SparkSqlDialect
[ https://issues.apache.org/jira/browse/CALCITE-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843688#comment-16843688 ] Feng Zhu commented on CALCITE-3077: --- Thanks, [~zabetak]. > Rewrite CUBE&ROLLUP&CUBE queries in SparkSqlDialect > --- > > Key: CALCITE-3077 > URL: https://issues.apache.org/jira/browse/CALCITE-3077 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.20.0 >Reporter: Feng Zhu >Priority: Major > > *Background:* we are building a platform that adopts Calcite to process > (i.e., parse&validate&convert&optimize) SQL queries and then regenerate the > final SQL. For the purpose of handling large volume data, we use the popular > SparkSQL engine to execute the generated SQL query. > However, we found a great part of real-world test cases failed, due to syntax > differences of > *_CUBE/ROLLUP/GROUPING SETS_* clauses. Spark SQL dialect supports only "WITH > ROLLUP&CUBE" in the "GROUP BY" clause. The corresponding grammer [1] is > defined as below. > {code:java} > aggregation > : GROUP BY groupingExpressions+=expression (',' > groupingExpressions+=expression)* ( > WITH kind=ROLLUP > | WITH kind=CUBE > | kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')')? > | GROUP BY kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')' > ; > {code} > To fill this gap, I think we need to rewrite CUBE/ROLLUP/GROUPING SETS > clauses in SparkSqlDialect, especially for some complex cases. > {code:java} > group by cube ((a, b), (c, d)) > group by cube(a,b), cube(c,d) > {code} > [1]https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3077) Rewrite CUBE&ROLLUP&CUBE queries in SparkSqlDialect
[ https://issues.apache.org/jira/browse/CALCITE-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843081#comment-16843081 ] Feng Zhu edited comment on CALCITE-3077 at 5/20/19 6:32 AM: I am working on this issue now, and looking forward to hear some suggestions. was (Author: donnyzone): I am working on this issue now, and look forward to hear some suggestions. > Rewrite CUBE&ROLLUP&CUBE queries in SparkSqlDialect > --- > > Key: CALCITE-3077 > URL: https://issues.apache.org/jira/browse/CALCITE-3077 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.20.0 >Reporter: Feng Zhu >Priority: Major > > *Background:* we are building a platform that adopts Calcite to process > (i.e., parse&validate&convert&optimize) SQL queries and then regenerate the > final SQL. For the purpose of handling large volume data, we use the popular > SparkSQL engine to execute the generated SQL query. > However, we found a great part of real-world test cases failed, due to syntax > differences of > *_CUBE/ROLLUP/GROUPING SETS_* clauses. Spark SQL dialect supports only "WITH > ROLLUP&CUBE" in the "GROUP BY" clause. The corresponding grammer [1] is > defined as below. > {code:java} > aggregation > : GROUP BY groupingExpressions+=expression (',' > groupingExpressions+=expression)* ( > WITH kind=ROLLUP > | WITH kind=CUBE > | kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')')? > | GROUP BY kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')' > ; > {code} > To fill this gap, I think we need to rewrite CUBE/ROLLUP/GROUPING SETS > clauses in SparkSqlDialect, especially for some complex cases. > {code:java} > group by cube ((a, b), (c, d)) > group by cube(a,b), cube(c,d) > {code} > [1]https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-3079) Successive dependent windows cannot be implemented in same expression level
[ https://issues.apache.org/jira/browse/CALCITE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Zhu updated CALCITE-3079: -- Description: Recently, we encountered an IndexOutOfBoundsException when running a complicated query containing successive dependent windows.The issue can be reproduced by the following simple query on table *t1(a, b, c)*. {code:java} Q1: select sum(s) over (partition by aa) as ss " + from ( select a as aa, sum(b) over (partition by a, c) as s from t1 ) t2";{code} The exception is: {code:java} Exception in thread "main" java.sql.SQLException: Error while executing SQL "select sum(s) over (partition by aa) as ss from (select a as aa, sum(b) over (partition by a, c) as s from t1) t2": index (0) must be less than size (0) at org.apache.calcite.avatica.Helper.createException(Helper.java:56) at org.apache.calcite.avatica.Helper.createException(Helper.java:41) at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:163) at org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:227) at org.apache.calcite.JDBCDemo.main(JDBCDemo.java:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less than size (0) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:310) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:293) at com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:67) at org.apache.calcite.adapter.enumerable.EnumUtils$2.get(EnumUtils.java:115) at org.apache.calcite.adapter.enumerable.EnumUtils$2.get(EnumUtils.java:110) at org.apache.calcite.adapter.enumerable.EnumerableWindow.lambda$implement$0(EnumerableWindow.java:442) at org.apache.calcite.adapter.enumerable.EnumerableWindow$3.rexArguments(EnumerableWindow.java:854) . {code} However, the modified query below can be executed in a right way. {code:java} Q2: select sum(s) over (partition by aa) as ss " + from ( select a as aa, sum(b) over (partition by a, c) + 0 as s from t1 ) t2{code} This issue is caused by *_ProjectToWindowRule_*({color:#ff}CalcRelSplitter{color}). When splitting window expressions in Project node, the rule ignores to check whether a window and its input window are in the same level.Due to such beheavior, two successive window expressions are implemented in same level and the RelNode after transformation is: {code:java} LogicalProject($0=[$4]) LogicalWindow(window#0=[window(partition {0, 2} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])], window#1=[window(partition {0} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($3)])]) EnumerableTableScan(subset=[rel#7:Subset#0.ENUMERABLE.[]], table=[[ttt, test]]){code} As for *Q2*, two window expressions are not "successive", an _*Add(+)*_ operation results to implementing them in different levels. The RelNode after transformation is: {code:java} LogicalProject($0=[$2]) LogicalWindow(window#0=[window(partition {0} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])]) LogicalProject(a=[$0], $1=[+($3, 0)]) LogicalWindow(window#0=[window(partition {0, 2} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])]) EnumerableTableScan(subset=[rel#7:Subset#0.ENUMERABLE.[]], table=[[ttt, test]]){code} was: Recently, we encountered an IndexOutOfBoundsException when running a complicated query containing successive dependent windows.The issue can be reproduced by the following simple query on table *t1(a, b, c)*. {code:java} Q1: select sum(s) over (partition by aa) as ss " + from ( select a as aa, sum(b) over (partition by a, c) as s from t1 ) t2";{code} The exception is: {code:java} Exception in thread "main" java.sql.SQLException: Error while executing SQL "select sum(s) over (partition by aa) as ss from (select a as aa, sum(b) over (partition by a, c) as s from ttt.test)": index (0) must be less than size (0) at org.apache.calcite.avatica.Helper.createException(Helper.java:56) at org.apache.calcite.avatica.Helper.createException(Helper.java:41) at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:163) at org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:227) at org.apache.calcite.JDBCDemo.ma
[jira] [Updated] (CALCITE-3079) Successive dependent windows cannot be implemented in same expression level
[ https://issues.apache.org/jira/browse/CALCITE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Zhu updated CALCITE-3079: -- Description: Recently, we encountered an IndexOutOfBoundsException when running a complicated query containing successive dependent windows.The issue can be reproduced by the following simple query on table *t1(a, b, c)*. {code:java} Q1: select sum(s) over (partition by aa) as ss " + from ( select a as aa, sum(b) over (partition by a, c) as s from t1 ) t2";{code} The exception is: {code:java} Exception in thread "main" java.sql.SQLException: Error while executing SQL "select sum(s) over (partition by aa) as ss from (select a as aa, sum(b) over (partition by a, c) as s from ttt.test)": index (0) must be less than size (0) at org.apache.calcite.avatica.Helper.createException(Helper.java:56) at org.apache.calcite.avatica.Helper.createException(Helper.java:41) at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:163) at org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:227) at org.apache.calcite.JDBCDemo.main(JDBCDemo.java:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less than size (0) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:310) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:293) at com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:67) at org.apache.calcite.adapter.enumerable.EnumUtils$2.get(EnumUtils.java:115) at org.apache.calcite.adapter.enumerable.EnumUtils$2.get(EnumUtils.java:110) at org.apache.calcite.adapter.enumerable.EnumerableWindow.lambda$implement$0(EnumerableWindow.java:442) at org.apache.calcite.adapter.enumerable.EnumerableWindow$3.rexArguments(EnumerableWindow.java:854) . {code} However, the modified query below can be executed in a right way. {code:java} Q2: select sum(s) over (partition by aa) as ss " + from ( select a as aa, sum(b) over (partition by a, c) + 0 as s from t1 ) t2{code} This issue is caused by *_ProjectToWindowRule_*({color:#ff}CalcRelSplitter{color}). When splitting window expressions in Project node, the rule ignores to check whether a window and its input window are in the same level.Due to such beheavior, two successive window expressions are implemented in same level and the RelNode after transformation is: {code:java} LogicalProject($0=[$4]) LogicalWindow(window#0=[window(partition {0, 2} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])], window#1=[window(partition {0} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($3)])]) EnumerableTableScan(subset=[rel#7:Subset#0.ENUMERABLE.[]], table=[[ttt, test]]){code} As for *Q2*, two window expressions are not "successive", an _*Add(+)*_ operation results to implementing them in different levels. The RelNode after transformation is: {code:java} LogicalProject($0=[$2]) LogicalWindow(window#0=[window(partition {0} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])]) LogicalProject(a=[$0], $1=[+($3, 0)]) LogicalWindow(window#0=[window(partition {0, 2} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])]) EnumerableTableScan(subset=[rel#7:Subset#0.ENUMERABLE.[]], table=[[ttt, test]]){code} was: Recently, we encountered an IndexOutOfBoundsException when running a complicated query containing successive dependent windows.The issue can be reproduced by the following simple query on table *t1(a, b, c)*. {code:java} Q1: select sum(s) over (partition by aa) as ss " + from ( select a as aa, sum(b) over (partition by a, c) as s from t1 ) t2";{code} However, the modified query below can be executed in a right way. {code:java} Q2: select sum(s) over (partition by aa) as ss " + from ( select a as aa, sum(b) over (partition by a, c) + 0 as s from t1 ) t2{code} This issue is caused by *_ProjectToWindowRule_*({color:#FF}CalcRelSplitter{color}). When splitting window expressions in Project node, the rule ignores to check whether a window and its input window are in the same level.Due to such beheavior, two successive window expressions are implemented in same level and the RelNode after transformation is: {code:java} LogicalProject($0=[$4]) LogicalW
[jira] [Updated] (CALCITE-3079) Successive dependent windows cannot be implemented in same expression level
[ https://issues.apache.org/jira/browse/CALCITE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CALCITE-3079: Labels: pull-request-available (was: ) > Successive dependent windows cannot be implemented in same expression level > --- > > Key: CALCITE-3079 > URL: https://issues.apache.org/jira/browse/CALCITE-3079 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.20.0 >Reporter: Feng Zhu >Priority: Major > Labels: pull-request-available > > Recently, we encountered an IndexOutOfBoundsException when running a > complicated query containing successive dependent windows.The issue can be > reproduced by the following simple query on table *t1(a, b, c)*. > {code:java} > Q1: > select sum(s) over (partition by aa) as ss " + > from ( > select a as aa, sum(b) over (partition by a, c) as s > from t1 > ) t2";{code} > However, the modified query below can be executed in a right way. > {code:java} > Q2: > select sum(s) over (partition by aa) as ss " + > from ( > select a as aa, sum(b) over (partition by a, c) + 0 as s > from t1 > ) t2{code} > This issue is caused by > *_ProjectToWindowRule_*({color:#FF}CalcRelSplitter{color}). When > splitting window expressions in Project node, the rule ignores to check > whether a window and its input window are in the same level.Due to such > beheavior, two successive window expressions are implemented in same level > and the RelNode after transformation is: > {code:java} > LogicalProject($0=[$4]) > LogicalWindow(window#0=[window(partition {0, 2} order by [] range between > UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])], > window#1=[window(partition {0} order by [] range between UNBOUNDED PRECEDING > and UNBOUNDED FOLLOWING aggs [SUM($3)])]) > EnumerableTableScan(subset=[rel#7:Subset#0.ENUMERABLE.[]], table=[[ttt, > test]]){code} > As for *Q2*, two window expressions are not "successive", an _*Add(+)*_ > operation results to implementing them in different levels. The RelNode after > transformation is: > {code:java} > LogicalProject($0=[$2]) > LogicalWindow(window#0=[window(partition {0} order by [] range between > UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])]) > LogicalProject(a=[$0], $1=[+($3, 0)]) > LogicalWindow(window#0=[window(partition {0, 2} order by [] range > between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])]) > EnumerableTableScan(subset=[rel#7:Subset#0.ENUMERABLE.[]], > table=[[ttt, test]]){code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-3079) Successive dependent windows cannot be implemented in same expression level
Feng Zhu created CALCITE-3079: - Summary: Successive dependent windows cannot be implemented in same expression level Key: CALCITE-3079 URL: https://issues.apache.org/jira/browse/CALCITE-3079 Project: Calcite Issue Type: Bug Components: core Affects Versions: 1.20.0 Reporter: Feng Zhu Recently, we encountered an IndexOutOfBoundsException when running a complicated query containing successive dependent windows.The issue can be reproduced by the following simple query on table *t1(a, b, c)*. {code:java} Q1: select sum(s) over (partition by aa) as ss " + from ( select a as aa, sum(b) over (partition by a, c) as s from t1 ) t2";{code} However, the modified query below can be executed in a right way. {code:java} Q2: select sum(s) over (partition by aa) as ss " + from ( select a as aa, sum(b) over (partition by a, c) + 0 as s from t1 ) t2{code} This issue is caused by *_ProjectToWindowRule_*({color:#FF}CalcRelSplitter{color}). When splitting window expressions in Project node, the rule ignores to check whether a window and its input window are in the same level.Due to such beheavior, two successive window expressions are implemented in same level and the RelNode after transformation is: {code:java} LogicalProject($0=[$4]) LogicalWindow(window#0=[window(partition {0, 2} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])], window#1=[window(partition {0} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($3)])]) EnumerableTableScan(subset=[rel#7:Subset#0.ENUMERABLE.[]], table=[[ttt, test]]){code} As for *Q2*, two window expressions are not "successive", an _*Add(+)*_ operation results to implementing them in different levels. The RelNode after transformation is: {code:java} LogicalProject($0=[$2]) LogicalWindow(window#0=[window(partition {0} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])]) LogicalProject(a=[$0], $1=[+($3, 0)]) LogicalWindow(window#0=[window(partition {0, 2} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])]) EnumerableTableScan(subset=[rel#7:Subset#0.ENUMERABLE.[]], table=[[ttt, test]]){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-3078) Duplicate code lastDay in calcite-avatica and calcite
[ https://issues.apache.org/jira/browse/CALCITE-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843614#comment-16843614 ] pingle wang commented on CALCITE-3078: -- calcite-avatica PR is [PR #98]([https://github.com/apache/calcite-avatica/pull/98]) h1. > Duplicate code lastDay in calcite-avatica and calcite > - > > Key: CALCITE-3078 > URL: https://issues.apache.org/jira/browse/CALCITE-3078 > Project: Calcite > Issue Type: Bug > Components: avatica, core >Affects Versions: 1.19.0 >Reporter: pingle wang >Priority: Trivial > > The code of lastDay appears in calcite( > [lastDay|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L2322] > ), and appears in calcite-avatica. I think the duplicate code can change the > calcite-avatica to public function then be used by calcite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-3078) Duplicate code lastDay in calcite-avatica and calcite
[ https://issues.apache.org/jira/browse/CALCITE-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pingle wang updated CALCITE-3078: - Summary: Duplicate code lastDay in calcite-avatica and calcite (was: [Duplicate code] lastDay in calcite-avatica and calcite) > Duplicate code lastDay in calcite-avatica and calcite > - > > Key: CALCITE-3078 > URL: https://issues.apache.org/jira/browse/CALCITE-3078 > Project: Calcite > Issue Type: Bug > Components: avatica, core >Affects Versions: 1.19.0 >Reporter: pingle wang >Priority: Trivial > > The code of lastDay appears in calcite( > [lastDay|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L2322] > ), and appears in calcite-avatica. I think the duplicate code can change the > calcite-avatica to public function then be used by calcite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-3078) [Duplicate code] lastDay in calcite-avatica and calcite
pingle wang created CALCITE-3078: Summary: [Duplicate code] lastDay in calcite-avatica and calcite Key: CALCITE-3078 URL: https://issues.apache.org/jira/browse/CALCITE-3078 Project: Calcite Issue Type: Bug Components: avatica, core Affects Versions: 1.19.0 Reporter: pingle wang The code of lastDay appears in calcite( [lastDay|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L2322] ), and appears in calcite-avatica. I think the duplicate code can change the calcite-avatica to public function then be used by calcite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2593) Sometimes fails to plan when a RelNode transform multiple collations to single collation
[ https://issues.apache.org/jira/browse/CALCITE-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843613#comment-16843613 ] Hongze Zhang commented on CALCITE-2593: --- Thanks [~zabetak] for looking into this again. I support the idea enforcing EnumerableSort's input collation to empty, and also, we should look into cases that are without a sort involved, such as the case in issue description, which seems not able to be healed by tweaking sort rules only (correct me if I am wrong :)). I think maybe we'll end up changing something related to the basic of composite traits. > Sometimes fails to plan when a RelNode transform multiple collations to > single collation > > > Key: CALCITE-2593 > URL: https://issues.apache.org/jira/browse/CALCITE-2593 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Hongze Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Sample SQL: > {code:java} > select sum(X + 1) filter (where Y) as "SET" from (values (1, TRUE), (2, > TRUE)) AS t(X, Y) limit 10{code} > Error log: > {code:java} > java.lang.RuntimeException: exception while executing [select sum(X + 1) > filter (where Y) as "SET" from (values (1, TRUE), (2, TRUE)) AS t(X, Y) limit > 10] at > org.apache.calcite.test.CalciteAssert$AssertQuery.returns(CalciteAssert.java:1366) > at > org.apache.calcite.test.CalciteAssert$AssertQuery.returns(CalciteAssert.java:1339) > at > org.apache.calcite.test.CalciteAssert$AssertQuery.returns(CalciteAssert.java:1302) > at > org.apache.calcite.test.JdbcTest.testWithinGroupClause5(JdbcTest.java:6736) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) at > org.junit.runner.JUnitCore.run(JUnitCore.java:137) at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > Caused by: java.lang.RuntimeException: With materializationsEnabled=false, > limit=0 at > org.apache.calcite.test.CalciteAssert.assertQuery(CalciteAssert.java:573) at > org.apache.calcite.test.CalciteAssert$AssertQuery.returns(CalciteAssert.java:1362) > ... 25 more Caused by: java.sql.SQLException: Error while executing SQL > "select sum(X + 1) filter (where Y) as "SET" from (values (1, TRUE), (2, > TRUE)) AS t(X, Y) limit 10": Node [rel#22:Subset#3.ENUMERABLE.[]] could not > be implemented; planner state: Root: rel#22:Subset#3.ENUMERABLE.[] Original > rel: LogicalSort(subset=[rel#22:Subset#3.ENUMERABLE.[]], fetch=[10]): > rowcount = 1.0, cumulative cost = {1.0 rows, 4.0 cpu, 0.0 io}, id = 17 > LogicalAggregate(subset=[rel#16:Subset#2.NONE.[]], group=[{}], SET=[SUM($0) > FILTER $1]): rowcount = 1.0, cumulative cost = {1.1375000476837158 rows, 0.0 > cpu, 0.0 io}, id = 15 LogicalProject(subset=[rel#14:Subset#1.NONE.[1]], > $f0=[+($0, 1)], Y=[$1]): rowcount = 2.0, cumulative cost = {2.0 rows, 4.0 > cpu, 0.0 io}, id = 13 LogicalValues(subset=[rel#12:Subset#0.NONE.[]], > tuples=[[{ 1, true }, { 2, true }]]): rowcount = 2.0, cumulative cost = {2.0 > rows, 1.0 cpu, 0.0 io}, id = 1 Sets: Set#0, type: RecordType(INTEGER X, > BOOLEAN Y) rel#12:Subset#0.NONE.[], best=null, importance=0.6561
[jira] [Commented] (CALCITE-3065) RexLiteral#getValueAs should consider primitive type
[ https://issues.apache.org/jira/browse/CALCITE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843588#comment-16843588 ] Danny Chan commented on CALCITE-3065: - Thanks [~Aron.tao], i do not have strong willingness for Calcite to support primitive numeric type for method SqlLiteral#getValueAs, because in our code base, we do not have use cases like you did: {code:java} val tp = new JavaTypeFactoryImpl(RelDataTypeSystem.DEFAULT) literal.getValueAs(tp.getJavaClass(literal.getType).asInstanceOf[java.lang.Class[_]]) {code} So, what i'm curious and confused about is why you need the "true" typed value of the literal ? I think you should give us a persuasive reason, after all, Calcite is for all kinds of engines and use cases, we should not add a logic that is not common and not used even for Calcite itself. > RexLiteral#getValueAs should consider primitive type > > > Key: CALCITE-3065 > URL: https://issues.apache.org/jira/browse/CALCITE-3065 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Jiatao Tao >Priority: Major > Labels: pull-request-available > Attachments: image-2019-05-13-12-04-36-365.png, > image-2019-05-17-08-23-52-735.png > > Time Spent: 10m > Remaining Estimate: 0h > > !image-2019-05-13-12-04-36-365.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CALCITE-3068) testSubprogram() does't test whether subprogram gets re-executed
[ https://issues.apache.org/jira/browse/CALCITE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chunwei Lei resolved CALCITE-3068. -- Resolution: Fixed Fix Version/s: 1.20.0 > testSubprogram() does't test whether subprogram gets re-executed > > > Key: CALCITE-3068 > URL: https://issues.apache.org/jira/browse/CALCITE-3068 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.19.0 >Reporter: Chunwei Lei >Assignee: Chunwei Lei >Priority: Minor > Labels: pull-request-available > Fix For: 1.20.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The goal of {{HepPlannerTest#testSubprogram}} is to test whether subprogram > gets re-executed. Unfortunately, it is unable to test it since there is only > one project and only fire the rule once. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-3068) testSubprogram() does't test whether subprogram gets re-executed
[ https://issues.apache.org/jira/browse/CALCITE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843581#comment-16843581 ] Chunwei Lei commented on CALCITE-3068: -- Fixed in [https://github.com/apache/calcite/commit/1a5fca0feab124f33c16d31b65f0ba0806231731]. > testSubprogram() does't test whether subprogram gets re-executed > > > Key: CALCITE-3068 > URL: https://issues.apache.org/jira/browse/CALCITE-3068 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.19.0 >Reporter: Chunwei Lei >Assignee: Chunwei Lei >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > The goal of {{HepPlannerTest#testSubprogram}} is to test whether subprogram > gets re-executed. Unfortunately, it is unable to test it since there is only > one project and only fire the rule once. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CALCITE-3072) Generate right SQL for FLOOR&SUBSTRING functions in SparkSqlDialect
[ https://issues.apache.org/jira/browse/CALCITE-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chan resolved CALCITE-3072. - Resolution: Fixed Assignee: Danny Chan Fix Version/s: 1.20.0 Fixed in [https://github.com/apache/calcite/commit/7fa13bf164f9354aed492b03f753c2b9b30b0164], thanks for your PR, [~donnyzone] ! > Generate right SQL for FLOOR&SUBSTRING functions in SparkSqlDialect > --- > > Key: CALCITE-3072 > URL: https://issues.apache.org/jira/browse/CALCITE-3072 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.20.0 >Reporter: Feng Zhu >Assignee: Danny Chan >Priority: Major > Labels: pull-request-available > Fix For: 1.20.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The FLOOR and SUBSTRING functions are currently broken when generating query > for Spark SQL[1]. > {code:java} > FLOOR -> DATE_TRUNC > SUBSTRING('Hello World' FROM 5 FOR 1) -> SUBSTRING('Hello World', 5, 1) > {code} > [1] https://spark.apache.org/docs/latest/api/sql/index.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2593) Sometimes fails to plan when a RelNode transform multiple collations to single collation
[ https://issues.apache.org/jira/browse/CALCITE-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843456#comment-16843456 ] Stamatis Zampetakis commented on CALCITE-2593: -- I had again a look into this. It seems that the following change in EnumerableSortRule also solves the CannotPlanException for this query. {code:java} return EnumerableSort.create( convert( input, input.getTraitSet().replace(EnumerableConvention.INSTANCE).replace(RelCollations.EMPTY)), sort.getCollation(), null, null); {code} Since this rule is going to perform a sort and thus fulfil some physical properties (RelCollation) I was wondering if it makes sense to ask the input to be sorted. In the textbook VolcanoPlanner when an enforcer is applied (in our case EnumerableSort) the satisfied physical properties are removed from the optimization of the subplan. Maybe removing all collations (as I did above) is wrong but I think the rule should remove all collations that are going to be enforced by the EnumerableSort operator. > Sometimes fails to plan when a RelNode transform multiple collations to > single collation > > > Key: CALCITE-2593 > URL: https://issues.apache.org/jira/browse/CALCITE-2593 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Hongze Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Sample SQL: > {code:java} > select sum(X + 1) filter (where Y) as "SET" from (values (1, TRUE), (2, > TRUE)) AS t(X, Y) limit 10{code} > Error log: > {code:java} > java.lang.RuntimeException: exception while executing [select sum(X + 1) > filter (where Y) as "SET" from (values (1, TRUE), (2, TRUE)) AS t(X, Y) limit > 10] at > org.apache.calcite.test.CalciteAssert$AssertQuery.returns(CalciteAssert.java:1366) > at > org.apache.calcite.test.CalciteAssert$AssertQuery.returns(CalciteAssert.java:1339) > at > org.apache.calcite.test.CalciteAssert$AssertQuery.returns(CalciteAssert.java:1302) > at > org.apache.calcite.test.JdbcTest.testWithinGroupClause5(JdbcTest.java:6736) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) at > org.junit.runner.JUnitCore.run(JUnitCore.java:137) at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > Caused by: java.lang.RuntimeException: With materializationsEnabled=false, > limit=0 at > org.apache.calcite.test.CalciteAssert.assertQuery(CalciteAssert.java:573) at > org.apache.calcite.test.CalciteAssert$AssertQuery.returns(CalciteAssert.java:1362) > ... 25 more Caused by: java.sql.SQLException: Error while executing SQL > "select sum(X + 1) filter (where Y) as "SET" from (values (1, TRUE), (2, > TRUE)) AS t(X, Y) limit 10": Node [rel#22:Subset#3.ENUMERABLE.[]] could not > be implemented; planner state: Root: rel#22:Subset#3.ENUMERABLE.[] Original > rel: LogicalSort(subset=[rel#22:Subset#3.ENUMERABLE.[]], fetch=[10]): > rowcount = 1.0, cumulative cost = {1.0 rows, 4.0 cpu, 0.0 io}, id = 17 > LogicalAggregate(subset=[rel#16:Subset#2.NONE.[]], group=[{}], SET=[SUM($0) > FILTER $1]): rowcount = 1.0
[jira] [Commented] (CALCITE-2624) Add a rule to copy a sort below a join operator
[ https://issues.apache.org/jira/browse/CALCITE-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843454#comment-16843454 ] Stamatis Zampetakis commented on CALCITE-2624: -- Currently (see CALCITE-2970) and in the past we had many performance problems due to AbstractConverters. I am wondering if the current rule also suffers from the same problem. [~khawlamhb] have you tried incorporating the rule in the default rule set of Calcite? Do all the tests pass? > Add a rule to copy a sort below a join operator > --- > > Key: CALCITE-2624 > URL: https://issues.apache.org/jira/browse/CALCITE-2624 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.17.0 >Reporter: Stamatis Zampetakis >Assignee: Khawla Mouhoubi >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, the only rule that allows a sort to traverse a binary operator is > the SortJoinTransposeRule. The rule was introduced mainly to push limits in > the case of left and right outer joins (see CALCITE-831). > I assume that the main reason that we don't have more rules is that sorts > with limits and offsets cannot be pushed safely below many types of join > operators. However, in many cases, it is possible and beneficial for > optimization purposes to just push the sort without the limit and offset. > Since we do not know in advance if the join operator preserves the order we > cannot remove (that is why I am saying copy and not transpose) the sort > operator on top of the join. The latter is not really a problem since the > SortRemoveRule can detect such cases and remove the sort if it is redundant. > A few concrete examples where this optimization makes sense are outlined > below: > * allow the sort to be later absorbed by an index scan and disappear from > the plan (Sort + Tablescan => IndexScan with RelCollation); > * allow operators that require sorted inputs to be exploited more easily > (e.g., merge join); > * allow the sort to be performed on a possibly smaller result (assuming that > the physical binary operator that is going to be used preserves the order of > left/right input and the top sort operator can be removed entirely). > I propose to add a new rule (e.g., SortCopyBelowJoinRule, > SortJoinCopyBelowRule) which allows a sort to be copied to the left or right > (or to both if it is rather easy to decompose the sort) of a join operator > (excluding the limit and offset attributes) if the respective inputs are not > already sorted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)