[jira] [Created] (CALCITE-3733) MySQL JDBC rewrite should rewrite CAST as TIMESTAMP to CAST as DATETIME
Vineet Garg created CALCITE-3733: Summary: MySQL JDBC rewrite should rewrite CAST as TIMESTAMP to CAST as DATETIME Key: CALCITE-3733 URL: https://issues.apache.org/jira/browse/CALCITE-3733 Project: Calcite Issue Type: Bug Components: jdbc-adapter Reporter: Vineet Garg Assignee: Vineet Garg Queries containing cast timestamp literal is rewritten into cast as TIMESTAMP. This syntax is not supported with MySql. Since TIMESTAMP is equivalent to DATETIME it should be rewritten into CAST as DATETIME. I will provide a test case later. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3734) MySQL JDBC rewrite is producing queries with CHAR with range beyond 255
Vineet Garg created CALCITE-3734: Summary: MySQL JDBC rewrite is producing queries with CHAR with range beyond 255 Key: CALCITE-3734 URL: https://issues.apache.org/jira/browse/CALCITE-3734 Project: Calcite Issue Type: Bug Components: jdbc-adapter Reporter: Vineet Garg Assignee: Vineet Garg Queries containing cast to varchar/string is rewritten into cast to CHAR with range beyond 255 causing query failure. This range/precision should be limited to 255. I will provide a test case later. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3824) JoinProjectTransposeRule should skip Projects containing windowing expression
Vineet Garg created CALCITE-3824: Summary: JoinProjectTransposeRule should skip Projects containing windowing expression Key: CALCITE-3824 URL: https://issues.apache.org/jira/browse/CALCITE-3824 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Assignee: Vineet Garg This rule could push windowing expressions within join condition which doesn't make sense. For example {code:sql} select * from dept a join (select rank() over (order by name) as r, 1 + 1 from dept) as b on a.name = b.r {code} Above query produces following plan after the rule {code} LogicalProject(DEPTNO=[$0], NAME=[$1], R=[$3], EXPR$1=[$4]) LogicalProject(DEPTNO=[$0], NAME=[$1], NAME0=[CAST($1):BIGINT NOT NULL], R=[RANK() OVER (ORDER BY $3 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)], EXPR$1=[+(1, 1)]) LogicalJoin(condition=[=(CAST($1):BIGINT NOT NULL, RANK() OVER (ORDER BY $3 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW))], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3848) Materialized view rewriting fails for mv consisting of group by on join keys
Vineet Garg created CALCITE-3848: Summary: Materialized view rewriting fails for mv consisting of group by on join keys Key: CALCITE-3848 URL: https://issues.apache.org/jira/browse/CALCITE-3848 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Assignee: Vineet Garg Test case {code:java} + @Test public void testAggregateOnJoinKeys() { +checkMaterialize( +"select \"deptno\", \"empid\", \"salary\"\n" ++ "from \"emps\"\n" ++ "group by \"deptno\", \"empid\", \"salary\"", + "select \"empid\", \"depts\".\"deptno\" \n" ++ "from \"emps\"\n" ++ "join \"depts\" on \"depts\".\"deptno\" = \"empid\" group by \"empid\", \"depts\".\"deptno\"", +HR_FKUK_MODEL, +CalciteAssert.checkResultContains( +"EnumerableCalc(expr#0=[{inputs}], empid=[$t0], empid0=[$t0])\n" + + " EnumerableAggregate(group=[{1}])\n" ++ "EnumerableHashJoin(condition=[=($1, $3)], joinType=[inner])\n" ++ " EnumerableTableScan(table=[[hr, m0]])")); + } + {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3862) Rewriting for materialized view consisting of group by on join keys with aggregate fails
Vineet Garg created CALCITE-3862: Summary: Rewriting for materialized view consisting of group by on join keys with aggregate fails Key: CALCITE-3862 URL: https://issues.apache.org/jira/browse/CALCITE-3862 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Assignee: Vineet Garg *Repro* {code:sql} +sql("select \"deptno\", \"empid\", \"salary\", sum(1) " ++ "from \"emps\"\n" ++ "group by \"deptno\", \"empid\", \"salary\"", +"select sum(1) " ++ "from \"emps\"\n" ++ "join \"depts\" on \"depts\".\"deptno\" = \"empid\" group by \"empid\", \"depts\".\"deptno\"") +.withResultContains( +"EnumerableCalc(expr#0..1=[{inputs}], EXPR$0=[$t1])\n" ++ " EnumerableAggregate(group=[{1}], EXPR$0=[$SUM0($3)])\n" ++ "EnumerableHashJoin(condition=[=($1, $4)], joinType=[inner])\n" ++ " EnumerableTableScan(table=[[hr, m0]])") +.ok(); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3914) Improve SubsitutionVisitor to consider RexCall of type PLUS and TIMES for canonicalization
Vineet Garg created CALCITE-3914: Summary: Improve SubsitutionVisitor to consider RexCall of type PLUS and TIMES for canonicalization Key: CALCITE-3914 URL: https://issues.apache.org/jira/browse/CALCITE-3914 Project: Calcite Issue Type: Improvement Components: core Reporter: Vineet Garg Assignee: Vineet Garg -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3952) Improve SortRemoveRule to remove Sort based on rowcount
Vineet Garg created CALCITE-3952: Summary: Improve SortRemoveRule to remove Sort based on rowcount Key: CALCITE-3952 URL: https://issues.apache.org/jira/browse/CALCITE-3952 Project: Calcite Issue Type: Improvement Components: core Reporter: Vineet Garg Assignee: Vineet Garg If a query is guaranteed to produce maximum one row it is safe to remove Sort (along with limit). Example: {code:sql} select count(*) cs from store_sales where ss_ext_sales_price > 100.00 order by cs limit 100 {code} Although logically equivalent this can greatly benefit physical plans by removing extra operator and avoiding unnecessary data transfer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-2874) NOT IN correlated subquery has un-necessary join
Vineet Garg created CALCITE-2874: Summary: NOT IN correlated subquery has un-necessary join Key: CALCITE-2874 URL: https://issues.apache.org/jira/browse/CALCITE-2874 Project: Calcite Issue Type: Improvement Reporter: Vineet Garg Query: {code:sql} select count(*) as c from "scott".emp as e where sal + 100 not in ( select deptno from dept where dname = e.ename); {code} Plan: {code} EnumerableAggregate(group=[{}], C=[COUNT()]) EnumerableCalc(expr#0..9=[{inputs}], expr#10=[IS NOT NULL($t7)], expr#11=[<($t5, $t4)], expr#12=[OR($t10, $t11)], expr#13=[IS NOT TRUE($t12)], expr#14=[0], expr#15=[=($t4, $t14)], expr#16=[IS TRUE($t15)], expr#17=[IS NULL($t2)], expr#18=[OR($t13, $t16, $t17)], proj#0..9=[{exprs}], $condition=[$t18]) EnumerableJoin(condition=[AND(=($1, $8), =($2, $9))], joinType=[left]) EnumerableCalc(expr#0..5=[{inputs}], expr#6=[IS NOT NULL($t2)], expr#7=[0], expr#8=[=($t4, $t7)], expr#9=[IS TRUE($t8)], expr#10=[OR($t6, $t9)], proj#0..5=[{exprs}], $condition=[$t10]) EnumerableJoin(condition=[=($1, $3)], joinType=[left]) EnumerableCalc(expr#0..7=[{inputs}], proj#0..1=[{exprs}], SAL=[$t5]) EnumerableTableScan(table=[[scott, EMP]]) EnumerableAggregate(group=[{1}], c=[COUNT()], ck=[COUNT($0)]) EnumerableCalc(expr#0..2=[{inputs}], expr#3=[IS NOT NULL($t1)], proj#0..2=[{exprs}], $condition=[$t3]) EnumerableTableScan(table=[[scott, DEPT]]) EnumerableCalc(expr#0..4=[{inputs}], DEPTNO=[$t2], i=[$t3], DNAME=[$t4], SAL=[$t0]) EnumerableJoin(condition=[=($1, $2)], joinType=[inner]) EnumerableCalc(expr#0=[{inputs}], expr#1=[100], expr#2=[+($t0, $t1)], SAL=[$t0], $f1=[$t2]) EnumerableAggregate(group=[{5}]) EnumerableTableScan(table=[[scott, EMP]]) EnumerableCalc(expr#0..2=[{inputs}], expr#3=[true], expr#4=[IS NOT NULL($t1)], DEPTNO=[$t0], i=[$t3], DNAME=[$t1], $condition=[$t4]) EnumerableTableScan(table=[[scott, DEPT]]) {code} Bottom inner join b/w EMP and DEPT could be removed (only DEPT scan + project is required) and top join can have sal+100 = dept.deptno condition. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-2954) SubQueryJoinRemoveRule and SubQueryProjectRemoveRule passing on empty set instead of set of correlation id
Vineet Garg created CALCITE-2954: Summary: SubQueryJoinRemoveRule and SubQueryProjectRemoveRule passing on empty set instead of set of correlation id Key: CALCITE-2954 URL: https://issues.apache.org/jira/browse/CALCITE-2954 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Assignee: Vineet Garg Call to {{apply}} takes set of {{CorrelationId}} but both {{SubQueryJoinRemoveRule}} and {{SubQueryProjectRemoveRule}} are passing empty set for whatever reason. I believe the correct thing to do here is to pass on the actual set of {{CorrelationId}} like {{SubQueryFilterRemoveRule}} Ref: {code:java} final RexNode target = apply(e, ImmutableSet.of(), logic, builder, 2, fieldCount); {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-2971) Correlated subquery in JOIN is failing
Vineet Garg created CALCITE-2971: Summary: Correlated subquery in JOIN is failing Key: CALCITE-2971 URL: https://issues.apache.org/jira/browse/CALCITE-2971 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Following query if added in RelOptRulesTest tests throws an exception: {code:sql} select empno from sales.emp as r left join sales.dept as s on exists (select deptno from sales.emp where empno = s.deptno+20) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-2986) Wrong results with =ANY subquery
Vineet Garg created CALCITE-2986: Summary: Wrong results with =ANY subquery Key: CALCITE-2986 URL: https://issues.apache.org/jira/browse/CALCITE-2986 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Assignee: Vineet Garg ANY/SOME subqueries are rewritten using MAX/MIN and cross-join. This is wrong transformation for {{=ANY}} and {{<>ANY}} (and therefore {{=ALL}} and {{<>ALL}}). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-2991) getMaxRowCount should return rowcount 1 for an aggregate with constant keys
Vineet Garg created CALCITE-2991: Summary: getMaxRowCount should return rowcount 1 for an aggregate with constant keys Key: CALCITE-2991 URL: https://issues.apache.org/jira/browse/CALCITE-2991 Project: Calcite Issue Type: Improvement Components: core Reporter: Vineet Garg Assignee: Vineet Garg Aggregate with constant keys are guaranteed to produce at most one row -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-3011) Support for outer joins with AggregateJoinTransposeRule
Vineet Garg created CALCITE-3011: Summary: Support for outer joins with AggregateJoinTransposeRule Key: CALCITE-3011 URL: https://issues.apache.org/jira/browse/CALCITE-3011 Project: Calcite Issue Type: Improvement Components: core Reporter: Vineet Garg Assignee: Vineet Garg Currently {{AggregateJoinTransposeRule}} only support INNER join. Aggregates (at least the ones without aggregate functions) could be pushed through OUTER joins with current logic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-3012) areColumnsUnique for FULL OUTER JOIN could return wrong answer when ignoreNulls is false
Vineet Garg created CALCITE-3012: Summary: areColumnsUnique for FULL OUTER JOIN could return wrong answer when ignoreNulls is false Key: CALCITE-3012 URL: https://issues.apache.org/jira/browse/CALCITE-3012 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Assignee: Vineet Garg Let's say set of columns passed to this API are join keys and there is one NULL key coming from both the inputs. Following code will return true which is wrong because the result of FULL. OUTER JOIN with NULL key on both side will produce two rows with NULL. Even though this value in respective input is unique the result of join may not be unique. {code:java} Boolean leftUnique = mq.areColumnsUnique(left, leftColumns, ignoreNulls); Boolean rightUnique = mq.areColumnsUnique(right, rightColumns, ignoreNulls); if ((leftColumns.cardinality() > 0) && (rightColumns.cardinality() > 0)) { if ((leftUnique == null) || (rightUnique == null)) { return null; } else { return leftUnique && rightUnique; } } {code} {code:sql} create table trepro(i int); insert into trepro values(null); select * from trepro t1 full outer join trepro t2 on t1.i=t2.i; null, null null, null {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-3028) Support FULL OUTER JOIN with AggregateJoinTransposeRule
Vineet Garg created CALCITE-3028: Summary: Support FULL OUTER JOIN with AggregateJoinTransposeRule Key: CALCITE-3028 URL: https://issues.apache.org/jira/browse/CALCITE-3028 Project: Calcite Issue Type: Improvement Reporter: Vineet Garg Assignee: Vineet Garg This is continuation of CALCITE-3011, which supported LEFT OUTER and RIGHT OUTER joins without aggregate functions. FULL OUTER JOIN was not supported at the time due to CALCITE-3012 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-3132) Simply expressions in LogicalFilter generated for subqueries with quantified predicate
Vineet Garg created CALCITE-3132: Summary: Simply expressions in LogicalFilter generated for subqueries with quantified predicate Key: CALCITE-3132 URL: https://issues.apache.org/jira/browse/CALCITE-3132 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Assignee: Vineet Garg Following test in RelOptRulesTest generate plan with LogicalFilter {code} @Test public void testSome() { final String sql = "select * from emp e1\n" + " where e1.empno > SOME (select deptno from dept)"; checkSubQuery(sql).withLateDecorrelation(true).check(); } {code} {code} LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalFilter(condition=[OR(AND(IS TRUE(>($0, $9)), <>($10, 0)), AND(>($10, $11), null, <>($10, 0), IS NOT TRUE(>($0, $9))), AND(>($0, $9), <>($10, 0), IS NOT TRUE(>($0, $9)), <=($10, $11)))]) LogicalJoin(condition=[true], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{}], m=[MIN($0)], c=[COUNT()], d=[COUNT($0)]) LogicalProject(DEPTNO=[$0]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code} Note that LogicalFilter has condition which should be further simplified {code} (condition=[OR(AND(IS TRUE(>($0, $9)), <>($10, 0)), AND(>($10, $11), null, <>($10, 0), IS NOT TRUE(>($0, $9))), AND(>($0, $9), <>($10, 0), IS NOT TRUE(>($0, $9)), <=($10, $11)))]) {code} If {{AND(IS TRUE(>($0, $9)), <>($10, 0))}} is true so will be {{AND(IS TRUE(>($0, $9)), <>($10, 0))}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-3222) Fix code style issues introduced by CALCITE-3031
Vineet Garg created CALCITE-3222: Summary: Fix code style issues introduced by CALCITE-3031 Key: CALCITE-3222 URL: https://issues.apache.org/jira/browse/CALCITE-3222 Project: Calcite Issue Type: Task Reporter: Vineet Garg Assignee: Vineet Garg -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (CALCITE-1483) Suboptimal plan for NOT IN query
Vineet Garg created CALCITE-1483: Summary: Suboptimal plan for NOT IN query Key: CALCITE-1483 URL: https://issues.apache.org/jira/browse/CALCITE-1483 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Assignee: Julian Hyde Following query generates sub-optimal plan {code} explain plan for select * from scott.emp where deptno not in (select deptno from scott.dept where deptno = 20); {code} Following is the plan {code} EnumerableCalc(expr#0..11=[{inputs}], expr#12=[0], expr#13=[=($t8, $t12)], expr#14=[false], expr#15=[IS NOT NULL($t11)], expr#16=[true], expr#17=[IS NULL($t7)], expr#18=[null], expr#19=[<($t9, $t8)], expr#20=[CASE($t13, $t14, $t15, $t16, $t17, $t18, $t19, $t16, $t14)], expr#21=[NOT($t20)], proj#0..7=[{exprs}], $condition=[$t21]) EnumerableJoin(condition=[=($7, $10)], joinType=[left]) EnumerableCalc(expr#0..9=[{inputs}], EMPNO=[$t2], ENAME=[$t3], JOB=[$t4], MGR=[$t5], HIREDATE=[$t6], SAL=[$t7], COMM=[$t8], DEPTNO=[$t9], c=[$t0], ck=[$t1]) EnumerableJoin(condition=[true], joinType=[inner]) JdbcToEnumerableConverter JdbcAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)]) JdbcFilter(condition=[=(CAST($0):INTEGER NOT NULL, 20)]) JdbcTableScan(table=[[SCOTT, DEPT]]) JdbcToEnumerableConverter JdbcTableScan(table=[[SCOTT, EMP]]) JdbcToEnumerableConverter JdbcAggregate(group=[{0, 1}]) JdbcProject(DEPTNO=[$0], i=[true]) JdbcFilter(condition=[=(CAST($0):INTEGER NOT NULL, 20)]) JdbcTableScan(table=[[SCOTT, DEPT]]) {code} As Julian pointed out in discussion on mailing list instead of two scans for DEPT one is sufficient as clearly DEPTNO is never null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CALCITE-1493) Wrong plan for NOT IN correlated queries
Vineet Garg created CALCITE-1493: Summary: Wrong plan for NOT IN correlated queries Key: CALCITE-1493 URL: https://issues.apache.org/jira/browse/CALCITE-1493 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Assignee: Julian Hyde Plan generated by calcite with SubqueryRemoveRule followed by de-correlation for the following query: {noformat} select sal from emp where empno NOT IN (select deptno from dept where emp.job = dept.name) {noformat} is {noformat} LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalFilter(condition=[IS NULL($11)]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], DEPTNO0=[CAST($9):INTEGER], JOB0=[CAST($10):VARCHAR(10) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"], $f2=[CAST($11):BOOLEAN]) LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalProject(DEPTNO=[$0], JOB=[$1], $f2=[true]) LogicalAggregate(group=[{0, 1}]) LogicalProject(DEPTNO=[$0], JOB=[$2], i=[$1]) LogicalProject(DEPTNO=[$0], i=[true], JOB=[$1]) LogicalProject(DEPTNO=[$0], JOB=[$2]) LogicalJoin(condition=[=($2, $1)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) LogicalAggregate(group=[{0}]) LogicalProject(JOB=[$2]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {noformat} As you can notice instead of doing {noformat} Left Outer Join {noformat} Calcite is doing {noformat} Inner Join {noformat}. This will produce wrong results. Plan for same query just before SubqueryRemove Rule is: {noformat} LogicalProject(SAL=[$5]) LogicalFilter(condition=[NOT(IN($0, { LogicalProject(DEPTNO=[$0]) LogicalFilter(condition=[=($cor0.JOB, $1)]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) }))], variablesSet=[[$cor0]]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {noformat} Plan just after SubqueryRemove Rule: {noformat} LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalFilter(condition=[IS NULL($10)]) LogicalFilter(condition=[=($0, $9)]) LogicalCorrelate(correlation=[$cor0], joinType=[LEFT], requiredColumns=[{2}]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0, 1}]) LogicalProject(DEPTNO=[$0], i=[true]) LogicalProject(DEPTNO=[$0]) LogicalFilter(condition=[=($cor0.JOB, $1)]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {noformat} Looking at above it seems RelDecorrelator have some issue where it is coming up with Inner Join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CALCITE-1494) Inefficient plan for co-related subqueries
Vineet Garg created CALCITE-1494: Summary: Inefficient plan for co-related subqueries Key: CALCITE-1494 URL: https://issues.apache.org/jira/browse/CALCITE-1494 Project: Calcite Issue Type: Improvement Components: core Reporter: Vineet Garg Assignee: Julian Hyde For co-related queries such as {noformat} select sal from emp where empno IN (select deptno from dept where emp.job = dept.name) {noformat} Calcite generates following plan (SubqueryRemove Rule + Decorrelation) {noformat} LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0, 1}]) LogicalProject(DEPTNO=[$0], JOB=[$1]) LogicalProject(DEPTNO=[$0], JOB=[$2]) LogicalJoin(condition=[=($2, $1)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) LogicalAggregate(group=[{0}]) LogicalProject(JOB=[$2]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {noformat} As you can notice there is a outer table scan (EMP in this case) to retrieve all distinct values for co-related column (EMP.JOB here), which is then joined with inner table (DEPT). I am not sure why is this step required. After this join Calcite is anyway doing group by to generate all distinct values for correlated and result column (DEPTNO, JOB) which is then joined with outer table. I think the scan + join of outer table with inner table to generate co-rrelated values is un-necessary and is not required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CALCITE-1495) Add a rule to convert INNER JOIN preceded by GROUP BY to appropriate SEMI-JOIN
Vineet Garg created CALCITE-1495: Summary: Add a rule to convert INNER JOIN preceded by GROUP BY to appropriate SEMI-JOIN Key: CALCITE-1495 URL: https://issues.apache.org/jira/browse/CALCITE-1495 Project: Calcite Issue Type: New Feature Components: core Reporter: Vineet Garg Assignee: Julian Hyde For IN and EXISTS subqueries Calcite currently generates plan consisting of GROUP BY on inner table followed by INNER JOIN with outer table. e.g. for following query: {noformat} :select sal from emp where empno IN (select deptno from dept) {noformat} Generated plan is: {noformat} LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalJoin(condition=[=($0, $9)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0}]) LogicalProject(DEPTNO=[$0]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {noformat} Such cases could be converted using this rule to use SEMI-JOIN to make it more efficient -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CALCITE-1511) RelDecorrelator's allLessThan failing for NOT EXISTS subquery
Vineet Garg created CALCITE-1511: Summary: RelDecorrelator's allLessThan failing for NOT EXISTS subquery Key: CALCITE-1511 URL: https://issues.apache.org/jira/browse/CALCITE-1511 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Assignee: Julian Hyde Calcite hits an assertion for following query (planning is done by calling SubQueryRemoveRule followed by decorrelateQuery ) {code} select * from emp where EXISTS (select * from emp e where emp.deptno = e.deptno) AND NOT EXISTS (select * from emp ee where ee.job = emp.job AND ee.sal=34) {code} Assertion {noformat} Caused by: java.lang.AssertionError: out of range; value: 3, limit: 3 {noformat} This assertion is hit in {noformat} RelDecorrelator's allLessThan {noformat} which is called while registering newly de-correlated LogicalAggregate. Plan just before SubqueryRemoveRule {noformat} LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalFilter(condition=[AND(EXISTS({ LogicalFilter(condition=[=($cor0.DEPTNO, $7)]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) }), NOT(EXISTS({ LogicalFilter(condition=[AND(=($2, $cor0.JOB), =($5, 34))]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) })))], variablesSet=[[$cor0]]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {noformat} Plan just after SubqueryRemoveRule: {noformat} LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], i=[$9]) LogicalFilter(condition=[AND(OR(IS NULL($12), =($10, 0)), OR(>=($11, $10), =($10, 0)))]) LogicalJoin(condition=[true], joinType=[left]) LogicalJoin(condition=[true], joinType=[inner]) LogicalCorrelate(correlation=[$cor0], joinType=[INNER], requiredColumns=[{7}]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0}]) LogicalProject(i=[true]) LogicalFilter(condition=[=($cor0.DEPTNO, $7)]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0, $1, $2, $3, $4, $5, $6, $7, $8)]) LogicalFilter(condition=[AND(=($2, $cor0.JOB), =($5, 34))]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0}]) LogicalProject(i=[true]) LogicalFilter(condition=[AND(=($2, $cor0.JOB), =($5, 34))]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CALCITE-1537) Un-necessary project expression in multi-subquery plan
Vineet Garg created CALCITE-1537: Summary: Un-necessary project expression in multi-subquery plan Key: CALCITE-1537 URL: https://issues.apache.org/jira/browse/CALCITE-1537 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Assignee: Julian Hyde Query {code} select sal from emp where empno IN (select deptno from dept where emp.job = dept.name) AND empno IN (select empno from emp e where emp.ename = e.ename) {code} Plan just before calling *SubqueryRemoveRule* {code} LogicalProject(SAL=[$5]) LogicalFilter(condition=[AND(IN($0, { LogicalProject(DEPTNO=[$0]) LogicalFilter(condition=[=($cor0.JOB, $1)]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) }), IN($0, { LogicalProject(EMPNO=[$0]) LogicalFilter(condition=[=($cor0.ENAME, $1)]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) }))], variablesSet=[[$cor0]]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code} Plan just after *SubqueryRemoveRule* {code} LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], DEPTNO0=[$9]) LogicalJoin(condition=[=($0, $10)], joinType=[inner]) LogicalFilter(condition=[=($0, $9)]) LogicalCorrelate(correlation=[$cor0], joinType=[INNER], requiredColumns=[{2}]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0}]) LogicalProject(DEPTNO=[$0]) LogicalFilter(condition=[=($cor0.JOB, $1)]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) LogicalAggregate(group=[{0}]) LogicalProject(EMPNO=[$0]) LogicalFilter(condition=[=($cor0.ENAME, $1)]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code} Plan just after *decorrelation* {code} LogicalProject(SAL=[$5], ENAME0=[$9]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$10]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], DEPTNO0=[$9], ENAME0=[$12]) LogicalJoin(condition=[=($0, $11)], joinType=[inner]) LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0, 1}]) LogicalProject(DEPTNO=[$0], JOB=[$1]) LogicalProject(DEPTNO=[$0], JOB=[$2]) LogicalJoin(condition=[=($2, $1)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) LogicalAggregate(group=[{0}]) LogicalProject(JOB=[$2]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0, 1}]) LogicalProject(EMPNO=[$0], ENAME0=[$1]) LogicalProject(EMPNO=[$0], ENAME0=[$9]) LogicalJoin(condition=[=($9, $1)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0}]) LogicalProject(ENAME=[$1]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code} As you can notice the top *LogicalProject* has unnecessary expression *ENAME0* after decorrelation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CALCITE-1541) Update RelDecorrelator to work on Filter, Join, Project etc
Vineet Garg created CALCITE-1541: Summary: Update RelDecorrelator to work on Filter, Join, Project etc Key: CALCITE-1541 URL: https://issues.apache.org/jira/browse/CALCITE-1541 Project: Calcite Issue Type: Task Components: core Reporter: Vineet Garg Assignee: Julian Hyde Reldecorrelator’s logic including all rules implemented within it are written to take LogicalJoin, LogicalFilter, LogicalProject etc Logical rel nodes. Since Logical operators are final that makes extending RelDecorrelator very difficult. It makes more sense to have RelDecorrelator operate on Join, Filter etc As Julian pointed out with this change RelDecorrelator will need to use RelNode factories (ideally a RelBuilder) so that it can create RelNodes of the appropriate sub-type -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CALCITE-1542) Update RelShuttle interface to work on Filter, Join etc instead of Logical rel nodes
Vineet Garg created CALCITE-1542: Summary: Update RelShuttle interface to work on Filter, Join etc instead of Logical rel nodes Key: CALCITE-1542 URL: https://issues.apache.org/jira/browse/CALCITE-1542 Project: Calcite Issue Type: Task Reporter: Vineet Garg Assignee: Julian Hyde Currently RelShuttle interface works on Logical rel nodes (e.g. LogicalFilter etc). With CALCITE-1541's change on RelDecorrelator RelShuttle will need to be updated as well. Logging in a separate JIRA for it since Julian pointed out that this will be a breaking change. If projects have sub-classed RelShuttle and have overridden methods without using @Override they will break with no warning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CALCITE-1546) Wrong result/plan for NOT IN subqueries with disjunction
Vineet Garg created CALCITE-1546: Summary: Wrong result/plan for NOT IN subqueries with disjunction Key: CALCITE-1546 URL: https://issues.apache.org/jira/browse/CALCITE-1546 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Assignee: Julian Hyde Query: {code} select * from emp where sal = 4 OR empno NOT IN (select deptno from dept){code} Plan: {code} LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalFilter(condition=[OR(=($5, 4), NOT(CASE(IS NOT NULL($10), true, false)))]) LogicalJoin(condition=[=($0, $9)], joinType=[left]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0, 1}]) LogicalProject(DEPTNO=[$0], i=[true]) LogicalProject(DEPTNO=[$0]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code} There is no null check branch i.e. with count(\*), count(c) in the plan. This produces wrong results if deptno is null in dept. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CALCITE-1583) Wrong results for query with correlated subqueries with aggregate subquery expression
Vineet Garg created CALCITE-1583: Summary: Wrong results for query with correlated subqueries with aggregate subquery expression Key: CALCITE-1583 URL: https://issues.apache.org/jira/browse/CALCITE-1583 Project: Calcite Issue Type: Bug Reporter: Vineet Garg Assignee: Julian Hyde Following query produces wrong result: {code} select * from depts where exists (select sum(empno) from emps where depts.deptno = emps.deptno and 1=2) {code} Expected Result: {noformat} deptno | name +--- 10 | Sales 20 | Marketing 30 | Accounts (3 rows) {noformat} Actual results {noformat} zero rows {noformat} Calcite rewrites such queries into JOIN which ignores the fact that aggregate functions such as {{sum}} always produce one row, effectively making {{EXISTS}} predicate always true. Same is the case with {{Scalar}} and {{IN}} sub-queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CALCITE-1605) Wrong result/plan for correlated subquery with windowing
Vineet Garg created CALCITE-1605: Summary: Wrong result/plan for correlated subquery with windowing Key: CALCITE-1605 URL: https://issues.apache.org/jira/browse/CALCITE-1605 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Assignee: Julian Hyde Calcite produces wrong plan, hence wrong results for correlated subquery which contains windowing function. Query {code}select * from emps where empid-1 IN (select min(e.empid) over () from emps e where e.empid =emps.empid); {code} Expected {code} 0 rows {code} Actual {noformat} +---+--+++--+---+-+-+-+--+ | EMPNO | NAME | DEPTNO | GENDER | CITY | EMPID | AGE | SLACKER | MANAGER | JOINEDAT | +---+--+++--+---+-+-+-+--+ | 110 | John | 40 | M | Vancouver | 2 | null | false | true | 2002-05-03 | | 130 | Alice | 40 | F | Vancouver | 2 | null | false | true | 2007-01-01 | +---+--+++--+---+-+-+-+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CALCITE-1624) Inefficient plan for NOT IN correlated subqueries
Vineet Garg created CALCITE-1624: Summary: Inefficient plan for NOT IN correlated subqueries Key: CALCITE-1624 URL: https://issues.apache.org/jira/browse/CALCITE-1624 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Assignee: Julian Hyde I just noticed that {{NOT IN}} correlated subqueries produces an extra un-neccessary join after de-correlation (this is an addition to un-necessary joins reported in CALCITE-1494) Query {code:SQL} select sal from emp where empno NOT IN ( select deptno from dept where emp.job = dept.name) {code} Plan after subquery remove rule: {code} LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalFilter(condition=[NOT(CASE(=($9, 0), false, IS NOT NULL($12), true, <($10, $9), true, false))]) LogicalCorrelate(correlation=[$cor0], joinType=[LEFT], requiredColumns=[{2}]) LogicalCorrelate(correlation=[$cor0], joinType=[LEFT], requiredColumns=[{2}]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)]) LogicalProject(DEPTNO=[$0]) LogicalFilter(condition=[=($cor0.JOB, $1)]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) LogicalFilter(condition=[=($cor0.EMPNO, $0)]) LogicalAggregate(group=[{0, 1}]) LogicalProject(DEPTNO=[$0], i=[true]) LogicalProject(DEPTNO=[$0]) LogicalFilter(condition=[=($cor0.JOB, $1)]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code} Plan after de-correlation {code} LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalFilter(condition=[NOT(CASE(=($10, 0), false, IS NOT NULL($14), true, <($11, $10), true, false))]) LogicalJoin(condition=[AND(=($0, $15), =($2, $13))], joinType=[left]) LogicalJoin(condition=[=($2, $9)], joinType=[left]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0}], c=[COUNT()], ck=[COUNT($1)]) LogicalProject(JOB=[$1], DEPTNO=[$0]) LogicalProject(DEPTNO=[$0], JOB=[$2]) LogicalJoin(condition=[=($2, $1)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) LogicalAggregate(group=[{0}]) LogicalProject(JOB=[$2]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalJoin(condition=[=($3, $0)], joinType=[inner]) // <== Un-necessary join LogicalProject(DEPTNO=[$0], JOB=[$1], $f2=[true]) LogicalAggregate(group=[{0, 1}]) LogicalProject(DEPTNO=[$0], JOB=[$2], i=[$1]) LogicalProject(DEPTNO=[$0], i=[true], JOB=[$1]) LogicalProject(DEPTNO=[$0], JOB=[$2]) LogicalJoin(condition=[=($2, $1)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) LogicalAggregate(group=[{0}]) LogicalProject(JOB=[$2]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0}]) LogicalProject(EMPNO=[$0]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code} As you can see in plan after de-correlation there is an un-necessary inner join. This is not reproducible on CALCITE-1494's branch. But since this is a separate issue from CALCITE-1494 I decided to open a separate JIRA. Feel free to mark is duplicate or close it if you think otherwise. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CALCITE-1726) Subquery in FILTER is left untransformed
Vineet Garg created CALCITE-1726: Summary: Subquery in FILTER is left untransformed Key: CALCITE-1726 URL: https://issues.apache.org/jira/browse/CALCITE-1726 Project: Calcite Issue Type: Bug Reporter: Vineet Garg Assignee: Julian Hyde Reproducer Query: {code:SQL} select * from emp where empno IN (select (select max(sal) from emp) from dept) {code} Plan after {{{SubqueryRemoveRule}}} {code:SQL} LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalJoin(condition=[=($0, $9)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0}]) LogicalProject(EXPR$0=[$SCALAR_QUERY({ LogicalAggregate(group=[{}], EXPR$0=[MAX($0)]) LogicalProject(SAL=[$5]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) })]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code} As you can notice scalar query in LogicalProject is left as it is -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CALCITE-1851) Decorrelation should get rid of correlated predicates while decorrelation filter condition
Vineet Garg created CALCITE-1851: Summary: Decorrelation should get rid of correlated predicates while decorrelation filter condition Key: CALCITE-1851 URL: https://issues.apache.org/jira/browse/CALCITE-1851 Project: Calcite Issue Type: Improvement Components: core Reporter: Vineet Garg Assignee: Julian Hyde For queries such as {code:sql} select sal from emp where empno IN ( select deptno from dept where emp.job = dept.name) {code} Filter condition for correlated predicate e.g. {{=($cor0.JOB, $1)}} is transformed into equality predicate on same column e.g. {{$1=$1}} after calling {{decorrelateExpr}} (if value generator is not generated). This is further simplified into {{$1 is not null}}. We do not need to generate and simplify such predicate expression, instead it could be completely removed. This will help support cases other than equality correlated predicates for which value generated is not required. e.g. {code:sql} -- This message was sent by Atlassian JIRA (v6.4.14#64029)