Re: Problems with abstract syntax tree
I logged in jira CALCITE-1483 for the query in question. I’ll try to find more examples. On 11/7/16, 9:17 PM, "Vineet Garg" wrote: >Hi Julian, > >Apologies for not responding earlier. > >I understand that planner rules sometime produces a plan that is sub-optimal. >My concern was about planner rules producing a plan consisting of an >expression (literal null constant in this case) with null type i.e. >SqlTypeName.NULL. I was wondering if this might be a bug on Calcite side. But >it looks like Calcite has a concept of null data type and this seems to be >expected. > >Vineet > > > >On 11/3/16, 12:14 PM, "Julian Hyde" wrote: > >>Vineet, >> >>In case you forgot, can you please log that JIRA case? If we have a lengthy >>design discussion without creating an action item, we are wasting everyone’s >>time. >> >>Julian >> >>> On Nov 1, 2016, at 11:00 AM, Julian Hyde wrote: >>> >>> Alexander & Vineet, >>> >>> One further comment about “NOT IN”. SQL in general is fairly close to >>> relational algebra, but “NOT IN” is one of the places where the gap is >>> widest. “NOT IN” is difficult in general to execute efficiently, because of >>> the problem of NULL values (at Oracle, we always recommended to users to >>> rewrite as NOT EXISTS if possible). The gap between SQL and relational >>> algebra is apparent when a short SQL query becomes a complex RelNode tree. >>> >>> There is a silver lining: the RelNode tree, being relational algebra, has >>> well-behaved semantics. Once you’re in RelNode land, you can freely apply >>> transformation rules to make it efficient. >>> >>> Vineet, >>> >>> If the planner rules produce a plan that is sub-optimal I wouldn’t call it >>> a bug but a missing feature. (It would be a bug if the planner over-reached >>> and created a plan that gave wrong results, so I always tend to be >>> conservative about adding rules.) >>> >>> Usually it’s OK if we make a mess in SQL-to-RelNode conversion. A few >>> redundant projects and filters are no problem, and can be easily removed >>> later with rules that reduce constants and propagate predicates throughout >>> the tree. But for the general case of NOT IN, we have to add a self-join to >>> deal with the possibility that the key has NULL values. After constant >>> reduction has kicked in and we have realized that NULL key values are not >>> possible, it is not easy to remove that self-join. >>> >>> Here is a very simple query where this happens: >>> >>> sqlline> !connect >>> jdbc:calcite:model=core/src/test/resources/hsqldb-model.json sa "" >>> sqlline> !set outputformat csv >>> sqlline> explain plan for select * from scott.emp where deptno not in ( select deptno from scott.dept where deptno = 20); >>> 'PLAN' >>> 'EnumerableCalc(expr#0..11=[{inputs}], expr#12=[0], expr#13=[=($t8, $t12)], >>> expr#14=[false], expr#15=[IS NOT NULL($t11)], expr#16=[true], expr#17=[IS >>> NULL($t7)], expr#18=[null], expr#19=[<($t9, $t8)], expr#20=[CASE($t13, >>> $t14, $t15, $t16, $t17, $t18, $t19, $t16, $t14)], expr#21=[NOT($t20)], >>> proj#0..7=[{exprs}], $condition=[$t21]) >>> EnumerableJoin(condition=[=($7, $10)], joinType=[left]) >>>EnumerableCalc(expr#0..9=[{inputs}], EMPNO=[$t2], ENAME=[$t3], >>> JOB=[$t4], MGR=[$t5], HIREDATE=[$t6], SAL=[$t7], COMM=[$t8], DEPTNO=[$t9], >>> c=[$t0], ck=[$t1]) >>> EnumerableJoin(condition=[true], joinType=[inner]) >>>JdbcToEnumerableConverter >>> JdbcAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)]) >>>JdbcFilter(condition=[=(CAST($0):INTEGER NOT NULL, 20)]) >>> JdbcTableScan(table=[[SCOTT, DEPT]]) >>>JdbcToEnumerableConverter >>> JdbcTableScan(table=[[SCOTT, EMP]]) >>>JdbcToEnumerableConverter >>> JdbcAggregate(group=[{0, 1}]) >>>JdbcProject(DEPTNO=[$0], i=[true]) >>> JdbcFilter(condition=[=(CAST($0):INTEGER NOT NULL, 20)]) >>>JdbcTableScan(table=[[SCOTT, DEPT]]) >>> ' >>> 1 row selected (0.067 seconds) >>> >>> Note that there are two scans of DEPT, but one is sufficient because DEPTNO >>> is never null. In the JdbcAggregate, c always equals ck, and therefore the >>> CASE can be simplified, and therefore the scan of DEPT that produces c and >>> ck can be dropped, but Calcite rules cannot deduce that fact. >>> >>> Can you please log a JIRA case for this? See if you can find some other >>> queries (maybe using IN rather than NOT IN, or whose key columns are not so >>> obviously NOT NULL) and include these in the JIRA case also. >>> >>> I doubt we can fix using a planner rule. The best solution may be to use >>> RelMetadataQuery.getPulledUpPredicates() to simplify the CASE before we add >>> the join. >>> >>> Julian >>> >>> On Nov 1, 2016, at 8:49 AM, Alexander Shoshin wrote: Julian, thank you for help. I had a wrong picture of NULL values processing. So, it looks like there is some problem in my planner rules. As for the A
[jira] [Created] (CALCITE-1483) Suboptimal plan for NOT IN query
Vineet Garg created CALCITE-1483: Summary: Suboptimal plan for NOT IN query Key: CALCITE-1483 URL: https://issues.apache.org/jira/browse/CALCITE-1483 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Assignee: Julian Hyde Following query generates sub-optimal plan {code} explain plan for select * from scott.emp where deptno not in (select deptno from scott.dept where deptno = 20); {code} Following is the plan {code} EnumerableCalc(expr#0..11=[{inputs}], expr#12=[0], expr#13=[=($t8, $t12)], expr#14=[false], expr#15=[IS NOT NULL($t11)], expr#16=[true], expr#17=[IS NULL($t7)], expr#18=[null], expr#19=[<($t9, $t8)], expr#20=[CASE($t13, $t14, $t15, $t16, $t17, $t18, $t19, $t16, $t14)], expr#21=[NOT($t20)], proj#0..7=[{exprs}], $condition=[$t21]) EnumerableJoin(condition=[=($7, $10)], joinType=[left]) EnumerableCalc(expr#0..9=[{inputs}], EMPNO=[$t2], ENAME=[$t3], JOB=[$t4], MGR=[$t5], HIREDATE=[$t6], SAL=[$t7], COMM=[$t8], DEPTNO=[$t9], c=[$t0], ck=[$t1]) EnumerableJoin(condition=[true], joinType=[inner]) JdbcToEnumerableConverter JdbcAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)]) JdbcFilter(condition=[=(CAST($0):INTEGER NOT NULL, 20)]) JdbcTableScan(table=[[SCOTT, DEPT]]) JdbcToEnumerableConverter JdbcTableScan(table=[[SCOTT, EMP]]) JdbcToEnumerableConverter JdbcAggregate(group=[{0, 1}]) JdbcProject(DEPTNO=[$0], i=[true]) JdbcFilter(condition=[=(CAST($0):INTEGER NOT NULL, 20)]) JdbcTableScan(table=[[SCOTT, DEPT]]) {code} As Julian pointed out in discussion on mailing list instead of two scans for DEPT one is sufficient as clearly DEPTNO is never null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Problems with abstract syntax tree
Hi Julian, Apologies for not responding earlier. I understand that planner rules sometime produces a plan that is sub-optimal. My concern was about planner rules producing a plan consisting of an expression (literal null constant in this case) with null type i.e. SqlTypeName.NULL. I was wondering if this might be a bug on Calcite side. But it looks like Calcite has a concept of null data type and this seems to be expected. Vineet On 11/3/16, 12:14 PM, "Julian Hyde" wrote: >Vineet, > >In case you forgot, can you please log that JIRA case? If we have a lengthy >design discussion without creating an action item, we are wasting everyone’s >time. > >Julian > >> On Nov 1, 2016, at 11:00 AM, Julian Hyde wrote: >> >> Alexander & Vineet, >> >> One further comment about “NOT IN”. SQL in general is fairly close to >> relational algebra, but “NOT IN” is one of the places where the gap is >> widest. “NOT IN” is difficult in general to execute efficiently, because of >> the problem of NULL values (at Oracle, we always recommended to users to >> rewrite as NOT EXISTS if possible). The gap between SQL and relational >> algebra is apparent when a short SQL query becomes a complex RelNode tree. >> >> There is a silver lining: the RelNode tree, being relational algebra, has >> well-behaved semantics. Once you’re in RelNode land, you can freely apply >> transformation rules to make it efficient. >> >> Vineet, >> >> If the planner rules produce a plan that is sub-optimal I wouldn’t call it a >> bug but a missing feature. (It would be a bug if the planner over-reached >> and created a plan that gave wrong results, so I always tend to be >> conservative about adding rules.) >> >> Usually it’s OK if we make a mess in SQL-to-RelNode conversion. A few >> redundant projects and filters are no problem, and can be easily removed >> later with rules that reduce constants and propagate predicates throughout >> the tree. But for the general case of NOT IN, we have to add a self-join to >> deal with the possibility that the key has NULL values. After constant >> reduction has kicked in and we have realized that NULL key values are not >> possible, it is not easy to remove that self-join. >> >> Here is a very simple query where this happens: >> >> sqlline> !connect >> jdbc:calcite:model=core/src/test/resources/hsqldb-model.json sa "" >> sqlline> !set outputformat csv >> sqlline> explain plan for select * from scott.emp where deptno not in ( >>> select deptno from scott.dept where deptno = 20); >> 'PLAN' >> 'EnumerableCalc(expr#0..11=[{inputs}], expr#12=[0], expr#13=[=($t8, $t12)], >> expr#14=[false], expr#15=[IS NOT NULL($t11)], expr#16=[true], expr#17=[IS >> NULL($t7)], expr#18=[null], expr#19=[<($t9, $t8)], expr#20=[CASE($t13, $t14, >> $t15, $t16, $t17, $t18, $t19, $t16, $t14)], expr#21=[NOT($t20)], >> proj#0..7=[{exprs}], $condition=[$t21]) >> EnumerableJoin(condition=[=($7, $10)], joinType=[left]) >>EnumerableCalc(expr#0..9=[{inputs}], EMPNO=[$t2], ENAME=[$t3], JOB=[$t4], >> MGR=[$t5], HIREDATE=[$t6], SAL=[$t7], COMM=[$t8], DEPTNO=[$t9], c=[$t0], >> ck=[$t1]) >> EnumerableJoin(condition=[true], joinType=[inner]) >>JdbcToEnumerableConverter >> JdbcAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)]) >>JdbcFilter(condition=[=(CAST($0):INTEGER NOT NULL, 20)]) >> JdbcTableScan(table=[[SCOTT, DEPT]]) >>JdbcToEnumerableConverter >> JdbcTableScan(table=[[SCOTT, EMP]]) >>JdbcToEnumerableConverter >> JdbcAggregate(group=[{0, 1}]) >>JdbcProject(DEPTNO=[$0], i=[true]) >> JdbcFilter(condition=[=(CAST($0):INTEGER NOT NULL, 20)]) >>JdbcTableScan(table=[[SCOTT, DEPT]]) >> ' >> 1 row selected (0.067 seconds) >> >> Note that there are two scans of DEPT, but one is sufficient because DEPTNO >> is never null. In the JdbcAggregate, c always equals ck, and therefore the >> CASE can be simplified, and therefore the scan of DEPT that produces c and >> ck can be dropped, but Calcite rules cannot deduce that fact. >> >> Can you please log a JIRA case for this? See if you can find some other >> queries (maybe using IN rather than NOT IN, or whose key columns are not so >> obviously NOT NULL) and include these in the JIRA case also. >> >> I doubt we can fix using a planner rule. The best solution may be to use >> RelMetadataQuery.getPulledUpPredicates() to simplify the CASE before we add >> the join. >> >> Julian >> >> >>> On Nov 1, 2016, at 8:49 AM, Alexander Shoshin >>> wrote: >>> >>> Julian, thank you for help. >>> >>> I had a wrong picture of NULL values processing. So, it looks like there is >>> some problem in my planner rules. >>> As for the AST, I was confused by the wrong Flink "explain()" function >>> description :) >>> >>> >>> Regards, >>> Alexander >>> >>> -Original Message- >>> From: Julian Hyde [mailto:jh...@apache.org] >>> Sent: Monday, October 31, 2016 10:43 PM >>> T