Re: Problems with abstract syntax tree

2016-11-07 Thread Vineet Garg
I logged in jira CALCITE-1483 for the query in question. I’ll try to find more 
examples.




On 11/7/16, 9:17 PM, "Vineet Garg"  wrote:

>Hi Julian,
>
>Apologies for not responding earlier.
>
>I understand that planner rules sometime produces a plan that is sub-optimal. 
>My concern was about planner rules producing a plan consisting of an 
>expression (literal null constant in this case) with null type i.e. 
>SqlTypeName.NULL. I was wondering if this might be a bug on Calcite side. But 
>it looks like Calcite has a concept of null data type and this seems to be 
>expected. 
>
>Vineet
>
>
>
>On 11/3/16, 12:14 PM, "Julian Hyde"  wrote:
>
>>Vineet,
>>
>>In case you forgot, can you please log that JIRA case? If we have a lengthy 
>>design discussion without creating an action item, we are wasting everyone’s 
>>time.
>>
>>Julian
>>
>>> On Nov 1, 2016, at 11:00 AM, Julian Hyde  wrote:
>>> 
>>> Alexander & Vineet,
>>> 
>>> One further comment about “NOT IN”. SQL in general is fairly close to 
>>> relational algebra, but “NOT IN” is one of the places where the gap is 
>>> widest. “NOT IN” is difficult in general to execute efficiently, because of 
>>> the problem of NULL values (at Oracle, we always recommended to users to 
>>> rewrite as NOT EXISTS if possible). The gap between SQL and relational 
>>> algebra is apparent when a short SQL query becomes a complex RelNode tree.
>>> 
>>> There is a silver lining: the RelNode tree, being relational algebra, has 
>>> well-behaved semantics. Once you’re in RelNode land, you can freely apply 
>>> transformation rules to make it efficient.
>>> 
>>> Vineet,
>>> 
>>> If the planner rules produce a plan that is sub-optimal I wouldn’t call it 
>>> a bug but a missing feature. (It would be a bug if the planner over-reached 
>>> and created a plan that gave wrong results, so I always tend to be 
>>> conservative about adding rules.)
>>> 
>>> Usually it’s OK if we make a mess in SQL-to-RelNode conversion. A few 
>>> redundant projects and filters are no problem, and can be easily removed 
>>> later with rules that reduce constants and propagate predicates throughout 
>>> the tree. But for the general case of NOT IN, we have to add a self-join to 
>>> deal with the possibility that the key has NULL values. After constant 
>>> reduction has kicked in and we have realized that NULL key values are not 
>>> possible, it is not easy to remove that self-join.
>>> 
>>> Here is a very simple query where this happens:
>>> 
>>> sqlline> !connect 
>>> jdbc:calcite:model=core/src/test/resources/hsqldb-model.json sa ""
>>> sqlline> !set outputformat csv
>>> sqlline> explain plan for select * from scott.emp where deptno not in (
 select deptno from scott.dept where deptno = 20);
>>> 'PLAN'
>>> 'EnumerableCalc(expr#0..11=[{inputs}], expr#12=[0], expr#13=[=($t8, $t12)], 
>>> expr#14=[false], expr#15=[IS NOT NULL($t11)], expr#16=[true], expr#17=[IS 
>>> NULL($t7)], expr#18=[null], expr#19=[<($t9, $t8)], expr#20=[CASE($t13, 
>>> $t14, $t15, $t16, $t17, $t18, $t19, $t16, $t14)], expr#21=[NOT($t20)], 
>>> proj#0..7=[{exprs}], $condition=[$t21])
>>>  EnumerableJoin(condition=[=($7, $10)], joinType=[left])
>>>EnumerableCalc(expr#0..9=[{inputs}], EMPNO=[$t2], ENAME=[$t3], 
>>> JOB=[$t4], MGR=[$t5], HIREDATE=[$t6], SAL=[$t7], COMM=[$t8], DEPTNO=[$t9], 
>>> c=[$t0], ck=[$t1])
>>>  EnumerableJoin(condition=[true], joinType=[inner])
>>>JdbcToEnumerableConverter
>>>  JdbcAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
>>>JdbcFilter(condition=[=(CAST($0):INTEGER NOT NULL, 20)])
>>>  JdbcTableScan(table=[[SCOTT, DEPT]])
>>>JdbcToEnumerableConverter
>>>  JdbcTableScan(table=[[SCOTT, EMP]])
>>>JdbcToEnumerableConverter
>>>  JdbcAggregate(group=[{0, 1}])
>>>JdbcProject(DEPTNO=[$0], i=[true])
>>>  JdbcFilter(condition=[=(CAST($0):INTEGER NOT NULL, 20)])
>>>JdbcTableScan(table=[[SCOTT, DEPT]])
>>> '
>>> 1 row selected (0.067 seconds)
>>> 
>>> Note that there are two scans of DEPT, but one is sufficient because DEPTNO 
>>> is never null. In the JdbcAggregate, c always equals ck, and therefore the 
>>> CASE can be simplified, and therefore the scan of DEPT that produces c and 
>>> ck can be dropped, but Calcite rules cannot deduce that fact.
>>> 
>>> Can you please log a JIRA case for this? See if you can find some other 
>>> queries (maybe using IN rather than NOT IN, or whose key columns are not so 
>>> obviously NOT NULL) and include these in the JIRA case also.
>>> 
>>> I doubt we can fix using a planner rule. The best solution may be to use 
>>> RelMetadataQuery.getPulledUpPredicates() to simplify the CASE before we add 
>>> the join.
>>> 
>>> Julian
>>> 
>>> 
 On Nov 1, 2016, at 8:49 AM, Alexander Shoshin  
 wrote:
 
 Julian, thank you for help.
 
 I had a wrong picture of NULL values processing. So, it looks like there 
 is some problem in my planner rules.
 As for the A

[jira] [Created] (CALCITE-1483) Suboptimal plan for NOT IN query

2016-11-07 Thread Vineet Garg (JIRA)
Vineet Garg created CALCITE-1483:


 Summary: Suboptimal plan for NOT IN query
 Key: CALCITE-1483
 URL: https://issues.apache.org/jira/browse/CALCITE-1483
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Vineet Garg
Assignee: Julian Hyde


Following query generates sub-optimal plan

{code} explain plan for select * from scott.emp where deptno not in (select 
deptno from scott.dept where deptno = 20); {code}

Following is the plan
{code}
EnumerableCalc(expr#0..11=[{inputs}], expr#12=[0], expr#13=[=($t8, $t12)], 
expr#14=[false], expr#15=[IS NOT NULL($t11)], expr#16=[true], expr#17=[IS 
NULL($t7)], expr#18=[null], expr#19=[<($t9, $t8)], expr#20=[CASE($t13, $t14, 
$t15, $t16, $t17, $t18, $t19, $t16, $t14)], expr#21=[NOT($t20)], 
proj#0..7=[{exprs}], $condition=[$t21])
  EnumerableJoin(condition=[=($7, $10)], joinType=[left])
EnumerableCalc(expr#0..9=[{inputs}], EMPNO=[$t2], ENAME=[$t3], JOB=[$t4], 
MGR=[$t5], HIREDATE=[$t6], SAL=[$t7], COMM=[$t8], DEPTNO=[$t9], c=[$t0], 
ck=[$t1])
  EnumerableJoin(condition=[true], joinType=[inner])
JdbcToEnumerableConverter
  JdbcAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
JdbcFilter(condition=[=(CAST($0):INTEGER NOT NULL, 20)])
  JdbcTableScan(table=[[SCOTT, DEPT]])
JdbcToEnumerableConverter
  JdbcTableScan(table=[[SCOTT, EMP]])
JdbcToEnumerableConverter
  JdbcAggregate(group=[{0, 1}])
JdbcProject(DEPTNO=[$0], i=[true])
  JdbcFilter(condition=[=(CAST($0):INTEGER NOT NULL, 20)])
JdbcTableScan(table=[[SCOTT, DEPT]])
{code}

As Julian pointed out in discussion on mailing list instead of two scans for 
DEPT one is sufficient as clearly DEPTNO is never null.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Problems with abstract syntax tree

2016-11-07 Thread Vineet Garg
Hi Julian,

Apologies for not responding earlier.

I understand that planner rules sometime produces a plan that is sub-optimal. 
My concern was about planner rules producing a plan consisting of an expression 
(literal null constant in this case) with null type i.e. SqlTypeName.NULL. I 
was wondering if this might be a bug on Calcite side. But it looks like Calcite 
has a concept of null data type and this seems to be expected. 

Vineet



On 11/3/16, 12:14 PM, "Julian Hyde"  wrote:

>Vineet,
>
>In case you forgot, can you please log that JIRA case? If we have a lengthy 
>design discussion without creating an action item, we are wasting everyone’s 
>time.
>
>Julian
>
>> On Nov 1, 2016, at 11:00 AM, Julian Hyde  wrote:
>> 
>> Alexander & Vineet,
>> 
>> One further comment about “NOT IN”. SQL in general is fairly close to 
>> relational algebra, but “NOT IN” is one of the places where the gap is 
>> widest. “NOT IN” is difficult in general to execute efficiently, because of 
>> the problem of NULL values (at Oracle, we always recommended to users to 
>> rewrite as NOT EXISTS if possible). The gap between SQL and relational 
>> algebra is apparent when a short SQL query becomes a complex RelNode tree.
>> 
>> There is a silver lining: the RelNode tree, being relational algebra, has 
>> well-behaved semantics. Once you’re in RelNode land, you can freely apply 
>> transformation rules to make it efficient.
>> 
>> Vineet,
>> 
>> If the planner rules produce a plan that is sub-optimal I wouldn’t call it a 
>> bug but a missing feature. (It would be a bug if the planner over-reached 
>> and created a plan that gave wrong results, so I always tend to be 
>> conservative about adding rules.)
>> 
>> Usually it’s OK if we make a mess in SQL-to-RelNode conversion. A few 
>> redundant projects and filters are no problem, and can be easily removed 
>> later with rules that reduce constants and propagate predicates throughout 
>> the tree. But for the general case of NOT IN, we have to add a self-join to 
>> deal with the possibility that the key has NULL values. After constant 
>> reduction has kicked in and we have realized that NULL key values are not 
>> possible, it is not easy to remove that self-join.
>> 
>> Here is a very simple query where this happens:
>> 
>> sqlline> !connect 
>> jdbc:calcite:model=core/src/test/resources/hsqldb-model.json sa ""
>> sqlline> !set outputformat csv
>> sqlline> explain plan for select * from scott.emp where deptno not in (
>>> select deptno from scott.dept where deptno = 20);
>> 'PLAN'
>> 'EnumerableCalc(expr#0..11=[{inputs}], expr#12=[0], expr#13=[=($t8, $t12)], 
>> expr#14=[false], expr#15=[IS NOT NULL($t11)], expr#16=[true], expr#17=[IS 
>> NULL($t7)], expr#18=[null], expr#19=[<($t9, $t8)], expr#20=[CASE($t13, $t14, 
>> $t15, $t16, $t17, $t18, $t19, $t16, $t14)], expr#21=[NOT($t20)], 
>> proj#0..7=[{exprs}], $condition=[$t21])
>>  EnumerableJoin(condition=[=($7, $10)], joinType=[left])
>>EnumerableCalc(expr#0..9=[{inputs}], EMPNO=[$t2], ENAME=[$t3], JOB=[$t4], 
>> MGR=[$t5], HIREDATE=[$t6], SAL=[$t7], COMM=[$t8], DEPTNO=[$t9], c=[$t0], 
>> ck=[$t1])
>>  EnumerableJoin(condition=[true], joinType=[inner])
>>JdbcToEnumerableConverter
>>  JdbcAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
>>JdbcFilter(condition=[=(CAST($0):INTEGER NOT NULL, 20)])
>>  JdbcTableScan(table=[[SCOTT, DEPT]])
>>JdbcToEnumerableConverter
>>  JdbcTableScan(table=[[SCOTT, EMP]])
>>JdbcToEnumerableConverter
>>  JdbcAggregate(group=[{0, 1}])
>>JdbcProject(DEPTNO=[$0], i=[true])
>>  JdbcFilter(condition=[=(CAST($0):INTEGER NOT NULL, 20)])
>>JdbcTableScan(table=[[SCOTT, DEPT]])
>> '
>> 1 row selected (0.067 seconds)
>> 
>> Note that there are two scans of DEPT, but one is sufficient because DEPTNO 
>> is never null. In the JdbcAggregate, c always equals ck, and therefore the 
>> CASE can be simplified, and therefore the scan of DEPT that produces c and 
>> ck can be dropped, but Calcite rules cannot deduce that fact.
>> 
>> Can you please log a JIRA case for this? See if you can find some other 
>> queries (maybe using IN rather than NOT IN, or whose key columns are not so 
>> obviously NOT NULL) and include these in the JIRA case also.
>> 
>> I doubt we can fix using a planner rule. The best solution may be to use 
>> RelMetadataQuery.getPulledUpPredicates() to simplify the CASE before we add 
>> the join.
>> 
>> Julian
>> 
>> 
>>> On Nov 1, 2016, at 8:49 AM, Alexander Shoshin  
>>> wrote:
>>> 
>>> Julian, thank you for help.
>>> 
>>> I had a wrong picture of NULL values processing. So, it looks like there is 
>>> some problem in my planner rules.
>>> As for the AST, I was confused by the wrong Flink "explain()" function 
>>> description :)
>>> 
>>> 
>>> Regards,
>>> Alexander
>>> 
>>> -Original Message-
>>> From: Julian Hyde [mailto:jh...@apache.org] 
>>> Sent: Monday, October 31, 2016 10:43 PM
>>> T