[jira] [Created] (CALCITE-1495) Add a rule to convert INNER JOIN preceded by GROUP BY to appropriate SEMI-JOIN

2016-11-15 Thread Vineet Garg (JIRA)
Vineet Garg created CALCITE-1495:


 Summary: Add a rule to convert INNER JOIN preceded by GROUP BY to 
appropriate SEMI-JOIN
 Key: CALCITE-1495
 URL: https://issues.apache.org/jira/browse/CALCITE-1495
 Project: Calcite
  Issue Type: New Feature
  Components: core
Reporter: Vineet Garg
Assignee: Julian Hyde


For IN and EXISTS subqueries Calcite currently generates plan consisting of 
GROUP BY on inner table followed by INNER JOIN with outer table.
e.g.  for following query:
{noformat} :select sal from emp where empno IN (select deptno from dept) 
{noformat}

Generated plan is:
{noformat}
LogicalProject(SAL=[$5])
  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
LogicalJoin(condition=[=($0, $9)], joinType=[inner])
  LogicalTableScan(table=[[CATALOG, SALES, EMP]])
  LogicalAggregate(group=[{0}])
LogicalProject(DEPTNO=[$0])
  LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{noformat}

Such cases could be converted using this rule to use SEMI-JOIN to make it more 
efficient



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Create a FilterPushThroughFilterRule

2016-11-15 Thread Julian Hyde
Can you log a JIRA case, and include the full error stack? 
(VolcanoRuleCall.java:236 is just a “re-throw”).

> On Nov 15, 2016, at 3:59 PM, Γιώργος Θεοδωράκης  
> wrote:
> 
> Hello Julian,
> 
> I get no matter what I do this exception:
> Exception in thread "main" java.lang.AssertionError: Internal error: Error
> while applying rule FilterPushThroughFilter, args
> [rel#15:LogicalFilter.NONE.[[0]](input=rel#12:Subset#1.NONE.[0],condition==($1,
> 5)), 
> rel#11:LogicalFilter.NONE.[[0]](input=rel#10:Subset#0.NONE.[0],condition=>($1,
> 5))]
> at org.apache.calcite.util.Util.newInternal(Util.java:792)
> at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall
> .java:236)
> at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner
> .java:819)
> at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:334)
> at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:308)
> at calcite.planner.SaberPlanner.getLogicalPlan(SaberPlanner.java:287)
> at calcite.Tester.main(Tester.java:68)
> ...
> 
> I try to use something like :
> 
> final Filter filter = call.rel(0);
> final Filter secFilter = call.rel(1);
> final LogicalFilter newFilter = LogicalFilter.create(filter.getInput(),
> secFilter.getCondition());
> final LogicalFilter newSecFilter = LogicalFilter.create(newFilter, filter.
> getCondition());
> call.transformTo(newSecFilter);
> 
> but nothing hapens. Volcano does nothing and Hep crushes. Also I think
> RelBuilder doesn't try to merge them. Do you have any suggestions?
> 
> Thanks,
> George
> 
> 2016-11-14 21:17 GMT+02:00 Julian Hyde :
> 
>> You have two calls to “build”. That looks wrong, because “build” pops an
>> entry off the stack.
>> 
>> If your intention is to swap the filters then you should push them in the
>> reverse order than you are currently pushing them.
>> 
>> Lastly, RelBuilder.filter might try to merge consecutive filter nodes. If
>> it does — and I don’t recall whether it does — your filter had better be
>> flagged non-deterministic (or something) to prevent the merge from
>> happening.
>> 
>> Julian
>> 
>>> On Nov 14, 2016, at 1:06 AM, Γιώργος Θεοδωράκης 
>> wrote:
>>> 
>>> Hello,
>>> 
>>> I want to create a rule that pushes a filter through another filter ( I
>>> don't merge them) according to their selectivities to optimize the final
>>> plan. I am using other rules as templates to create it but I keep getting
>>> errors, as I haven't understood correctly the basics. I want to have
>>> something like this :
>>> 
>>> public void onMatch(RelOptRuleCall call) {
>>>   final Filter filter = call.rel(0);
>>>   final Filter secFilter = call.rel(1);
>>>   final RelBuilder relBuilder = call.builder();
>>>   relBuilder.push(filter)
>>>   .filter(secFilter.getCondition())
>>>   .build();
>>>   call.transformTo(relBuilder.build());
>>> }
>> 
>> 



[jira] [Created] (CALCITE-1494) Inefficient plan for co-related subqueries

2016-11-15 Thread Vineet Garg (JIRA)
Vineet Garg created CALCITE-1494:


 Summary: Inefficient plan for co-related subqueries
 Key: CALCITE-1494
 URL: https://issues.apache.org/jira/browse/CALCITE-1494
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Vineet Garg
Assignee: Julian Hyde


For co-related queries such as 
{noformat} select sal from emp where empno IN (select deptno from dept where 
emp.job = dept.name) {noformat}

Calcite generates following plan (SubqueryRemove Rule + Decorrelation) 

{noformat}
LogicalProject(SAL=[$5])
  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner])
  LogicalTableScan(table=[[CATALOG, SALES, EMP]])
  LogicalAggregate(group=[{0, 1}])
LogicalProject(DEPTNO=[$0], JOB=[$1])
  LogicalProject(DEPTNO=[$0], JOB=[$2])
LogicalJoin(condition=[=($2, $1)], joinType=[inner])
  LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
  LogicalAggregate(group=[{0}])
LogicalProject(JOB=[$2])
  LogicalTableScan(table=[[CATALOG, SALES, EMP]])
{noformat}

As you can notice there is a outer table scan (EMP in this case) to retrieve 
all distinct values for co-related column (EMP.JOB here), which is then joined 
with inner table (DEPT). 
I am not sure why is this step required. After this join Calcite is anyway 
doing group by to generate all distinct values for correlated and result column 
(DEPTNO, JOB) which is then joined with outer table. 
I think the scan + join of outer table with inner table to generate co-rrelated 
values is un-necessary and is not required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Create a FilterPushThroughFilterRule

2016-11-15 Thread Γιώργος Θεοδωράκης
Hello Julian,

I get no matter what I do this exception:
Exception in thread "main" java.lang.AssertionError: Internal error: Error
while applying rule FilterPushThroughFilter, args
[rel#15:LogicalFilter.NONE.[[0]](input=rel#12:Subset#1.NONE.[0],condition==($1,
5)), 
rel#11:LogicalFilter.NONE.[[0]](input=rel#10:Subset#0.NONE.[0],condition=>($1,
5))]
at org.apache.calcite.util.Util.newInternal(Util.java:792)
at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall
.java:236)
at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner
.java:819)
at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:334)
at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:308)
at calcite.planner.SaberPlanner.getLogicalPlan(SaberPlanner.java:287)
at calcite.Tester.main(Tester.java:68)
...

I try to use something like :

final Filter filter = call.rel(0);
final Filter secFilter = call.rel(1);
final LogicalFilter newFilter = LogicalFilter.create(filter.getInput(),
secFilter.getCondition());
final LogicalFilter newSecFilter = LogicalFilter.create(newFilter, filter.
getCondition());
call.transformTo(newSecFilter);

but nothing hapens. Volcano does nothing and Hep crushes. Also I think
RelBuilder doesn't try to merge them. Do you have any suggestions?

Thanks,
George

2016-11-14 21:17 GMT+02:00 Julian Hyde :

> You have two calls to “build”. That looks wrong, because “build” pops an
> entry off the stack.
>
> If your intention is to swap the filters then you should push them in the
> reverse order than you are currently pushing them.
>
> Lastly, RelBuilder.filter might try to merge consecutive filter nodes. If
> it does — and I don’t recall whether it does — your filter had better be
> flagged non-deterministic (or something) to prevent the merge from
> happening.
>
> Julian
>
> > On Nov 14, 2016, at 1:06 AM, Γιώργος Θεοδωράκης 
> wrote:
> >
> > Hello,
> >
> > I want to create a rule that pushes a filter through another filter ( I
> > don't merge them) according to their selectivities to optimize the final
> > plan. I am using other rules as templates to create it but I keep getting
> > errors, as I haven't understood correctly the basics. I want to have
> > something like this :
> >
> >  public void onMatch(RelOptRuleCall call) {
> >final Filter filter = call.rel(0);
> >final Filter secFilter = call.rel(1);
> >final RelBuilder relBuilder = call.builder();
> >relBuilder.push(filter)
> >.filter(secFilter.getCondition())
> >.build();
> >call.transformTo(relBuilder.build());
> > }
>
>


[jira] [Created] (CALCITE-1493) Wrong plan for NOT IN correlated queries

2016-11-15 Thread Vineet Garg (JIRA)
Vineet Garg created CALCITE-1493:


 Summary: Wrong plan for NOT IN correlated queries
 Key: CALCITE-1493
 URL: https://issues.apache.org/jira/browse/CALCITE-1493
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Vineet Garg
Assignee: Julian Hyde


Plan generated by calcite with SubqueryRemoveRule followed by de-correlation 
for the following query:
{noformat} select sal from emp where empno NOT IN (select deptno from dept 
where emp.job = dept.name) {noformat}

is

{noformat}
LogicalProject(SAL=[$5])
  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
LogicalFilter(condition=[IS NULL($11)])
  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], DEPTNO0=[CAST($9):INTEGER], 
JOB0=[CAST($10):VARCHAR(10) CHARACTER SET "ISO-8859-1" COLLATE 
"ISO-8859-1$en_US$primary"], $f2=[CAST($11):BOOLEAN])
LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner])
  LogicalTableScan(table=[[CATALOG, SALES, EMP]])
  LogicalProject(DEPTNO=[$0], JOB=[$1], $f2=[true])
LogicalAggregate(group=[{0, 1}])
  LogicalProject(DEPTNO=[$0], JOB=[$2], i=[$1])
LogicalProject(DEPTNO=[$0], i=[true], JOB=[$1])
  LogicalProject(DEPTNO=[$0], JOB=[$2])
LogicalJoin(condition=[=($2, $1)], joinType=[inner])
  LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
  LogicalAggregate(group=[{0}])
LogicalProject(JOB=[$2])
  LogicalTableScan(table=[[CATALOG, SALES, EMP]])
{noformat}

As you can notice instead of doing {noformat} Left Outer Join {noformat} 
Calcite is doing {noformat} Inner Join {noformat}. This will produce wrong 
results.

Plan for same query just before SubqueryRemove Rule is:
{noformat}
LogicalProject(SAL=[$5])
  LogicalFilter(condition=[NOT(IN($0, {
LogicalProject(DEPTNO=[$0])
  LogicalFilter(condition=[=($cor0.JOB, $1)])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
}))], variablesSet=[[$cor0]])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
{noformat}

Plan just after SubqueryRemove Rule:

{noformat}
LogicalProject(SAL=[$5])
  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
LogicalFilter(condition=[IS NULL($10)])
  LogicalFilter(condition=[=($0, $9)])
LogicalCorrelate(correlation=[$cor0], joinType=[LEFT], 
requiredColumns=[{2}])
  LogicalTableScan(table=[[CATALOG, SALES, EMP]])
  LogicalAggregate(group=[{0, 1}])
LogicalProject(DEPTNO=[$0], i=[true])
  LogicalProject(DEPTNO=[$0])
LogicalFilter(condition=[=($cor0.JOB, $1)])
  LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{noformat}

Looking at above it seems RelDecorrelator have some issue where it is coming up 
with Inner Join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Spark with Calcite JDBC and Druid adapter

2016-11-15 Thread herman...@teeupdata.com
Played around Calcite JDBC settings, especially with lexical, some settings 
returns empty set (default with caseSensitive=false) when where is filter/join, 
some just failed at parsing phase (e.g. lex=JAVA) :

ava.sql.SQLException: Error while preparing statement [SELECT 
"timestamp","commentorId","hashTagId”,"value","count" FROM yyy WHERE (hashTagId 
IS NOT NULL) AND (hashTagId = 'hashTag_01')]
at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
at 
org.apache.calcite.jdbc.CalciteConnectionImpl.prepareStatement_(CalciteConnectionImpl.java:204)
at 
org.apache.calcite.jdbc.CalciteConnectionImpl.prepareStatement(CalciteConnectionImpl.java:186)
at 
org.apache.calcite.jdbc.CalciteConnectionImpl.prepareStatement(CalciteConnectionImpl.java:87)
at 
org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:264)

when with default setting, after removing "(hashTagId IS NOT NULL) AND" from 
where clause, correct result set returned. so it does seem to me that this is 
calcite configuration issue. Does anybody have any experience using Calcite 
JDBC with Spark?

thanks
Herman.


> On Nov 15, 2016, at 10:45, herman...@teeupdata.com wrote:
> 
> Hi everyone,
> 
> When accessing Druid through Calcite JDBC and Druid adapter from Spark, I 
> have been experiencing strange results. 
> 
> Druid data schema is defined as:
> 
>{
>  "type": "custom",
>  "name": “xxx",
>  "factory": "org.apache.calcite.adapter.druid.DruidSchemaFactory",
>  "operand": {
>"url": "http://:8082",
>"coordinatorUrl": "http://:8081"
>  },
>  "tables": [
>{
>  "name": “yyy",
>  "factory": "org.apache.calcite.adapter.druid.DruidTableFactory",
>  "operand": {
>"dataSource": “t",
>"interval": "2016-11-08T00:00:00.000Z/2016-12-31T00:00:00.000Z",
>"timestampColumn": "timestamp",
>"dimensions": [
>  "commentorId", 
>  "hashTagId",
>  “value"
>],
>  "metrics": [
>   {
> "type" : "count",
> "name" : "count"
>   }
>  ]
>  }
>}
>  ]
> 
> with a  JDBC client, queries like these work fine:
> 1. select * from yay where hashTagId=“hashTag_01”
> 2. select badcount.hashTagId, badcount.bad,totalcount.total, 
> badcount.bad/totalcount.total*100 as bad_pct 
>   from 
>   (select hashTagId, cast(count(*) as double) as bad from yyy 
> where value='bad' group by hashTagId) as badcount
>   join 
>   (select hashTagId, cast(count(*) as double) as 
> total from yyy group by hashTagId) as totalcount
>   on 
>   
> (badcount.hashTagId=totalcount.hashTagId)
> 
> However, in spark 2.0, it is strange
> 1. df_yyy = spark.read.format(“jdbc”).option(“url”,jdbc:calcite:model= to schema json>;caseSensitive=false”)…
> 2. df_yyy.show() —— works fine, returns all records
> 3. df_yyy.filter($”hashTagId”=“hashTag_01”).count() — returns the correct 
> number of records
> 4. df_yyy.filter($”hashTagId”=“hashTag_01”).show() — returns empty result set
> 5. df_yyy.join(, ).show() —— 
> returns empty result set (any joins returns empty result set)
> 
> I am suspecting there are conflicts between  how spark parses SQL and how 
> Calcite JDBC does. are there special properties to set as of JDBC string to 
> make it work with Spark? Is there a Calcite JDBC log file that I can dig 
> through? I did some googling and don’t see similar usage with 
> spark/calcite/druid, is this the right way accessing druid from spark? (may 
> be this is a question better for spark/druid community…)
> 
> Thanks.
> Herman.
> 



Spark with Calcite JDBC and Druid adapter

2016-11-15 Thread herman...@teeupdata.com
Hi everyone,

When accessing Druid through Calcite JDBC and Druid adapter from Spark, I have 
been experiencing strange results. 

Druid data schema is defined as:

{
  "type": "custom",
  "name": “xxx",
  "factory": "org.apache.calcite.adapter.druid.DruidSchemaFactory",
  "operand": {
"url": "http://:8082",
"coordinatorUrl": "http://:8081"
  },
  "tables": [
{
  "name": “yyy",
  "factory": "org.apache.calcite.adapter.druid.DruidTableFactory",
  "operand": {
"dataSource": “t",
"interval": "2016-11-08T00:00:00.000Z/2016-12-31T00:00:00.000Z",
"timestampColumn": "timestamp",
"dimensions": [
  "commentorId", 
  "hashTagId",
  “value"
],
  "metrics": [
{
  "type" : "count",
  "name" : "count"
}
   ]
  }
}
  ]

with a  JDBC client, queries like these work fine:
1. select * from yay where hashTagId=“hashTag_01”
2. select badcount.hashTagId, badcount.bad,totalcount.total, 
badcount.bad/totalcount.total*100 as bad_pct 
from 
(select hashTagId, cast(count(*) as double) as bad from yyy 
where value='bad' group by hashTagId) as badcount
join 
(select hashTagId, cast(count(*) as double) as 
total from yyy group by hashTagId) as totalcount
on 

(badcount.hashTagId=totalcount.hashTagId)

However, in spark 2.0, it is strange
1. df_yyy = spark.read.format(“jdbc”).option(“url”,jdbc:calcite:model=;caseSensitive=false”)…
2. df_yyy.show() —— works fine, returns all records
3. df_yyy.filter($”hashTagId”=“hashTag_01”).count() — returns the correct 
number of records
4. df_yyy.filter($”hashTagId”=“hashTag_01”).show() — returns empty result set
5. df_yyy.join(, ).show() —— returns 
empty result set (any joins returns empty result set)

I am suspecting there are conflicts between  how spark parses SQL and how 
Calcite JDBC does. are there special properties to set as of JDBC string to 
make it work with Spark? Is there a Calcite JDBC log file that I can dig 
through? I did some googling and don’t see similar usage with 
spark/calcite/druid, is this the right way accessing druid from spark? (may be 
this is a question better for spark/druid community…)

Thanks.
Herman.



[jira] [Created] (CALCITE-1492) Table name with star is not supported.

2016-11-15 Thread Dongming Liu (JIRA)
Dongming Liu created CALCITE-1492:
-

 Summary: Table name with star is not supported.
 Key: CALCITE-1492
 URL: https://issues.apache.org/jira/browse/CALCITE-1492
 Project: Calcite
  Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Dongming Liu
Assignee: Julian Hyde


sql1:

{code}
select * from t1;
 {code}

This sql is ok.

sql2:

{code}
select t1.* from t1;
{code}

This sql has error as follows:

{noformat}
>From line 1, column 8 to line 1, column 9: Column '"*"' not found in table 't1'
{noformat}

Does calcite not support it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)