[jira] [Created] (CALCITE-1495) Add a rule to convert INNER JOIN preceded by GROUP BY to appropriate SEMI-JOIN
Vineet Garg created CALCITE-1495: Summary: Add a rule to convert INNER JOIN preceded by GROUP BY to appropriate SEMI-JOIN Key: CALCITE-1495 URL: https://issues.apache.org/jira/browse/CALCITE-1495 Project: Calcite Issue Type: New Feature Components: core Reporter: Vineet Garg Assignee: Julian Hyde For IN and EXISTS subqueries Calcite currently generates plan consisting of GROUP BY on inner table followed by INNER JOIN with outer table. e.g. for following query: {noformat} :select sal from emp where empno IN (select deptno from dept) {noformat} Generated plan is: {noformat} LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalJoin(condition=[=($0, $9)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0}]) LogicalProject(DEPTNO=[$0]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {noformat} Such cases could be converted using this rule to use SEMI-JOIN to make it more efficient -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Create a FilterPushThroughFilterRule
Can you log a JIRA case, and include the full error stack? (VolcanoRuleCall.java:236 is just a “re-throw”). > On Nov 15, 2016, at 3:59 PM, Γιώργος Θεοδωράκης > wrote: > > Hello Julian, > > I get no matter what I do this exception: > Exception in thread "main" java.lang.AssertionError: Internal error: Error > while applying rule FilterPushThroughFilter, args > [rel#15:LogicalFilter.NONE.[[0]](input=rel#12:Subset#1.NONE.[0],condition==($1, > 5)), > rel#11:LogicalFilter.NONE.[[0]](input=rel#10:Subset#0.NONE.[0],condition=>($1, > 5))] > at org.apache.calcite.util.Util.newInternal(Util.java:792) > at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall > .java:236) > at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner > .java:819) > at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:334) > at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:308) > at calcite.planner.SaberPlanner.getLogicalPlan(SaberPlanner.java:287) > at calcite.Tester.main(Tester.java:68) > ... > > I try to use something like : > > final Filter filter = call.rel(0); > final Filter secFilter = call.rel(1); > final LogicalFilter newFilter = LogicalFilter.create(filter.getInput(), > secFilter.getCondition()); > final LogicalFilter newSecFilter = LogicalFilter.create(newFilter, filter. > getCondition()); > call.transformTo(newSecFilter); > > but nothing hapens. Volcano does nothing and Hep crushes. Also I think > RelBuilder doesn't try to merge them. Do you have any suggestions? > > Thanks, > George > > 2016-11-14 21:17 GMT+02:00 Julian Hyde : > >> You have two calls to “build”. That looks wrong, because “build” pops an >> entry off the stack. >> >> If your intention is to swap the filters then you should push them in the >> reverse order than you are currently pushing them. >> >> Lastly, RelBuilder.filter might try to merge consecutive filter nodes. If >> it does — and I don’t recall whether it does — your filter had better be >> flagged non-deterministic (or something) to prevent the merge from >> happening. >> >> Julian >> >>> On Nov 14, 2016, at 1:06 AM, Γιώργος Θεοδωράκης >> wrote: >>> >>> Hello, >>> >>> I want to create a rule that pushes a filter through another filter ( I >>> don't merge them) according to their selectivities to optimize the final >>> plan. I am using other rules as templates to create it but I keep getting >>> errors, as I haven't understood correctly the basics. I want to have >>> something like this : >>> >>> public void onMatch(RelOptRuleCall call) { >>> final Filter filter = call.rel(0); >>> final Filter secFilter = call.rel(1); >>> final RelBuilder relBuilder = call.builder(); >>> relBuilder.push(filter) >>> .filter(secFilter.getCondition()) >>> .build(); >>> call.transformTo(relBuilder.build()); >>> } >> >>
[jira] [Created] (CALCITE-1494) Inefficient plan for co-related subqueries
Vineet Garg created CALCITE-1494: Summary: Inefficient plan for co-related subqueries Key: CALCITE-1494 URL: https://issues.apache.org/jira/browse/CALCITE-1494 Project: Calcite Issue Type: Improvement Components: core Reporter: Vineet Garg Assignee: Julian Hyde For co-related queries such as {noformat} select sal from emp where empno IN (select deptno from dept where emp.job = dept.name) {noformat} Calcite generates following plan (SubqueryRemove Rule + Decorrelation) {noformat} LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0, 1}]) LogicalProject(DEPTNO=[$0], JOB=[$1]) LogicalProject(DEPTNO=[$0], JOB=[$2]) LogicalJoin(condition=[=($2, $1)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) LogicalAggregate(group=[{0}]) LogicalProject(JOB=[$2]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {noformat} As you can notice there is a outer table scan (EMP in this case) to retrieve all distinct values for co-related column (EMP.JOB here), which is then joined with inner table (DEPT). I am not sure why is this step required. After this join Calcite is anyway doing group by to generate all distinct values for correlated and result column (DEPTNO, JOB) which is then joined with outer table. I think the scan + join of outer table with inner table to generate co-rrelated values is un-necessary and is not required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Create a FilterPushThroughFilterRule
Hello Julian, I get no matter what I do this exception: Exception in thread "main" java.lang.AssertionError: Internal error: Error while applying rule FilterPushThroughFilter, args [rel#15:LogicalFilter.NONE.[[0]](input=rel#12:Subset#1.NONE.[0],condition==($1, 5)), rel#11:LogicalFilter.NONE.[[0]](input=rel#10:Subset#0.NONE.[0],condition=>($1, 5))] at org.apache.calcite.util.Util.newInternal(Util.java:792) at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall .java:236) at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner .java:819) at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:334) at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:308) at calcite.planner.SaberPlanner.getLogicalPlan(SaberPlanner.java:287) at calcite.Tester.main(Tester.java:68) ... I try to use something like : final Filter filter = call.rel(0); final Filter secFilter = call.rel(1); final LogicalFilter newFilter = LogicalFilter.create(filter.getInput(), secFilter.getCondition()); final LogicalFilter newSecFilter = LogicalFilter.create(newFilter, filter. getCondition()); call.transformTo(newSecFilter); but nothing hapens. Volcano does nothing and Hep crushes. Also I think RelBuilder doesn't try to merge them. Do you have any suggestions? Thanks, George 2016-11-14 21:17 GMT+02:00 Julian Hyde : > You have two calls to “build”. That looks wrong, because “build” pops an > entry off the stack. > > If your intention is to swap the filters then you should push them in the > reverse order than you are currently pushing them. > > Lastly, RelBuilder.filter might try to merge consecutive filter nodes. If > it does — and I don’t recall whether it does — your filter had better be > flagged non-deterministic (or something) to prevent the merge from > happening. > > Julian > > > On Nov 14, 2016, at 1:06 AM, Γιώργος Θεοδωράκης > wrote: > > > > Hello, > > > > I want to create a rule that pushes a filter through another filter ( I > > don't merge them) according to their selectivities to optimize the final > > plan. I am using other rules as templates to create it but I keep getting > > errors, as I haven't understood correctly the basics. I want to have > > something like this : > > > > public void onMatch(RelOptRuleCall call) { > >final Filter filter = call.rel(0); > >final Filter secFilter = call.rel(1); > >final RelBuilder relBuilder = call.builder(); > >relBuilder.push(filter) > >.filter(secFilter.getCondition()) > >.build(); > >call.transformTo(relBuilder.build()); > > } > >
[jira] [Created] (CALCITE-1493) Wrong plan for NOT IN correlated queries
Vineet Garg created CALCITE-1493: Summary: Wrong plan for NOT IN correlated queries Key: CALCITE-1493 URL: https://issues.apache.org/jira/browse/CALCITE-1493 Project: Calcite Issue Type: Bug Components: core Reporter: Vineet Garg Assignee: Julian Hyde Plan generated by calcite with SubqueryRemoveRule followed by de-correlation for the following query: {noformat} select sal from emp where empno NOT IN (select deptno from dept where emp.job = dept.name) {noformat} is {noformat} LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalFilter(condition=[IS NULL($11)]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], DEPTNO0=[CAST($9):INTEGER], JOB0=[CAST($10):VARCHAR(10) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"], $f2=[CAST($11):BOOLEAN]) LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalProject(DEPTNO=[$0], JOB=[$1], $f2=[true]) LogicalAggregate(group=[{0, 1}]) LogicalProject(DEPTNO=[$0], JOB=[$2], i=[$1]) LogicalProject(DEPTNO=[$0], i=[true], JOB=[$1]) LogicalProject(DEPTNO=[$0], JOB=[$2]) LogicalJoin(condition=[=($2, $1)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) LogicalAggregate(group=[{0}]) LogicalProject(JOB=[$2]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {noformat} As you can notice instead of doing {noformat} Left Outer Join {noformat} Calcite is doing {noformat} Inner Join {noformat}. This will produce wrong results. Plan for same query just before SubqueryRemove Rule is: {noformat} LogicalProject(SAL=[$5]) LogicalFilter(condition=[NOT(IN($0, { LogicalProject(DEPTNO=[$0]) LogicalFilter(condition=[=($cor0.JOB, $1)]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) }))], variablesSet=[[$cor0]]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {noformat} Plan just after SubqueryRemove Rule: {noformat} LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalFilter(condition=[IS NULL($10)]) LogicalFilter(condition=[=($0, $9)]) LogicalCorrelate(correlation=[$cor0], joinType=[LEFT], requiredColumns=[{2}]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0, 1}]) LogicalProject(DEPTNO=[$0], i=[true]) LogicalProject(DEPTNO=[$0]) LogicalFilter(condition=[=($cor0.JOB, $1)]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {noformat} Looking at above it seems RelDecorrelator have some issue where it is coming up with Inner Join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Spark with Calcite JDBC and Druid adapter
Played around Calcite JDBC settings, especially with lexical, some settings returns empty set (default with caseSensitive=false) when where is filter/join, some just failed at parsing phase (e.g. lex=JAVA) : ava.sql.SQLException: Error while preparing statement [SELECT "timestamp","commentorId","hashTagId”,"value","count" FROM yyy WHERE (hashTagId IS NOT NULL) AND (hashTagId = 'hashTag_01')] at org.apache.calcite.avatica.Helper.createException(Helper.java:56) at org.apache.calcite.avatica.Helper.createException(Helper.java:41) at org.apache.calcite.jdbc.CalciteConnectionImpl.prepareStatement_(CalciteConnectionImpl.java:204) at org.apache.calcite.jdbc.CalciteConnectionImpl.prepareStatement(CalciteConnectionImpl.java:186) at org.apache.calcite.jdbc.CalciteConnectionImpl.prepareStatement(CalciteConnectionImpl.java:87) at org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:264) when with default setting, after removing "(hashTagId IS NOT NULL) AND" from where clause, correct result set returned. so it does seem to me that this is calcite configuration issue. Does anybody have any experience using Calcite JDBC with Spark? thanks Herman. > On Nov 15, 2016, at 10:45, herman...@teeupdata.com wrote: > > Hi everyone, > > When accessing Druid through Calcite JDBC and Druid adapter from Spark, I > have been experiencing strange results. > > Druid data schema is defined as: > >{ > "type": "custom", > "name": “xxx", > "factory": "org.apache.calcite.adapter.druid.DruidSchemaFactory", > "operand": { >"url": "http://:8082", >"coordinatorUrl": "http://:8081" > }, > "tables": [ >{ > "name": “yyy", > "factory": "org.apache.calcite.adapter.druid.DruidTableFactory", > "operand": { >"dataSource": “t", >"interval": "2016-11-08T00:00:00.000Z/2016-12-31T00:00:00.000Z", >"timestampColumn": "timestamp", >"dimensions": [ > "commentorId", > "hashTagId", > “value" >], > "metrics": [ > { > "type" : "count", > "name" : "count" > } > ] > } >} > ] > > with a JDBC client, queries like these work fine: > 1. select * from yay where hashTagId=“hashTag_01” > 2. select badcount.hashTagId, badcount.bad,totalcount.total, > badcount.bad/totalcount.total*100 as bad_pct > from > (select hashTagId, cast(count(*) as double) as bad from yyy > where value='bad' group by hashTagId) as badcount > join > (select hashTagId, cast(count(*) as double) as > total from yyy group by hashTagId) as totalcount > on > > (badcount.hashTagId=totalcount.hashTagId) > > However, in spark 2.0, it is strange > 1. df_yyy = spark.read.format(“jdbc”).option(“url”,jdbc:calcite:model= to schema json>;caseSensitive=false”)… > 2. df_yyy.show() —— works fine, returns all records > 3. df_yyy.filter($”hashTagId”=“hashTag_01”).count() — returns the correct > number of records > 4. df_yyy.filter($”hashTagId”=“hashTag_01”).show() — returns empty result set > 5. df_yyy.join(, ).show() —— > returns empty result set (any joins returns empty result set) > > I am suspecting there are conflicts between how spark parses SQL and how > Calcite JDBC does. are there special properties to set as of JDBC string to > make it work with Spark? Is there a Calcite JDBC log file that I can dig > through? I did some googling and don’t see similar usage with > spark/calcite/druid, is this the right way accessing druid from spark? (may > be this is a question better for spark/druid community…) > > Thanks. > Herman. >
Spark with Calcite JDBC and Druid adapter
Hi everyone, When accessing Druid through Calcite JDBC and Druid adapter from Spark, I have been experiencing strange results. Druid data schema is defined as: { "type": "custom", "name": “xxx", "factory": "org.apache.calcite.adapter.druid.DruidSchemaFactory", "operand": { "url": "http://:8082", "coordinatorUrl": "http://:8081" }, "tables": [ { "name": “yyy", "factory": "org.apache.calcite.adapter.druid.DruidTableFactory", "operand": { "dataSource": “t", "interval": "2016-11-08T00:00:00.000Z/2016-12-31T00:00:00.000Z", "timestampColumn": "timestamp", "dimensions": [ "commentorId", "hashTagId", “value" ], "metrics": [ { "type" : "count", "name" : "count" } ] } } ] with a JDBC client, queries like these work fine: 1. select * from yay where hashTagId=“hashTag_01” 2. select badcount.hashTagId, badcount.bad,totalcount.total, badcount.bad/totalcount.total*100 as bad_pct from (select hashTagId, cast(count(*) as double) as bad from yyy where value='bad' group by hashTagId) as badcount join (select hashTagId, cast(count(*) as double) as total from yyy group by hashTagId) as totalcount on (badcount.hashTagId=totalcount.hashTagId) However, in spark 2.0, it is strange 1. df_yyy = spark.read.format(“jdbc”).option(“url”,jdbc:calcite:model=;caseSensitive=false”)… 2. df_yyy.show() —— works fine, returns all records 3. df_yyy.filter($”hashTagId”=“hashTag_01”).count() — returns the correct number of records 4. df_yyy.filter($”hashTagId”=“hashTag_01”).show() — returns empty result set 5. df_yyy.join(, ).show() —— returns empty result set (any joins returns empty result set) I am suspecting there are conflicts between how spark parses SQL and how Calcite JDBC does. are there special properties to set as of JDBC string to make it work with Spark? Is there a Calcite JDBC log file that I can dig through? I did some googling and don’t see similar usage with spark/calcite/druid, is this the right way accessing druid from spark? (may be this is a question better for spark/druid community…) Thanks. Herman.
[jira] [Created] (CALCITE-1492) Table name with star is not supported.
Dongming Liu created CALCITE-1492: - Summary: Table name with star is not supported. Key: CALCITE-1492 URL: https://issues.apache.org/jira/browse/CALCITE-1492 Project: Calcite Issue Type: Bug Affects Versions: 1.11.0 Reporter: Dongming Liu Assignee: Julian Hyde sql1: {code} select * from t1; {code} This sql is ok. sql2: {code} select t1.* from t1; {code} This sql has error as follows: {noformat} >From line 1, column 8 to line 1, column 9: Column '"*"' not found in table 't1' {noformat} Does calcite not support it? -- This message was sent by Atlassian JIRA (v6.3.4#6332)