[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670157#comment-16670157 ]
ASF GitHub Bot commented on DRILL-786: -------------------------------------- ihuzenko commented on issue #1488: DRILL-786: Allow CROSS JOIN syntax URL: https://github.com/apache/drill/pull/1488#issuecomment-434703183 Hi @amansinha100 . Thanks for your active involvement, I investigated the issue and found that: when option `planner.slice_target` is set to low value, then StreamAggPrule at the first match will try to create 2 phase plan, but method call.transformTo(newRel) won't be called (means that rule results are ignored), because the rule matches at the very early stage of physical planning, there aren't rels matched by new traitSet with physical convention. Currently for simple cross join query with aggregate function, physical rules are applied in order, which is wrong and results into PlanningException: Prel.ScreenPrule Prel.ScreenPrule Prel.ScanPrule Prel.ScanPrule ProjectPrule ProjectPrule Prel.NestedLoopJoinPrule ExpandConversionRule ExpandConversionRule .... The problem may be fixed either by returning false from StreamAggPrule.create2PhasePlan(call, aggregate) for cross join queries, so then StreamAggPrule will be applied successfully at early stage. Or another option is to change NestedLoopJoinPrule left and join node's distribution trait from ANY to SINGLETON, which then results in activation of ProjectPrule -> StreamAggPrule etc. So then StreamAggPrule is activated again, but now at the right stage when it can be applied successfully. @vvysotskyi suggests to proceed with second option, but in order to do this I need to investigate more about how changing of the distribution traits for NestedLoopJoinPrule left and join node impacts physical planning. And then, if possible, try to activate necessary rules without distribution trait change. @amansinha100 I'd appreciate if you could share your thoughts about the options. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement CROSS JOIN > -------------------- > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning & Optimization > Affects Versions: 1.14.0 > Reporter: Krystal > Assignee: Igor Guzenko > Priority: Major > Labels: doc-impacting > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Stack trace: > org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node > [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; > planner state: > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Sets: > Set#22, type: (DrillRecordRow[*, age, name, studentnum]) > rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, > importance=0.5904900000000001 > rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), > rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 > network} > rel#333:AbstractConverter.LOGICAL.ANY([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1000.0, cumulative cost={inf} > rel#337:AbstractConverter.LOGICAL.ANY([]).[](child=rel#336:Subset#22.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1000.0, cumulative cost={inf} > rel#332:Subset#22.PHYSICAL.ANY([]).[], best=rel#335, importance=0.531441 > rel#334:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1000.0, cumulative cost={inf} > rel#338:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#336:Subset#22.PHYSICAL.SINGLETON([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1000.0, cumulative cost={inf} > rel#339:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), > rowcount=1000.0, cumulative cost={inf} > rel#340:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), > rowcount=1000.0, cumulative cost={inf} > rel#335:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/student]], > selectionRoot=/drill/testdata/p1tests/student, columns=[SchemaPath [`age`], > SchemaPath [`name`], SchemaPath [`studentnum`]]]), rowcount=1000.0, > cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network} > rel#336:Subset#22.PHYSICAL.SINGLETON([]).[], best=rel#335, > importance=0.4782969000000001 > rel#339:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), > rowcount=1000.0, cumulative cost={inf} > rel#340:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), > rowcount=1000.0, cumulative cost={inf} > rel#335:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/student]], > selectionRoot=/drill/testdata/p1tests/student, columns=[SchemaPath [`age`], > SchemaPath [`name`], SchemaPath [`studentnum`]]]), rowcount=1000.0, > cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network} > Set#23, type: (DrillRecordRow[*, age, name, studentnum]) > rel#308:Subset#23.LOGICAL.ANY([]).[], best=rel#307, importance=0.6561 > rel#307:DrillFilterRel.LOGICAL.ANY([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],condition==(CAST($1):INTEGER, > 20)), rowcount=150.0, cumulative cost={2000.0 rows, 8000.0 cpu, 0.0 io, 0.0 > network} > rel#343:AbstractConverter.LOGICAL.ANY([]).[](child=rel#342:Subset#23.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=150.0, cumulative cost={inf} > rel#342:Subset#23.PHYSICAL.SINGLETON([]).[], best=rel#341, > importance=0.5904900000000001 > rel#344:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#308:Subset#23.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), > rowcount=150.0, cumulative cost={inf} > rel#341:FilterPrel.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],condition==(CAST($1):INTEGER, > 20)), rowcount=150.0, cumulative cost={2000.0 rows, 8000.0 cpu, 0.0 io, 0.0 > network} > Set#24, type: (DrillRecordRow[*, age]) > rel#309:Subset#24.LOGICAL.ANY([]).[], best=rel#140, > importance=0.5904900000000001 > rel#140:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, voter]), rowcount=1000.0, > cumulative cost={1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 network} > rel#330:AbstractConverter.LOGICAL.ANY([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1000.0, cumulative cost={inf} > rel#349:AbstractConverter.LOGICAL.ANY([]).[](child=rel#348:Subset#24.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1000.0, cumulative cost={inf} > rel#329:Subset#24.PHYSICAL.ANY([]).[], best=rel#347, importance=0.531441 > rel#331:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1000.0, cumulative cost={inf} > rel#350:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#348:Subset#24.PHYSICAL.SINGLETON([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1000.0, cumulative cost={inf} > rel#351:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), > rowcount=1000.0, cumulative cost={inf} > rel#352:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), > rowcount=1000.0, cumulative cost={inf} > rel#347:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/voter]], > selectionRoot=/drill/testdata/p1tests/voter, columns=[SchemaPath [`age`]]]), > rowcount=1000.0, cumulative cost={1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 > network} > rel#348:Subset#24.PHYSICAL.SINGLETON([]).[], best=rel#347, > importance=0.4782969000000001 > rel#351:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), > rowcount=1000.0, cumulative cost={inf} > rel#352:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), > rowcount=1000.0, cumulative cost={inf} > rel#347:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/voter]], > selectionRoot=/drill/testdata/p1tests/voter, columns=[SchemaPath [`age`]]]), > rowcount=1000.0, cumulative cost={1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 > network} > Set#25, type: (DrillRecordRow[*, age]) > rel#311:Subset#25.LOGICAL.ANY([]).[], best=rel#310, importance=0.6561 > rel#310:DrillFilterRel.LOGICAL.ANY([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],condition==(CAST($1):INTEGER, > 20)), rowcount=150.0, cumulative cost={2000.0 rows, 6000.0 cpu, 0.0 io, 0.0 > network} > rel#355:AbstractConverter.LOGICAL.ANY([]).[](child=rel#354:Subset#25.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=150.0, cumulative cost={inf} > rel#354:Subset#25.PHYSICAL.SINGLETON([]).[], best=rel#353, > importance=0.5904900000000001 > rel#356:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#311:Subset#25.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), > rowcount=150.0, cumulative cost={inf} > rel#353:FilterPrel.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],condition==(CAST($1):INTEGER, > 20)), rowcount=150.0, cumulative cost={2000.0 rows, 6000.0 cpu, 0.0 io, 0.0 > network} > Set#26, type: RecordType(ANY *, ANY age, ANY name, ANY studentnum, ANY *0, > ANY age0) > rel#313:Subset#26.LOGICAL.ANY([]).[], best=rel#312, > importance=0.7290000000000001 > rel#312:DrillJoinRel.LOGICAL.ANY([]).[](left=rel#308:Subset#23.LOGICAL.ANY([]).[],right=rel#311:Subset#25.LOGICAL.ANY([]).[],condition=true,joinType=inner), > rowcount=22500.0, cumulative cost={4001.0 rows, 14001.0 cpu, 0.0 io, 0.0 > network} > rel#327:AbstractConverter.LOGICAL.ANY([]).[](child=rel#326:Subset#26.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1.7976931348623157E308, cumulative cost={inf} > rel#326:Subset#26.PHYSICAL.ANY([]).[], best=null, importance=0.6561 > rel#328:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#313:Subset#26.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=22500.0, cumulative cost={inf} > Set#27, type: RecordType(ANY name, ANY age, ANY studentnum) > rel#315:Subset#27.LOGICAL.ANY([]).[], best=rel#314, importance=0.81 > rel#314:DrillProjectRel.LOGICAL.ANY([]).[](child=rel#313:Subset#26.LOGICAL.ANY([]).[],name=$2,age=$1,studentnum=$3), > rowcount=22500.0, cumulative cost={26501.0 rows, 14013.0 cpu, 0.0 io, 0.0 > network} > rel#322:AbstractConverter.LOGICAL.ANY([]).[](child=rel#321:Subset#27.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1.7976931348623157E308, cumulative cost={inf} > rel#321:Subset#27.PHYSICAL.SINGLETON([]).[], best=null, > importance=0.7290000000000001 > rel#323:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#315:Subset#27.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), > rowcount=22500.0, cumulative cost={inf} > Set#28, type: RecordType(ANY name, ANY age, ANY studentnum) > rel#317:Subset#28.LOGICAL.ANY([]).[], best=rel#316, importance=0.9 > rel#316:DrillScreenRel.LOGICAL.ANY([]).[](child=rel#315:Subset#27.LOGICAL.ANY([]).[]), > rowcount=22500.0, cumulative cost={28751.0 rows, 16263.0 cpu, 0.0 io, 0.0 > network} > rel#319:AbstractConverter.LOGICAL.ANY([]).[](child=rel#318:Subset#28.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1.7976931348623157E308, cumulative cost={inf} > rel#318:Subset#28.PHYSICAL.SINGLETON([]).[], best=null, importance=1.0 > rel#320:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#317:Subset#28.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), > rowcount=22500.0, cumulative cost={inf} > rel#324:ScreenPrel.PHYSICAL.SINGLETON([]).[](child=rel#321:Subset#27.PHYSICAL.SINGLETON([]).[]), > rowcount=1.7976931348623157E308, cumulative cost={inf} > org.eigenbase.relopt.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:445) > ~[optiq-core-0.7-20140513.013236-5.jar:na] > org.eigenbase.relopt.volcano.RelSubset.buildCheapestPlan(RelSubset.java:287) > ~[optiq-core-0.7-20140513.013236-5.jar:na] > org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:669) > ~[optiq-core-0.7-20140513.013236-5.jar:na] > net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:271) > ~[optiq-core-0.7-20140513.013236-5.jar:na] > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:119) > > ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:89) > > ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:134) > > ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:338) > [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:186) > [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] -- This message was sent by Atlassian JIRA (v7.6.3#76005)