[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637491#comment-16637491 ] Gautam Kumar Parai edited comment on DRILL-786 at 10/3/18 8:51 PM: --- Yes, option 3 makes the most sense given the alternatives. The default value (TRUE) of the option serves as a defensive check. When the user sets it to FALSE they know what they are getting into rather than Drill surprising them. was (Author: gparai): Yes, option 3 makes the most sense. The default value (TRUE) of the option serves as a defensive check. When the user sets it to FALSE they know what they are getting into rather than Drill surprising them. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Stack trace: > org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node > [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; > planner state: > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0
[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306 ] Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:53 AM: -- We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}. {color:#33}I have provided results of the investigation in the prior comments.{color}{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. Also we can consider changing default value of the option to false thus queries producing Cartesian product would always succeed. was (Author: ihorhuzenko): We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}. {color:#33} I have provided results of the investigation in the prior comments.{color}{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. Also we can consider changing default value of the option to false thus queries producing Cartesian product would always succeed. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: >
[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306 ] Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:53 AM: -- We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}. {color:#33} I have provided results of the investigation in the prior comments.{color}{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. Also we can consider changing default value of the option to false thus queries producing Cartesian product would always succeed. was (Author: ihorhuzenko): We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. Also we can consider changing default value of the option to false thus queries producing Cartesian product would always succeed. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL],
[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306 ] Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:50 AM: -- We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. Also we can consider changing default value of the option to false thus queries producing Cartesian product would always succeed. was (Author: ihorhuzenko): We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0
[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306 ] Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:45 AM: -- We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. was (Author: ihorhuzenko): We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, {color:#d04437}it's really hard to implement this approach, it requires a lot of time and includes a lot of changes to Apache Calcite.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 >
[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630719#comment-16630719 ] Ihor Huzenko edited comment on DRILL-786 at 9/27/18 4:55 PM: - As you may know cross joins are allowed when option {color:#59afe1}planner.enable_nljoin_for_scalar_only{color} is set to false and join performed using comma syntax. But main goal of this task is to allow explicit cross joins in queries when option is enabled and at the same time disallow other ways to execute cross joins (for example, list tables via comma in FROM section of query without condition) while option is enabled. The main idea about how we could implement this task is to allow usage of NestedLoopJoin in two places where mentioned above option is checked: # [DrillJoinRelBase.java|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java] method *computeSelfCost*, line: _if (PrelUtil.getPlannerSettings(planner).isNlJoinForScalarOnly())_ # [NestedLoopJoinPrule.java |https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/NestedLoopJoinPrule.java] method *checkPreconditions* line: _if (settings.isNlJoinForScalarOnly())._ But we should allow it only if our current join rel node is actually originated from explicit cross join in SQL query. Here is where the main challenge comes in, at this points we don't know that the rel node is explicit cross join. Because after invocation of Calcite's org.apache.calcite.sql2rel.SqlToRelConverter, converted SqlNode becomes LogicalJoin rel node with type INNER, see SqlToRelConverter: {code:java} private static JoinRelType convertJoinType(JoinType joinType) { switch (joinType) { case COMMA: case INNER: case CROSS: return JoinRelType.INNER; case FULL: return JoinRelType.FULL; case LEFT: return JoinRelType.LEFT; case RIGHT: return JoinRelType.RIGHT; default: throw Util.unexpected(joinType); } } {code} I tried to add custom RelTrait and with help of reflections magic I was even able to overcome HepPlanner's conversions of LogicalJoin nodes. But then I got an error from VolcanoPlanner's code: {code:java} if (traits.size() != traitDefs.size()) { throw new AssertionError("Relational expression " + rel + " does not have the correct number of traits: " + traits.size() + " != " + traitDefs.size()); } {code} So it's impossible to use traitSet for marking that rel node is came from explicit CROSS JOIN syntax. I see two options how we could overcome this problem and both of them include changes to Calcite's LogicalJoin class (just because it's final class :(): 1) Either add additional flag and preserve it between recreation of LogicalJoin instances, as it was done for field: {code:java} private final boolean semiJoinDone; {code} But major disadvantage of this approach is that updated constructors will break other clients code. 2) Add ability to register static callback function that will be called after creation of new instance inside copy method, and accept both oldRelNode and newRelNode. So then we could trace ids of LogicalJoin instances since creation of first such instance in org.apache.calcite.sql2rel.SqlToRelConverter. This are all ideas that I have now. I'm very new to Drill and Calcite and maybe I don't see other good alternatives. Dear drillers, could you please take a look and share your thoughts about possible options ? was (Author: ihorhuzenko): As you may know CROSS JOIN syntax works fine when option {color:#59afe1}planner.enable_nljoin_for_scalar_only{color} is set to false. But main goal of this task is to allow explicit cross joins in queries when option is enabled and at the same time disallow other ways to execute cross joins (for example, list tables via comma in FROM section of query without condition) while option is enabled. The main idea about how we could implement this task is to allow usage of NestedLoopJoin in two places where mentioned above option is checked: # [DrillJoinRelBase.java|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java] method *computeSelfCost*, line: _if (PrelUtil.getPlannerSettings(planner).isNlJoinForScalarOnly())_ # [NestedLoopJoinPrule.java |https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/NestedLoopJoinPrule.java] method *checkPreconditions* line: _if (settings.isNlJoinForScalarOnly())._ But we should allow it only if our current join rel node is actually originated from explicit cross join in SQL query. Here is where the main challenge comes in, at this points we don't know that the rel node is explicit cross join. Because after invocation of