[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

2018-10-03 Thread Gautam Kumar Parai (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637491#comment-16637491
 ] 

Gautam Kumar Parai edited comment on DRILL-786 at 10/3/18 8:51 PM:
---

Yes, option 3 makes the most sense given the alternatives. The default value 
(TRUE) of the option serves as a defensive check. When the user sets it to 
FALSE they know what they are getting into rather than Drill surprising them.


was (Author: gparai):
Yes, option 3 makes the most sense. The default value (TRUE) of the option 
serves as a defensive check. When the user sets it to FALSE they know what they 
are getting into rather than Drill surprising them.

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306
 ] 

Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:53 AM:
--

We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}.  {color:#33}I 
have provided results of the investigation in the prior comments.{color}{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 Also we can consider changing default value of the option to false thus 
queries producing Cartesian product would always succeed.

 


was (Author: ihorhuzenko):
We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}. {color:#33} I 
have provided results of the investigation in the prior comments.{color}{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 Also we can consider changing default value of the option to false thus 
queries producing Cartesian product would always succeed.

 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> 

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306
 ] 

Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:53 AM:
--

We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}. {color:#33} I 
have provided results of the investigation in the prior comments.{color}{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 Also we can consider changing default value of the option to false thus 
queries producing Cartesian product would always succeed.

 


was (Author: ihorhuzenko):
We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 Also we can consider changing default value of the option to false thus 
queries producing Cartesian product would always succeed.

 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], 

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306
 ] 

Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:50 AM:
--

We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 Also we can consider changing default value of the option to false thus 
queries producing Cartesian product would always succeed.

 


was (Author: ihorhuzenko):
We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 

 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306
 ] 

Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:45 AM:
--

We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 

 


was (Author: ihorhuzenko):
We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, {color:#d04437}it's 
really hard to implement this approach, it requires a lot of time and includes 
a lot of changes to Apache Calcite.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 

 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> 

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

2018-09-27 Thread Ihor Huzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630719#comment-16630719
 ] 

Ihor Huzenko edited comment on DRILL-786 at 9/27/18 4:55 PM:
-

As you may know cross joins are allowed when option 
{color:#59afe1}planner.enable_nljoin_for_scalar_only{color} is set to false and 
join performed using comma syntax. But main goal of this task is to allow 
explicit cross joins in queries when option is enabled and at the same time 
disallow other ways to execute cross joins (for example, list tables via comma 
in FROM section of query without condition) while option is enabled.  

The main idea about how we could implement this task is to allow usage of 
NestedLoopJoin in two places where mentioned above option is checked:  
 # 
[DrillJoinRelBase.java|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java]
 method *computeSelfCost*, line: _if 
(PrelUtil.getPlannerSettings(planner).isNlJoinForScalarOnly())_
 # [NestedLoopJoinPrule.java 
|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/NestedLoopJoinPrule.java]
 method *checkPreconditions* line:  _if (settings.isNlJoinForScalarOnly())._

But we should allow it only if our current join rel node is actually originated 
from explicit cross join in SQL query.  Here is where the main challenge comes 
in, at this points we don't know that the rel node is explicit cross join. 
Because after invocation of Calcite's 
org.apache.calcite.sql2rel.SqlToRelConverter, converted SqlNode becomes 
LogicalJoin rel node with type INNER, see SqlToRelConverter: 
{code:java}
private static JoinRelType convertJoinType(JoinType joinType) {
  switch (joinType) {
  case COMMA:
  case INNER:
  case CROSS:
return JoinRelType.INNER;
  case FULL:
return JoinRelType.FULL;
  case LEFT:
return JoinRelType.LEFT;
  case RIGHT:
return JoinRelType.RIGHT;
  default:
throw Util.unexpected(joinType);
  }
}
{code}
 I tried to add custom RelTrait and with help of reflections magic I was even 
able to overcome HepPlanner's conversions of LogicalJoin nodes. But then I got 
an error from VolcanoPlanner's code: 
{code:java}
if (traits.size() != traitDefs.size()) {
  throw new AssertionError("Relational expression " + rel
  + " does not have the correct number of traits: " + traits.size()
  + " != " + traitDefs.size());
}
{code}
 So it's impossible to use traitSet for marking that rel node is came from 
explicit CROSS JOIN syntax.  I see two options how we could overcome this 
problem and both of them include changes to Calcite's LogicalJoin class (just 
because it's final class :(): 

1) Either add additional flag and preserve it between recreation of LogicalJoin 
instances, as it was done for field:
{code:java}
private final boolean semiJoinDone;
{code}
But major disadvantage of this approach is that updated constructors will break 
other clients code.

 2) Add ability to register static callback function that will be called after 
creation of new instance inside copy method, and accept both oldRelNode and 
newRelNode. So then  we could trace ids of LogicalJoin instances since creation 
of first such instance in org.apache.calcite.sql2rel.SqlToRelConverter. 

This are all ideas that I have now. I'm very new to Drill and Calcite and maybe 
I don't see other good alternatives. Dear drillers, could you please take a 
look and share your thoughts about possible options ?  

 


was (Author: ihorhuzenko):
As you may know CROSS JOIN syntax works fine when option 
{color:#59afe1}planner.enable_nljoin_for_scalar_only{color} is set to false. 
But main goal of this task is to allow explicit cross joins in queries when 
option is enabled and at the same time disallow other ways to execute cross 
joins (for example, list tables via comma in FROM section of query without 
condition) while option is enabled.  

The main idea about how we could implement this task is to allow usage of 
NestedLoopJoin in two places where mentioned above option is checked:  
 # 
[DrillJoinRelBase.java|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java]
 method *computeSelfCost*, line: _if 
(PrelUtil.getPlannerSettings(planner).isNlJoinForScalarOnly())_
 # [NestedLoopJoinPrule.java 
|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/NestedLoopJoinPrule.java]
 method *checkPreconditions* line:  _if (settings.isNlJoinForScalarOnly())._

But we should allow it only if our current join rel node is actually originated 
from explicit cross join in SQL query.  Here is where the main challenge comes 
in, at this points we don't know that the rel node is explicit cross join. 
Because after invocation of