date:20181002

[jira] [Commented] (DRILL-6731) JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter

2018-10-02 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636496#comment-16636496
 ] 

ASF GitHub Bot commented on DRILL-6731:
---

sohami commented on issue #1459: DRILL-6731: Move the BFs aggregating work from 
the Foreman to the RuntimeFi…
URL: https://github.com/apache/drill/pull/1459#issuecomment-426521845
 
 
   @weijietong - Thanks! I ran the pre-commit tests and few JPPD tests are 
failing due to some issues related to locking mechanism in RuntimeFilterSink. I 
will try to fix that and update the PR. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter
> --
>
> Key: DRILL-6731
> URL: https://issues.apache.org/jira/browse/DRILL-6731
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Affects Versions: 1.15.0
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
> Fix For: 1.15.0
>
>
> This PR is to move the BloomFilter aggregating work from the foreman to 
> RuntimeFilter. Though this change, the RuntimeFilter can apply the incoming 
> BF as soon as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6767) Simplify transfer of information from the planner to the operators

2018-10-02 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6767:
---

 Summary: Simplify transfer of information from the planner to the 
operators
 Key: DRILL-6767
 URL: https://issues.apache.org/jira/browse/DRILL-6767
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators, Query Planning & 
Optimization
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.15.0


Currently little specific information known to the planner is passed to the 
operators. For example, see the `joinType` parameter passed to the Join 
operators (specifying whether this is a LEFT, RIGHT, INNER of FULL join). 
 The relevant code passes this information explicitly via the constructors' 
signature (e.g., see HashJoinPOP, AbstractJoinPop, etc), and uses specific 
fields for this information, and affects all the test code using it, etc.
 In the near future many more such "pieces of information" will possibly be 
added to Drill, including:
 (1) Is this a Semi (or Anti-Semi) join.
 (2) `joinControl`
 (3) `isRowKeyJoin`
 (4) `isBroadcastJoin`
 (5) Which join columns are not needed (DRILL-6758)
 (6) Is this operator positioned between Lateral and UnNest.
 (7) For Hash-Agg: Which phase (already implemented).

Each addition of such information would require a significant code change, and 
add some code clutter.

*Suggestion*: Instead pass a single object containing all the needed planner 
information. So the next time another field is added, only that object needs to 
be changed. (Ideally the whole plan could be passed, and then each operator 
could poke and pick its needed fields)





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version

2018-10-02 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636298#comment-16636298
 ] 

ASF GitHub Bot commented on DRILL-6473:
---

Agirish commented on issue #1307: DRILL-6473: Update MapR Hive, MapR release 
version and ojai version
URL: https://github.com/apache/drill/pull/1307#issuecomment-426476958
 
 
   @KazydubB, can we get this in soon? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update MapR Hive, MapR release version and ojai version
> ---
>
> Key: DRILL-6473
> URL: https://issues.apache.org/jira/browse/DRILL-6473
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.15.0
>
>
> Update:
> * version of Hive to 2.3.3-mapr for mapr profile;
> *  MapR release version to 6.0.1-mapr;
> *  ojai version to 2.0.1-mapr-1804.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version

2018-10-02 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636294#comment-16636294
 ] 

ASF GitHub Bot commented on DRILL-6473:
---

Agirish commented on a change in pull request #1307: DRILL-6473: Update MapR 
Hive, MapR release version and ojai version
URL: https://github.com/apache/drill/pull/1307#discussion_r222154293
 
 

 ##
 File path: pom.xml
 ##
 @@ -2439,7 +2439,7 @@
   
 mapr
 true
-2.1.1-mapr-1710
+2.3.3-mapr
 1.1.1-mapr-1602-m7-5.2.0
 2.7.0-mapr-1707
 
 Review comment:
   Update Hadoop version to 2.7.0-mapr-1808?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update MapR Hive, MapR release version and ojai version
> ---
>
> Key: DRILL-6473
> URL: https://issues.apache.org/jira/browse/DRILL-6473
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.15.0
>
>
> Update:
> * version of Hive to 2.3.3-mapr for mapr profile;
> *  MapR release version to 6.0.1-mapr;
> *  ojai version to 2.0.1-mapr-1804.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version

2018-10-02 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636296#comment-16636296
 ] 

ASF GitHub Bot commented on DRILL-6473:
---

Agirish commented on a change in pull request #1307: DRILL-6473: Update MapR 
Hive, MapR release version and ojai version
URL: https://github.com/apache/drill/pull/1307#discussion_r222154373
 
 

 ##
 File path: pom.xml
 ##
 @@ -2439,7 +2439,7 @@
   
 mapr
 true
-2.1.1-mapr-1710
+2.3.3-mapr
 1.1.1-mapr-1602-m7-5.2.0
 2.7.0-mapr-1707
 3.4.5-mapr-1710
 
 Review comment:
   Update Zookeeper version to 3.4.11-mapr-1808?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update MapR Hive, MapR release version and ojai version
> ---
>
> Key: DRILL-6473
> URL: https://issues.apache.org/jira/browse/DRILL-6473
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.15.0
>
>
> Update:
> * version of Hive to 2.3.3-mapr for mapr profile;
> *  MapR release version to 6.0.1-mapr;
> *  ojai version to 2.0.1-mapr-1804.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version

2018-10-02 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636293#comment-16636293
 ] 

ASF GitHub Bot commented on DRILL-6473:
---

Agirish commented on a change in pull request #1307: DRILL-6473: Update MapR 
Hive, MapR release version and ojai version
URL: https://github.com/apache/drill/pull/1307#discussion_r222154231
 
 

 ##
 File path: pom.xml
 ##
 @@ -2439,7 +2439,7 @@
   
 mapr
 true
-2.1.1-mapr-1710
+2.3.3-mapr
 
 Review comment:
   Update Hive version to 2.3.3-mapr-1808?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update MapR Hive, MapR release version and ojai version
> ---
>
> Key: DRILL-6473
> URL: https://issues.apache.org/jira/browse/DRILL-6473
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.15.0
>
>
> Update:
> * version of Hive to 2.3.3-mapr for mapr profile;
> *  MapR release version to 6.0.1-mapr;
> *  ojai version to 2.0.1-mapr-1804.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version

2018-10-02 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636290#comment-16636290
 ] 

ASF GitHub Bot commented on DRILL-6473:
---

Agirish commented on a change in pull request #1307: DRILL-6473: Update MapR 
Hive, MapR release version and ojai version
URL: https://github.com/apache/drill/pull/1307#discussion_r222154084
 
 

 ##
 File path: pom.xml
 ##
 @@ -53,8 +53,8 @@
 2.9.5
 2.9.5
 3.4.12
-5.2.1-mapr
-1.1
+6.0.1-mapr
+2.0.1-mapr-1804
 
 Review comment:
   Also update OJAI version to 3.0-mapr-1808?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update MapR Hive, MapR release version and ojai version
> ---
>
> Key: DRILL-6473
> URL: https://issues.apache.org/jira/browse/DRILL-6473
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.15.0
>
>
> Update:
> * version of Hive to 2.3.3-mapr for mapr profile;
> *  MapR release version to 6.0.1-mapr;
> *  ojai version to 2.0.1-mapr-1804.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version

2018-10-02 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636286#comment-16636286
 ] 

ASF GitHub Bot commented on DRILL-6473:
---

Agirish commented on a change in pull request #1307: DRILL-6473: Update MapR 
Hive, MapR release version and ojai version
URL: https://github.com/apache/drill/pull/1307#discussion_r222153910
 
 

 ##
 File path: pom.xml
 ##
 @@ -53,8 +53,8 @@
 2.9.5
 2.9.5
 3.4.12
-5.2.1-mapr
-1.1
+6.0.1-mapr
 
 Review comment:
   Can we update MapR version to the latest 6.1.0-mapr?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update MapR Hive, MapR release version and ojai version
> ---
>
> Key: DRILL-6473
> URL: https://issues.apache.org/jira/browse/DRILL-6473
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.15.0
>
>
> Update:
> * version of Hive to 2.3.3-mapr for mapr profile;
> *  MapR release version to 6.0.1-mapr;
> *  ojai version to 2.0.1-mapr-1804.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Hanumath Rao Maduri (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636073#comment-16636073
 ] 

Hanumath Rao Maduri commented on DRILL-786:
---

IMO, the option 3 is what the short term solution for this problem was. i.e 
Treat the explicit CROSS JOIN and implicit cross join same. Planner should 
generate the plan when the flag is enabled (which is true by default) for 
scalar query cases. Otherwise it should throw an error.

I am fine with the option 3 but I am not sure if changing the default value is 
needed.

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Sets:
> Set#22, type: (DrillRecordRow[*, age, name, studentnum])
> rel#306:Subset#22.LOGICAL.ANY([]).[], be

[jira] [Updated] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive

2018-10-02 Thread Vitalii Diravka (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6759:
---
Labels: ready-to-commit  (was: )

> CSV 'columns' array is incorrectly case sensitive
> -
>
> Key: DRILL-6759
> URL: https://issues.apache.org/jira/browse/DRILL-6759
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Perform the following query on a CSV file without column headers:
> {noformat}
> SELECT columns[0] FROM `yourFile.csv`
> {noformat}
> In Drill, column names are supposed to be case insensitive. So, let's try 
> upper case:
> {noformat}
> SELECT COLUMNS[0] FROM `yourFile.csv`;
> Error: DATA_READ ERROR: Selected column 'COLUMNS' must have name 'columns' or 
> must be plain '*'
> {noformat}
> Expected {{`columns`}} to be case insensitive like other CSV columns and SQL 
> keywords.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive

2018-10-02 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635723#comment-16635723
 ] 

ASF GitHub Bot commented on DRILL-6759:
---

arina-ielchiieva opened a new pull request #1485: DRILL-6759: Make columns 
array name for csv data case insensitive
URL: https://github.com/apache/drill/pull/1485
 
 
   Details in [DRILL-6759](https://issues.apache.org/jira/browse/DRILL-6759).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> CSV 'columns' array is incorrectly case sensitive
> -
>
> Key: DRILL-6759
> URL: https://issues.apache.org/jira/browse/DRILL-6759
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.15.0
>
>
> Perform the following query on a CSV file without column headers:
> {noformat}
> SELECT columns[0] FROM `yourFile.csv`
> {noformat}
> In Drill, column names are supposed to be case insensitive. So, let's try 
> upper case:
> {noformat}
> SELECT COLUMNS[0] FROM `yourFile.csv`;
> Error: DATA_READ ERROR: Selected column 'COLUMNS' must have name 'columns' or 
> must be plain '*'
> {noformat}
> Expected {{`columns`}} to be case insensitive like other CSV columns and SQL 
> keywords.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive

2018-10-02 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6759:

Reviewer: Vitalii Diravka

> CSV 'columns' array is incorrectly case sensitive
> -
>
> Key: DRILL-6759
> URL: https://issues.apache.org/jira/browse/DRILL-6759
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.15.0
>
>
> Perform the following query on a CSV file without column headers:
> {noformat}
> SELECT columns[0] FROM `yourFile.csv`
> {noformat}
> In Drill, column names are supposed to be case insensitive. So, let's try 
> upper case:
> {noformat}
> SELECT COLUMNS[0] FROM `yourFile.csv`;
> Error: DATA_READ ERROR: Selected column 'COLUMNS' must have name 'columns' or 
> must be plain '*'
> {noformat}
> Expected {{`columns`}} to be case insensitive like other CSV columns and SQL 
> keywords.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive

2018-10-02 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6759:

Fix Version/s: 1.15.0

> CSV 'columns' array is incorrectly case sensitive
> -
>
> Key: DRILL-6759
> URL: https://issues.apache.org/jira/browse/DRILL-6759
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.15.0
>
>
> Perform the following query on a CSV file without column headers:
> {noformat}
> SELECT columns[0] FROM `yourFile.csv`
> {noformat}
> In Drill, column names are supposed to be case insensitive. So, let's try 
> upper case:
> {noformat}
> SELECT COLUMNS[0] FROM `yourFile.csv`;
> Error: DATA_READ ERROR: Selected column 'COLUMNS' must have name 'columns' or 
> must be plain '*'
> {noformat}
> Expected {{`columns`}} to be case insensitive like other CSV columns and SQL 
> keywords.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive

2018-10-02 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6759:
---

Assignee: Arina Ielchiieva

> CSV 'columns' array is incorrectly case sensitive
> -
>
> Key: DRILL-6759
> URL: https://issues.apache.org/jira/browse/DRILL-6759
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.15.0
>
>
> Perform the following query on a CSV file without column headers:
> {noformat}
> SELECT columns[0] FROM `yourFile.csv`
> {noformat}
> In Drill, column names are supposed to be case insensitive. So, let's try 
> upper case:
> {noformat}
> SELECT COLUMNS[0] FROM `yourFile.csv`;
> Error: DATA_READ ERROR: Selected column 'COLUMNS' must have name 'columns' or 
> must be plain '*'
> {noformat}
> Expected {{`columns`}} to be case insensitive like other CSV columns and SQL 
> keywords.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive

2018-10-02 Thread Arina Ielchiieva (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635518#comment-16635518
 ] 

Arina Ielchiieva commented on DRILL-6759:
-

Agree, especially when {{select COLUMNS from `yourFile.csv`}} currently 
succeeds. 

> CSV 'columns' array is incorrectly case sensitive
> -
>
> Key: DRILL-6759
> URL: https://issues.apache.org/jira/browse/DRILL-6759
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Priority: Minor
> Fix For: 1.15.0
>
>
> Perform the following query on a CSV file without column headers:
> {noformat}
> SELECT columns[0] FROM `yourFile.csv`
> {noformat}
> In Drill, column names are supposed to be case insensitive. So, let's try 
> upper case:
> {noformat}
> SELECT COLUMNS[0] FROM `yourFile.csv`;
> Error: DATA_READ ERROR: Selected column 'COLUMNS' must have name 'columns' or 
> must be plain '*'
> {noformat}
> Expected {{`columns`}} to be case insensitive like other CSV columns and SQL 
> keywords.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6731) JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter

2018-10-02 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635429#comment-16635429
 ] 

ASF GitHub Bot commented on DRILL-6731:
---

weijietong commented on issue #1459: DRILL-6731: Move the BFs aggregating work 
from the Foreman to the RuntimeFi…
URL: https://github.com/apache/drill/pull/1459#issuecomment-426259756
 
 
   @sohami done, really appreciate the effort you made.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter
> --
>
> Key: DRILL-6731
> URL: https://issues.apache.org/jira/browse/DRILL-6731
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Affects Versions: 1.15.0
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
> Fix For: 1.15.0
>
>
> This PR is to move the BloomFilter aggregating work from the foreman to 
> RuntimeFilter. Though this change, the RuntimeFilter can apply the incoming 
> BF as soon as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Volodymyr Vysotskyi (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635371#comment-16635371
 ] 

Volodymyr Vysotskyi commented on DRILL-786:
---

Since option 1 cannot be implemented, I'm fine with option 3 and changing the 
default option value. Also, I would recommend adding more tests with the cross 
join in different cases and for {{CROSS APPLY}}.

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Sets:
> Set#22, type: (DrillRecordRow[*, age, name, studentnum])
> rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, 
> importance=0.59049001
> rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), 
> rowcount=1000.0, cumulative cost={1000.0 rows, 4000.

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Arina Ielchiieva (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635351#comment-16635351
 ] 

Arina Ielchiieva commented on DRILL-786:


Ideally option 1 is the best approach but since there is no good way to 
implement it I would go with option 3 and even consider changing default option 
value. 
[~amansinha100] / [~vvysotskyi] / [~hanu.ncr] / [~gparai] what do you think?

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Sets:
> Set#22, type: (DrillRecordRow[*, age, name, studentnum])
> rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, 
> importance=0.59049001
> rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), 
> rowcount=1000.0, cumulative

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Igor Guzenko (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635306#comment-16635306
 ] 

Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:53 AM:
--

We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}.  {color:#33}I 
have provided results of the investigation in the prior comments.{color}{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 Also we can consider changing default value of the option to false thus 
queries producing Cartesian product would always succeed.

 


was (Author: ihorhuzenko):
We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}. {color:#33} I 
have provided results of the investigation in the prior comments.{color}{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 Also we can consider changing default value of the option to false thus 
queries producing Cartesian product would always succeed.

 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original re

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Igor Guzenko (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635306#comment-16635306
 ] 

Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:53 AM:
--

We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}. {color:#33} I 
have provided results of the investigation in the prior comments.{color}{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 Also we can consider changing default value of the option to false thus 
queries producing Cartesian product would always succeed.

 


was (Author: ihorhuzenko):
We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 Also we can consider changing default value of the option to false thus 
queries producing Cartesian product would always succeed.

 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Igor Guzenko (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635306#comment-16635306
 ] 

Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:50 AM:
--

We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 Also we can consider changing default value of the option to false thus 
queries producing Cartesian product would always succeed.

 


was (Author: ihorhuzenko):
We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 

 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulativ

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Igor Guzenko (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635306#comment-16635306
 ] 

Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:45 AM:
--

We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 

 


was (Author: ihorhuzenko):
We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, {color:#d04437}it's 
really hard to implement this approach, it requires a lot of time and includes 
a lot of changes to Apache Calcite.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 

 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> Dri

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Igor Guzenko (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635306#comment-16635306
 ] 

Igor Guzenko commented on DRILL-786:


We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, {color:#d04437}it's 
really hard to implement this approach, it requires a lot of time and includes 
a lot of changes to Apache Calcite.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 

 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 225

[jira] [Commented] (DRILL-6465) Transitive closure is not working in Drill for Join with multiple local conditions

2018-10-02 Thread Denys Ordynskiy (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635185#comment-16635185
 ] 

Denys Ordynskiy commented on DRILL-6465:


CALCITE-2275 is in Drill Calcite now. I will retest this bug and write the 
results later.

> Transitive closure is not working in Drill for Join with multiple local 
> conditions
> --
>
> Key: DRILL-6465
> URL: https://issues.apache.org/jira/browse/DRILL-6465
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Denys Ordynskiy
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.15.0
>
> Attachments: drill.zip
>
>
> For several SQL operators Transitive closure is not working during Partition 
> Pruning and Filter Pushdown for the left table in Join.
>  If I use several local conditions, then Drill scans full left table in Join.
>  But if we move additional conditions to the WHERE statement, then Transitive 
> closure works fine for all joined tables
> *Query BETWEEN:*
> {code:java}
> EXPLAIN PLAN FOR
> SELECT * FROM hive.`h_tab1` t1
> JOIN hive.`h_tab2` t2
> ON t1.y=t2.y
> AND t2.y BETWEEN 1987 AND 1988;
> {code}
> *Expected result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
> columns=[`**`], numPartitions=8, partitions= [Partition(values:[1987, 5, 1]), 
> Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
> Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), 
> Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), 
> Partition(values:[1988, 12, 2])]{code}
> *Actual result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
> columns=[`**`], numPartitions=16, partitions= [Partition(values:[1987, 5, 
> 1]), Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
> Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), 
> Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), 
> Partition(values:[1988, 12, 2]), Partition(values:[1990, 4, 1]), 
> Partition(values:[1990, 4, 2]), Partition(values:[1990, 5, 1]), 
> Partition(values:[1990, 5, 2]), Partition(values:[1991, 3, 1]), 
> Partition(values:[1991, 3, 2]), Partition(values:[1991, 3, 3]), 
> Partition(values:[1991, 3, 4])
> ]
> {code}
> *There is the same Transitive closure behavior for this logical operators:*
>  * NOT IN
>  * LIKE
>  * NOT LIKE
> Also Transitive closure is not working during Partition Pruning and Filter 
> Pushdown for this comparison operators:
> *Query <*
> {code:java}
> EXPLAIN PLAN FOR
> SELECT * FROM hive.`h_tab1` t1
> JOIN hive.`h_tab2` t2
> ON t1.y=t2.y
> AND t2.y < 1988;
> {code}
> *Expected result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
> columns=[`**`], numPartitions=4, partitions= [Partition(values:[1987, 5, 1]), 
> Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
> Partition(values:[1987, 7, 2])]{code}
> *Actual result:*
> {code:java}
> 00-00 Screen
> 00-01 Project(itm=[$0], y=[$1], m=[$2], category=[$3], itm0=[$4], 
> category0=[$5], y0=[$6], m0=[$7])
> 00-02 Project(itm=[$0], y=[$1], m=[$2], category=[$3], itm0=[$4], 
> category0=[$5], y0=[$6], m0=[$7])
> 00-03 HashJoin(condition=[=($1, $6)], joinType=[inner])
> 00-05 Scan(groupscan=[HiveScan [table=Table(dbName:default, 
> tableName:h_tab1), columns=[`**`], numPartitions=16, partitions= 
> [Partition(values:[1987, 5, 1]), Partition(values:[1987, 5, 2]), 
> Partition(values:[1987, 7, 1]), Partition(values:[1987, 7, 2]), 
> Partition(values:[1988, 11, 1]), Partition(values:[1988, 11, 2]), 
> Partition(values:[1988, 12, 1]), Partition(values:[1988, 12, 2]), 
> Partition(values:[1990, 4, 1]), Partition(values:[1990, 4, 2]), 
> Partition(values:[1990, 5, 1]), Partition(values:[1990, 5, 2]), 
> Partition(values:[1991, 3, 1]), Partition(values:[1991, 3, 2]), 
> Partition(values:[1991, 3, 3]), Partition(values:[1991, 3, 4])], 
> inputDirectories=[maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/1, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/2, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/3, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/4, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/5, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/6, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/7, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/8, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/9, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/10, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/11, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/12, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/13,

[jira] [Commented] (DRILL-6731) JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter

[jira] [Created] (DRILL-6767) Simplify transfer of information from the planner to the operators

[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version

[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version

[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version

[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version

[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version

[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

[jira] [Updated] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive

[jira] [Commented] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive

[jira] [Updated] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive

[jira] [Updated] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive

[jira] [Assigned] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive

[jira] [Commented] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive

[jira] [Commented] (DRILL-6731) JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

[jira] [Commented] (DRILL-6465) Transitive closure is not working in Drill for Join with multiple local conditions

24 matches

Site Navigation

Mail list logo

Footer information