[jira] [Commented] (DRILL-6731) JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter
[ https://issues.apache.org/jira/browse/DRILL-6731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636496#comment-16636496 ] ASF GitHub Bot commented on DRILL-6731: --- sohami commented on issue #1459: DRILL-6731: Move the BFs aggregating work from the Foreman to the RuntimeFi… URL: https://github.com/apache/drill/pull/1459#issuecomment-426521845 @weijietong - Thanks! I ran the pre-commit tests and few JPPD tests are failing due to some issues related to locking mechanism in RuntimeFilterSink. I will try to fix that and update the PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter > -- > > Key: DRILL-6731 > URL: https://issues.apache.org/jira/browse/DRILL-6731 > Project: Apache Drill > Issue Type: Improvement > Components: Server >Affects Versions: 1.15.0 >Reporter: weijie.tong >Assignee: weijie.tong >Priority: Major > Fix For: 1.15.0 > > > This PR is to move the BloomFilter aggregating work from the foreman to > RuntimeFilter. Though this change, the RuntimeFilter can apply the incoming > BF as soon as possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6767) Simplify transfer of information from the planner to the operators
Boaz Ben-Zvi created DRILL-6767: --- Summary: Simplify transfer of information from the planner to the operators Key: DRILL-6767 URL: https://issues.apache.org/jira/browse/DRILL-6767 Project: Apache Drill Issue Type: Improvement Components: Execution - Relational Operators, Query Planning & Optimization Affects Versions: 1.14.0 Reporter: Boaz Ben-Zvi Assignee: Boaz Ben-Zvi Fix For: 1.15.0 Currently little specific information known to the planner is passed to the operators. For example, see the `joinType` parameter passed to the Join operators (specifying whether this is a LEFT, RIGHT, INNER of FULL join). The relevant code passes this information explicitly via the constructors' signature (e.g., see HashJoinPOP, AbstractJoinPop, etc), and uses specific fields for this information, and affects all the test code using it, etc. In the near future many more such "pieces of information" will possibly be added to Drill, including: (1) Is this a Semi (or Anti-Semi) join. (2) `joinControl` (3) `isRowKeyJoin` (4) `isBroadcastJoin` (5) Which join columns are not needed (DRILL-6758) (6) Is this operator positioned between Lateral and UnNest. (7) For Hash-Agg: Which phase (already implemented). Each addition of such information would require a significant code change, and add some code clutter. *Suggestion*: Instead pass a single object containing all the needed planner information. So the next time another field is added, only that object needs to be changed. (Ideally the whole plan could be passed, and then each operator could poke and pick its needed fields) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version
[ https://issues.apache.org/jira/browse/DRILL-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636298#comment-16636298 ] ASF GitHub Bot commented on DRILL-6473: --- Agirish commented on issue #1307: DRILL-6473: Update MapR Hive, MapR release version and ojai version URL: https://github.com/apache/drill/pull/1307#issuecomment-426476958 @KazydubB, can we get this in soon? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update MapR Hive, MapR release version and ojai version > --- > > Key: DRILL-6473 > URL: https://issues.apache.org/jira/browse/DRILL-6473 > Project: Apache Drill > Issue Type: Task >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.15.0 > > > Update: > * version of Hive to 2.3.3-mapr for mapr profile; > * MapR release version to 6.0.1-mapr; > * ojai version to 2.0.1-mapr-1804. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version
[ https://issues.apache.org/jira/browse/DRILL-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636294#comment-16636294 ] ASF GitHub Bot commented on DRILL-6473: --- Agirish commented on a change in pull request #1307: DRILL-6473: Update MapR Hive, MapR release version and ojai version URL: https://github.com/apache/drill/pull/1307#discussion_r222154293 ## File path: pom.xml ## @@ -2439,7 +2439,7 @@ mapr true -2.1.1-mapr-1710 +2.3.3-mapr 1.1.1-mapr-1602-m7-5.2.0 2.7.0-mapr-1707 Review comment: Update Hadoop version to 2.7.0-mapr-1808? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update MapR Hive, MapR release version and ojai version > --- > > Key: DRILL-6473 > URL: https://issues.apache.org/jira/browse/DRILL-6473 > Project: Apache Drill > Issue Type: Task >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.15.0 > > > Update: > * version of Hive to 2.3.3-mapr for mapr profile; > * MapR release version to 6.0.1-mapr; > * ojai version to 2.0.1-mapr-1804. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version
[ https://issues.apache.org/jira/browse/DRILL-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636296#comment-16636296 ] ASF GitHub Bot commented on DRILL-6473: --- Agirish commented on a change in pull request #1307: DRILL-6473: Update MapR Hive, MapR release version and ojai version URL: https://github.com/apache/drill/pull/1307#discussion_r222154373 ## File path: pom.xml ## @@ -2439,7 +2439,7 @@ mapr true -2.1.1-mapr-1710 +2.3.3-mapr 1.1.1-mapr-1602-m7-5.2.0 2.7.0-mapr-1707 3.4.5-mapr-1710 Review comment: Update Zookeeper version to 3.4.11-mapr-1808? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update MapR Hive, MapR release version and ojai version > --- > > Key: DRILL-6473 > URL: https://issues.apache.org/jira/browse/DRILL-6473 > Project: Apache Drill > Issue Type: Task >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.15.0 > > > Update: > * version of Hive to 2.3.3-mapr for mapr profile; > * MapR release version to 6.0.1-mapr; > * ojai version to 2.0.1-mapr-1804. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version
[ https://issues.apache.org/jira/browse/DRILL-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636293#comment-16636293 ] ASF GitHub Bot commented on DRILL-6473: --- Agirish commented on a change in pull request #1307: DRILL-6473: Update MapR Hive, MapR release version and ojai version URL: https://github.com/apache/drill/pull/1307#discussion_r222154231 ## File path: pom.xml ## @@ -2439,7 +2439,7 @@ mapr true -2.1.1-mapr-1710 +2.3.3-mapr Review comment: Update Hive version to 2.3.3-mapr-1808? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update MapR Hive, MapR release version and ojai version > --- > > Key: DRILL-6473 > URL: https://issues.apache.org/jira/browse/DRILL-6473 > Project: Apache Drill > Issue Type: Task >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.15.0 > > > Update: > * version of Hive to 2.3.3-mapr for mapr profile; > * MapR release version to 6.0.1-mapr; > * ojai version to 2.0.1-mapr-1804. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version
[ https://issues.apache.org/jira/browse/DRILL-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636290#comment-16636290 ] ASF GitHub Bot commented on DRILL-6473: --- Agirish commented on a change in pull request #1307: DRILL-6473: Update MapR Hive, MapR release version and ojai version URL: https://github.com/apache/drill/pull/1307#discussion_r222154084 ## File path: pom.xml ## @@ -53,8 +53,8 @@ 2.9.5 2.9.5 3.4.12 -5.2.1-mapr -1.1 +6.0.1-mapr +2.0.1-mapr-1804 Review comment: Also update OJAI version to 3.0-mapr-1808? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update MapR Hive, MapR release version and ojai version > --- > > Key: DRILL-6473 > URL: https://issues.apache.org/jira/browse/DRILL-6473 > Project: Apache Drill > Issue Type: Task >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.15.0 > > > Update: > * version of Hive to 2.3.3-mapr for mapr profile; > * MapR release version to 6.0.1-mapr; > * ojai version to 2.0.1-mapr-1804. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6473) Update MapR Hive, MapR release version and ojai version
[ https://issues.apache.org/jira/browse/DRILL-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636286#comment-16636286 ] ASF GitHub Bot commented on DRILL-6473: --- Agirish commented on a change in pull request #1307: DRILL-6473: Update MapR Hive, MapR release version and ojai version URL: https://github.com/apache/drill/pull/1307#discussion_r222153910 ## File path: pom.xml ## @@ -53,8 +53,8 @@ 2.9.5 2.9.5 3.4.12 -5.2.1-mapr -1.1 +6.0.1-mapr Review comment: Can we update MapR version to the latest 6.1.0-mapr? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update MapR Hive, MapR release version and ojai version > --- > > Key: DRILL-6473 > URL: https://issues.apache.org/jira/browse/DRILL-6473 > Project: Apache Drill > Issue Type: Task >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.15.0 > > > Update: > * version of Hive to 2.3.3-mapr for mapr profile; > * MapR release version to 6.0.1-mapr; > * ojai version to 2.0.1-mapr-1804. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636073#comment-16636073 ] Hanumath Rao Maduri commented on DRILL-786: --- IMO, the option 3 is what the short term solution for this problem was. i.e Treat the explicit CROSS JOIN and implicit cross join same. Planner should generate the plan when the flag is enabled (which is true by default) for scalar query cases. Otherwise it should throw an error. I am fine with the option 3 but I am not sure if changing the default value is needed. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning & Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Stack trace: > org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node > [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; > planner state: > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Sets: > Set#22, type: (DrillRecordRow[*, age, name, studentnum]) > rel#306:Subset#22.LOGICAL.ANY([]).[], be
[jira] [Updated] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive
[ https://issues.apache.org/jira/browse/DRILL-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka updated DRILL-6759: --- Labels: ready-to-commit (was: ) > CSV 'columns' array is incorrectly case sensitive > - > > Key: DRILL-6759 > URL: https://issues.apache.org/jira/browse/DRILL-6759 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Paul Rogers >Assignee: Arina Ielchiieva >Priority: Minor > Labels: ready-to-commit > Fix For: 1.15.0 > > > Perform the following query on a CSV file without column headers: > {noformat} > SELECT columns[0] FROM `yourFile.csv` > {noformat} > In Drill, column names are supposed to be case insensitive. So, let's try > upper case: > {noformat} > SELECT COLUMNS[0] FROM `yourFile.csv`; > Error: DATA_READ ERROR: Selected column 'COLUMNS' must have name 'columns' or > must be plain '*' > {noformat} > Expected {{`columns`}} to be case insensitive like other CSV columns and SQL > keywords. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive
[ https://issues.apache.org/jira/browse/DRILL-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635723#comment-16635723 ] ASF GitHub Bot commented on DRILL-6759: --- arina-ielchiieva opened a new pull request #1485: DRILL-6759: Make columns array name for csv data case insensitive URL: https://github.com/apache/drill/pull/1485 Details in [DRILL-6759](https://issues.apache.org/jira/browse/DRILL-6759). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > CSV 'columns' array is incorrectly case sensitive > - > > Key: DRILL-6759 > URL: https://issues.apache.org/jira/browse/DRILL-6759 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Paul Rogers >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.15.0 > > > Perform the following query on a CSV file without column headers: > {noformat} > SELECT columns[0] FROM `yourFile.csv` > {noformat} > In Drill, column names are supposed to be case insensitive. So, let's try > upper case: > {noformat} > SELECT COLUMNS[0] FROM `yourFile.csv`; > Error: DATA_READ ERROR: Selected column 'COLUMNS' must have name 'columns' or > must be plain '*' > {noformat} > Expected {{`columns`}} to be case insensitive like other CSV columns and SQL > keywords. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive
[ https://issues.apache.org/jira/browse/DRILL-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-6759: Reviewer: Vitalii Diravka > CSV 'columns' array is incorrectly case sensitive > - > > Key: DRILL-6759 > URL: https://issues.apache.org/jira/browse/DRILL-6759 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Paul Rogers >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.15.0 > > > Perform the following query on a CSV file without column headers: > {noformat} > SELECT columns[0] FROM `yourFile.csv` > {noformat} > In Drill, column names are supposed to be case insensitive. So, let's try > upper case: > {noformat} > SELECT COLUMNS[0] FROM `yourFile.csv`; > Error: DATA_READ ERROR: Selected column 'COLUMNS' must have name 'columns' or > must be plain '*' > {noformat} > Expected {{`columns`}} to be case insensitive like other CSV columns and SQL > keywords. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive
[ https://issues.apache.org/jira/browse/DRILL-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-6759: Fix Version/s: 1.15.0 > CSV 'columns' array is incorrectly case sensitive > - > > Key: DRILL-6759 > URL: https://issues.apache.org/jira/browse/DRILL-6759 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Paul Rogers >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.15.0 > > > Perform the following query on a CSV file without column headers: > {noformat} > SELECT columns[0] FROM `yourFile.csv` > {noformat} > In Drill, column names are supposed to be case insensitive. So, let's try > upper case: > {noformat} > SELECT COLUMNS[0] FROM `yourFile.csv`; > Error: DATA_READ ERROR: Selected column 'COLUMNS' must have name 'columns' or > must be plain '*' > {noformat} > Expected {{`columns`}} to be case insensitive like other CSV columns and SQL > keywords. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive
[ https://issues.apache.org/jira/browse/DRILL-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reassigned DRILL-6759: --- Assignee: Arina Ielchiieva > CSV 'columns' array is incorrectly case sensitive > - > > Key: DRILL-6759 > URL: https://issues.apache.org/jira/browse/DRILL-6759 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Paul Rogers >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.15.0 > > > Perform the following query on a CSV file without column headers: > {noformat} > SELECT columns[0] FROM `yourFile.csv` > {noformat} > In Drill, column names are supposed to be case insensitive. So, let's try > upper case: > {noformat} > SELECT COLUMNS[0] FROM `yourFile.csv`; > Error: DATA_READ ERROR: Selected column 'COLUMNS' must have name 'columns' or > must be plain '*' > {noformat} > Expected {{`columns`}} to be case insensitive like other CSV columns and SQL > keywords. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6759) CSV 'columns' array is incorrectly case sensitive
[ https://issues.apache.org/jira/browse/DRILL-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635518#comment-16635518 ] Arina Ielchiieva commented on DRILL-6759: - Agree, especially when {{select COLUMNS from `yourFile.csv`}} currently succeeds. > CSV 'columns' array is incorrectly case sensitive > - > > Key: DRILL-6759 > URL: https://issues.apache.org/jira/browse/DRILL-6759 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Paul Rogers >Priority: Minor > Fix For: 1.15.0 > > > Perform the following query on a CSV file without column headers: > {noformat} > SELECT columns[0] FROM `yourFile.csv` > {noformat} > In Drill, column names are supposed to be case insensitive. So, let's try > upper case: > {noformat} > SELECT COLUMNS[0] FROM `yourFile.csv`; > Error: DATA_READ ERROR: Selected column 'COLUMNS' must have name 'columns' or > must be plain '*' > {noformat} > Expected {{`columns`}} to be case insensitive like other CSV columns and SQL > keywords. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6731) JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter
[ https://issues.apache.org/jira/browse/DRILL-6731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635429#comment-16635429 ] ASF GitHub Bot commented on DRILL-6731: --- weijietong commented on issue #1459: DRILL-6731: Move the BFs aggregating work from the Foreman to the RuntimeFi… URL: https://github.com/apache/drill/pull/1459#issuecomment-426259756 @sohami done, really appreciate the effort you made. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter > -- > > Key: DRILL-6731 > URL: https://issues.apache.org/jira/browse/DRILL-6731 > Project: Apache Drill > Issue Type: Improvement > Components: Server >Affects Versions: 1.15.0 >Reporter: weijie.tong >Assignee: weijie.tong >Priority: Major > Fix For: 1.15.0 > > > This PR is to move the BloomFilter aggregating work from the foreman to > RuntimeFilter. Though this change, the RuntimeFilter can apply the incoming > BF as soon as possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635371#comment-16635371 ] Volodymyr Vysotskyi commented on DRILL-786: --- Since option 1 cannot be implemented, I'm fine with option 3 and changing the default option value. Also, I would recommend adding more tests with the cross join in different cases and for {{CROSS APPLY}}. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning & Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Stack trace: > org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node > [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; > planner state: > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Sets: > Set#22, type: (DrillRecordRow[*, age, name, studentnum]) > rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, > importance=0.59049001 > rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), > rowcount=1000.0, cumulative cost={1000.0 rows, 4000.
[jira] [Commented] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635351#comment-16635351 ] Arina Ielchiieva commented on DRILL-786: Ideally option 1 is the best approach but since there is no good way to implement it I would go with option 3 and even consider changing default option value. [~amansinha100] / [~vvysotskyi] / [~hanu.ncr] / [~gparai] what do you think? > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning & Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Stack trace: > org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node > [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; > planner state: > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Sets: > Set#22, type: (DrillRecordRow[*, age, name, studentnum]) > rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, > importance=0.59049001 > rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), > rowcount=1000.0, cumulative
[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635306#comment-16635306 ] Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:53 AM: -- We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}. {color:#33}I have provided results of the investigation in the prior comments.{color}{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. Also we can consider changing default value of the option to false thus queries producing Cartesian product would always succeed. was (Author: ihorhuzenko): We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}. {color:#33} I have provided results of the investigation in the prior comments.{color}{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. Also we can consider changing default value of the option to false thus queries producing Cartesian product would always succeed. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning & Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original re
[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635306#comment-16635306 ] Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:53 AM: -- We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}. {color:#33} I have provided results of the investigation in the prior comments.{color}{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. Also we can consider changing default value of the option to false thus queries producing Cartesian product would always succeed. was (Author: ihorhuzenko): We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. Also we can consider changing default value of the option to false thus queries producing Cartesian product would always succeed. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning & Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[
[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635306#comment-16635306 ] Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:50 AM: -- We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. Also we can consider changing default value of the option to false thus queries producing Cartesian product would always succeed. was (Author: ihorhuzenko): We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning & Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulativ
[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635306#comment-16635306 ] Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:45 AM: -- We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. was (Author: ihorhuzenko): We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, {color:#d04437}it's really hard to implement this approach, it requires a lot of time and includes a lot of changes to Apache Calcite.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning & Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > Dri
[jira] [Commented] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635306#comment-16635306 ] Igor Guzenko commented on DRILL-786: We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, {color:#d04437}it's really hard to implement this approach, it requires a lot of time and includes a lot of changes to Apache Calcite.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning & Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Stack trace: > org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node > [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; > planner state: > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 225
[jira] [Commented] (DRILL-6465) Transitive closure is not working in Drill for Join with multiple local conditions
[ https://issues.apache.org/jira/browse/DRILL-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635185#comment-16635185 ] Denys Ordynskiy commented on DRILL-6465: CALCITE-2275 is in Drill Calcite now. I will retest this bug and write the results later. > Transitive closure is not working in Drill for Join with multiple local > conditions > -- > > Key: DRILL-6465 > URL: https://issues.apache.org/jira/browse/DRILL-6465 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Denys Ordynskiy >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.15.0 > > Attachments: drill.zip > > > For several SQL operators Transitive closure is not working during Partition > Pruning and Filter Pushdown for the left table in Join. > If I use several local conditions, then Drill scans full left table in Join. > But if we move additional conditions to the WHERE statement, then Transitive > closure works fine for all joined tables > *Query BETWEEN:* > {code:java} > EXPLAIN PLAN FOR > SELECT * FROM hive.`h_tab1` t1 > JOIN hive.`h_tab2` t2 > ON t1.y=t2.y > AND t2.y BETWEEN 1987 AND 1988; > {code} > *Expected result:* > {code:java} > Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), > columns=[`**`], numPartitions=8, partitions= [Partition(values:[1987, 5, 1]), > Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), > Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), > Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), > Partition(values:[1988, 12, 2])]{code} > *Actual result:* > {code:java} > Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), > columns=[`**`], numPartitions=16, partitions= [Partition(values:[1987, 5, > 1]), Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), > Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), > Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), > Partition(values:[1988, 12, 2]), Partition(values:[1990, 4, 1]), > Partition(values:[1990, 4, 2]), Partition(values:[1990, 5, 1]), > Partition(values:[1990, 5, 2]), Partition(values:[1991, 3, 1]), > Partition(values:[1991, 3, 2]), Partition(values:[1991, 3, 3]), > Partition(values:[1991, 3, 4]) > ] > {code} > *There is the same Transitive closure behavior for this logical operators:* > * NOT IN > * LIKE > * NOT LIKE > Also Transitive closure is not working during Partition Pruning and Filter > Pushdown for this comparison operators: > *Query <* > {code:java} > EXPLAIN PLAN FOR > SELECT * FROM hive.`h_tab1` t1 > JOIN hive.`h_tab2` t2 > ON t1.y=t2.y > AND t2.y < 1988; > {code} > *Expected result:* > {code:java} > Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), > columns=[`**`], numPartitions=4, partitions= [Partition(values:[1987, 5, 1]), > Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), > Partition(values:[1987, 7, 2])]{code} > *Actual result:* > {code:java} > 00-00 Screen > 00-01 Project(itm=[$0], y=[$1], m=[$2], category=[$3], itm0=[$4], > category0=[$5], y0=[$6], m0=[$7]) > 00-02 Project(itm=[$0], y=[$1], m=[$2], category=[$3], itm0=[$4], > category0=[$5], y0=[$6], m0=[$7]) > 00-03 HashJoin(condition=[=($1, $6)], joinType=[inner]) > 00-05 Scan(groupscan=[HiveScan [table=Table(dbName:default, > tableName:h_tab1), columns=[`**`], numPartitions=16, partitions= > [Partition(values:[1987, 5, 1]), Partition(values:[1987, 5, 2]), > Partition(values:[1987, 7, 1]), Partition(values:[1987, 7, 2]), > Partition(values:[1988, 11, 1]), Partition(values:[1988, 11, 2]), > Partition(values:[1988, 12, 1]), Partition(values:[1988, 12, 2]), > Partition(values:[1990, 4, 1]), Partition(values:[1990, 4, 2]), > Partition(values:[1990, 5, 1]), Partition(values:[1990, 5, 2]), > Partition(values:[1991, 3, 1]), Partition(values:[1991, 3, 2]), > Partition(values:[1991, 3, 3]), Partition(values:[1991, 3, 4])], > inputDirectories=[maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/1, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/2, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/3, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/4, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/5, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/6, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/7, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/8, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/9, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/10, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/11, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/12, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/13,