[jira] [Created] (DRILL-6708) Flatten operator executes twice on subquery resulting in cartesian of flatten columns, when final query has 2 columns using the flatten column in original query
Andries Engelbrecht created DRILL-6708: -- Summary: Flatten operator executes twice on subquery resulting in cartesian of flatten columns, when final query has 2 columns using the flatten column in original query Key: DRILL-6708 URL: https://issues.apache.org/jira/browse/DRILL-6708 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 1.13.0 Reporter: Andries Engelbrecht Attachments: campaignclicks_50.json The following query with subquery and referencing the flatten column twice in final result, ends up with 1137195 rows vs the expected 140913 rows. {code:java} SELECT ( `Third`.`cust_id`), ( `Third`.`device`), ( `Third`.`prod_id`) ( `Third`.`prod_id`) AS `prod_id2` FROM ( SELECT ( `Second`.`cust_id`), ( `Second`.`device`), ( `Second`.`prod_id`) FROM ( SELECT ( `First`.`cust_id`), ( `First`.`device`), ( `First`.`prod_id`) FROM `dfs.views`.`clicks_campaign_vw` AS `First` ) AS `Second` ) AS `Third` {code} This executed against Drill View listed below {code:java} CREATE or REPLACE VIEW dfs.views.clicks_campaign_vw AS SELECT CAST(`t`.`trans_id` as BIGINT) as trans_id, CAST(`t`.`date` AS DATE) AS `thedate`, CAST(`t`.`user_info`['cust_id'] AS BIGINT) AS `cust_id`, CAST(`t`.`user_info`['device'] AS VARCHAR(20)) AS `device`, CAST(`t`.`user_info`['state'] AS VARCHAR(2)) AS `custstate`, CAST(FLATTEN(`t`.`trans_info`['prod_id']) AS BIGINT) AS `prod_id`, CAST(`t`.`trans_info`['purch_flag'] AS VARCHAR(6)) AS `purch_flag` FROM `dfs`.`clicks`.`campaignclicks_50.json` AS `t` WHERE `t`.`trans_info`['prod_id'][0] IS NOT NULL;{code} Below is the query plan showing FLATTEN invoked twice {code:java} 00-00 Screen : rowType = RecordType(BIGINT cust_id, VARCHAR(20) device, BIGINT prod_id, BIGINT prod_id2): rowcount = 7089.3, cumulative cost = {73965.03 rows, 337056.826 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, id = 901702 00-01 ComplexToJson : rowType = RecordType(BIGINT cust_id, VARCHAR(20) device, BIGINT prod_id, BIGINT prod_id2): rowcount = 7089.3, cumulative cost = {73256.1 rows, 336347.897 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, id = 901701 00-02 Project(cust_id=[CAST($0):BIGINT], device=[CAST($1):VARCHAR(20) CHARACTER SET "UTF-16LE" COLLATE "UTF-16LE$en_US$primary"], prod_id=[CAST($3):BIGINT], prod_id2=[CAST($4):BIGINT]) : rowType = RecordType(BIGINT cust_id, VARCHAR(20) device, BIGINT prod_id, BIGINT prod_id2): rowcount = 7089.3, cumulative cost = {66166.8 rows, 329258.6 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, id = 901700 00-03 Flatten(flattenField=[$4]) : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, ANY EXPR$2, ANY EXPR$4, ANY EXPR$5): rowcount = 7089.3, cumulative cost = {59077.501 rows, 215829.8 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, id = 901699 00-04 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$4=[$3], EXPR$5=[$2]) : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, ANY EXPR$2, ANY EXPR$4, ANY EXPR$5): rowcount = 7089.3, cumulative cost = {51988.2004 rows, 208740.5 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, id = 901698 00-05 Flatten(flattenField=[$3]) : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, ANY EXPR$2, ANY EXPR$4): rowcount = 7089.3, cumulative cost = {44898.9 rows, 173294.0 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, id = 901697 00-06 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$4=[$2]) : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, ANY EXPR$2, ANY EXPR$4): rowcount = 7089.3, cumulative cost = {37809.6 rows, 166204.7 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, id = 901696 00-07 SelectionVectorRemover : rowType = RecordType(ANY ITEM, ANY ITEM1, ANY ITEM2, ANY ITEM3): rowcount = 7089.3, cumulative cost = {30720.3 rows, 137847.5 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, id = 901695 00-08 Filter(condition=[IS NOT NULL($3)]) : rowType = RecordType(ANY ITEM, ANY ITEM1, ANY ITEM2, ANY ITEM3): rowcount = 7089.3, cumulative cost = {23631.0 rows, 130758.2 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, id = 901694 00-09 Project(ITEM=[ITEM($0, 'cust_id')], ITEM1=[ITEM($0, 'device')], ITEM2=[ITEM($1, 'prod_id')], ITEM3=[ITEM(ITEM($1, 'prod_id'), 0)]) : rowType = RecordType(ANY ITEM, ANY ITEM1, ANY ITEM2, ANY ITEM3): rowcount = 7877.0, cumulative cost = {15754.0 rows, 70893.0 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, id = 901693 00-10 Scan(table=[[dfs, clicks, campaignclicks_50.json]], groupscan=[EasyGroupScan [selectionRoot=maprfs:/data/nested/clicks/campaignclicks_50.json, numFiles=1, columns=[`user_info`.`cust_id`, `user_info`.`device`, `trans_info`.`prod_id`, `trans_info`.`prod_id`[0]], files=[maprfs:///data/nested/clicks/campaignclicks_50.json]]]) : rowType = RecordType(ANY user_info, ANY trans_info): rowcount = 7877.0, cumulative cost = {7877.0 rows, 15754.0 cpu, 8067011.0 io, 0.0 netw
[jira] [Commented] (DRILL-5617) Spill file name collisions when spill file is on a shared file system
[ https://issues.apache.org/jira/browse/DRILL-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068494#comment-16068494 ] Andries Engelbrecht commented on DRILL-5617: Perhaps proper configuration will avoid this issue. On most Hadoop Distros with HDFS there is local temp location for mapreduce that should be leveraged for Drill spill. Placing spill data on general HDFS will cause replication that can slow things down. As example on MapR there are local volumes with replication 1 that can be used, in this case it won't overlap between nodes. See this link for configuration. https://community.mapr.com/community/exchange/blog/2017/05/03/top-5-items-to-configure-with-drill-on-mapr-5x Similar best practices should be leveraged for other deployments. > Spill file name collisions when spill file is on a shared file system > - > > Key: DRILL-5617 > URL: https://issues.apache.org/jira/browse/DRILL-5617 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.11.0 >Reporter: Chun Chang >Assignee: Paul Rogers > > Spill location can be configured to be written on hdfs such as: > hashagg: { > # The partitions divide the work inside the hashagg, to ease > # handling spilling. This initial figure is tuned down when > # memory is limited. > # Setting this option to 1 disables spilling ! > num_partitions: 32, > spill: { > # The 2 options below override the common ones > # they should be deprecated in the future > directories : [ "/tmp/drill/spill" ], > fs : "maprfs:///" > } > } > However, this could cause spill filename conflict since name convention does > not contain node name. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-2839) ODBC Driver Doc to point to latest available Driver, also provide compatibility matrix for Drill and ODBC version
Andries Engelbrecht created DRILL-2839: -- Summary: ODBC Driver Doc to point to latest available Driver, also provide compatibility matrix for Drill and ODBC version Key: DRILL-2839 URL: https://issues.apache.org/jira/browse/DRILL-2839 Project: Apache Drill Issue Type: Improvement Components: Documentation Reporter: Andries Engelbrecht Assignee: Bridget Bevens On the ODBC documentation page the links to the ODBC drivers are hard linked to the .0618 version of the drivers. http://drill.apache.org/docs/step-1-install-the-mapr-drill-odbc-driver-on-windows/ It may be better to point to the latest drivers available in the MapR packages directory here. http://package.mapr.com/tools/MapR-ODBC/MapR_Drill/MapRDrill_odbc/ Challenge is to match the version of the ODBC driver that match the Drill version. It may be good to add a compatibility matrix on the doc webpage to identify the appropriate ODBC driver and Drill version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2140) RPC Error querying JSON with empty nested maps
[ https://issues.apache.org/jira/browse/DRILL-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509191#comment-14509191 ] Andries Engelbrecht commented on DRILL-2140: Sudheesh The issue seems to be resolved when working with 0.8 . Have not experienced it on the same dataset. Can mark as resolved. > RPC Error querying JSON with empty nested maps > -- > > Key: DRILL-2140 > URL: https://issues.apache.org/jira/browse/DRILL-2140 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 0.7.0 > Environment: Centos 4 node MapR cluster >Reporter: Andries Engelbrecht >Assignee: Sudheesh Katkam > Fix For: 1.0.0 > > Attachments: drillbit.log > > > When querying large number of documents in multiple directories with multiple > JSON files in each, and some documents have no top level map that is used for > a predicate, Drill produces a RPC error in the log. > Query > {code} > > select t.retweeted_status.`user`.name as name, > > count(t.retweeted_status.favorited) as rt_count from `./nfl` t where > > t.retweeted_status.`user`.name is not null group by > > t.retweeted_status.`user`.name order by count(t.retweeted_status.favorited) > > desc limit 10; > Query failed: Query failed: Failure while running fragment., index: 0, > length: 1 (expected: range(0, 0)) [ b96e3bfa-74c9-4b78-886b-9a2c3fc4ea9b on > se-node13.se.lab:31010 ] > [ b96e3bfa-74c9-4b78-886b-9a2c3fc4ea9b on se-node13.se.lab:31010 ] > {code} > Drillbit log attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2141) Data type error in group by and order by for JSON
[ https://issues.apache.org/jira/browse/DRILL-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522386#comment-14522386 ] Andries Engelbrecht commented on DRILL-2141: Please mark as resolved as of 0.8 Have not experienced the issue with 0.8 on the same data set. > Data type error in group by and order by for JSON > - > > Key: DRILL-2141 > URL: https://issues.apache.org/jira/browse/DRILL-2141 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 0.7.0 >Reporter: Andries Engelbrecht >Assignee: Daniel Barclay (Drill) > Fix For: 1.0.0 > > Attachments: FlumeData.1422748800086, drillbit.log, new_drillbit.log > > > When doing group by and oder by on complex nested JSON getting Data type > errors. > Query: > select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) > as rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null > group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) > desc limit 10; > Screen output: > Query failed: Query failed: Failure while running fragment., Failure while > reading vector. Expected vector class of > org.apache.drill.exec.vector.NullableIntVector but was holding vector class > org.apache.drill.exec.vector.NullableVarCharVector. [ > c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ] > [ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ] > java.lang.RuntimeException: java.sql.SQLException: Failure while executing > query. > at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) > at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) > at sqlline.SqlLine.print(SqlLine.java:1809) > at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) > at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) > at sqlline.SqlLine.dispatch(SqlLine.java:889) > at sqlline.SqlLine.begin(SqlLine.java:763) > at sqlline.SqlLine.start(SqlLine.java:498) > at sqlline.SqlLine.main(SqlLine.java:460) > Drill log attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2105) Query fails when using flatten on JSON data where some documents have an empty array
[ https://issues.apache.org/jira/browse/DRILL-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522388#comment-14522388 ] Andries Engelbrecht commented on DRILL-2105: Please mark as resolved as of Drill 0.8 Have not experienced the issue with Drill 0.8 > Query fails when using flatten on JSON data where some documents have an > empty array > > > Key: DRILL-2105 > URL: https://issues.apache.org/jira/browse/DRILL-2105 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 0.7.0 > Environment: MFS with JSON >Reporter: Andries Engelbrecht >Assignee: Deneche A. Hakim > Fix For: 1.0.0 > > > Drill query fails when using flatten on an array, where some records contain > an empty array. Especially with larger data sets where the number of JSON > documents are greater than 100k. > Using twitter data as sample. > select flatten (entities.hashtags) from dfs.foo.`file.json`; > Empty array > "entities": { > "trends": [], > "symbols": [], > "urls": [ > { > "expanded_url": "http://on.nfl.com/1BkThQF";, > "indices": [ > 118, > 140 > ], > "display_url": "on.nfl.com/1BkThQF", > "url": "http://t.co/Unr5KFy6hG"; > } > ], > "hashtags": [], > "user_mentions": [ > { > "id": 19362299, > "name": "NFL Network", > "indices": [ > 3, > 14 > ], > "screen_name": "nflnetwork", > "id_str": "19362299" > } > ] > }, > Array with content > "entities": { > "trends": [], > "symbols": [], > "urls": [], > "hashtags": [ > { > "text": "djpreps", > "indices": [ > 47, > 55 > ] > }, > { > "text": "MSPreps", > "indices": [ > 56, > 64 > ] > } > ], > "user_mentions": [] > }, > Log output > 2015-01-27 02:26:13,478 [2b3908b9-cf08-3fd5-3bd8-ebb6bb5b70f1:foreman] INFO > o.a.d.e.store.mock.MockStorageEngine - Failure while attempting to check for > Parquet metadata file. > java.io.IOException: Open failed for file: /data/twitter/nfl/2015, error: > Invalid argument (22) > at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:191) > ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr] > at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:776) > ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr] > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:800) > ~[hadoop-common-2.4.1-mapr-1408.jar:na] > at > org.apache.drill.exec.store.dfs.shim.fallback.FallbackFileSystem.open(FallbackFileSystem.java:94) > ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] > at > org.apache.drill.exec.store.dfs.BasicFormatMatcher$MagicStringMatcher.matches(BasicFormatMatcher.java:138) > ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] > at > org.apache.drill.exec.store.dfs.BasicFormatMatcher.isReadable(BasicFormatMatcher.java:107) > ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isDirReadable(ParquetFormatPlugin.java:232) > [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] > at > org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isReadable(ParquetFormatPlugin.java:212) > [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] > at > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory.create(WorkspaceSchemaFactory.java:141) > [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] > at > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory.create(WorkspaceSchemaFactory.java:58) > [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] > at > org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry(ExpandingConcurrentMap.java:96) > [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] > at > org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get(ExpandingConcurrentMap.java:90) > [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] > at > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable(WorkspaceSchemaFactory.java:273) > [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] > at > net.hydromatic.optiq.jdbc.SimpleOptiqSchema.getTable(SimpleOptiqSchema.java:75) > [optiq-core-0.9-drill-r12.jar:na] > at > net.hydromatic.optiq.prepare.OptiqCatalogReader.getTableFrom(OptiqCatalogReader.java:87) > [optiq-core-0.9-drill-r12.jar:na] > at > net.hy
[jira] [Created] (DRILL-2946) Tableau 9.0 Desktop Enablement Document
Andries Engelbrecht created DRILL-2946: -- Summary: Tableau 9.0 Desktop Enablement Document Key: DRILL-2946 URL: https://issues.apache.org/jira/browse/DRILL-2946 Project: Apache Drill Issue Type: Improvement Components: Documentation Affects Versions: 0.9.0 Reporter: Andries Engelbrecht Assignee: Bridget Bevens Attachments: Tableau 9 Desktop Drill Configuration.docx Documentation for Tableau 9.0 Desktop enablement. Includes authentication with Drill 0.9 and later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2946) Tableau 9.0 Desktop Enablement Document
[ https://issues.apache.org/jira/browse/DRILL-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-2946: --- Attachment: Tableau 9 Desktop Drill Configuration.docx > Tableau 9.0 Desktop Enablement Document > --- > > Key: DRILL-2946 > URL: https://issues.apache.org/jira/browse/DRILL-2946 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 0.9.0 >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > Attachments: Tableau 9 Desktop Drill Configuration.docx > > > Documentation for Tableau 9.0 Desktop enablement. > Includes authentication with Drill 0.9 and later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2948) Drill MicroStrategy Document has an incorrect image
Andries Engelbrecht created DRILL-2948: -- Summary: Drill MicroStrategy Document has an incorrect image Key: DRILL-2948 URL: https://issues.apache.org/jira/browse/DRILL-2948 Project: Apache Drill Issue Type: Bug Components: Documentation Reporter: Andries Engelbrecht Assignee: Bridget Bevens Priority: Minor The first image is showing a table of a report instead of showing what the 32 bit version of the ODBC driver should look like. Please refer to the original document for references to the pictures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2982) Tableau 9.0 Server Enablement Documentation
Andries Engelbrecht created DRILL-2982: -- Summary: Tableau 9.0 Server Enablement Documentation Key: DRILL-2982 URL: https://issues.apache.org/jira/browse/DRILL-2982 Project: Apache Drill Issue Type: Improvement Components: Documentation Affects Versions: 0.9.0, 1.0.0 Reporter: Andries Engelbrecht Assignee: Bridget Bevens Tableau 9.0 Server Enablement document -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2982) Tableau 9.0 Server Enablement Documentation
[ https://issues.apache.org/jira/browse/DRILL-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-2982: --- Attachment: Tableau 9 Server Drill Configuration.docx > Tableau 9.0 Server Enablement Documentation > --- > > Key: DRILL-2982 > URL: https://issues.apache.org/jira/browse/DRILL-2982 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 0.9.0, 1.0.0 >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > Attachments: Tableau 9 Server Drill Configuration.docx > > > Tableau 9.0 Server Enablement document -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3025) Tibco Spotfire Server - JDBC - Configuration Document
Andries Engelbrecht created DRILL-3025: -- Summary: Tibco Spotfire Server - JDBC - Configuration Document Key: DRILL-3025 URL: https://issues.apache.org/jira/browse/DRILL-3025 Project: Apache Drill Issue Type: Improvement Components: Documentation Affects Versions: 1.0.0 Reporter: Andries Engelbrecht Assignee: Bridget Bevens TSS Configuration document - JDBC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3025) Tibco Spotfire Server - JDBC - Configuration Document
[ https://issues.apache.org/jira/browse/DRILL-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-3025: --- Attachment: Tibco Spotfire Server 6.0 Drill Configuration.docx > Tibco Spotfire Server - JDBC - Configuration Document > - > > Key: DRILL-3025 > URL: https://issues.apache.org/jira/browse/DRILL-3025 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.0.0 >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > Attachments: Tibco Spotfire Server 6.0 Drill Configuration.docx > > > TSS Configuration document - JDBC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3148) JReport enablement document for Drill
Andries Engelbrecht created DRILL-3148: -- Summary: JReport enablement document for Drill Key: DRILL-3148 URL: https://issues.apache.org/jira/browse/DRILL-3148 Project: Apache Drill Issue Type: Improvement Components: Documentation Affects Versions: 1.0.0, Future Reporter: Andries Engelbrecht Assignee: Bridget Bevens Enablement document for JReport to work with Drill using JDBC. In support of JReport certification of Drill 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3148) JReport enablement document for Drill
[ https://issues.apache.org/jira/browse/DRILL-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-3148: --- Attachment: JReport with Apache Drill v0.3.doc > JReport enablement document for Drill > - > > Key: DRILL-3148 > URL: https://issues.apache.org/jira/browse/DRILL-3148 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.0.0, Future >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > Attachments: JReport with Apache Drill v0.3.doc > > > Enablement document for JReport to work with Drill using JDBC. > In support of JReport certification of Drill 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3148) JReport enablement document for Drill
[ https://issues.apache.org/jira/browse/DRILL-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-3148: --- Attachment: (was: JReport with Apache Drill v0.3.doc) > JReport enablement document for Drill > - > > Key: DRILL-3148 > URL: https://issues.apache.org/jira/browse/DRILL-3148 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.0.0, Future >Reporter: Andries Engelbrecht >Assignee: Bob Rumsby > Attachments: JReport with Apache Drill Final-AE.doc > > > Enablement document for JReport to work with Drill using JDBC. > In support of JReport certification of Drill 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3148) JReport enablement document for Drill
[ https://issues.apache.org/jira/browse/DRILL-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-3148: --- Attachment: JReport with Apache Drill Final-AE.doc > JReport enablement document for Drill > - > > Key: DRILL-3148 > URL: https://issues.apache.org/jira/browse/DRILL-3148 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.0.0, Future >Reporter: Andries Engelbrecht >Assignee: Bob Rumsby > Attachments: JReport with Apache Drill Final-AE.doc > > > Enablement document for JReport to work with Drill using JDBC. > In support of JReport certification of Drill 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3276) Broken links on Apache Drill documentation
Andries Engelbrecht created DRILL-3276: -- Summary: Broken links on Apache Drill documentation Key: DRILL-3276 URL: https://issues.apache.org/jira/browse/DRILL-3276 Project: Apache Drill Issue Type: Bug Components: Documentation Reporter: Andries Engelbrecht Assignee: Bridget Bevens Priority: Minor Following links result in a paige not found error http://drill.apache.org/docs/supported-data-types-for-casting http://drill.apache.org/docs/explicit-type-casting-maps The links are found on this page http://drill.apache.org/docs/data-type-conversion/#cast -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3148) JReport enablement document for Drill
[ https://issues.apache.org/jira/browse/DRILL-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593480#comment-14593480 ] Andries Engelbrecht commented on DRILL-3148: Looks good, thank you. > JReport enablement document for Drill > - > > Key: DRILL-3148 > URL: https://issues.apache.org/jira/browse/DRILL-3148 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.0.0, Future >Reporter: Andries Engelbrecht >Assignee: Bob Rumsby > Fix For: 1.1.0 > > Attachments: JReport with Apache Drill Final-AE.doc > > > Enablement document for JReport to work with Drill using JDBC. > In support of JReport certification of Drill 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3025) Tibco Spotfire Server - JDBC - Configuration Document
[ https://issues.apache.org/jira/browse/DRILL-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593482#comment-14593482 ] Andries Engelbrecht commented on DRILL-3025: Do you know when this will be published? > Tibco Spotfire Server - JDBC - Configuration Document > - > > Key: DRILL-3025 > URL: https://issues.apache.org/jira/browse/DRILL-3025 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.0.0 >Reporter: Andries Engelbrecht >Assignee: Bob Rumsby > Attachments: Tibco Spotfire Server 6.0 Drill Configuration.docx > > > TSS Configuration document - JDBC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (DRILL-2272) Tibco Spotfire Desktop configuration for Drill documentation
[ https://issues.apache.org/jira/browse/DRILL-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht reopened DRILL-2272: Can we please change the heading to Using Tibco Spotfire Desktop with Drill Once the Spotfire Server document is published it will create confusion when looking at the index. > Tibco Spotfire Desktop configuration for Drill documentation > > > Key: DRILL-2272 > URL: https://issues.apache.org/jira/browse/DRILL-2272 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > Attachments: Spotfire Desktop Drill Config.docx > > > Instructions to configure Tibco Spotfire Desktop with Drill using ODBC to be > added to the wiki. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3025) Tibco Spotfire Server - JDBC - Configuration Document
[ https://issues.apache.org/jira/browse/DRILL-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601555#comment-14601555 ] Andries Engelbrecht commented on DRILL-3025: Thank you, please let me know when it is published. Andries > Tibco Spotfire Server - JDBC - Configuration Document > - > > Key: DRILL-3025 > URL: https://issues.apache.org/jira/browse/DRILL-3025 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.0.0 >Reporter: Andries Engelbrecht >Assignee: Bob Rumsby > Attachments: Tibco Spotfire Server 6.0 Drill Configuration.docx > > > TSS Configuration document - JDBC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3416) QlikSense Drill Configuration document
[ https://issues.apache.org/jira/browse/DRILL-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-3416: --- Attachment: Qlik Sense and Drill Configuration v3.0.docx > QlikSense Drill Configuration document > -- > > Key: DRILL-3416 > URL: https://issues.apache.org/jira/browse/DRILL-3416 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.0.0 >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > Attachments: Qlik Sense and Drill Configuration v3.0.docx > > > QlikSense - Drill 1.0 configuration -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3416) QlikSense Drill Configuration document
Andries Engelbrecht created DRILL-3416: -- Summary: QlikSense Drill Configuration document Key: DRILL-3416 URL: https://issues.apache.org/jira/browse/DRILL-3416 Project: Apache Drill Issue Type: Improvement Components: Documentation Affects Versions: 1.0.0 Reporter: Andries Engelbrecht Assignee: Bridget Bevens Attachments: Qlik Sense and Drill Configuration v3.0.docx QlikSense - Drill 1.0 configuration -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3436) MicroStrategy configuration documentation
Andries Engelbrecht created DRILL-3436: -- Summary: MicroStrategy configuration documentation Key: DRILL-3436 URL: https://issues.apache.org/jira/browse/DRILL-3436 Project: Apache Drill Issue Type: Bug Components: Documentation Affects Versions: 1.0.0 Reporter: Andries Engelbrecht Assignee: Bridget Bevens Broken links that require updates in documentation on MicroStrategy configuration document https://drill.apache.org/docs/using-microstrategy-analytics-with-apache-drill/ Step 1.2 should point to link below (not old wiki link) https://drill.apache.org/docs/installing-the-driver-on-windows/ Step 1.3 should point to link below (not old wiki link) https://drill.apache.org/docs/configuring-odbc-on-windows/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3437) 2 Listings of Tibco Spotfire Desktop in drop down list
Andries Engelbrecht created DRILL-3437: -- Summary: 2 Listings of Tibco Spotfire Desktop in drop down list Key: DRILL-3437 URL: https://issues.apache.org/jira/browse/DRILL-3437 Project: Apache Drill Issue Type: Bug Components: Documentation Reporter: Andries Engelbrecht Assignee: Bridget Bevens There are 2 listing for Tibco Spotfire Desktop on the left hand pane when expanding Using Drill with BI Tools. https://drill.apache.org/docs/using-tibco-spotfire-desktop-with-drill/ One needs to be removed, seems to be duplicates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3438) Broken Links in Tibco Spotfire Desktop documentation
Andries Engelbrecht created DRILL-3438: -- Summary: Broken Links in Tibco Spotfire Desktop documentation Key: DRILL-3438 URL: https://issues.apache.org/jira/browse/DRILL-3438 Project: Apache Drill Issue Type: Bug Components: Documentation Reporter: Andries Engelbrecht Assignee: Bridget Bevens On the Tibco Spotfire Documentation page the ODBC links are broken. https://drill.apache.org/docs/using-tibco-spotfire-desktop-with-drill/ Step 1.2 should point to https://drill.apache.org/docs/installing-the-driver-on-windows/ Step 1.3 should point to https://drill.apache.org/docs/configuring-odbc-on-windows/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5403) Tableau 10.2 with Drill 1.10 integration documentation
Andries Engelbrecht created DRILL-5403: -- Summary: Tableau 10.2 with Drill 1.10 integration documentation Key: DRILL-5403 URL: https://issues.apache.org/jira/browse/DRILL-5403 Project: Apache Drill Issue Type: Improvement Components: Documentation Affects Versions: 1.10.0 Reporter: Andries Engelbrecht Documentation to be added for Tableau 10.2 and Drill 1.10 integration. See attached document with details. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5403) Tableau 10.2 with Drill 1.10 integration documentation
[ https://issues.apache.org/jira/browse/DRILL-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-5403: --- Attachment: Tableau 10.2 Drill Configuration.docx > Tableau 10.2 with Drill 1.10 integration documentation > -- > > Key: DRILL-5403 > URL: https://issues.apache.org/jira/browse/DRILL-5403 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Andries Engelbrecht > Attachments: Tableau 10.2 Drill Configuration.docx > > > Documentation to be added for Tableau 10.2 and Drill 1.10 integration. See > attached document with details. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5403) Tableau 10.2 with Drill 1.10 integration documentation
[ https://issues.apache.org/jira/browse/DRILL-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949863#comment-15949863 ] Andries Engelbrecht commented on DRILL-5403: [~bbevens]Thank you! > Tableau 10.2 with Drill 1.10 integration documentation > -- > > Key: DRILL-5403 > URL: https://issues.apache.org/jira/browse/DRILL-5403 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > Attachments: Tableau 10.2 Drill Configuration.docx > > > Documentation to be added for Tableau 10.2 and Drill 1.10 integration. See > attached document with details. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5403) Tableau 10.2 with Drill 1.10 integration documentation
[ https://issues.apache.org/jira/browse/DRILL-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-5403: --- Attachment: Edit-2-Tableau-10.2-Drill-Configuration.docx > Tableau 10.2 with Drill 1.10 integration documentation > -- > > Key: DRILL-5403 > URL: https://issues.apache.org/jira/browse/DRILL-5403 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > Fix For: 1.10.0 > > Attachments: BB-Edit-Tableau-10.2-Drill-Configuration.docx, > Edit-2-Tableau-10.2-Drill-Configuration.docx, Tableau 10.2 Drill > Configuration.docx > > > Documentation to be added for Tableau 10.2 and Drill 1.10 integration. See > attached document with details. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5403) Tableau 10.2 with Drill 1.10 integration documentation
[ https://issues.apache.org/jira/browse/DRILL-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1594#comment-1594 ] Andries Engelbrecht commented on DRILL-5403: [~bbevens] Thanks for the quick turnaround. I made a few edits to clarify that Drill is now available as a data source for Tableau on Mac (where it wasn't before). Also a few small changes as the basic steps are used with Tableau Server as well. See the Edit-2 attached document for the updates. > Tableau 10.2 with Drill 1.10 integration documentation > -- > > Key: DRILL-5403 > URL: https://issues.apache.org/jira/browse/DRILL-5403 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > Fix For: 1.10.0 > > Attachments: BB-Edit-Tableau-10.2-Drill-Configuration.docx, > Edit-2-Tableau-10.2-Drill-Configuration.docx, Tableau 10.2 Drill > Configuration.docx > > > Documentation to be added for Tableau 10.2 and Drill 1.10 integration. See > attached document with details. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (DRILL-4866) Provide TABLE and PARTITION information in INFORMATION_SCHEMA for parquet tables created by Drill
Andries Engelbrecht created DRILL-4866: -- Summary: Provide TABLE and PARTITION information in INFORMATION_SCHEMA for parquet tables created by Drill Key: DRILL-4866 URL: https://issues.apache.org/jira/browse/DRILL-4866 Project: Apache Drill Issue Type: Improvement Components: Metadata, Storage - Parquet Reporter: Andries Engelbrecht Provide the Table and Partition information on parquet tables created by Drill in INFORMATION_SCHEMA. This can be utilized by tools and users looking to optimize Drill queries by referencing the table and partition metadata from within Drill, as opposed to querying the parquet metadata underneath. Potentially extend INFORMATION_SCHEMA with an additional PARTITIONS table similar to MySQL to provide information on column(s) used for partitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2744) Provide error message when trying to query MapR-DB or HBase tables with insufficient priviliges
[ https://issues.apache.org/jira/browse/DRILL-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-2744: --- Labels: security (was: ) > Provide error message when trying to query MapR-DB or HBase tables with > insufficient priviliges > --- > > Key: DRILL-2744 > URL: https://issues.apache.org/jira/browse/DRILL-2744 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - HBase >Affects Versions: 0.8.0 >Reporter: Andries Engelbrecht > Labels: security > Fix For: Future > > > When creating MapR-DB tables with different privileges Drill will return no > results for tables with insufficient privileges. Propose an error is returned > so the user is aware of the issue, instead of simply no data being returned. > This can be a serious issue with complex queries when joining data across > multiple data sources. > Creating 2 tables - one with mapr user and the other as root. > lr. 1 mapr mapr 2 Apr 9 17:51 customers -> > mapr::table::2057.45.1574734 > lr. 1 root root 2 Apr 10 00:21 test -> mapr::table::2057.48.1574740 > hbase(main):005:0> get "test", "r1" > COLUMNCELL > col1:timestamp=1428625497000, value=a > col2:timestamp=1428625506268, value=b > 2 row(s) in 0.0380 seconds > 0: jdbc:drill:zk=drilldemo:5181> show tables; > +--++ > | TABLE_SCHEMA | TABLE_NAME | > +--++ > | maprdb | test | > | maprdb | customers | > +--++ > 2 rows selected (0.098 seconds) > querying test tables simply returns no results instead of an error. > 0: jdbc:drill:zk=drilldemo:5181> select * from test; > +--+ > | | > +--+ > +--+ > No rows selected (0.059 seconds) > Customers does return data due to sufficient privileges. > 0: jdbc:drill:zk=drilldemo:5181> select * from customers limit 1; > +++++ > | row_key | address | loyalty | personal | > +++++ > | [B@6e22c013 | {"state":"InZhIg=="} | > {"agg_rev":"MTk3","membership":"InNpbHZlciI="} | > {"age":"IjE1LTIwIg==","gender":"IkZFTUFMRSI=","name":"IkNvcnJpbmUgTWVjaGFtIg=="} > | > +++++ > 1 row selected (0.236 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4899) Hive Plugin goes to disabled status with restart of Drill and ZK
Andries Engelbrecht created DRILL-4899: -- Summary: Hive Plugin goes to disabled status with restart of Drill and ZK Key: DRILL-4899 URL: https://issues.apache.org/jira/browse/DRILL-4899 Project: Apache Drill Issue Type: Bug Components: Storage - Hive Affects Versions: 1.8.0 Reporter: Andries Engelbrecht When restarting ZK and Drill the Hive storage plugin is disabled by default and requires manual steps to enable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4899) Hive Plugin goes to disabled status with restart of Drill and ZK
[ https://issues.apache.org/jira/browse/DRILL-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15511572#comment-15511572 ] Andries Engelbrecht commented on DRILL-4899: In this case the Hive Plugin config details are retained, but the plugin itself is disabled on startup although it was enabled before shutdown. > Hive Plugin goes to disabled status with restart of Drill and ZK > > > Key: DRILL-4899 > URL: https://issues.apache.org/jira/browse/DRILL-4899 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.8.0 >Reporter: Andries Engelbrecht > > When restarting ZK and Drill the Hive storage plugin is disabled by default > and requires manual steps to enable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4973) Sqlline history
Andries Engelbrecht created DRILL-4973: -- Summary: Sqlline history Key: DRILL-4973 URL: https://issues.apache.org/jira/browse/DRILL-4973 Project: Apache Drill Issue Type: Improvement Components: Client - CLI Reporter: Andries Engelbrecht Priority: Minor Currently the history on sqlline stops working after 500 queries have been logged in the users .sqlline/history file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2848) Disable decimal data type by default
[ https://issues.apache.org/jira/browse/DRILL-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062508#comment-15062508 ] Andries Engelbrecht commented on DRILL-2848: It is feasible to enable decimal by default in future versions? A number of BI and Analytical Software tools that work with Drill requested this. > Disable decimal data type by default > > > Key: DRILL-2848 > URL: https://issues.apache.org/jira/browse/DRILL-2848 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Reporter: Mehant Baid >Assignee: Jinfeng Ni >Priority: Critical > Fix For: 1.0.0 > > Attachments: DRILL-2848-part1.patch, DRILL-2848-part2.patch > > > Due to the difference in the storage format of decimal data type in parquet > versus the in-memory format within Drill using the decimal data type is not > performant. Also some of the rules for calculating the scale and precision > need to be changed. These two concerns will be addressed post 1.0.0 release > and to prevent users from running into this we are disabling decimal data > type by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4239) Update documentation to reflect 64bit requirement to run Drill on Windows.
[ https://issues.apache.org/jira/browse/DRILL-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096483#comment-15096483 ] Andries Engelbrecht commented on DRILL-4239: Kristine Hahn the issue here relates only to actually running Drill on a 32bit Windows machine, which is a poor platform choice with likely minimal user adoption (compared to 64bit Windows). However the ODBC drivers is a different topic as it for client systems, the 32bit and 64bit ODBC drivers works on a 64bit Windows machine as certain client software may need a 32bit ODBC driver. > Update documentation to reflect 64bit requirement to run Drill on Windows. > --- > > Key: DRILL-4239 > URL: https://issues.apache.org/jira/browse/DRILL-4239 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.3.0, 1.4.0 >Reporter: Peder Jakobsen >Assignee: Kristine Hahn > Labels: newbie > Fix For: 1.5.0 > > > Winutils.exe has been compiled to run on the 64 bit version of windows. For > this reason, some part of the documentation that suggest that Drill can run > on 32bit Windows must be fixed. Furthermore, although few user run 64 bit > windows these days, it would be helpful to make this requirement more > explicit. In particular, rapid installation of Windows on VirtualBox will > often result in 32 bit version being installed by default, since it's the > preselected default during installation process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4312) JDBC PlugIN - MySQL Causes errors in Drill INFORMATION_SCHEMA
Andries Engelbrecht created DRILL-4312: -- Summary: JDBC PlugIN - MySQL Causes errors in Drill INFORMATION_SCHEMA Key: DRILL-4312 URL: https://issues.apache.org/jira/browse/DRILL-4312 Project: Apache Drill Issue Type: Bug Components: Storage - Other Affects Versions: 1.4.0 Reporter: Andries Engelbrecht When connecting MySQL with the JDBC PlugIn queries to INFORMATION_SCHEMA fails. Specifically for COLUMNS and on mysql.performance_schema. {query} SELECT DISTINCT TABLE_SCHEMA as NAME_SPACE, TABLE_NAME as TAB_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA <>'INFORMATION_SCHEMA' and TABLE_SCHEMA <> 'sys'; {/query} {result} Error: SYSTEM ERROR: MySQLSyntaxErrorException: Unknown table engine 'PERFORMANCE_SCHEMA' Fragment 0:0 {/result} {query} 0: jdbc:drill:> select * from INFORMATION_SCHEMA.`COLUMNS` where TABLE_SCHEMA = 'mysql.performance_schema'; {/query} {result} Error: SYSTEM ERROR: MySQLSyntaxErrorException: Unknown table engine 'PERFORMANCE_SCHEMA' Fragment 0:0 {/result} {drillbit.log} [Error Id: 45d23eb8-0bcf-41e2-84e2-4626e7fb0d33 on drilldemo:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) ~[drill-common-1.4.0.jar:1.4.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321) [drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184) [drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290) [drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.4.0.jar:1.4.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_51] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51] Caused by: java.lang.RuntimeException: Exception while reading definition of table 'cond_instances' at org.apache.calcite.adapter.jdbc.JdbcTable.getRowType(JdbcTable.java:103) ~[calcite-core-1.4.0-drill-1.4.0-mapr-r1.jar:1.4.0-drill-1.4.0-mapr-r1] at org.apache.drill.exec.store.ischema.RecordGenerator.scanSchema(RecordGenerator.java:140) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.store.ischema.RecordGenerator.scanSchema(RecordGenerator.java:120) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.store.ischema.RecordGenerator.scanSchema(RecordGenerator.java:120) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.store.ischema.RecordGenerator.scanSchema(RecordGenerator.java:108) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.store.ischema.SelectedTable.getRecordReader(SelectedTable.java:57) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.store.ischema.InfoSchemaBatchCreator.getBatch(InfoSchemaBatchCreator.java:36) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.store.ischema.InfoSchemaBatchCreator.getBatch(InfoSchemaBatchCreator.java:30) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:147) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:170) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:127) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:170) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:101) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:79) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:230) [drill-java-exec-1.4.0.jar:1.4.0] ... 4 common frames omitted Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown table engine 'PERFORMANCE_SCHEMA' at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_51] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[na:1.8.0_51] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_51] at java.lang.reflect.Constructor.newInstance(Constructor.java:422) ~[na:1.8.0_51] at com.mysql.jdbc.Util.handleNewInstance(Util.java:404) ~[mysql-connector-java-5.1.38
[jira] [Created] (DRILL-4440) Host file location for Windows incorrect in doc
Andries Engelbrecht created DRILL-4440: -- Summary: Host file location for Windows incorrect in doc Key: DRILL-4440 URL: https://issues.apache.org/jira/browse/DRILL-4440 Project: Apache Drill Issue Type: Bug Components: Documentation Reporter: Andries Engelbrecht Priority: Minor The hosts file location on the page https://drill.apache.org/docs/installing-the-driver-on-windows/ show /etc/hosts which is for Linux/Mac. It should point to \Windows\system32\drivers\etc\hosts for Windows systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4526) WebFOCUS Configuration Document
[ https://issues.apache.org/jira/browse/DRILL-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-4526: --- Attachment: WebFocus 8.2 Configuration with Drill-v1.1.doc > WebFOCUS Configuration Document > --- > > Key: DRILL-4526 > URL: https://issues.apache.org/jira/browse/DRILL-4526 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.4.0 >Reporter: Andries Engelbrecht > Attachments: WebFocus 8.2 Configuration with Drill-v1.1.doc > > > Please add attached configuration document for Information Builders WebFOCUS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4526) WebFOCUS Configuration Document
Andries Engelbrecht created DRILL-4526: -- Summary: WebFOCUS Configuration Document Key: DRILL-4526 URL: https://issues.apache.org/jira/browse/DRILL-4526 Project: Apache Drill Issue Type: Improvement Components: Documentation Affects Versions: 1.4.0 Reporter: Andries Engelbrecht Attachments: WebFocus 8.2 Configuration with Drill-v1.1.doc Please add attached configuration document for Information Builders WebFOCUS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4526) WebFOCUS Configuration Document
[ https://issues.apache.org/jira/browse/DRILL-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205582#comment-15205582 ] Andries Engelbrecht commented on DRILL-4526: Medium priority, will be good to get it done over the next few weeks. > WebFOCUS Configuration Document > --- > > Key: DRILL-4526 > URL: https://issues.apache.org/jira/browse/DRILL-4526 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.4.0 >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > Attachments: WebFocus 8.2 Configuration with Drill-v1.1.doc > > > Please add attached configuration document for Information Builders WebFOCUS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4533) Error should be more informative when creating/updating storage plugin definition
[ https://issues.apache.org/jira/browse/DRILL-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210435#comment-15210435 ] Andries Engelbrecht commented on DRILL-4533: Also for plugins that require credentials to clearly specify that the credentials are invalid. > Error should be more informative when creating/updating storage plugin > definition > - > > Key: DRILL-4533 > URL: https://issues.apache.org/jira/browse/DRILL-4533 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Information Schema >Reporter: Chris Matta >Priority: Minor > > When updating or creating the definition of a storage plugin fails the error > {code}error (unable to create/ update storage)"{code} > isn't descriptive enough. It should provide a hint of what's actually wrong: > * Is the JSON invalid? If so maybe a hint to where the linter encountered the > problem > * Unexpected Field? > * Deeper issues could maybe include the stack trace? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4626) WebUI https and http not lsited correct
Andries Engelbrecht created DRILL-4626: -- Summary: WebUI https and http not lsited correct Key: DRILL-4626 URL: https://issues.apache.org/jira/browse/DRILL-4626 Project: Apache Drill Issue Type: Bug Components: Documentation Affects Versions: 1.6.0, 1.5.0, 1.4.0, 1.3.0, 1.2.0 Reporter: Andries Engelbrecht Priority: Minor The documentation states to connect to https:// for Drill 1.2 and later by default. However https:// is only used if configured to do so, default is still http:// -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3510) Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL identifiers
[ https://issues.apache.org/jira/browse/DRILL-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281619#comment-15281619 ] Andries Engelbrecht commented on DRILL-3510: What is the latest status on this? We still see a lot of tool using double quotes, and many do not have the ability to change the quote character or are cumbersome for users/partners. > Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL > identifiers > -- > > Key: DRILL-3510 > URL: https://issues.apache.org/jira/browse/DRILL-3510 > Project: Apache Drill > Issue Type: Improvement > Components: SQL Parser >Reporter: Jinfeng Ni > Fix For: Future > > Attachments: DRILL-3510.patch, DRILL-3510.patch > > > Currently Drill's SQL parser uses backtick as identifier quotes, the same as > what MySQL does. However, this is different from ANSI SQL specification, > where double quote is used as identifier quotes. > MySQL has an option "ANSI_QUOTES", which could be switched on/off by user. > Drill should follow the same way, so that Drill users do not have to rewrite > their existing queries, if their queries use double quotes. > {code} > SET sql_mode='ANSI_QUOTES'; > {code} > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281633#comment-15281633 ] Andries Engelbrecht commented on DRILL-786: --- Any movement on this? Multiple tools (Tableau, MicroStrategy as examples) generate cross joins with dimension tables when building dashboards/analytics. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning & Optimization >Reporter: Krystal > Fix For: Future > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Stack trace: > org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node > [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; > planner state: > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Sets: > Set#22, type: (DrillRecordRow[*, age, name, studentnum]) > rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, > importance=0.59049001 > rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), > rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 > network} > rel#333:AbstractConverter.LOGICAL.ANY([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).
[jira] [Created] (DRILL-4682) Allow full schema identifier in SELECT clause
Andries Engelbrecht created DRILL-4682: -- Summary: Allow full schema identifier in SELECT clause Key: DRILL-4682 URL: https://issues.apache.org/jira/browse/DRILL-4682 Project: Apache Drill Issue Type: Improvement Components: SQL Parser Reporter: Andries Engelbrecht Currently Drill requires aliases to identify columns in the SELECT clause when working with multiple tables/workspaces. Many BI/Analytical and other tools by default will use the full schema identifier in the select clause when generating SQL statements for execution for generic JDBC or ODBC sources. Not supporting this feature causes issues and a slower adoption of utilizing Drill as an execution engine within the larger Analytical SQL community. Propose to support SELECT ... FROM .. Also see DRILL-3510 for double quote support as per ANSI_QUOTES SELECT ""."".""."" FROM "".""."" Which is very common generic SQL being generated by most tools when dealing with a generic SQL data source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-6186) Document support for delegationUID client side ODBC/JDBC property in open source JDBC driver
[ https://issues.apache.org/jira/browse/DRILL-6186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382781#comment-16382781 ] Andries Engelbrecht commented on DRILL-6186: [~veeran] Can you please file a Jira for support for delegationUID parameter for the Open source JDBC driver, then link back to have the documentation updated for delegationUID. ODBC doesn't have open source driver, so it may be good to update the Jira name to just specify JDBC. > Document support for delegationUID client side ODBC/JDBC property in open > source JDBC driver > > > Key: DRILL-6186 > URL: https://issues.apache.org/jira/browse/DRILL-6186 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.12.0 >Reporter: Veera Naranammalpuram >Priority: Major > Labels: security > > There is no documentation around the "delegationUID' property in the open > source documentation. We at MapR ask our customers to use this property as > one form of impersonation. Because sqlline ships with the open source JDBC > driver, if users want to use delegationUID from sqlline because they use it > from ODBC/JDBC/ BI tools, they should be able to but there's no documentation > on drill.apache.org on how to do so. There is documentation on ODBC driver > but not on JDBC. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6221) Decimal aggregations for NULL values result in 0.0 value
Andries Engelbrecht created DRILL-6221: -- Summary: Decimal aggregations for NULL values result in 0.0 value Key: DRILL-6221 URL: https://issues.apache.org/jira/browse/DRILL-6221 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 1.12.0 Reporter: Andries Engelbrecht If you sum a packed decimal field with a null value instead of null you get 0.0. select id, amt from hive.`default`.`packtest` 1 2.3 2 null 3 4.5 select sum(amt) from hive.`default`.`packtest` group by id 1 2.3 2 0.0 3 4.5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-3610) TimestampAdd/Diff (SQL_TSI_) functions
Andries Engelbrecht created DRILL-3610: -- Summary: TimestampAdd/Diff (SQL_TSI_) functions Key: DRILL-3610 URL: https://issues.apache.org/jira/browse/DRILL-3610 Project: Apache Drill Issue Type: Improvement Components: Functions - Drill Reporter: Andries Engelbrecht Assignee: Mehant Baid Add TimestampAdd and TimestampDiff (SQL_TSI) functions for year, quarter, month, week, day, hour, minute, second. Examples SELECT CAST(TIMESTAMPADD(SQL_TSI_QUARTER,1,Date('2013-03-31'), SQL_DATE) AS `column_quarter` FROM `table_in` HAVING (COUNT(1) > 0) SELECT `table_in`.`datetime` AS `column1`, `table`.`Key` AS `column_Key`, TIMESTAMPDIFF(SQL_TSI_MINUTE,to_timestamp('2004-07-04', '-MM-dd'),`table_in`.`datetime`) AS `sum_datediff_minute` FROM `calcs` -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3610) TimestampAdd/Diff (SQL_TSI_) functions
[ https://issues.apache.org/jira/browse/DRILL-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658856#comment-14658856 ] Andries Engelbrecht commented on DRILL-3610: TIMESTAMPDIFF(SQL_TSI_) returns the difference based on the specified interval between the 2 supplied timestamps, where interval can be second, minute, hour, day, week, month, quarter or year as an integer. EXTRACT/Date_Part and Timestamp functions can be used to substitute, but requires more extensive SQL to achieve the same operation. Can be very cumbersome in queries with multiple of these operations, also with machine generated queries. Date_ADD can be substituted for TIMESTAMPADD, but lacks QUARTER interval commonly used in financial analysis. > TimestampAdd/Diff (SQL_TSI_) functions > -- > > Key: DRILL-3610 > URL: https://issues.apache.org/jira/browse/DRILL-3610 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Reporter: Andries Engelbrecht >Assignee: Mehant Baid > > Add TimestampAdd and TimestampDiff (SQL_TSI) functions for year, quarter, > month, week, day, hour, minute, second. > Examples > SELECT CAST(TIMESTAMPADD(SQL_TSI_QUARTER,1,Date('2013-03-31'), SQL_DATE) AS > `column_quarter` > FROM `table_in` > HAVING (COUNT(1) > 0) > SELECT `table_in`.`datetime` AS `column1`, > `table`.`Key` AS `column_Key`, > TIMESTAMPDIFF(SQL_TSI_MINUTE,to_timestamp('2004-07-04', > '-MM-dd'),`table_in`.`datetime`) AS `sum_datediff_minute` > FROM `calcs` -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3610) TimestampAdd/Diff (SQL_TSI_) functions
[ https://issues.apache.org/jira/browse/DRILL-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660188#comment-14660188 ] Andries Engelbrecht commented on DRILL-3610: Potentially can be used, but still leaves a gap in terms of syntax used for a common DATETIME function, where ADD and DIFF will be very different. > TimestampAdd/Diff (SQL_TSI_) functions > -- > > Key: DRILL-3610 > URL: https://issues.apache.org/jira/browse/DRILL-3610 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Reporter: Andries Engelbrecht >Assignee: Mehant Baid > > Add TimestampAdd and TimestampDiff (SQL_TSI) functions for year, quarter, > month, week, day, hour, minute, second. > Examples > SELECT CAST(TIMESTAMPADD(SQL_TSI_QUARTER,1,Date('2013-03-31'), SQL_DATE) AS > `column_quarter` > FROM `table_in` > HAVING (COUNT(1) > 0) > SELECT `table_in`.`datetime` AS `column1`, > `table`.`Key` AS `column_Key`, > TIMESTAMPDIFF(SQL_TSI_MINUTE,to_timestamp('2004-07-04', > '-MM-dd'),`table_in`.`datetime`) AS `sum_datediff_minute` > FROM `calcs` -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3623) Hive query hangs with limit 0 clause
Andries Engelbrecht created DRILL-3623: -- Summary: Hive query hangs with limit 0 clause Key: DRILL-3623 URL: https://issues.apache.org/jira/browse/DRILL-3623 Project: Apache Drill Issue Type: Bug Components: Storage - Hive Affects Versions: 1.1.0 Environment: MapR cluster Reporter: Andries Engelbrecht Assignee: Venki Korukanti Running a select * from hive.table limit 0 does not return (hangs). Select * from hive.table limit 1 works fine Hive table is about 6GB with 330 files with parquet using snappy compression. Data types are int, bigint, string and double. Querying directory with parquet files through the DFS plugin works fine select * from dfs.root.`/user/hive/warehouse/database/table` limit 0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3721) Regarding drill with big file
[ https://issues.apache.org/jira/browse/DRILL-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719296#comment-14719296 ] Andries Engelbrecht commented on DRILL-3721: See what the query memory per node is set at and increase it to see if it resolves your problem. The parameter is planner.memory.max_query_memory_per_node Query sys.options to see what it is set as and use alter system to modify. https://drill.apache.org/docs/configuring-drill-memory/ https://drill.apache.org/docs/alter-system/ https://drill.apache.org/docs/configuration-options-introduction/ > Regarding drill with big file > - > > Key: DRILL-3721 > URL: https://issues.apache.org/jira/browse/DRILL-3721 > Project: Apache Drill > Issue Type: Bug >Reporter: kunal > Attachments: sample.json, sqlline.log > > > I am new to apache drill. I have configured apache drill on machine with > centos. > "DRILL_MAX_DIRECT_MEMORY" = 25g > "DRILL_HEAP" = 4g > I have a 600 mb and 3 gb json file [sample file attached]. When i fire query > on relativly small size file everything works fine but as I fire same query > with 600 mb and 3 gb files it gives following error[stack trace attached]. > Query - > select tbl5.product_id product_id,tbl5.gender gender,tbl5.item_number > item_number,tbl5.price price,tbl5.description > description,tbl5.color_swatch.image image,tbl5.color_swatch.color color from > (select tbl4.product_id product_id,tbl4.gender gender,tbl4.item_number > item_number,tbl4.price price,tbl4.size.description > description,FLATTEN(tbl4.size.color_swatch) color_swatch from > (select tbl3.product_id product_id,tbl3.catalog_item.gender > gender,tbl3.catalog_item.item_number item_number,tbl3.catalog_item.price > price,FLATTEN(tbl3.catalog_item.size) size from > (select tbl2.product.product_id as > product_id,FLATTEN(tbl2.product.catalog_item) as catalog_item from > (select FLATTEN(tbl1.catalog.product) product from dfs.root.`demo.json` tbl1) > tbl2) tbl3) tbl4) tbl5 > -- > Error - > SYSTEM ERROR: IllegalArgumentException: initialCapacity: -2147483648 > (expectd: 0+) > Fragment 0:0 > [Error Id: 60cf1b95-762d-4a0d-8cae-a2db418d4ea9 on sinhagad:31010] > -- > 1) Am i doing someting wrong or missing something ( probably because i am not > using cluster ?? ). > Please guide me through this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3510) Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL identifiers
[ https://issues.apache.org/jira/browse/DRILL-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943769#comment-14943769 ] Andries Engelbrecht commented on DRILL-3510: Will this be part of Drill 1.2? Also if ANSI_QUOTES flag is set, will the back tick ` identifier still work? The challenge is that in environments where multiple users access a Drill cluster they may be using different tools and queries. Some may now use back tick and some double quote identifiers. Will it be feasible for the SQL Parser to support both at the same time? Then allowing the JDBC to return the default ANSI standard as identifier. > Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL > identifiers > -- > > Key: DRILL-3510 > URL: https://issues.apache.org/jira/browse/DRILL-3510 > Project: Apache Drill > Issue Type: Improvement > Components: SQL Parser >Reporter: Jinfeng Ni > Fix For: Future > > Attachments: DRILL-3510.patch, DRILL-3510.patch > > > Currently Drill's SQL parser uses backtick as identifier quotes, the same as > what MySQL does. However, this is different from ANSI SQL specification, > where double quote is used as identifier quotes. > MySQL has an option "ANSI_QUOTES", which could be switched on/off by user. > Drill should follow the same way, so that Drill users do not have to rewrite > their existing queries, if their queries use double quotes. > {code} > SET sql_mode='ANSI_QUOTES'; > {code} > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3979) Add support for "replace" in CTAS similar to views
[ https://issues.apache.org/jira/browse/DRILL-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974984#comment-14974984 ] Andries Engelbrecht commented on DRILL-3979: Most databases do not support the REPLACE clause for CTAS. With a VIEW there is not inherit data loss, while with CTAS there can be data loss if the existing table is dropped. That is part of why most databases requires an explicit DROP TABLE command. Perhaps a better option would be a TEMP TABLE clause, which implies the data is transient. > Add support for "replace" in CTAS similar to views > -- > > Key: DRILL-3979 > URL: https://issues.apache.org/jira/browse/DRILL-3979 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization, Storage - Writer >Affects Versions: 1.2.0 >Reporter: Abhishek Girish > > Drill could support "create or replace table" syntax, similar to the existing > "create or replace view" syntax. > Given that "drop table" is now supported, I think it might be possible to > support this. This could be helpful in automating tests and in SQL scripting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4036) logs/sqlline_queries.json can not be accessed by user mapr
[ https://issues.apache.org/jira/browse/DRILL-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991914#comment-14991914 ] Andries Engelbrecht commented on DRILL-4036: This can happen if sqlline on the node was first executed as root. Delete the file and the next time it will work fine. It will also be created again with the rw-rw-r-- permissions. The root cause is starting sqlline as root the first time on the node. If mapr is used first and then root uses sqllien later it is not an issue. > logs/sqlline_queries.json can not be accessed by user mapr > --- > > Key: DRILL-4036 > URL: https://issues.apache.org/jira/browse/DRILL-4036 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Writer >Affects Versions: 1.3.0 >Reporter: Khurram Faraaz >Priority: Minor > > Drill was installed using RPM and when I try to connect to Drill from sqlline > as mapr user it results in permission denied error. That file > sqlline_queries.json is always empty, it has no content in it, and it is > owned by root and others can not write to it. > The change was made using he below commit > https://github.com/apache/drill/commit/42d5f818a5501dbd05808c53959db86e66202792 > {code} > I logged in as root > [root@centos-01 bin]# id > uid=0(root) gid=0(root) groups=0(root) > Note that the file is owned by root, and non-root users can not write to that > file. > [root@centos-01 bin]# ls -lrt > /opt/mapr/drill/drill-1.3.0/logs/sqlline_queries.json > -rw-r--r-- 1 root root 0 Nov 2 20:56 > /opt/mapr/drill/drill-1.3.0/logs/sqlline_queries.json > and then I connect to Drill as mapr user > > [root@centos-01 bin]# su - mapr > -bash-4.1$ pwd > /home/mapr > -bash-4.1$ cd /opt/mapr/drill/drill-1.3.0/bin/ > -bash-4.1$ ./sqlline -u "jdbc:drill:schema=dfs.tmp -n mapr -p mapr" > 23:30:38,366 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could > NOT find resource [logback.groovy] > 23:30:38,366 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could > NOT find resource [logback-test.xml] > 23:30:38,367 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found > resource [logback.xml] at [file:/opt/mapr/drill/drill-1.3.0/conf/logback.xml] > 23:30:38,565 |-INFO in > ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not > set > 23:30:38,571 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - > About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender] > 23:30:38,583 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - > Naming appender as [STDOUT] > 23:30:38,613 |-INFO in > ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default > type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] > property > 23:30:38,693 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - > About to instantiate appender of type > [ch.qos.logback.core.rolling.RollingFileAppender] > 23:30:38,696 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - > Naming appender as [QUERY] > 23:30:38,722 |-INFO in > ch.qos.logback.core.rolling.FixedWindowRollingPolicy@69663655 - No > compression will be used > 23:30:38,736 |-INFO in > ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default > type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] > property > 23:30:38,737 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[QUERY] > - Active log file name: /opt/mapr/drill/drill-1.3.0/logs/sqlline_queries.json > 23:30:38,737 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[QUERY] > - File property is set to > [/opt/mapr/drill/drill-1.3.0/logs/sqlline_queries.json] > 23:30:38,739 |-ERROR in > ch.qos.logback.core.rolling.RollingFileAppender[QUERY] - > openFile(/opt/mapr/drill/drill-1.3.0/logs/sqlline_queries.json,true) call > failed. java.io.FileNotFoundException: > /opt/mapr/drill/drill-1.3.0/logs/sqlline_queries.json (Permission denied) > at java.io.FileNotFoundException: > /opt/mapr/drill/drill-1.3.0/logs/sqlline_queries.json (Permission denied) > at at java.io.FileOutputStream.open(Native Method) > at at java.io.FileOutputStream.(FileOutputStream.java:221) > at at > ch.qos.logback.core.recovery.ResilientFileOutputStream.(ResilientFileOutputStream.java:28) > at at > ch.qos.logback.core.FileAppender.openFile(FileAppender.java:149) > at at ch.qos.logback.core.FileAppender.start(FileAppender.java:108) > at at > ch.qos.logback.core.rolling.RollingFileAppender.start(RollingFileAppender.java:86) > at at > ch.qos.logback.core.joran.action.AppenderAction.end(AppenderAction.java:96) > at at > ch.qos.logback.core.joran.spi.Interpreter.ca
[jira] [Commented] (DRILL-4114) drill not support limit 1,100
[ https://issues.apache.org/jira/browse/DRILL-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013746#comment-15013746 ] Andries Engelbrecht commented on DRILL-4114: You should use limit 1100, don't add a comma in the number. This is not a bug. > drill not support limit 1,100 > - > > Key: DRILL-4114 > URL: https://issues.apache.org/jira/browse/DRILL-4114 > Project: Apache Drill > Issue Type: Bug >Affects Versions: Future >Reporter: david_hudavy > > when > select * from tab order by tab.column1 limit 1,100 > throws exception: > [Client-1] INFO o.a.d.j.i.DrillResultSetImpl$ResultsListener - [#31] Query > failed: > org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: > Encountered "," at line 1, column 48. > Was expecting one of: > > "OFFSET" ... > "FETCH" ... > [Error Id: 68dde852-13df-4c39-bfd3-5d970dbc2549 on vm1-4:31010] > at > org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:110) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32) > [drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:205) > [drill-java-exec-1.2.0.jar:1.2.0] > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) > [netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) > [netty-handler-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > [netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) > [netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at
[jira] [Commented] (DRILL-6383) View column types, modes are plan-time guesses, not actual types
[ https://issues.apache.org/jira/browse/DRILL-6383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464094#comment-16464094 ] Andries Engelbrecht commented on DRILL-6383: Most BI/Viz tools were developed with RDBMS data sources in mind. Since Drill is not an RDBMS and doesn't own the data, we rely on Views to make metadata available to these tools in a form that is usable. Many tools will request the data source metadata upon connection, which is counter to Drill's default behavior of "Let's discover the data". For this reason we use Views as a crutch to make the metadata available in a columnar format. However a poorly defined View (i.e. select *) is not very helpful for these tools and we published best practices in this regard, and can also be very expensive for numerous metadata operations and SQL prepare statements being converted by the client drivers. As an example see the metadata available in INFORMATION_SCHEMA for columns in the View, as this is what many tools will interrogate to get metadata available from the source, which then leads to the question if Drill should do some work at View creation time to actually define the underlying data of the View, vs. just lazily create a logical View and then wait for it to be used. We have had discussions with Tool vendors to utilize the data discovery capabilities in Drill, but that is a major development for most of them that only large market demand will get them to move quicker in this regard. > View column types, modes are plan-time guesses, not actual types > > > Key: DRILL-6383 > URL: https://issues.apache.org/jira/browse/DRILL-6383 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Priority: Minor > > Create a view views and look at the list of columns within the view. You'll > see that they are often wrong in name, type and mode. > Consider a very simple CSV file with headers: > {noformat} > custId,name,balance,status > 123,Fred,456.78 > 125,Betty,98.76,VIP > 128,Barney,1.23,PAST DUE,30 > {noformat} > Define the simplest possible view: > {noformat} > CREATE VIEW myView2 AS SELECT * FROM `csvh/cust.csvh`; > {noformat} > Then look at the view file: > {noformat} > { > "name" : "myView2", > "sql" : "SELECT *\nFROM `csvh/cust.csvh`", > "fields" : [ { > "name" : "**", > "type" : "DYNAMIC_STAR", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "local", "data" ] > } > {noformat} > It is clear that the view simply captured the plan-time list of the new > double-star for the wildcard. Since this is not a true type, it should not > have an `isNullable` attribute. > OK, we have to spell out the columns: > {noformat} > CREATE VIEW myView3 AS SELECT custId FROM `csvh/cust.csvh`; > {noformat} > Let's look at the view file: > {noformat} > { > "name" : "myView3", > "sql" : "SELECT `custId`\nFROM `csvh/cust.csvh`", > "fields" : [ { > "name" : "custId", > "type" : "ANY", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "local", "data" ] > } > {noformat} > The name is correct. The type is `ANY`, which is wrong. Since this is a CSV > file, the column type is `VARCHAR`. Further, because this is a CSV file which > headers, the mode is REQUIRED, but is listed as nullable. To verify: > {noformat} > SELECT sqlTypeOf(custId), modeOf(custId) FROM myView3 LIMIT 1; > ++---+ > | EXPR$0 | EXPR$1 | > ++---+ > | CHARACTER VARYING | NOT NULL | > ++---+ > {noformat} > Now, let's try a CSV file without headers: > {noformat} > 123,Fred,456.78 > 125,Betty,98.76,VIP > {noformat} > {noformat} > CREATE VIEW myView4 AS SELECT columns FROM `csv/cust.csv`; > SELECT * FROM myView4; > ++ > |columns | > ++ > | ["123","Fred","456.78"]| > | ["125","Betty","98.76","VIP"] | > ++ > {noformat} > Let's look at the view file: > {noformat} > { > "name" : "myView4", > "sql" : "SELECT `columns`\nFROM `csv/cust.csv`", > "fields" : [ { > "name" : "columns", > "type" : "ANY", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "local", "data" ] > } > {noformat} > This is almost non-sensical. `columns` is reported as type `ANY` and > nullable. But, `columns` is Repeated `VARCHAR` and repeated types cannot be > nullable. > The conclusion is that the type information is virtually worthless and the > `isNullable` information is worse than worthless: it is plain wrong. > The type information is valid only if the planner can inver types: > {noformat} > CREATE VIEW myView5 AS > SELECT CAST(cus
[jira] [Commented] (DRILL-4973) Sqlline history
[ https://issues.apache.org/jira/browse/DRILL-4973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497310#comment-16497310 ] Andries Engelbrecht commented on DRILL-4973: [~kkhatua] in old versions of sqlline the history feature stopped working after 500 commands. Newer versions (not sure when this was resolved) no longer seem to have this issue, tested Drill 1.13 which keeps a rolling history of the last 500 command which is sufficient for most use cases. I think this can be marked as fixed now. > Sqlline history > --- > > Key: DRILL-4973 > URL: https://issues.apache.org/jira/browse/DRILL-4973 > Project: Apache Drill > Issue Type: Improvement > Components: Client - CLI >Reporter: Andries Engelbrecht >Priority: Minor > > Currently the history on sqlline stops working after 500 queries have been > logged in the users .sqlline/history file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-2049) NoClassDefFoundError: org/apache/commons/lang/StringEscapeUtils in JDBC Driver
[ https://issues.apache.org/jira/browse/DRILL-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286285#comment-14286285 ] Andries Engelbrecht commented on DRILL-2049: I tested the updated jdbc driver and it no longer requires the jar with v2 classes. Thanks! > NoClassDefFoundError: org/apache/commons/lang/StringEscapeUtils in JDBC Driver > -- > > Key: DRILL-2049 > URL: https://issues.apache.org/jira/browse/DRILL-2049 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Reporter: Patrick Wong >Assignee: Aditya Kishore > Attachments: > DRILL-2049-NoClassDefFoundError-org-apache-commons-l.patch, > DRILL-2049.1.patch.txt > > > Original request by Andries Engelbrecht (aengelbre...@maprtech.com) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2105) Query fails when using flatten on JSON data where some documents have an empty array
Andries Engelbrecht created DRILL-2105: -- Summary: Query fails when using flatten on JSON data where some documents have an empty array Key: DRILL-2105 URL: https://issues.apache.org/jira/browse/DRILL-2105 Project: Apache Drill Issue Type: Bug Components: Execution - Operators Affects Versions: 0.7.0 Environment: MFS with JSON Reporter: Andries Engelbrecht Assignee: Chris Westin Drill query fails when using flatten on an array, where some records contain an empty array. Especially with larger data sets where the number of JSON documents are greater than 100k. Using twitter data as sample. select flatten (entities.hashtags) from dfs.foo.`file.json`; Empty array "entities": { "trends": [], "symbols": [], "urls": [ { "expanded_url": "http://on.nfl.com/1BkThQF";, "indices": [ 118, 140 ], "display_url": "on.nfl.com/1BkThQF", "url": "http://t.co/Unr5KFy6hG"; } ], "hashtags": [], "user_mentions": [ { "id": 19362299, "name": "NFL Network", "indices": [ 3, 14 ], "screen_name": "nflnetwork", "id_str": "19362299" } ] }, Array with content "entities": { "trends": [], "symbols": [], "urls": [], "hashtags": [ { "text": "djpreps", "indices": [ 47, 55 ] }, { "text": "MSPreps", "indices": [ 56, 64 ] } ], "user_mentions": [] }, Log output 2015-01-27 02:26:13,478 [2b3908b9-cf08-3fd5-3bd8-ebb6bb5b70f1:foreman] INFO o.a.d.e.store.mock.MockStorageEngine - Failure while attempting to check for Parquet metadata file. java.io.IOException: Open failed for file: /data/twitter/nfl/2015, error: Invalid argument (22) at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:191) ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr] at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:776) ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr] at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:800) ~[hadoop-common-2.4.1-mapr-1408.jar:na] at org.apache.drill.exec.store.dfs.shim.fallback.FallbackFileSystem.open(FallbackFileSystem.java:94) ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.store.dfs.BasicFormatMatcher$MagicStringMatcher.matches(BasicFormatMatcher.java:138) ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.store.dfs.BasicFormatMatcher.isReadable(BasicFormatMatcher.java:107) ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isDirReadable(ParquetFormatPlugin.java:232) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isReadable(ParquetFormatPlugin.java:212) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory.create(WorkspaceSchemaFactory.java:141) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory.create(WorkspaceSchemaFactory.java:58) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry(ExpandingConcurrentMap.java:96) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get(ExpandingConcurrentMap.java:90) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable(WorkspaceSchemaFactory.java:273) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at net.hydromatic.optiq.jdbc.SimpleOptiqSchema.getTable(SimpleOptiqSchema.java:75) [optiq-core-0.9-drill-r12.jar:na] at net.hydromatic.optiq.prepare.OptiqCatalogReader.getTableFrom(OptiqCatalogReader.java:87) [optiq-core-0.9-drill-r12.jar:na] at net.hydromatic.optiq.prepare.OptiqCatalogReader.getTable(OptiqCatalogReader.java:70) [optiq-core-0.9-drill-r12.jar:na] at net.hydromatic.optiq.prepare.OptiqCatalogReader.getTable(OptiqCatalogReader.java:42) [optiq-core-0.9-drill-r12.jar:na] at org.eigenbase.sql.validate.EmptyScope.getTableNamespace(EmptyScope.java:67) [optiq-core-0.9-drill-r12.jar:na] at org.eigenbase.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:75) [optiq-core-0.9-drill-r12.jar:na] at org.eigenbase.sql.validate.AbstractNamespace.validate(AbstractNamespac
[jira] [Created] (DRILL-2140) RPC Error querying JSON with empty nested maps
Andries Engelbrecht created DRILL-2140: -- Summary: RPC Error querying JSON with empty nested maps Key: DRILL-2140 URL: https://issues.apache.org/jira/browse/DRILL-2140 Project: Apache Drill Issue Type: Bug Components: Execution - RPC Affects Versions: 0.7.0 Environment: Centos 4 node MapR cluster Reporter: Andries Engelbrecht Assignee: Jacques Nadeau When querying large number of documents in multiple directories with multiple JSON files in each, and some documents have no top level map that is used for a predicate, Drill produces a RPC error in the log. Query select t.retweeted_status.`user`.name as name, count(t.retweeted_status.favorited) as rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null group by t.retweeted_status.`user`.name order by count(t.retweeted_status.favorited) desc limit 10; Screen Error Query failed: Query failed: Failure while running fragment., index: 0, length: 1 (expected: range(0, 0)) [ b96e3bfa-74c9-4b78-886b-9a2c3fc4ea9b on se-node13.se.lab:31010 ] [ b96e3bfa-74c9-4b78-886b-9a2c3fc4ea9b on se-node13.se.lab:31010 ] Drillbit log attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2140) RPC Error querying JSON with empty nested maps
[ https://issues.apache.org/jira/browse/DRILL-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-2140: --- Attachment: drillbit.log > RPC Error querying JSON with empty nested maps > -- > > Key: DRILL-2140 > URL: https://issues.apache.org/jira/browse/DRILL-2140 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC >Affects Versions: 0.7.0 > Environment: Centos 4 node MapR cluster >Reporter: Andries Engelbrecht >Assignee: Jacques Nadeau > Attachments: drillbit.log > > > When querying large number of documents in multiple directories with multiple > JSON files in each, and some documents have no top level map that is used for > a predicate, Drill produces a RPC error in the log. > Query > select t.retweeted_status.`user`.name as name, > count(t.retweeted_status.favorited) as rt_count from `./nfl` t where > t.retweeted_status.`user`.name is not null group by > t.retweeted_status.`user`.name order by count(t.retweeted_status.favorited) > desc limit 10; > Screen Error > Query failed: Query failed: Failure while running fragment., index: 0, > length: 1 (expected: range(0, 0)) [ b96e3bfa-74c9-4b78-886b-9a2c3fc4ea9b on > se-node13.se.lab:31010 ] > [ b96e3bfa-74c9-4b78-886b-9a2c3fc4ea9b on se-node13.se.lab:31010 ] > Drillbit log attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2141) Data type error in group by and order by for JSON
[ https://issues.apache.org/jira/browse/DRILL-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-2141: --- Attachment: drillbit.log > Data type error in group by and order by for JSON > - > > Key: DRILL-2141 > URL: https://issues.apache.org/jira/browse/DRILL-2141 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 0.7.0 >Reporter: Andries Engelbrecht >Assignee: Daniel Barclay (Drill/MapR) > Attachments: drillbit.log > > > When doing group by and oder by on complex nested JSON getting Data type > errors. > Query: > select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) > as rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null > group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) > desc limit 10; > Screen output: > Query failed: Query failed: Failure while running fragment., Failure while > reading vector. Expected vector class of > org.apache.drill.exec.vector.NullableIntVector but was holding vector class > org.apache.drill.exec.vector.NullableVarCharVector. [ > c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ] > [ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ] > java.lang.RuntimeException: java.sql.SQLException: Failure while executing > query. > at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) > at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) > at sqlline.SqlLine.print(SqlLine.java:1809) > at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) > at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) > at sqlline.SqlLine.dispatch(SqlLine.java:889) > at sqlline.SqlLine.begin(SqlLine.java:763) > at sqlline.SqlLine.start(SqlLine.java:498) > at sqlline.SqlLine.main(SqlLine.java:460) > Drill log attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2141) Data type error in group by and order by for JSON
Andries Engelbrecht created DRILL-2141: -- Summary: Data type error in group by and order by for JSON Key: DRILL-2141 URL: https://issues.apache.org/jira/browse/DRILL-2141 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 0.7.0 Reporter: Andries Engelbrecht Assignee: Daniel Barclay (Drill/MapR) Attachments: drillbit.log When doing group by and oder by on complex nested JSON getting Data type errors. Query: select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) as rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) desc limit 10; Screen output: Query failed: Query failed: Failure while running fragment., Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.drill.exec.vector.NullableVarCharVector. [ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ] [ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ] java.lang.RuntimeException: java.sql.SQLException: Failure while executing query. at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) at sqlline.SqlLine.print(SqlLine.java:1809) at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) at sqlline.SqlLine.dispatch(SqlLine.java:889) at sqlline.SqlLine.begin(SqlLine.java:763) at sqlline.SqlLine.start(SqlLine.java:498) at sqlline.SqlLine.main(SqlLine.java:460) Drill log attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2141) Data type error in group by and order by for JSON
[ https://issues.apache.org/jira/browse/DRILL-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-2141: --- Attachment: new_drillbit.log > Data type error in group by and order by for JSON > - > > Key: DRILL-2141 > URL: https://issues.apache.org/jira/browse/DRILL-2141 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 0.7.0 >Reporter: Andries Engelbrecht >Assignee: Daniel Barclay (Drill/MapR) > Attachments: drillbit.log, new_drillbit.log > > > When doing group by and oder by on complex nested JSON getting Data type > errors. > Query: > select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) > as rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null > group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) > desc limit 10; > Screen output: > Query failed: Query failed: Failure while running fragment., Failure while > reading vector. Expected vector class of > org.apache.drill.exec.vector.NullableIntVector but was holding vector class > org.apache.drill.exec.vector.NullableVarCharVector. [ > c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ] > [ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ] > java.lang.RuntimeException: java.sql.SQLException: Failure while executing > query. > at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) > at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) > at sqlline.SqlLine.print(SqlLine.java:1809) > at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) > at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) > at sqlline.SqlLine.dispatch(SqlLine.java:889) > at sqlline.SqlLine.begin(SqlLine.java:763) > at sqlline.SqlLine.start(SqlLine.java:498) > at sqlline.SqlLine.main(SqlLine.java:460) > Drill log attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2141) Data type error in group by and order by for JSON
[ https://issues.apache.org/jira/browse/DRILL-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301762#comment-14301762 ] Andries Engelbrecht commented on DRILL-2141: When changing the query to use a different field to filter out JSON docs without the top level map a different error is received (similar to DRILL-2140). New Query: select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) as rt_count from `./nfl` t where t.`text` like '%RT_@%' group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) desc limit 10; Screen Output: Query failed: Query stopped., Undefined failure occurred. [ c480ac84-9dfa-4e1d-922e-d2aabe279b10 on drilldemo:31010 ] java.lang.RuntimeException: java.sql.SQLException: Failure while executing query. at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) at sqlline.SqlLine.print(SqlLine.java:1809) at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) at sqlline.SqlLine.dispatch(SqlLine.java:889) at sqlline.SqlLine.begin(SqlLine.java:763) at sqlline.SqlLine.start(SqlLine.java:498) at sqlline.SqlLine.main(SqlLine.java:460) New Drill log attached as new_drillbit.log Also note this is a single node drill system, and also used alter session set `store.format` = 'json'; > Data type error in group by and order by for JSON > - > > Key: DRILL-2141 > URL: https://issues.apache.org/jira/browse/DRILL-2141 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 0.7.0 >Reporter: Andries Engelbrecht >Assignee: Daniel Barclay (Drill/MapR) > Attachments: drillbit.log, new_drillbit.log > > > When doing group by and oder by on complex nested JSON getting Data type > errors. > Query: > select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) > as rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null > group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) > desc limit 10; > Screen output: > Query failed: Query failed: Failure while running fragment., Failure while > reading vector. Expected vector class of > org.apache.drill.exec.vector.NullableIntVector but was holding vector class > org.apache.drill.exec.vector.NullableVarCharVector. [ > c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ] > [ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ] > java.lang.RuntimeException: java.sql.SQLException: Failure while executing > query. > at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) > at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) > at sqlline.SqlLine.print(SqlLine.java:1809) > at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) > at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) > at sqlline.SqlLine.dispatch(SqlLine.java:889) > at sqlline.SqlLine.begin(SqlLine.java:763) > at sqlline.SqlLine.start(SqlLine.java:498) > at sqlline.SqlLine.main(SqlLine.java:460) > Drill log attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2157) Directory pruning on subdirectories only and data type conversions for directory filters
Andries Engelbrecht created DRILL-2157: -- Summary: Directory pruning on subdirectories only and data type conversions for directory filters Key: DRILL-2157 URL: https://issues.apache.org/jira/browse/DRILL-2157 Project: Apache Drill Issue Type: Improvement Components: Query Planning & Optimization Affects Versions: Future Reporter: Andries Engelbrecht Assignee: Jinfeng Ni Priority: Minor Drill will scan all files and directories when using only a subdirectory as a predicate. Additionally if the data type for the directory filter is not a string and is converted Drill will also first scan all the subdirectories adn files before applying the filter. My current observation is that for a directory structure as listed below, the pruning only works if the full tree is provided. If only a lower level directory is supplied in the filter condition Drill only uses it as a filter. With directory structure as below /2015 /01 /10 /11 /12 /13 /14 Query: select count(id) from `/foo` t where dir0='2015' and dir1='01' and dir2='10' Produces the correct pruning and query plan 01-02Project(id=[$3]): rowcount = 3670316.0, cumulative cost = {1.1010948E7 rows, 1.4681284E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 28434 01-03 Project(dir0=[$0], dir1=[$3], dir2=[$2], id=[$1]): rowcount = 3670316.0, cumulative cost = {7340632.0 rows, 1.468128E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 28433 01-04Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, numFiles=24, columns=[`dir0`, `dir1`, `dir2`, `id`] However: select count(id) from `/foo` t where dir2='10' Produces full scan of all sub directories and only applies a filter condition after the fact. Notice the numFiles between the 2, even though it lists columns in the base scan 01-04Filter(condition=[=($0, '10')]): rowcount = 9423761.7, cumulative cost = {1.88475234E8 rows, 3.76950476E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 27470 01-05 Project(dir2=[$1], id=[$0]): rowcount = 6.2825078E7, cumulative cost = {1.25650156E8 rows, 1.25650164E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 27469 01-06Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, numFiles=405, columns=[`dir2`, `id`] Also using the wrong data type for the filter produces a full scan select count(id) from `/foo` where dir_year=2015 and dir_month=01 and dir_day=14 Produces 01-04Filter(condition=[AND(=(CAST($1):ANY NOT NULL, 2015), =(CAST($2):ANY NOT NULL, 1), =(CAST($3):ANY NOT NULL, 10))]): rowcount = 212034.63825, cumulative cost = {1.88475234E8 rows, 1.005201264E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 34910 01-05 Project(id=[$2], dir0=[$3], dir1=[$1], dir2=[$0]): rowcount = 6.2825078E7, cumulative cost = {1.25650156E8 rows, 2.51300328E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 34909 01-06Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, numFiles=405, columns=[`id`, `dir0`, `dir1`, `dir2`], -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2157) Directory pruning on subdirectories only and data type conversions for directory filters
[ https://issues.apache.org/jira/browse/DRILL-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305368#comment-14305368 ] Andries Engelbrecht commented on DRILL-2157: In addition the ability to perform directory pruning with < and > expressions. Also if the ability to support directory pruning in views where there is a predicate filter i.e. the view below is unable to perform directory pruning while views without the predicate filter are. Create or replace view maprfs.views.`retweeted` as select CAST(t.`id` as BIGINT) as `id`, CAST(t.retweeted_status.`id` as BIGINT) as `retweet_id`, t.dir0 as dir_year, t.dir1 as dir_month, t.dir2 as dir_day, t.dir3 as dir_hour, CAST(t.retweeted_status.`created_at` as VARCHAR(40)) as `created_at`, to_date ((concat (substring(t.retweeted_status.`created_at`, 5,6),substring(t.retweeted_status.`created_at`, 26,5))), 'MMM dd ') as `date`, to_timestamp ((concat (substring(t.retweeted_status.`created_at`, 5,6),substring(t.retweeted_status.`created_at`, 26,5),substring(t.retweeted_status.`created_at`, 11,9))), 'MMM dd HH:mm:ss') as `timestamp`, CAST(t.retweeted_status.`text` as VARCHAR(140)) as `tweet`, CAST(t.retweeted_status.`user`.`favorites_count` as INT) as `favorites_count` from maprfs.twitter.`/nfl` t where t.retweeted_status.`user`.`name` is not null; > Directory pruning on subdirectories only and data type conversions for > directory filters > > > Key: DRILL-2157 > URL: https://issues.apache.org/jira/browse/DRILL-2157 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: Future >Reporter: Andries Engelbrecht >Assignee: Jinfeng Ni >Priority: Minor > > Drill will scan all files and directories when using only a subdirectory as a > predicate. Additionally if the data type for the directory filter is not a > string and is converted Drill will also first scan all the subdirectories adn > files before applying the filter. > My current observation is that for a directory structure as listed below, > the pruning only works if the full tree is provided. If only a lower level > directory is supplied in the filter condition Drill only uses it as a > filter. > With directory structure as below > /2015 > /01 >/10 >/11 >/12 >/13 >/14 > Query: > select count(id) from `/foo` t where dir0='2015' and dir1='01' and > dir2='10' > Produces the correct pruning and query plan > 01-02Project(id=[$3]): rowcount = 3670316.0, cumulative cost = > {1.1010948E7 rows, 1.4681284E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = > 28434 > 01-03 Project(dir0=[$0], dir1=[$3], dir2=[$2], id=[$1]): > rowcount = 3670316.0, cumulative cost = {7340632.0 rows, 1.468128E7 cpu, > 0.0 io, 0.0 network, 0.0 memory}, id = 28433 > 01-04Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, > numFiles=24, columns=[`dir0`, `dir1`, `dir2`, `id`] > However: > select count(id) from `/foo` t where dir2='10' > Produces full scan of all sub directories and only applies a filter > condition after the fact. Notice the numFiles between the 2, even though it > lists columns in the base scan > 01-04Filter(condition=[=($0, '10')]): rowcount = > 9423761.7, cumulative cost = {1.88475234E8 rows, 3.76950476E8 cpu, 0.0 io, > 0.0 network, 0.0 memory}, id = 27470 > 01-05 Project(dir2=[$1], id=[$0]): rowcount = > 6.2825078E7, cumulative cost = {1.25650156E8 rows, 1.25650164E8 cpu, 0.0 > io, 0.0 network, 0.0 memory}, id = 27469 > 01-06Scan(groupscan=[EasyGroupScan > [selectionRoot=/foo, numFiles=405, columns=[`dir2`, `id`] > Also using the wrong data type for the filter produces a full scan > select count(id) from `/foo` where dir_year=2015 and dir_month=01 and > dir_day=14 > Produces > 01-04Filter(condition=[AND(=(CAST($1):ANY NOT NULL, 2015), > =(CAST($2):ANY NOT NULL, 1), =(CAST($3):ANY NOT NULL, 10))]): rowcount = > 212034.63825, cumulative cost = {1.88475234E8 rows, 1.005201264E9 cpu, 0.0 > io, 0.0 network, 0.0 memory}, id = 34910 > 01-05 Project(id=[$2], dir0=[$3], dir1=[$1], dir2=[$0]): > rowcount = 6.2825078E7, cumulative cost = {1.25650156E8 rows, 2.51300328E8 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 34909 > 01-06Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, > numFiles=405, columns=[`id`, `dir0`, `dir1`, `dir2`], -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2263) Directory pruning best practices with Drill
Andries Engelbrecht created DRILL-2263: -- Summary: Directory pruning best practices with Drill Key: DRILL-2263 URL: https://issues.apache.org/jira/browse/DRILL-2263 Project: Apache Drill Issue Type: Improvement Components: Documentation Reporter: Andries Engelbrecht Assignee: Bridget Bevens Add on for querying directories. https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories Best practices to ensure that directory pruning is properly applied. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2263) Directory pruning best practices with Drill
[ https://issues.apache.org/jira/browse/DRILL-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-2263: --- Description: Add on for querying directories. https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories Best practices to ensure that directory pruning is properly applied. Please see attached document for details and write up. was: Add on for querying directories. https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories Best practices to ensure that directory pruning is properly applied. > Directory pruning best practices with Drill > --- > > Key: DRILL-2263 > URL: https://issues.apache.org/jira/browse/DRILL-2263 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > > Add on for querying directories. > https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories > Best practices to ensure that directory pruning is properly applied. > Please see attached document for details and write up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2263) Directory pruning best practices with Drill
[ https://issues.apache.org/jira/browse/DRILL-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-2263: --- Attachment: Optimizing directory pruning with Drill.docx > Directory pruning best practices with Drill > --- > > Key: DRILL-2263 > URL: https://issues.apache.org/jira/browse/DRILL-2263 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > Attachments: Optimizing directory pruning with Drill.docx > > > Add on for querying directories. > https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories > Best practices to ensure that directory pruning is properly applied. > Please see attached document for details and write up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2265) Drill data exploration function for complex data types
Andries Engelbrecht created DRILL-2265: -- Summary: Drill data exploration function for complex data types Key: DRILL-2265 URL: https://issues.apache.org/jira/browse/DRILL-2265 Project: Apache Drill Issue Type: Improvement Components: Functions - Drill Reporter: Andries Engelbrecht Assignee: Daniel Barclay (Drill/MapR) Drill data exploration function for complex data types When dealing with complex data in large volumes it will be extremely useful to have a function to collect metadata to provide a better view of the total data set. If JSON is used as an example a data set can have an extremely large volume of JSON objects. Each object can have multiple schemas and subschemas with multiple nested subschemas as well as arrays. Not all objects will have all of the schemas or subschemas. When exploring this data in Drill a SQL dot notation is used to navigate the complex subschema structure, and it can become very cumbersome to fully understand the total picture of all the data. A function that can explore the JSON objects in a data set (whether single file with multiple objects, single or multilevel directory structure) and provide the total structure of all the JSON objects to show all schema, subschema and arrays that are available for all the JSON objects. This way a data analyst will be able to see within the data set all the schema data that is available. Additionally if the function can provide the statistics information to show how many of the objects actually contain each of the schemas, subschemas and arrays (and data in each), this may indicate to an analyst how valuable or important in may be to explore any subschema or array. To speed up the collection of this data, the function may contain an option to set a sample size to only sample a portion of the total volume and project the total data set. This is a very common operation being used with prominent RDBMS systems today. Additionally for data that changes or grows the metadata collection function will need to be run periodically to update the statistics. To make the metadata more useful the results should be considered to be placed in a Drill metadata structure, similar to INFORMATION_SCHEMA, but specifically for statistics metadata only to be used by analysts for data exploration. Some security considerations should also be deigned to only allow access to users with access to the base data. In addition to the use for data analyst and data exploration the metadata and statistics can also be used for Drill internal functions in the future, such as query optimization and creation of views. This example specifically focusses on JSON data, but can similarly be applied to other complex data types that may require a very detailed understanding of the complex data set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2263) Directory pruning best practices with Drill
[ https://issues.apache.org/jira/browse/DRILL-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-2263: --- Attachment: Optimizing directory pruning with Drill v2.docx > Directory pruning best practices with Drill > --- > > Key: DRILL-2263 > URL: https://issues.apache.org/jira/browse/DRILL-2263 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > Attachments: Optimizing directory pruning with Drill v2.docx > > > Add on for querying directories. > https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories > Best practices to ensure that directory pruning is properly applied. > Please see attached document for details and write up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2263) Directory pruning best practices with Drill
[ https://issues.apache.org/jira/browse/DRILL-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-2263: --- Attachment: (was: Optimizing directory pruning with Drill.docx) > Directory pruning best practices with Drill > --- > > Key: DRILL-2263 > URL: https://issues.apache.org/jira/browse/DRILL-2263 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > Attachments: Optimizing directory pruning with Drill v2.docx > > > Add on for querying directories. > https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories > Best practices to ensure that directory pruning is properly applied. > Please see attached document for details and write up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2263) Directory pruning best practices with Drill
[ https://issues.apache.org/jira/browse/DRILL-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-2263: --- Priority: Minor (was: Major) > Directory pruning best practices with Drill > --- > > Key: DRILL-2263 > URL: https://issues.apache.org/jira/browse/DRILL-2263 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens >Priority: Minor > Attachments: Optimizing directory pruning with Drill v2.docx > > > Add on for querying directories. > https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories > Best practices to ensure that directory pruning is properly applied. > Please see attached document for details and write up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2272) Tibco Spotfire Desktop configuration for Drill documentation
Andries Engelbrecht created DRILL-2272: -- Summary: Tibco Spotfire Desktop configuration for Drill documentation Key: DRILL-2272 URL: https://issues.apache.org/jira/browse/DRILL-2272 Project: Apache Drill Issue Type: Improvement Components: Documentation Reporter: Andries Engelbrecht Assignee: Bridget Bevens Instructions to configure Tibco Spotfire Desktop with Drill using ODBC to be added to the wiki. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2272) Tibco Spotfire Desktop configuration for Drill documentation
[ https://issues.apache.org/jira/browse/DRILL-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-2272: --- Attachment: Spotfire Desktop Drill Config.docx > Tibco Spotfire Desktop configuration for Drill documentation > > > Key: DRILL-2272 > URL: https://issues.apache.org/jira/browse/DRILL-2272 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > Attachments: Spotfire Desktop Drill Config.docx > > > Instructions to configure Tibco Spotfire Desktop with Drill using ODBC to be > added to the wiki. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2141) Data type error in group by and order by for JSON
[ https://issues.apache.org/jira/browse/DRILL-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-2141: --- Attachment: FlumeData.1422748800086 Sample file size. Issue is more pronounced with larger sample file sizes and larger number of files. > Data type error in group by and order by for JSON > - > > Key: DRILL-2141 > URL: https://issues.apache.org/jira/browse/DRILL-2141 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 0.7.0 >Reporter: Andries Engelbrecht >Assignee: Hanifi Gunes > Fix For: 0.9.0 > > Attachments: FlumeData.1422748800086, drillbit.log, new_drillbit.log > > > When doing group by and oder by on complex nested JSON getting Data type > errors. > Query: > select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) > as rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null > group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) > desc limit 10; > Screen output: > Query failed: Query failed: Failure while running fragment., Failure while > reading vector. Expected vector class of > org.apache.drill.exec.vector.NullableIntVector but was holding vector class > org.apache.drill.exec.vector.NullableVarCharVector. [ > c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ] > [ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ] > java.lang.RuntimeException: java.sql.SQLException: Failure while executing > query. > at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) > at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) > at sqlline.SqlLine.print(SqlLine.java:1809) > at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) > at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) > at sqlline.SqlLine.dispatch(SqlLine.java:889) > at sqlline.SqlLine.begin(SqlLine.java:763) > at sqlline.SqlLine.start(SqlLine.java:498) > at sqlline.SqlLine.main(SqlLine.java:460) > Drill log attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-2141) Data type error in group by and order by for JSON
[ https://issues.apache.org/jira/browse/DRILL-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340566#comment-14340566 ] Andries Engelbrecht edited comment on DRILL-2141 at 2/27/15 6:49 PM: - Sample file attached now. Issue is more pronounced with larger sample file sizes and larger number of files. Let me know if you experience the issue, and perhaps we can test on larger environment. was (Author: aengelbrecht): Sample file size. Issue is more pronounced with larger sample file sizes and larger number of files. > Data type error in group by and order by for JSON > - > > Key: DRILL-2141 > URL: https://issues.apache.org/jira/browse/DRILL-2141 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 0.7.0 >Reporter: Andries Engelbrecht >Assignee: Hanifi Gunes > Fix For: 0.9.0 > > Attachments: FlumeData.1422748800086, drillbit.log, new_drillbit.log > > > When doing group by and oder by on complex nested JSON getting Data type > errors. > Query: > select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) > as rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null > group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) > desc limit 10; > Screen output: > Query failed: Query failed: Failure while running fragment., Failure while > reading vector. Expected vector class of > org.apache.drill.exec.vector.NullableIntVector but was holding vector class > org.apache.drill.exec.vector.NullableVarCharVector. [ > c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ] > [ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ] > java.lang.RuntimeException: java.sql.SQLException: Failure while executing > query. > at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) > at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) > at sqlline.SqlLine.print(SqlLine.java:1809) > at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) > at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) > at sqlline.SqlLine.dispatch(SqlLine.java:889) > at sqlline.SqlLine.begin(SqlLine.java:763) > at sqlline.SqlLine.start(SqlLine.java:498) > at sqlline.SqlLine.main(SqlLine.java:460) > Drill log attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2141) Data type error in group by and order by for JSON
[ https://issues.apache.org/jira/browse/DRILL-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344376#comment-14344376 ] Andries Engelbrecht commented on DRILL-2141: Error still present on a much larger sample set of data in a cluster. select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) as rt_count from `./nfl` t where t.`text` like '%RT_@%' group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) desc limit 10; +++ |name| rt_count | +++ Query failed: Query stopped., Undefined failure occurred. [ 79f5d0d4-5101-48e6-a6bc-f25c147db6d8 on se-node11.se.lab:31010 ] java.lang.RuntimeException: java.sql.SQLException: Failure while executing query. at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) at sqlline.SqlLine.print(SqlLine.java:1809) at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) at sqlline.SqlLine.dispatch(SqlLine.java:889) at sqlline.SqlLine.begin(SqlLine.java:763) at sqlline.SqlLine.start(SqlLine.java:498) at sqlline.SqlLine.main(SqlLine.java:460) > Data type error in group by and order by for JSON > - > > Key: DRILL-2141 > URL: https://issues.apache.org/jira/browse/DRILL-2141 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 0.7.0 >Reporter: Andries Engelbrecht >Assignee: Hanifi Gunes > Fix For: 0.9.0 > > Attachments: FlumeData.1422748800086, drillbit.log, new_drillbit.log > > > When doing group by and oder by on complex nested JSON getting Data type > errors. > Query: > select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) > as rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null > group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) > desc limit 10; > Screen output: > Query failed: Query failed: Failure while running fragment., Failure while > reading vector. Expected vector class of > org.apache.drill.exec.vector.NullableIntVector but was holding vector class > org.apache.drill.exec.vector.NullableVarCharVector. [ > c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ] > [ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ] > java.lang.RuntimeException: java.sql.SQLException: Failure while executing > query. > at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) > at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) > at sqlline.SqlLine.print(SqlLine.java:1809) > at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) > at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) > at sqlline.SqlLine.dispatch(SqlLine.java:889) > at sqlline.SqlLine.begin(SqlLine.java:763) > at sqlline.SqlLine.start(SqlLine.java:498) > at sqlline.SqlLine.main(SqlLine.java:460) > Drill log attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2424) Ignore hidden files in directory path
Andries Engelbrecht created DRILL-2424: -- Summary: Ignore hidden files in directory path Key: DRILL-2424 URL: https://issues.apache.org/jira/browse/DRILL-2424 Project: Apache Drill Issue Type: Improvement Components: Storage - JSON, Storage - Text & CSV Affects Versions: 0.7.0 Reporter: Andries Engelbrecht Assignee: Steven Phillips When streaming data to the DFS some records can be incomplete during the temporary write phase for the last file(s). These file typically have a different extension like '.tmp' or can be marked hidden with a prefix of '.' . Querying the directory path will Drill will then cause a query error as some records may not be complete in the temporary files. Having the ability to have Drill ignore hidden files and/or to only read files of designated extension in the workspace will resolve this problem. Example is using Flume to stream JSON files to a directory structure, the HDFS sink creates .tmp files (can be hidden with . prefix) that contains incomplete JSON objects till the file is closed and the .tmp extension (or prefix) is removed. Attempting to query the directory structure with Drill then results in errors due to the incomplete JSON object(s) in the tmp files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2456) regexp_replace using hex codes fails on larger JSON data sets
[ https://issues.apache.org/jira/browse/DRILL-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-2456: --- Attachment: drillbit.log > regexp_replace using hex codes fails on larger JSON data sets > - > > Key: DRILL-2456 > URL: https://issues.apache.org/jira/browse/DRILL-2456 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 0.7.0 > Environment: Drill 0.7 > MapR 4.0.1 > CentOS >Reporter: Andries Engelbrecht >Assignee: Daniel Barclay (Drill) > Attachments: drillbit.log > > > This query works with only 1 file > select regexp_replace(`text`, '[^\x20-\xad]', '°'), count(id) from > dfs.twitter.`/feed/2015/03/13/17/FlumeData.1426267859699.json` group by > `text` order by count(id) desc limit 10; > This one fails with multiple files > select regexp_replace(`text`, '[^\x20-\xad]', '°'), count(id) from > dfs.twitter.`/feed/2015/03/13` group by `text` order by count(id) desc limit > 10; > Query failed: Query failed: Failure while trying to start remote fragment, > Encountered an illegal char on line 1, column 31: '' [ > 43ff1aa4-4a71-455d-b817-ec5eb8d179bb on twitternode:31010 ] > Using text in regexp_replace does work for same dataset. > This query works fine on full data set. > select regexp_replace(`text`, '[^ -~¡-ÿ]', '°'), count(id) from > dfs.twitter.`/feed/2015/03/13` group by `text` order by count(id) desc limit > 10; > Attached snippet drillbit.log for error -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2456) regexp_replace using hex codes fails on larger JSON data sets
Andries Engelbrecht created DRILL-2456: -- Summary: regexp_replace using hex codes fails on larger JSON data sets Key: DRILL-2456 URL: https://issues.apache.org/jira/browse/DRILL-2456 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 0.7.0 Environment: Drill 0.7 MapR 4.0.1 CentOS Reporter: Andries Engelbrecht Assignee: Daniel Barclay (Drill) Attachments: drillbit.log This query works with only 1 file select regexp_replace(`text`, '[^\x20-\xad]', '°'), count(id) from dfs.twitter.`/feed/2015/03/13/17/FlumeData.1426267859699.json` group by `text` order by count(id) desc limit 10; This one fails with multiple files select regexp_replace(`text`, '[^\x20-\xad]', '°'), count(id) from dfs.twitter.`/feed/2015/03/13` group by `text` order by count(id) desc limit 10; Query failed: Query failed: Failure while trying to start remote fragment, Encountered an illegal char on line 1, column 31: '' [ 43ff1aa4-4a71-455d-b817-ec5eb8d179bb on twitternode:31010 ] Using text in regexp_replace does work for same dataset. This query works fine on full data set. select regexp_replace(`text`, '[^ -~¡-ÿ]', '°'), count(id) from dfs.twitter.`/feed/2015/03/13` group by `text` order by count(id) desc limit 10; Attached snippet drillbit.log for error -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2473) Set query timezone at session level
Andries Engelbrecht created DRILL-2473: -- Summary: Set query timezone at session level Key: DRILL-2473 URL: https://issues.apache.org/jira/browse/DRILL-2473 Project: Apache Drill Issue Type: Improvement Components: Query Planning & Optimization Affects Versions: Future Reporter: Andries Engelbrecht Assignee: Jinfeng Ni Ability to set the user timezone for queries at session level to allow different users querying the same data form different timezones to localize the results to the desired timezone. Allowance for DST where applicable should be incorporated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2473) Set query timezone at session level
[ https://issues.apache.org/jira/browse/DRILL-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14365261#comment-14365261 ] Andries Engelbrecht commented on DRILL-2473: Need to think if we want connection or session level, as an application may establish a single connection but serve multiple users from different timezones. > Set query timezone at session level > --- > > Key: DRILL-2473 > URL: https://issues.apache.org/jira/browse/DRILL-2473 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: Future >Reporter: Andries Engelbrecht >Assignee: Jinfeng Ni > > Ability to set the user timezone for queries at session level to allow > different users querying the same data form different timezones to localize > the results to the desired timezone. > Allowance for DST where applicable should be incorporated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2263) Directory pruning best practices with Drill
[ https://issues.apache.org/jira/browse/DRILL-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375102#comment-14375102 ] Andries Engelbrecht commented on DRILL-2263: I would suggest that we test a couple of specific conditions with the 0.8 released version first before finalizing. > Directory pruning best practices with Drill > --- > > Key: DRILL-2263 > URL: https://issues.apache.org/jira/browse/DRILL-2263 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens >Priority: Minor > Attachments: Optimizing directory pruning with Drill v2.docx > > > Add on for querying directories. > https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories > Best practices to ensure that directory pruning is properly applied. > Please see attached document for details and write up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2157) Directory pruning on subdirectories only and data type conversions for directory filters
[ https://issues.apache.org/jira/browse/DRILL-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375103#comment-14375103 ] Andries Engelbrecht commented on DRILL-2157: Aman I will test the conditions, casting and views with the final 0.8 release. Will file a new JIRA if there are any specific conditions that may still cause issues. > Directory pruning on subdirectories only and data type conversions for > directory filters > > > Key: DRILL-2157 > URL: https://issues.apache.org/jira/browse/DRILL-2157 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: Future >Reporter: Andries Engelbrecht >Assignee: Aman Sinha >Priority: Minor > Fix For: 0.8.0 > > > Drill will scan all files and directories when using only a subdirectory as a > predicate. Additionally if the data type for the directory filter is not a > string and is converted Drill will also first scan all the subdirectories adn > files before applying the filter. > My current observation is that for a directory structure as listed below, > the pruning only works if the full tree is provided. If only a lower level > directory is supplied in the filter condition Drill only uses it as a > filter. > With directory structure as below > /2015 > /01 >/10 >/11 >/12 >/13 >/14 > Query: > select count(id) from `/foo` t where dir0='2015' and dir1='01' and > dir2='10' > Produces the correct pruning and query plan > 01-02Project(id=[$3]): rowcount = 3670316.0, cumulative cost = > {1.1010948E7 rows, 1.4681284E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = > 28434 > 01-03 Project(dir0=[$0], dir1=[$3], dir2=[$2], id=[$1]): > rowcount = 3670316.0, cumulative cost = {7340632.0 rows, 1.468128E7 cpu, > 0.0 io, 0.0 network, 0.0 memory}, id = 28433 > 01-04Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, > numFiles=24, columns=[`dir0`, `dir1`, `dir2`, `id`] > However: > select count(id) from `/foo` t where dir2='10' > Produces full scan of all sub directories and only applies a filter > condition after the fact. Notice the numFiles between the 2, even though it > lists columns in the base scan > 01-04Filter(condition=[=($0, '10')]): rowcount = > 9423761.7, cumulative cost = {1.88475234E8 rows, 3.76950476E8 cpu, 0.0 io, > 0.0 network, 0.0 memory}, id = 27470 > 01-05 Project(dir2=[$1], id=[$0]): rowcount = > 6.2825078E7, cumulative cost = {1.25650156E8 rows, 1.25650164E8 cpu, 0.0 > io, 0.0 network, 0.0 memory}, id = 27469 > 01-06Scan(groupscan=[EasyGroupScan > [selectionRoot=/foo, numFiles=405, columns=[`dir2`, `id`] > Also using the wrong data type for the filter produces a full scan > select count(id) from `/foo` where dir_year=2015 and dir_month=01 and > dir_day=14 > Produces > 01-04Filter(condition=[AND(=(CAST($1):ANY NOT NULL, 2015), > =(CAST($2):ANY NOT NULL, 1), =(CAST($3):ANY NOT NULL, 10))]): rowcount = > 212034.63825, cumulative cost = {1.88475234E8 rows, 1.005201264E9 cpu, 0.0 > io, 0.0 network, 0.0 memory}, id = 34910 > 01-05 Project(id=[$2], dir0=[$3], dir1=[$1], dir2=[$0]): > rowcount = 6.2825078E7, cumulative cost = {1.25650156E8 rows, 2.51300328E8 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 34909 > 01-06Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, > numFiles=405, columns=[`id`, `dir0`, `dir1`, `dir2`], -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2518) MicroStrategy integration with Apache Drill instructions
[ https://issues.apache.org/jira/browse/DRILL-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andries Engelbrecht updated DRILL-2518: --- Attachment: MicroStrategy-9-Drill-Configuration.docx > MicroStrategy integration with Apache Drill instructions > > > Key: DRILL-2518 > URL: https://issues.apache.org/jira/browse/DRILL-2518 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > Attachments: MicroStrategy-9-Drill-Configuration.docx > > > Configuration instructions for enabling MicroStrategy with Apache Drill. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2518) MicroStrategy integration with Apache Drill instructions
Andries Engelbrecht created DRILL-2518: -- Summary: MicroStrategy integration with Apache Drill instructions Key: DRILL-2518 URL: https://issues.apache.org/jira/browse/DRILL-2518 Project: Apache Drill Issue Type: Improvement Components: Documentation Reporter: Andries Engelbrecht Assignee: Bridget Bevens Attachments: MicroStrategy-9-Drill-Configuration.docx Configuration instructions for enabling MicroStrategy with Apache Drill. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2157) Directory pruning on subdirectories only and data type conversions for directory filters
[ https://issues.apache.org/jira/browse/DRILL-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486406#comment-14486406 ] Andries Engelbrecht commented on DRILL-2157: Can be marked as resolved. Tested with following use cases and shows the correct numFiles being scanned. Casting or string no longer needed for lower level directories. Seems to be resolved in Drill 0.8 for direct DFS access select count(id) from `/nfl` where dir0=2015 and dir1=01 and dir2=20 and dir3 between 00 and 05; select count(id) from `/nfl` where dir0=2015 and dir1=01 and dir2 between 20 and 25 and dir3 between 00 and 05; select count(id) from `/nfl` where dir0=2015 and dir1=01 and dir2 between 20 and 25 and dir3 between 00 and 05; select count(id) from `/nfl` where dir2 between 20 and 25 and dir3 between 00 and 05; select count(id) from `/nfl` where dir2>25 and dir3 between 00 and 05; For Views also seems to report the correct number of files to be scanned with directory pruning select count(id) from dfs.views.tweet_base where dir_year=2015 and dir_month=01 and dir_day=26 and dir_hour>20; select count(id) from dfs.views.tweet_base where dir_day between 20 and 26 and dir_hour>20; > Directory pruning on subdirectories only and data type conversions for > directory filters > > > Key: DRILL-2157 > URL: https://issues.apache.org/jira/browse/DRILL-2157 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: Future >Reporter: Andries Engelbrecht >Assignee: Aman Sinha >Priority: Minor > Fix For: 0.8.0 > > > Drill will scan all files and directories when using only a subdirectory as a > predicate. Additionally if the data type for the directory filter is not a > string and is converted Drill will also first scan all the subdirectories adn > files before applying the filter. > My current observation is that for a directory structure as listed below, > the pruning only works if the full tree is provided. If only a lower level > directory is supplied in the filter condition Drill only uses it as a > filter. > With directory structure as below > /2015 > /01 >/10 >/11 >/12 >/13 >/14 > Query: > select count(id) from `/foo` t where dir0='2015' and dir1='01' and > dir2='10' > Produces the correct pruning and query plan > 01-02Project(id=[$3]): rowcount = 3670316.0, cumulative cost = > {1.1010948E7 rows, 1.4681284E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = > 28434 > 01-03 Project(dir0=[$0], dir1=[$3], dir2=[$2], id=[$1]): > rowcount = 3670316.0, cumulative cost = {7340632.0 rows, 1.468128E7 cpu, > 0.0 io, 0.0 network, 0.0 memory}, id = 28433 > 01-04Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, > numFiles=24, columns=[`dir0`, `dir1`, `dir2`, `id`] > However: > select count(id) from `/foo` t where dir2='10' > Produces full scan of all sub directories and only applies a filter > condition after the fact. Notice the numFiles between the 2, even though it > lists columns in the base scan > 01-04Filter(condition=[=($0, '10')]): rowcount = > 9423761.7, cumulative cost = {1.88475234E8 rows, 3.76950476E8 cpu, 0.0 io, > 0.0 network, 0.0 memory}, id = 27470 > 01-05 Project(dir2=[$1], id=[$0]): rowcount = > 6.2825078E7, cumulative cost = {1.25650156E8 rows, 1.25650164E8 cpu, 0.0 > io, 0.0 network, 0.0 memory}, id = 27469 > 01-06Scan(groupscan=[EasyGroupScan > [selectionRoot=/foo, numFiles=405, columns=[`dir2`, `id`] > Also using the wrong data type for the filter produces a full scan > select count(id) from `/foo` where dir_year=2015 and dir_month=01 and > dir_day=14 > Produces > 01-04Filter(condition=[AND(=(CAST($1):ANY NOT NULL, 2015), > =(CAST($2):ANY NOT NULL, 1), =(CAST($3):ANY NOT NULL, 10))]): rowcount = > 212034.63825, cumulative cost = {1.88475234E8 rows, 1.005201264E9 cpu, 0.0 > io, 0.0 network, 0.0 memory}, id = 34910 > 01-05 Project(id=[$2], dir0=[$3], dir1=[$1], dir2=[$0]): > rowcount = 6.2825078E7, cumulative cost = {1.25650156E8 rows, 2.51300328E8 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 34909 > 01-06Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, > numFiles=405, columns=[`id`, `dir0`, `dir1`, `dir2`], -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2263) Directory pruning best practices with Drill
[ https://issues.apache.org/jira/browse/DRILL-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486357#comment-14486357 ] Andries Engelbrecht commented on DRILL-2263: Please close, issues resolved with Drill 0.8 No longer needed > Directory pruning best practices with Drill > --- > > Key: DRILL-2263 > URL: https://issues.apache.org/jira/browse/DRILL-2263 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens >Priority: Minor > Attachments: Optimizing directory pruning with Drill v2.docx > > > Add on for querying directories. > https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories > Best practices to ensure that directory pruning is properly applied. > Please see attached document for details and write up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2726) Display Drill version in sys.version
Andries Engelbrecht created DRILL-2726: -- Summary: Display Drill version in sys.version Key: DRILL-2726 URL: https://issues.apache.org/jira/browse/DRILL-2726 Project: Apache Drill Issue Type: Improvement Reporter: Andries Engelbrecht Include the Drill version information in sys.version, so it is easy to determine the exact version of Drill being used for support purposes. Adding a version column to sys.version to show the exact version i.e. mapr-drill-0.8.0.31168-1 or apache-drill-0.8.0.31168-1 Will make it easier for users to quickly identify the Drill version being used, and provide that information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2518) MicroStrategy integration with Apache Drill instructions
[ https://issues.apache.org/jira/browse/DRILL-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486356#comment-14486356 ] Andries Engelbrecht commented on DRILL-2518: The documentation was added here http://drill.apache.org/docs/using-microstrategy-analytics-with-apache-drill/ Please close Thank you > MicroStrategy integration with Apache Drill instructions > > > Key: DRILL-2518 > URL: https://issues.apache.org/jira/browse/DRILL-2518 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Andries Engelbrecht >Assignee: Bridget Bevens > Attachments: MicroStrategy-9-Drill-Configuration.docx > > > Configuration instructions for enabling MicroStrategy with Apache Drill. -- This message was sent by Atlassian JIRA (v6.3.4#6332)