from:"Andries Engelbrecht \(JIRA\)"

[jira] [Created] (DRILL-6708) Flatten operator executes twice on subquery resulting in cartesian of flatten columns, when final query has 2 columns using the flatten column in original query

2018-08-23 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-6708:
--

 Summary: Flatten operator executes twice on subquery resulting in 
cartesian of flatten columns, when final query has 2 columns using the flatten 
column in original query
 Key: DRILL-6708
 URL: https://issues.apache.org/jira/browse/DRILL-6708
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.13.0
Reporter: Andries Engelbrecht
 Attachments: campaignclicks_50.json

The following query with subquery and referencing the flatten column twice in 
final result, ends up with 1137195 rows vs the expected 140913 rows.

 
{code:java}
  SELECT
( `Third`.`cust_id`),
( `Third`.`device`),
( `Third`.`prod_id`)
( `Third`.`prod_id`) AS `prod_id2`
FROM (
 SELECT
 ( `Second`.`cust_id`),
 ( `Second`.`device`),
 ( `Second`.`prod_id`)
 FROM
 ( SELECT
 ( `First`.`cust_id`),
 ( `First`.`device`),
 ( `First`.`prod_id`)
 FROM
 `dfs.views`.`clicks_campaign_vw` AS `First` 
 ) AS `Second`
) AS `Third`    {code}
 
This executed against Drill View listed below
 
{code:java}
CREATE or REPLACE VIEW dfs.views.clicks_campaign_vw AS
SELECT CAST(`t`.`trans_id` as BIGINT) as trans_id, CAST(`t`.`date` AS DATE) AS 
`thedate`,
CAST(`t`.`user_info`['cust_id'] AS BIGINT) AS `cust_id`,
CAST(`t`.`user_info`['device'] AS VARCHAR(20)) AS `device`,
CAST(`t`.`user_info`['state'] AS VARCHAR(2)) AS `custstate`,
CAST(FLATTEN(`t`.`trans_info`['prod_id']) AS BIGINT) AS `prod_id`,
CAST(`t`.`trans_info`['purch_flag'] AS VARCHAR(6)) AS `purch_flag`
FROM `dfs`.`clicks`.`campaignclicks_50.json` AS `t`
WHERE `t`.`trans_info`['prod_id'][0] IS NOT NULL;{code}
Below is the query plan showing FLATTEN invoked twice
 
{code:java}
00-00 Screen : rowType = RecordType(BIGINT cust_id, VARCHAR(20) device, BIGINT 
prod_id, BIGINT prod_id2): rowcount = 7089.3, cumulative cost = {73965.03 rows, 
337056.826 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, id = 901702 
00-01 ComplexToJson : rowType = RecordType(BIGINT cust_id, VARCHAR(20) device, 
BIGINT prod_id, BIGINT prod_id2): rowcount = 7089.3, cumulative cost = {73256.1 
rows, 336347.897 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, id = 
901701 00-02 Project(cust_id=[CAST($0):BIGINT], device=[CAST($1):VARCHAR(20) 
CHARACTER SET "UTF-16LE" COLLATE "UTF-16LE$en_US$primary"], 
prod_id=[CAST($3):BIGINT], prod_id2=[CAST($4):BIGINT]) : rowType = 
RecordType(BIGINT cust_id, VARCHAR(20) device, BIGINT prod_id, BIGINT 
prod_id2): rowcount = 7089.3, cumulative cost = {66166.8 rows, 329258.6 cpu, 
8067011.0 io, 0.0 network, 0.0 memory}, id = 901700 00-03 
Flatten(flattenField=[$4]) : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, ANY 
EXPR$2, ANY EXPR$4, ANY EXPR$5): rowcount = 7089.3, cumulative cost = 
{59077.501 rows, 215829.8 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, 
id = 901699 00-04 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$4=[$3], 
EXPR$5=[$2]) : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, ANY EXPR$2, ANY 
EXPR$4, ANY EXPR$5): rowcount = 7089.3, cumulative cost = {51988.2004 
rows, 208740.5 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, id = 901698 00-05 
Flatten(flattenField=[$3]) : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, ANY 
EXPR$2, ANY EXPR$4): rowcount = 7089.3, cumulative cost = {44898.9 rows, 
173294.0 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, id = 901697 00-06 
Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$4=[$2]) : rowType = 
RecordType(ANY EXPR$0, ANY EXPR$1, ANY EXPR$2, ANY EXPR$4): rowcount = 7089.3, 
cumulative cost = {37809.6 rows, 166204.7 cpu, 8067011.0 io, 0.0 network, 0.0 
memory}, id = 901696 00-07 SelectionVectorRemover : rowType = RecordType(ANY 
ITEM, ANY ITEM1, ANY ITEM2, ANY ITEM3): rowcount = 7089.3, cumulative cost = 
{30720.3 rows, 137847.5 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, id = 
901695 00-08 Filter(condition=[IS NOT NULL($3)]) : rowType = RecordType(ANY 
ITEM, ANY ITEM1, ANY ITEM2, ANY ITEM3): rowcount = 7089.3, cumulative cost = 
{23631.0 rows, 130758.2 cpu, 8067011.0 io, 0.0 network, 0.0 memory}, id = 
901694 00-09 Project(ITEM=[ITEM($0, 'cust_id')], ITEM1=[ITEM($0, 'device')], 
ITEM2=[ITEM($1, 'prod_id')], ITEM3=[ITEM(ITEM($1, 'prod_id'), 0)]) : rowType = 
RecordType(ANY ITEM, ANY ITEM1, ANY ITEM2, ANY ITEM3): rowcount = 7877.0, 
cumulative cost = {15754.0 rows, 70893.0 cpu, 8067011.0 io, 0.0 network, 0.0 
memory}, id = 901693 00-10 Scan(table=[[dfs, clicks, campaignclicks_50.json]], 
groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/data/nested/clicks/campaignclicks_50.json, numFiles=1, 
columns=[`user_info`.`cust_id`, `user_info`.`device`, `trans_info`.`prod_id`, 
`trans_info`.`prod_id`[0]], 
files=[maprfs:///data/nested/clicks/campaignclicks_50.json]]]) : rowType = 
RecordType(ANY user_info, ANY trans_info): rowcount = 7877.0, cumulative cost = 
{7877.0 rows, 15754.0 cpu, 8067011.0 io, 0.0 netw

[jira] [Commented] (DRILL-5617) Spill file name collisions when spill file is on a shared file system

2017-06-29 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068494#comment-16068494
 ] 

Andries Engelbrecht commented on DRILL-5617:


Perhaps proper configuration will avoid this issue.

On most Hadoop Distros with HDFS there is local temp location for mapreduce 
that should be leveraged for Drill spill. Placing spill data on general HDFS 
will cause replication that can slow things down.

As example on MapR there are local volumes with replication 1 that can be used, 
in this case it won't overlap between nodes. See this link for configuration.
https://community.mapr.com/community/exchange/blog/2017/05/03/top-5-items-to-configure-with-drill-on-mapr-5x

Similar best practices should be leveraged for other deployments.

> Spill file name collisions when spill file is on a shared file system
> -
>
> Key: DRILL-5617
> URL: https://issues.apache.org/jira/browse/DRILL-5617
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.11.0
>Reporter: Chun Chang
>Assignee: Paul Rogers
>
> Spill location can be configured to be written on hdfs such as:
>   hashagg: {
> # The partitions divide the work inside the hashagg, to ease
> # handling spilling. This initial figure is tuned down when
> # memory is limited.
> #  Setting this option to 1 disables spilling !
> num_partitions: 32,
> spill: {
> # The 2 options below override the common ones
> # they should be deprecated in the future
> directories : [ "/tmp/drill/spill" ],
> fs : "maprfs:///"
>  }
>   }
> However, this could cause spill filename conflict since name convention does 
> not contain node name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-2839) ODBC Driver Doc to point to latest available Driver, also provide compatibility matrix for Drill and ODBC version

2015-04-21 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-2839:
--

 Summary: ODBC Driver Doc to point to latest available Driver, also 
provide compatibility matrix for Drill and ODBC version
 Key: DRILL-2839
 URL: https://issues.apache.org/jira/browse/DRILL-2839
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Reporter: Andries Engelbrecht
Assignee: Bridget Bevens


On the ODBC documentation page the links to the ODBC drivers are hard linked to 
the .0618 version of the drivers.
http://drill.apache.org/docs/step-1-install-the-mapr-drill-odbc-driver-on-windows/

It may be better to point to the latest drivers available in the MapR packages 
directory here. 
http://package.mapr.com/tools/MapR-ODBC/MapR_Drill/MapRDrill_odbc/

Challenge is to match the version of the ODBC driver that match the Drill 
version. It may be good to add a compatibility matrix on the doc webpage to 
identify the appropriate ODBC driver and Drill version.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2140) RPC Error querying JSON with empty nested maps

2015-04-23 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509191#comment-14509191
 ] 

Andries Engelbrecht commented on DRILL-2140:


Sudheesh  The issue seems to be resolved when working with 0.8 . Have not 
experienced it on the same dataset. Can mark as resolved. 

> RPC Error querying JSON with empty nested maps
> --
>
> Key: DRILL-2140
> URL: https://issues.apache.org/jira/browse/DRILL-2140
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 0.7.0
> Environment: Centos 4 node MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: 1.0.0
>
> Attachments: drillbit.log
>
>
> When querying large number of documents in multiple directories with multiple 
> JSON files in each, and some documents have no top level map that is used for 
> a predicate, Drill produces a RPC error in the log.
> Query
> {code}
> > select t.retweeted_status.`user`.name as name, 
> > count(t.retweeted_status.favorited) as rt_count from `./nfl` t where 
> > t.retweeted_status.`user`.name is not null group by 
> > t.retweeted_status.`user`.name order by count(t.retweeted_status.favorited) 
> > desc limit 10;
> Query failed: Query failed: Failure while running fragment., index: 0, 
> length: 1 (expected: range(0, 0)) [ b96e3bfa-74c9-4b78-886b-9a2c3fc4ea9b on 
> se-node13.se.lab:31010 ]
> [ b96e3bfa-74c9-4b78-886b-9a2c3fc4ea9b on se-node13.se.lab:31010 ]
> {code}
> Drillbit log attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2141) Data type error in group by and order by for JSON

2015-04-30 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522386#comment-14522386
 ] 

Andries Engelbrecht commented on DRILL-2141:


Please mark as resolved as of 0.8

Have not experienced the issue with 0.8 on the same data set.

> Data type error in group by and order by for JSON
> -
>
> Key: DRILL-2141
> URL: https://issues.apache.org/jira/browse/DRILL-2141
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 0.7.0
>Reporter: Andries Engelbrecht
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.0.0
>
> Attachments: FlumeData.1422748800086, drillbit.log, new_drillbit.log
>
>
> When doing group by and oder by on complex nested JSON getting Data type 
> errors.
> Query:
> select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) 
> as rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null 
> group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) 
> desc limit 10;
> Screen output:
> Query failed: Query failed: Failure while running fragment., Failure while 
> reading vector.  Expected vector class of 
> org.apache.drill.exec.vector.NullableIntVector but was holding vector class 
> org.apache.drill.exec.vector.NullableVarCharVector. [ 
> c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]
> [ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]
> java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
> query.
>   at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
>   at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
>   at sqlline.SqlLine.print(SqlLine.java:1809)
>   at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
>   at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
>   at sqlline.SqlLine.dispatch(SqlLine.java:889)
>   at sqlline.SqlLine.begin(SqlLine.java:763)
>   at sqlline.SqlLine.start(SqlLine.java:498)
>   at sqlline.SqlLine.main(SqlLine.java:460)
> Drill log attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2105) Query fails when using flatten on JSON data where some documents have an empty array

2015-04-30 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522388#comment-14522388
 ] 

Andries Engelbrecht commented on DRILL-2105:


Please mark as resolved as of Drill 0.8

Have not experienced the issue with Drill 0.8

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-2105
> URL: https://issues.apache.org/jira/browse/DRILL-2105
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 0.7.0
> Environment: MFS with JSON
>Reporter: Andries Engelbrecht
>Assignee: Deneche A. Hakim
> Fix For: 1.0.0
>
>
> Drill query fails when using flatten on an array, where some records contain 
> an empty array. Especially with larger data sets where the number of JSON 
> documents are greater than 100k.
> Using twitter data as sample.
> select flatten (entities.hashtags) from dfs.foo.`file.json`;
> Empty array
>   "entities": {
> "trends": [],
> "symbols": [],
> "urls": [
>   {
> "expanded_url": "http://on.nfl.com/1BkThQF";,
> "indices": [
>   118,
>   140
> ],
> "display_url": "on.nfl.com/1BkThQF",
> "url": "http://t.co/Unr5KFy6hG";
>   }
> ],
> "hashtags": [],
> "user_mentions": [
>   {
> "id": 19362299,
> "name": "NFL Network",
> "indices": [
>   3,
>   14
> ],
> "screen_name": "nflnetwork",
> "id_str": "19362299"
>   }
> ]
>   },
> Array with content
>   "entities": {
> "trends": [],
> "symbols": [],
> "urls": [],
> "hashtags": [
>   {
> "text": "djpreps",
> "indices": [
>   47,
>   55
> ]
>   },
>   {
> "text": "MSPreps",
> "indices": [
>   56,
>   64
> ]
>   }
> ],
> "user_mentions": []
>   },
> Log output
> 2015-01-27 02:26:13,478 [2b3908b9-cf08-3fd5-3bd8-ebb6bb5b70f1:foreman] INFO  
> o.a.d.e.store.mock.MockStorageEngine - Failure while attempting to check for 
> Parquet metadata file.
> java.io.IOException: Open failed for file: /data/twitter/nfl/2015, error: 
> Invalid argument (22)
> at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:191) 
> ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr]
> at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:776) 
> ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr]
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:800) 
> ~[hadoop-common-2.4.1-mapr-1408.jar:na]
> at 
> org.apache.drill.exec.store.dfs.shim.fallback.FallbackFileSystem.open(FallbackFileSystem.java:94)
>  ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.dfs.BasicFormatMatcher$MagicStringMatcher.matches(BasicFormatMatcher.java:138)
>  ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.dfs.BasicFormatMatcher.isReadable(BasicFormatMatcher.java:107)
>  ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isDirReadable(ParquetFormatPlugin.java:232)
>  [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isReadable(ParquetFormatPlugin.java:212)
>  [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory.create(WorkspaceSchemaFactory.java:141)
>  [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory.create(WorkspaceSchemaFactory.java:58)
>  [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry(ExpandingConcurrentMap.java:96)
>  [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get(ExpandingConcurrentMap.java:90)
>  [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable(WorkspaceSchemaFactory.java:273)
>  [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
> at 
> net.hydromatic.optiq.jdbc.SimpleOptiqSchema.getTable(SimpleOptiqSchema.java:75)
>  [optiq-core-0.9-drill-r12.jar:na]
> at 
> net.hydromatic.optiq.prepare.OptiqCatalogReader.getTableFrom(OptiqCatalogReader.java:87)
>  [optiq-core-0.9-drill-r12.jar:na]
> at 
> net.hy

[jira] [Created] (DRILL-2946) Tableau 9.0 Desktop Enablement Document

2015-05-03 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-2946:
--

 Summary: Tableau 9.0 Desktop Enablement Document
 Key: DRILL-2946
 URL: https://issues.apache.org/jira/browse/DRILL-2946
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 0.9.0
Reporter: Andries Engelbrecht
Assignee: Bridget Bevens
 Attachments: Tableau 9 Desktop Drill Configuration.docx

Documentation for Tableau 9.0 Desktop enablement.

Includes authentication with Drill 0.9 and later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2946) Tableau 9.0 Desktop Enablement Document

2015-05-03 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-2946:
---
Attachment: Tableau 9 Desktop Drill Configuration.docx

> Tableau 9.0 Desktop Enablement Document
> ---
>
> Key: DRILL-2946
> URL: https://issues.apache.org/jira/browse/DRILL-2946
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 0.9.0
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
> Attachments: Tableau 9 Desktop Drill Configuration.docx
>
>
> Documentation for Tableau 9.0 Desktop enablement.
> Includes authentication with Drill 0.9 and later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2948) Drill MicroStrategy Document has an incorrect image

2015-05-04 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-2948:
--

 Summary: Drill MicroStrategy Document has an incorrect image
 Key: DRILL-2948
 URL: https://issues.apache.org/jira/browse/DRILL-2948
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Reporter: Andries Engelbrecht
Assignee: Bridget Bevens
Priority: Minor


The first image is showing a table of a report instead of showing what the 32 
bit version of the ODBC driver should look like. 

Please refer to the original document for references to the pictures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2982) Tableau 9.0 Server Enablement Documentation

2015-05-07 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-2982:
--

 Summary: Tableau 9.0 Server Enablement Documentation
 Key: DRILL-2982
 URL: https://issues.apache.org/jira/browse/DRILL-2982
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 0.9.0, 1.0.0
Reporter: Andries Engelbrecht
Assignee: Bridget Bevens


Tableau 9.0 Server Enablement document



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2982) Tableau 9.0 Server Enablement Documentation

2015-05-07 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-2982:
---
Attachment: Tableau 9 Server Drill Configuration.docx

> Tableau 9.0 Server Enablement Documentation
> ---
>
> Key: DRILL-2982
> URL: https://issues.apache.org/jira/browse/DRILL-2982
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 0.9.0, 1.0.0
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
> Attachments: Tableau 9 Server Drill Configuration.docx
>
>
> Tableau 9.0 Server Enablement document



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3025) Tibco Spotfire Server - JDBC - Configuration Document

2015-05-11 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-3025:
--

 Summary: Tibco Spotfire Server - JDBC - Configuration Document
 Key: DRILL-3025
 URL: https://issues.apache.org/jira/browse/DRILL-3025
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.0.0
Reporter: Andries Engelbrecht
Assignee: Bridget Bevens


TSS Configuration document - JDBC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3025) Tibco Spotfire Server - JDBC - Configuration Document

2015-05-11 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-3025:
---
Attachment: Tibco Spotfire Server 6.0 Drill Configuration.docx

> Tibco Spotfire Server - JDBC - Configuration Document
> -
>
> Key: DRILL-3025
> URL: https://issues.apache.org/jira/browse/DRILL-3025
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.0.0
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
> Attachments: Tibco Spotfire Server 6.0 Drill Configuration.docx
>
>
> TSS Configuration document - JDBC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3148) JReport enablement document for Drill

2015-05-19 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-3148:
--

 Summary: JReport enablement document for Drill
 Key: DRILL-3148
 URL: https://issues.apache.org/jira/browse/DRILL-3148
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.0.0, Future
Reporter: Andries Engelbrecht
Assignee: Bridget Bevens


Enablement document for JReport to work with Drill using JDBC.
In support of JReport certification of Drill 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3148) JReport enablement document for Drill

2015-05-19 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-3148:
---
Attachment: JReport with Apache Drill v0.3.doc

> JReport enablement document for Drill
> -
>
> Key: DRILL-3148
> URL: https://issues.apache.org/jira/browse/DRILL-3148
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.0.0, Future
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
> Attachments: JReport with Apache Drill v0.3.doc
>
>
> Enablement document for JReport to work with Drill using JDBC.
> In support of JReport certification of Drill 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3148) JReport enablement document for Drill

2015-06-04 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-3148:
---
Attachment: (was: JReport with Apache Drill v0.3.doc)

> JReport enablement document for Drill
> -
>
> Key: DRILL-3148
> URL: https://issues.apache.org/jira/browse/DRILL-3148
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.0.0, Future
>Reporter: Andries Engelbrecht
>Assignee: Bob Rumsby
> Attachments: JReport with Apache Drill Final-AE.doc
>
>
> Enablement document for JReport to work with Drill using JDBC.
> In support of JReport certification of Drill 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3148) JReport enablement document for Drill

2015-06-04 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-3148:
---
Attachment: JReport with Apache Drill Final-AE.doc

> JReport enablement document for Drill
> -
>
> Key: DRILL-3148
> URL: https://issues.apache.org/jira/browse/DRILL-3148
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.0.0, Future
>Reporter: Andries Engelbrecht
>Assignee: Bob Rumsby
> Attachments: JReport with Apache Drill Final-AE.doc
>
>
> Enablement document for JReport to work with Drill using JDBC.
> In support of JReport certification of Drill 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3276) Broken links on Apache Drill documentation

2015-06-10 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-3276:
--

 Summary: Broken links on Apache Drill documentation
 Key: DRILL-3276
 URL: https://issues.apache.org/jira/browse/DRILL-3276
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Reporter: Andries Engelbrecht
Assignee: Bridget Bevens
Priority: Minor


Following links result in a paige not found error

http://drill.apache.org/docs/supported-data-types-for-casting
http://drill.apache.org/docs/explicit-type-casting-maps


The links are found on this page
http://drill.apache.org/docs/data-type-conversion/#cast




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3148) JReport enablement document for Drill

2015-06-19 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593480#comment-14593480
 ] 

Andries Engelbrecht commented on DRILL-3148:


Looks good, thank you.

> JReport enablement document for Drill
> -
>
> Key: DRILL-3148
> URL: https://issues.apache.org/jira/browse/DRILL-3148
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.0.0, Future
>Reporter: Andries Engelbrecht
>Assignee: Bob Rumsby
> Fix For: 1.1.0
>
> Attachments: JReport with Apache Drill Final-AE.doc
>
>
> Enablement document for JReport to work with Drill using JDBC.
> In support of JReport certification of Drill 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3025) Tibco Spotfire Server - JDBC - Configuration Document

2015-06-19 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593482#comment-14593482
 ] 

Andries Engelbrecht commented on DRILL-3025:


Do you know when this will be published?

> Tibco Spotfire Server - JDBC - Configuration Document
> -
>
> Key: DRILL-3025
> URL: https://issues.apache.org/jira/browse/DRILL-3025
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.0.0
>Reporter: Andries Engelbrecht
>Assignee: Bob Rumsby
> Attachments: Tibco Spotfire Server 6.0 Drill Configuration.docx
>
>
> TSS Configuration document - JDBC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (DRILL-2272) Tibco Spotfire Desktop configuration for Drill documentation

2015-06-25 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht reopened DRILL-2272:


Can we please change the heading to 
Using Tibco Spotfire Desktop with Drill


Once the Spotfire Server document is published it will create confusion when 
looking at the index.

> Tibco Spotfire Desktop configuration for Drill documentation
> 
>
> Key: DRILL-2272
> URL: https://issues.apache.org/jira/browse/DRILL-2272
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
> Attachments: Spotfire Desktop Drill Config.docx
>
>
> Instructions to configure Tibco Spotfire Desktop with Drill using ODBC to be 
> added to the wiki.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3025) Tibco Spotfire Server - JDBC - Configuration Document

2015-06-25 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601555#comment-14601555
 ] 

Andries Engelbrecht commented on DRILL-3025:


Thank you, please let me know when it is published.

Andries

> Tibco Spotfire Server - JDBC - Configuration Document
> -
>
> Key: DRILL-3025
> URL: https://issues.apache.org/jira/browse/DRILL-3025
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.0.0
>Reporter: Andries Engelbrecht
>Assignee: Bob Rumsby
> Attachments: Tibco Spotfire Server 6.0 Drill Configuration.docx
>
>
> TSS Configuration document - JDBC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3416) QlikSense Drill Configuration document

2015-06-29 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-3416:
---
Attachment: Qlik Sense and Drill Configuration v3.0.docx

> QlikSense Drill Configuration document
> --
>
> Key: DRILL-3416
> URL: https://issues.apache.org/jira/browse/DRILL-3416
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.0.0
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
> Attachments: Qlik Sense and Drill Configuration v3.0.docx
>
>
> QlikSense - Drill 1.0 configuration 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3416) QlikSense Drill Configuration document

2015-06-29 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-3416:
--

 Summary: QlikSense Drill Configuration document
 Key: DRILL-3416
 URL: https://issues.apache.org/jira/browse/DRILL-3416
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.0.0
Reporter: Andries Engelbrecht
Assignee: Bridget Bevens
 Attachments: Qlik Sense and Drill Configuration v3.0.docx

QlikSense - Drill 1.0 configuration 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3436) MicroStrategy configuration documentation

2015-06-30 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-3436:
--

 Summary: MicroStrategy configuration documentation
 Key: DRILL-3436
 URL: https://issues.apache.org/jira/browse/DRILL-3436
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.0.0
Reporter: Andries Engelbrecht
Assignee: Bridget Bevens


Broken links that require updates in documentation on MicroStrategy 
configuration document
https://drill.apache.org/docs/using-microstrategy-analytics-with-apache-drill/

Step 1.2 should point to link below (not old wiki link)
https://drill.apache.org/docs/installing-the-driver-on-windows/

Step 1.3 should point to link below (not old wiki link)
https://drill.apache.org/docs/configuring-odbc-on-windows/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3437) 2 Listings of Tibco Spotfire Desktop in drop down list

2015-06-30 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-3437:
--

 Summary: 2 Listings of Tibco Spotfire Desktop in drop down list
 Key: DRILL-3437
 URL: https://issues.apache.org/jira/browse/DRILL-3437
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Reporter: Andries Engelbrecht
Assignee: Bridget Bevens


There are 2 listing for Tibco Spotfire Desktop on the left hand pane when 
expanding Using Drill with BI Tools.
https://drill.apache.org/docs/using-tibco-spotfire-desktop-with-drill/

One needs to be removed, seems to be duplicates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3438) Broken Links in Tibco Spotfire Desktop documentation

2015-06-30 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-3438:
--

 Summary: Broken Links in Tibco Spotfire Desktop documentation
 Key: DRILL-3438
 URL: https://issues.apache.org/jira/browse/DRILL-3438
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Reporter: Andries Engelbrecht
Assignee: Bridget Bevens


On the Tibco Spotfire Documentation page the ODBC links are broken.
https://drill.apache.org/docs/using-tibco-spotfire-desktop-with-drill/

Step 1.2 should point to
https://drill.apache.org/docs/installing-the-driver-on-windows/

Step 1.3 should point to 
https://drill.apache.org/docs/configuring-odbc-on-windows/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5403) Tableau 10.2 with Drill 1.10 integration documentation

2017-03-30 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-5403:
--

 Summary: Tableau 10.2 with Drill 1.10 integration documentation
 Key: DRILL-5403
 URL: https://issues.apache.org/jira/browse/DRILL-5403
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.10.0
Reporter: Andries Engelbrecht


Documentation to be added for Tableau 10.2 and Drill 1.10 integration. See 
attached document with details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (DRILL-5403) Tableau 10.2 with Drill 1.10 integration documentation

2017-03-30 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-5403:
---
Attachment: Tableau 10.2 Drill Configuration.docx

> Tableau 10.2 with Drill 1.10 integration documentation
> --
>
> Key: DRILL-5403
> URL: https://issues.apache.org/jira/browse/DRILL-5403
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Andries Engelbrecht
> Attachments: Tableau 10.2 Drill Configuration.docx
>
>
> Documentation to be added for Tableau 10.2 and Drill 1.10 integration. See 
> attached document with details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5403) Tableau 10.2 with Drill 1.10 integration documentation

2017-03-30 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949863#comment-15949863
 ] 

Andries Engelbrecht commented on DRILL-5403:


[~bbevens]Thank you!

> Tableau 10.2 with Drill 1.10 integration documentation
> --
>
> Key: DRILL-5403
> URL: https://issues.apache.org/jira/browse/DRILL-5403
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
> Attachments: Tableau 10.2 Drill Configuration.docx
>
>
> Documentation to be added for Tableau 10.2 and Drill 1.10 integration. See 
> attached document with details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (DRILL-5403) Tableau 10.2 with Drill 1.10 integration documentation

2017-03-30 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-5403:
---
Attachment: Edit-2-Tableau-10.2-Drill-Configuration.docx

> Tableau 10.2 with Drill 1.10 integration documentation
> --
>
> Key: DRILL-5403
> URL: https://issues.apache.org/jira/browse/DRILL-5403
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
> Fix For: 1.10.0
>
> Attachments: BB-Edit-Tableau-10.2-Drill-Configuration.docx, 
> Edit-2-Tableau-10.2-Drill-Configuration.docx, Tableau 10.2 Drill 
> Configuration.docx
>
>
> Documentation to be added for Tableau 10.2 and Drill 1.10 integration. See 
> attached document with details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5403) Tableau 10.2 with Drill 1.10 integration documentation

2017-03-30 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1594#comment-1594
 ] 

Andries Engelbrecht commented on DRILL-5403:


[~bbevens] Thanks for the quick turnaround.
I made a few edits to clarify that Drill is now available as a data source for 
Tableau on Mac  (where it wasn't before). Also a few small changes as the basic 
steps are used with Tableau Server as well.
See the Edit-2 attached document for the updates.

> Tableau 10.2 with Drill 1.10 integration documentation
> --
>
> Key: DRILL-5403
> URL: https://issues.apache.org/jira/browse/DRILL-5403
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
> Fix For: 1.10.0
>
> Attachments: BB-Edit-Tableau-10.2-Drill-Configuration.docx, 
> Edit-2-Tableau-10.2-Drill-Configuration.docx, Tableau 10.2 Drill 
> Configuration.docx
>
>
> Documentation to be added for Tableau 10.2 and Drill 1.10 integration. See 
> attached document with details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-4866) Provide TABLE and PARTITION information in INFORMATION_SCHEMA for parquet tables created by Drill

2016-08-30 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-4866:
--

 Summary: Provide TABLE and PARTITION information in 
INFORMATION_SCHEMA for parquet tables created by Drill
 Key: DRILL-4866
 URL: https://issues.apache.org/jira/browse/DRILL-4866
 Project: Apache Drill
  Issue Type: Improvement
  Components: Metadata, Storage - Parquet
Reporter: Andries Engelbrecht


Provide the Table and Partition information on parquet tables created by Drill 
in INFORMATION_SCHEMA. This can be utilized by tools and users looking to 
optimize Drill queries by referencing the table and partition metadata from 
within Drill, as opposed to querying the parquet metadata underneath.

Potentially extend INFORMATION_SCHEMA with an additional PARTITIONS table 
similar to MySQL to provide information on column(s) used for partitioning.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2744) Provide error message when trying to query MapR-DB or HBase tables with insufficient priviliges

2016-08-31 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-2744:
---
Labels: security  (was: )

> Provide error message when trying to query MapR-DB or HBase tables with 
> insufficient priviliges
> ---
>
> Key: DRILL-2744
> URL: https://issues.apache.org/jira/browse/DRILL-2744
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - HBase
>Affects Versions: 0.8.0
>Reporter: Andries Engelbrecht
>  Labels: security
> Fix For: Future
>
>
> When creating MapR-DB tables with different privileges Drill will return no 
> results for tables with insufficient privileges. Propose an error is returned 
> so the user is aware of the issue, instead of simply no data being returned. 
> This can be a serious issue with complex queries when joining data across 
> multiple data sources.
> Creating 2 tables - one with mapr user and the other as root.
> lr. 1 mapr mapr 2 Apr  9 17:51 customers -> 
> mapr::table::2057.45.1574734
> lr. 1 root root 2 Apr 10 00:21 test -> mapr::table::2057.48.1574740
> hbase(main):005:0> get "test", "r1"
> COLUMNCELL
>  col1:timestamp=1428625497000, value=a
>  col2:timestamp=1428625506268, value=b
> 2 row(s) in 0.0380 seconds
> 0: jdbc:drill:zk=drilldemo:5181> show tables;
> +--++
> | TABLE_SCHEMA | TABLE_NAME |
> +--++
> | maprdb   | test   |
> | maprdb   | customers  |
> +--++
> 2 rows selected (0.098 seconds)
> querying test tables simply returns no results instead of an error.
> 0: jdbc:drill:zk=drilldemo:5181> select * from test;
> +--+
> |  |
> +--+
> +--+
> No rows selected (0.059 seconds)
> Customers does return data due to sufficient privileges.
> 0: jdbc:drill:zk=drilldemo:5181> select * from customers limit 1;
> +++++
> |  row_key   |  address   |  loyalty   |  personal  |
> +++++
> | [B@6e22c013 | {"state":"InZhIg=="} | 
> {"agg_rev":"MTk3","membership":"InNpbHZlciI="} | 
> {"age":"IjE1LTIwIg==","gender":"IkZFTUFMRSI=","name":"IkNvcnJpbmUgTWVjaGFtIg=="}
>  |
> +++++
> 1 row selected (0.236 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4899) Hive Plugin goes to disabled status with restart of Drill and ZK

2016-09-21 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-4899:
--

 Summary: Hive Plugin goes to disabled status with restart of Drill 
and ZK
 Key: DRILL-4899
 URL: https://issues.apache.org/jira/browse/DRILL-4899
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Affects Versions: 1.8.0
Reporter: Andries Engelbrecht


When restarting ZK and Drill the Hive storage plugin is disabled by default and 
requires manual steps to enable. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4899) Hive Plugin goes to disabled status with restart of Drill and ZK

2016-09-21 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15511572#comment-15511572
 ] 

Andries Engelbrecht commented on DRILL-4899:


In this case the Hive Plugin config details are retained, but the plugin itself 
is disabled on startup although it was enabled before shutdown.

> Hive Plugin goes to disabled status with restart of Drill and ZK
> 
>
> Key: DRILL-4899
> URL: https://issues.apache.org/jira/browse/DRILL-4899
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.8.0
>Reporter: Andries Engelbrecht
>
> When restarting ZK and Drill the Hive storage plugin is disabled by default 
> and requires manual steps to enable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4973) Sqlline history

2016-10-27 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-4973:
--

 Summary: Sqlline history
 Key: DRILL-4973
 URL: https://issues.apache.org/jira/browse/DRILL-4973
 Project: Apache Drill
  Issue Type: Improvement
  Components: Client - CLI
Reporter: Andries Engelbrecht
Priority: Minor


Currently the history on sqlline stops working after 500 queries have been 
logged in the users .sqlline/history file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2848) Disable decimal data type by default

2015-12-17 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062508#comment-15062508
 ] 

Andries Engelbrecht commented on DRILL-2848:


It is feasible to enable decimal by default in future versions?

A number of BI and Analytical Software tools that work with Drill requested 
this.

> Disable decimal data type by default
> 
>
> Key: DRILL-2848
> URL: https://issues.apache.org/jira/browse/DRILL-2848
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Reporter: Mehant Baid
>Assignee: Jinfeng Ni
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: DRILL-2848-part1.patch, DRILL-2848-part2.patch
>
>
> Due to the difference in the storage format of decimal data type in parquet 
> versus the in-memory format within Drill using the decimal data type is not 
> performant. Also some of the rules for calculating the scale and precision 
> need to be changed. These two concerns will be addressed post 1.0.0 release 
> and to prevent users from running into this we are disabling decimal data 
> type by default. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4239) Update documentation to reflect 64bit requirement to run Drill on Windows.

2016-01-13 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096483#comment-15096483
 ] 

Andries Engelbrecht commented on DRILL-4239:


Kristine Hahn the issue here relates only to actually running Drill on a 32bit 
Windows machine, which is a poor platform choice with likely minimal user 
adoption (compared to 64bit Windows). 

However the ODBC drivers is a different topic as it for client systems, the 
32bit and 64bit ODBC drivers works on a 64bit Windows machine as certain client 
software may need a 32bit ODBC driver.  



> Update documentation to reflect 64bit requirement to run Drill on Windows. 
> ---
>
> Key: DRILL-4239
> URL: https://issues.apache.org/jira/browse/DRILL-4239
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Peder Jakobsen
>Assignee: Kristine Hahn
>  Labels: newbie
> Fix For: 1.5.0
>
>
> Winutils.exe has been compiled to run on the 64 bit version of windows.  For 
> this reason, some part of the documentation that suggest that Drill can run 
> on 32bit Windows must be fixed.  Furthermore, although few user run 64 bit 
> windows these days, it would be helpful to make this requirement more 
> explicit.  In particular, rapid installation of Windows on VirtualBox will 
> often result in 32 bit version being installed by default, since it's the 
> preselected default during installation process. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4312) JDBC PlugIN - MySQL Causes errors in Drill INFORMATION_SCHEMA

2016-01-26 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-4312:
--

 Summary: JDBC PlugIN - MySQL Causes errors in Drill 
INFORMATION_SCHEMA
 Key: DRILL-4312
 URL: https://issues.apache.org/jira/browse/DRILL-4312
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Other
Affects Versions: 1.4.0
Reporter: Andries Engelbrecht


When connecting MySQL with the JDBC PlugIn queries to INFORMATION_SCHEMA fails. 
Specifically for COLUMNS and on mysql.performance_schema.

{query}
SELECT DISTINCT TABLE_SCHEMA as NAME_SPACE, TABLE_NAME as TAB_NAME FROM 
INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA <>'INFORMATION_SCHEMA' and 
TABLE_SCHEMA <> 'sys';
{/query}

{result}
Error: SYSTEM ERROR: MySQLSyntaxErrorException: Unknown table engine 
'PERFORMANCE_SCHEMA'

Fragment 0:0
{/result}

{query}
0: jdbc:drill:> select * from INFORMATION_SCHEMA.`COLUMNS` where TABLE_SCHEMA = 
'mysql.performance_schema';
{/query}

{result}
Error: SYSTEM ERROR: MySQLSyntaxErrorException: Unknown table engine 
'PERFORMANCE_SCHEMA'

Fragment 0:0
{/result}



{drillbit.log}
[Error Id: 45d23eb8-0bcf-41e2-84e2-4626e7fb0d33 on drilldemo:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
 ~[drill-common-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
 [drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
 [drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
 [drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.4.0.jar:1.4.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_51]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_51]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
Caused by: java.lang.RuntimeException: Exception while reading definition of 
table 'cond_instances'
at 
org.apache.calcite.adapter.jdbc.JdbcTable.getRowType(JdbcTable.java:103) 
~[calcite-core-1.4.0-drill-1.4.0-mapr-r1.jar:1.4.0-drill-1.4.0-mapr-r1]
at 
org.apache.drill.exec.store.ischema.RecordGenerator.scanSchema(RecordGenerator.java:140)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.store.ischema.RecordGenerator.scanSchema(RecordGenerator.java:120)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.store.ischema.RecordGenerator.scanSchema(RecordGenerator.java:120)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.store.ischema.RecordGenerator.scanSchema(RecordGenerator.java:108)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.store.ischema.SelectedTable.getRecordReader(SelectedTable.java:57)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.store.ischema.InfoSchemaBatchCreator.getBatch(InfoSchemaBatchCreator.java:36)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.store.ischema.InfoSchemaBatchCreator.getBatch(InfoSchemaBatchCreator.java:30)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:147)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:170)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:127)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:170)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:101)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:79) 
~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:230)
 [drill-java-exec-1.4.0.jar:1.4.0]
... 4 common frames omitted
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown 
table engine 'PERFORMANCE_SCHEMA'
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) ~[na:1.8.0_51]
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 ~[na:1.8.0_51]
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[na:1.8.0_51]
at java.lang.reflect.Constructor.newInstance(Constructor.java:422) 
~[na:1.8.0_51]
at com.mysql.jdbc.Util.handleNewInstance(Util.java:404) 
~[mysql-connector-java-5.1.38

[jira] [Created] (DRILL-4440) Host file location for Windows incorrect in doc

2016-02-25 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-4440:
--

 Summary: Host file location for Windows incorrect in doc
 Key: DRILL-4440
 URL: https://issues.apache.org/jira/browse/DRILL-4440
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Reporter: Andries Engelbrecht
Priority: Minor


The hosts file location on the page
https://drill.apache.org/docs/installing-the-driver-on-windows/

show /etc/hosts which is for Linux/Mac.

It should point to 

\Windows\system32\drivers\etc\hosts 

for Windows systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4526) WebFOCUS Configuration Document

2016-03-21 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-4526:
---
Attachment: WebFocus 8.2 Configuration with Drill-v1.1.doc

> WebFOCUS Configuration Document
> ---
>
> Key: DRILL-4526
> URL: https://issues.apache.org/jira/browse/DRILL-4526
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.4.0
>Reporter: Andries Engelbrecht
> Attachments: WebFocus 8.2 Configuration with Drill-v1.1.doc
>
>
> Please add attached configuration document for Information Builders WebFOCUS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4526) WebFOCUS Configuration Document

2016-03-21 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-4526:
--

 Summary: WebFOCUS Configuration Document
 Key: DRILL-4526
 URL: https://issues.apache.org/jira/browse/DRILL-4526
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.4.0
Reporter: Andries Engelbrecht
 Attachments: WebFocus 8.2 Configuration with Drill-v1.1.doc

Please add attached configuration document for Information Builders WebFOCUS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4526) WebFOCUS Configuration Document

2016-03-21 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205582#comment-15205582
 ] 

Andries Engelbrecht commented on DRILL-4526:


Medium priority, will be good to get it done over the next few weeks.

> WebFOCUS Configuration Document
> ---
>
> Key: DRILL-4526
> URL: https://issues.apache.org/jira/browse/DRILL-4526
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.4.0
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
> Attachments: WebFocus 8.2 Configuration with Drill-v1.1.doc
>
>
> Please add attached configuration document for Information Builders WebFOCUS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4533) Error should be more informative when creating/updating storage plugin definition

2016-03-24 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210435#comment-15210435
 ] 

Andries Engelbrecht commented on DRILL-4533:


Also for plugins that require credentials to clearly specify that the 
credentials are invalid.

> Error should be more informative when creating/updating storage plugin 
> definition
> -
>
> Key: DRILL-4533
> URL: https://issues.apache.org/jira/browse/DRILL-4533
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Information Schema
>Reporter: Chris Matta
>Priority: Minor
>
> When updating or creating the definition of a storage plugin fails the error 
> {code}error (unable to create/ update storage)"{code} 
> isn't descriptive enough. It should provide a hint of what's actually wrong:
> * Is the JSON invalid? If so maybe a hint to where the linter encountered the 
> problem 
> * Unexpected Field? 
> * Deeper issues could maybe include the stack trace? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4626) WebUI https and http not lsited correct

2016-04-22 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-4626:
--

 Summary: WebUI https and http not lsited correct
 Key: DRILL-4626
 URL: https://issues.apache.org/jira/browse/DRILL-4626
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.6.0, 1.5.0, 1.4.0, 1.3.0, 1.2.0
Reporter: Andries Engelbrecht
Priority: Minor


The documentation states to connect to https:// for Drill 1.2 and later by 
default. However https:// is only used if configured to do so, default is still 
http://





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3510) Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL identifiers

2016-05-12 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281619#comment-15281619
 ] 

Andries Engelbrecht commented on DRILL-3510:


What is the latest status on this?

We still see a lot of tool using double quotes, and many do not have the 
ability to change the quote character or are cumbersome for users/partners.

> Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL 
> identifiers 
> --
>
> Key: DRILL-3510
> URL: https://issues.apache.org/jira/browse/DRILL-3510
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Reporter: Jinfeng Ni
> Fix For: Future
>
> Attachments: DRILL-3510.patch, DRILL-3510.patch
>
>
> Currently Drill's SQL parser uses backtick as identifier quotes, the same as 
> what MySQL does. However, this is different from ANSI SQL specification, 
> where double quote is used as identifier quotes.  
> MySQL has an option "ANSI_QUOTES", which could be switched on/off by user. 
> Drill should follow the same way, so that Drill users do not have to rewrite 
> their existing queries, if their queries use double quotes. 
> {code}
> SET sql_mode='ANSI_QUOTES';
> {code}
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

2016-05-12 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281633#comment-15281633
 ] 

Andries Engelbrecht commented on DRILL-786:
---

Any movement on this?

Multiple tools (Tableau, MicroStrategy as examples) generate cross joins with 
dimension tables when building dashboards/analytics.

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Reporter: Krystal
> Fix For: Future
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Sets:
> Set#22, type: (DrillRecordRow[*, age, name, studentnum])
> rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, 
> importance=0.59049001
> rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), 
> rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 
> network}
> rel#333:AbstractConverter.LOGICAL.ANY([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).

[jira] [Created] (DRILL-4682) Allow full schema identifier in SELECT clause

2016-05-17 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-4682:
--

 Summary: Allow full schema identifier in SELECT clause
 Key: DRILL-4682
 URL: https://issues.apache.org/jira/browse/DRILL-4682
 Project: Apache Drill
  Issue Type: Improvement
  Components: SQL Parser
Reporter: Andries Engelbrecht


Currently Drill requires aliases to identify columns in the SELECT clause when 
working with multiple tables/workspaces.

Many BI/Analytical and other tools by default will use the full schema 
identifier in the select clause when generating SQL statements for execution 
for generic JDBC or ODBC sources. Not supporting this feature causes issues and 
a slower adoption of utilizing Drill as an execution engine within the larger 
Analytical SQL community.

Propose to support 

SELECT ... FROM 
..

Also see DRILL-3510 for double quote support as per ANSI_QUOTES

SELECT ""."".""."" FROM 
""."".""

Which is very common generic SQL being generated by most tools when dealing 
with a generic SQL data source.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-6186) Document support for delegationUID client side ODBC/JDBC property in open source JDBC driver

2018-03-01 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382781#comment-16382781
 ] 

Andries Engelbrecht commented on DRILL-6186:


[~veeran] Can you please file a Jira for support for delegationUID parameter 
for the Open source JDBC driver, then link back to have the documentation 
updated for delegationUID.

 

ODBC doesn't have open source driver, so it may be good to update the Jira name 
to just specify JDBC.

> Document support for delegationUID client side ODBC/JDBC property in open 
> source JDBC driver
> 
>
> Key: DRILL-6186
> URL: https://issues.apache.org/jira/browse/DRILL-6186
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.12.0
>Reporter: Veera Naranammalpuram
>Priority: Major
>  Labels: security
>
> There is no documentation around the "delegationUID' property in the open 
> source documentation. We at MapR ask our customers to use this property as 
> one form of impersonation. Because sqlline ships with the open source JDBC 
> driver, if users want to use delegationUID from sqlline because they use it 
> from ODBC/JDBC/ BI tools, they should be able to but there's no documentation 
> on drill.apache.org on how to do so. There is documentation on ODBC driver 
> but not on JDBC. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6221) Decimal aggregations for NULL values result in 0.0 value

2018-03-08 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-6221:
--

 Summary: Decimal aggregations for NULL values result in 0.0 value
 Key: DRILL-6221
 URL: https://issues.apache.org/jira/browse/DRILL-6221
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.12.0
Reporter: Andries Engelbrecht


If you sum a packed decimal field with a null value instead of null you get 0.0.

 

select id, amt from hive.`default`.`packtest`

1 2.3

2 null

3 4.5

 

select sum(amt) from hive.`default`.`packtest` group by id

1 2.3

2 0.0

3 4.5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-3610) TimestampAdd/Diff (SQL_TSI_) functions

2015-08-05 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-3610:
--

 Summary: TimestampAdd/Diff (SQL_TSI_) functions
 Key: DRILL-3610
 URL: https://issues.apache.org/jira/browse/DRILL-3610
 Project: Apache Drill
  Issue Type: Improvement
  Components: Functions - Drill
Reporter: Andries Engelbrecht
Assignee: Mehant Baid


Add TimestampAdd and TimestampDiff (SQL_TSI) functions for year, quarter, 
month, week, day, hour, minute, second.

Examples
SELECT CAST(TIMESTAMPADD(SQL_TSI_QUARTER,1,Date('2013-03-31'), SQL_DATE) AS 
`column_quarter`
FROM `table_in`
HAVING (COUNT(1) > 0)

SELECT `table_in`.`datetime` AS `column1`,
  `table`.`Key` AS `column_Key`,
  TIMESTAMPDIFF(SQL_TSI_MINUTE,to_timestamp('2004-07-04', 
'-MM-dd'),`table_in`.`datetime`) AS `sum_datediff_minute`
FROM `calcs`




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3610) TimestampAdd/Diff (SQL_TSI_) functions

2015-08-05 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658856#comment-14658856
 ] 

Andries Engelbrecht commented on DRILL-3610:


TIMESTAMPDIFF(SQL_TSI_) returns the difference based on the 
specified interval between the 2 supplied timestamps, where interval can be 
second, minute, hour, day, week, month, quarter or year as an integer.

EXTRACT/Date_Part and Timestamp functions can be used to substitute, but 
requires more extensive SQL to achieve the same operation. Can be very 
cumbersome in queries with multiple of these operations, also with machine 
generated queries.

Date_ADD can be substituted for TIMESTAMPADD, but lacks QUARTER interval 
commonly used in financial analysis.

> TimestampAdd/Diff (SQL_TSI_) functions
> --
>
> Key: DRILL-3610
> URL: https://issues.apache.org/jira/browse/DRILL-3610
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Andries Engelbrecht
>Assignee: Mehant Baid
>
> Add TimestampAdd and TimestampDiff (SQL_TSI) functions for year, quarter, 
> month, week, day, hour, minute, second.
> Examples
> SELECT CAST(TIMESTAMPADD(SQL_TSI_QUARTER,1,Date('2013-03-31'), SQL_DATE) AS 
> `column_quarter`
> FROM `table_in`
> HAVING (COUNT(1) > 0)
> SELECT `table_in`.`datetime` AS `column1`,
>   `table`.`Key` AS `column_Key`,
>   TIMESTAMPDIFF(SQL_TSI_MINUTE,to_timestamp('2004-07-04', 
> '-MM-dd'),`table_in`.`datetime`) AS `sum_datediff_minute`
> FROM `calcs`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3610) TimestampAdd/Diff (SQL_TSI_) functions

2015-08-06 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660188#comment-14660188
 ] 

Andries Engelbrecht commented on DRILL-3610:


Potentially can be used, but still leaves a gap in terms of syntax used for a 
common DATETIME function, where ADD and DIFF will be very different.



> TimestampAdd/Diff (SQL_TSI_) functions
> --
>
> Key: DRILL-3610
> URL: https://issues.apache.org/jira/browse/DRILL-3610
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Andries Engelbrecht
>Assignee: Mehant Baid
>
> Add TimestampAdd and TimestampDiff (SQL_TSI) functions for year, quarter, 
> month, week, day, hour, minute, second.
> Examples
> SELECT CAST(TIMESTAMPADD(SQL_TSI_QUARTER,1,Date('2013-03-31'), SQL_DATE) AS 
> `column_quarter`
> FROM `table_in`
> HAVING (COUNT(1) > 0)
> SELECT `table_in`.`datetime` AS `column1`,
>   `table`.`Key` AS `column_Key`,
>   TIMESTAMPDIFF(SQL_TSI_MINUTE,to_timestamp('2004-07-04', 
> '-MM-dd'),`table_in`.`datetime`) AS `sum_datediff_minute`
> FROM `calcs`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3623) Hive query hangs with limit 0 clause

2015-08-10 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-3623:
--

 Summary: Hive query hangs with limit 0 clause
 Key: DRILL-3623
 URL: https://issues.apache.org/jira/browse/DRILL-3623
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Affects Versions: 1.1.0
 Environment: MapR cluster
Reporter: Andries Engelbrecht
Assignee: Venki Korukanti


Running a select * from hive.table limit 0 does not return (hangs).
Select * from hive.table limit 1 works fine

Hive table is about 6GB with 330 files with parquet using snappy compression.
Data types are int, bigint, string and double.

Querying directory with parquet files through the DFS plugin works fine
select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3721) Regarding drill with big file

2015-08-28 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719296#comment-14719296
 ] 

Andries Engelbrecht commented on DRILL-3721:


See what the query memory per node is set at and increase it to see if it 
resolves your problem.

The parameter is  planner.memory.max_query_memory_per_node

Query sys.options to see what it is set as and use alter system to modify.

https://drill.apache.org/docs/configuring-drill-memory/

https://drill.apache.org/docs/alter-system/

https://drill.apache.org/docs/configuration-options-introduction/


> Regarding drill with big file
> -
>
> Key: DRILL-3721
> URL: https://issues.apache.org/jira/browse/DRILL-3721
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: kunal
> Attachments: sample.json, sqlline.log
>
>
> I am new to apache drill. I have configured apache drill on machine with 
> centos.
> "DRILL_MAX_DIRECT_MEMORY" = 25g
> "DRILL_HEAP" = 4g
> I have a 600 mb and 3 gb json file [sample file attached]. When i fire query 
> on relativly small size file everything works fine but as I fire same query 
> with 600 mb and 3 gb files it gives following error[stack trace attached].
> Query - 
> select tbl5.product_id product_id,tbl5.gender gender,tbl5.item_number 
> item_number,tbl5.price price,tbl5.description 
> description,tbl5.color_swatch.image image,tbl5.color_swatch.color color from
> (select tbl4.product_id product_id,tbl4.gender gender,tbl4.item_number 
> item_number,tbl4.price price,tbl4.size.description 
> description,FLATTEN(tbl4.size.color_swatch) color_swatch from
> (select tbl3.product_id product_id,tbl3.catalog_item.gender 
> gender,tbl3.catalog_item.item_number item_number,tbl3.catalog_item.price 
> price,FLATTEN(tbl3.catalog_item.size) size from 
> (select tbl2.product.product_id as 
> product_id,FLATTEN(tbl2.product.catalog_item) as catalog_item from 
> (select FLATTEN(tbl1.catalog.product) product from dfs.root.`demo.json` tbl1) 
> tbl2) tbl3) tbl4) tbl5
> --
> Error -
> SYSTEM ERROR: IllegalArgumentException: initialCapacity: -2147483648 
> (expectd: 0+)
> Fragment 0:0
> [Error Id: 60cf1b95-762d-4a0d-8cae-a2db418d4ea9 on sinhagad:31010]
> --
> 1) Am i doing someting wrong or missing something ( probably because i am not 
> using cluster ?? ).
> Please guide me through this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3510) Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL identifiers

2015-10-05 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943769#comment-14943769
 ] 

Andries Engelbrecht commented on DRILL-3510:


Will this be part of Drill 1.2?

Also if ANSI_QUOTES flag is set, will the back tick ` identifier still work?

The challenge is that in environments where multiple users access a Drill 
cluster they may be using different tools and queries. Some may now use back 
tick and some double quote identifiers.

Will it be feasible for the SQL Parser to support both at the same time? Then 
allowing the JDBC to return the default ANSI standard as identifier.

> Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL 
> identifiers 
> --
>
> Key: DRILL-3510
> URL: https://issues.apache.org/jira/browse/DRILL-3510
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Reporter: Jinfeng Ni
> Fix For: Future
>
> Attachments: DRILL-3510.patch, DRILL-3510.patch
>
>
> Currently Drill's SQL parser uses backtick as identifier quotes, the same as 
> what MySQL does. However, this is different from ANSI SQL specification, 
> where double quote is used as identifier quotes.  
> MySQL has an option "ANSI_QUOTES", which could be switched on/off by user. 
> Drill should follow the same way, so that Drill users do not have to rewrite 
> their existing queries, if their queries use double quotes. 
> {code}
> SET sql_mode='ANSI_QUOTES';
> {code}
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3979) Add support for "replace" in CTAS similar to views

2015-10-26 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974984#comment-14974984
 ] 

Andries Engelbrecht commented on DRILL-3979:


Most databases do not support the REPLACE clause for CTAS. With a VIEW there is 
not inherit data loss, while with CTAS there can be data loss if the existing 
table is dropped. That is part of why most databases requires an explicit DROP 
TABLE command.

Perhaps a better option would be a TEMP TABLE clause, which implies the data is 
transient.

> Add support for "replace" in CTAS similar to views
> --
>
> Key: DRILL-3979
> URL: https://issues.apache.org/jira/browse/DRILL-3979
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, Storage - Writer
>Affects Versions: 1.2.0
>Reporter: Abhishek Girish
>
> Drill could support "create or replace table" syntax, similar to the existing 
> "create or replace view" syntax. 
> Given that "drop table" is now supported, I think it might be possible to 
> support this. This could be helpful in automating tests and in SQL scripting. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4036) logs/sqlline_queries.json can not be accessed by user mapr

2015-11-05 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991914#comment-14991914
 ] 

Andries Engelbrecht commented on DRILL-4036:


This can happen if sqlline on the node was first executed as root.

Delete the file and the next time it will work fine. It will also be created 
again with the rw-rw-r-- permissions.

The root cause is starting sqlline as root the first time on the node. If mapr 
is used first and then root uses sqllien later it is not an issue.

> logs/sqlline_queries.json can not be accessed by user mapr 
> ---
>
> Key: DRILL-4036
> URL: https://issues.apache.org/jira/browse/DRILL-4036
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Writer
>Affects Versions: 1.3.0
>Reporter: Khurram Faraaz
>Priority: Minor
>
> Drill was installed using RPM and when I try to connect to Drill from sqlline 
> as mapr user it results in permission denied error. That file 
> sqlline_queries.json is always empty, it has no content in it, and it is 
> owned by root and others can not write to it.
> The change was made using he below commit
> https://github.com/apache/drill/commit/42d5f818a5501dbd05808c53959db86e66202792
> {code}
> I logged in as root 
> [root@centos-01 bin]# id
> uid=0(root) gid=0(root) groups=0(root)
> Note that the file is owned by root, and non-root users can not write to that 
> file.
> [root@centos-01 bin]# ls -lrt 
> /opt/mapr/drill/drill-1.3.0/logs/sqlline_queries.json
> -rw-r--r-- 1 root root 0 Nov  2 20:56 
> /opt/mapr/drill/drill-1.3.0/logs/sqlline_queries.json
> and then I connect to Drill as mapr user
>  
> [root@centos-01 bin]# su - mapr
> -bash-4.1$ pwd
> /home/mapr
> -bash-4.1$ cd /opt/mapr/drill/drill-1.3.0/bin/
> -bash-4.1$ ./sqlline -u "jdbc:drill:schema=dfs.tmp -n mapr -p mapr"
> 23:30:38,366 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could 
> NOT find resource [logback.groovy]
> 23:30:38,366 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could 
> NOT find resource [logback-test.xml]
> 23:30:38,367 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found 
> resource [logback.xml] at [file:/opt/mapr/drill/drill-1.3.0/conf/logback.xml]
> 23:30:38,565 |-INFO in 
> ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not 
> set
> 23:30:38,571 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - 
> About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
> 23:30:38,583 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - 
> Naming appender as [STDOUT]
> 23:30:38,613 |-INFO in 
> ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default 
> type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] 
> property
> 23:30:38,693 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - 
> About to instantiate appender of type 
> [ch.qos.logback.core.rolling.RollingFileAppender]
> 23:30:38,696 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - 
> Naming appender as [QUERY]
> 23:30:38,722 |-INFO in 
> ch.qos.logback.core.rolling.FixedWindowRollingPolicy@69663655 - No 
> compression will be used
> 23:30:38,736 |-INFO in 
> ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default 
> type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] 
> property
> 23:30:38,737 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[QUERY] 
> - Active log file name: /opt/mapr/drill/drill-1.3.0/logs/sqlline_queries.json
> 23:30:38,737 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[QUERY] 
> - File property is set to 
> [/opt/mapr/drill/drill-1.3.0/logs/sqlline_queries.json]
> 23:30:38,739 |-ERROR in 
> ch.qos.logback.core.rolling.RollingFileAppender[QUERY] - 
> openFile(/opt/mapr/drill/drill-1.3.0/logs/sqlline_queries.json,true) call 
> failed. java.io.FileNotFoundException: 
> /opt/mapr/drill/drill-1.3.0/logs/sqlline_queries.json (Permission denied)
>   at java.io.FileNotFoundException: 
> /opt/mapr/drill/drill-1.3.0/logs/sqlline_queries.json (Permission denied)
>   at  at java.io.FileOutputStream.open(Native Method)
>   at  at java.io.FileOutputStream.(FileOutputStream.java:221)
>   at  at 
> ch.qos.logback.core.recovery.ResilientFileOutputStream.(ResilientFileOutputStream.java:28)
>   at  at 
> ch.qos.logback.core.FileAppender.openFile(FileAppender.java:149)
>   at  at ch.qos.logback.core.FileAppender.start(FileAppender.java:108)
>   at  at 
> ch.qos.logback.core.rolling.RollingFileAppender.start(RollingFileAppender.java:86)
>   at  at 
> ch.qos.logback.core.joran.action.AppenderAction.end(AppenderAction.java:96)
>   at  at 
> ch.qos.logback.core.joran.spi.Interpreter.ca

[jira] [Commented] (DRILL-4114) drill not support limit 1,100

2015-11-19 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013746#comment-15013746
 ] 

Andries Engelbrecht commented on DRILL-4114:


You should use limit 1100, don't add a comma in the number.

This is not a bug.

> drill not support limit 1,100
> -
>
> Key: DRILL-4114
> URL: https://issues.apache.org/jira/browse/DRILL-4114
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: Future
>Reporter: david_hudavy
>
> when
> select * from tab order by tab.column1 limit 1,100
> throws exception:
> [Client-1] INFO  o.a.d.j.i.DrillResultSetImpl$ResultsListener - [#31] Query 
> failed:
> org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: 
> Encountered "," at line 1, column 48.
> Was expecting one of:
> 
> "OFFSET" ...
> "FETCH" ...
> [Error Id: 68dde852-13df-4c39-bfd3-5d970dbc2549 on vm1-4:31010]
> at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:110) 
> [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61) 
> [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233) 
> [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:205) 
> [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>  [netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>  [netty-handler-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>  [netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
>  [netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
>  [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at

[jira] [Commented] (DRILL-6383) View column types, modes are plan-time guesses, not actual types

2018-05-04 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464094#comment-16464094
 ] 

Andries Engelbrecht commented on DRILL-6383:


Most BI/Viz tools were developed with RDBMS data sources in mind. Since Drill 
is not an RDBMS and doesn't own the data, we rely on Views to make metadata 
available to these tools in a form that is usable. Many tools will request the 
data source metadata upon connection, which is counter to Drill's default 
behavior  of "Let's discover the data". For this reason we use Views as a 
crutch to make the metadata available in a columnar format. However a poorly 
defined View (i.e. select *) is not very helpful for these tools and we 
published best practices in this regard, and can also be very expensive for 
numerous metadata operations and SQL prepare statements being converted by the 
client drivers. As an example see the metadata available in INFORMATION_SCHEMA 
for columns in the View, as this is what many tools will interrogate to get 
metadata available from the source, which then leads to the question if Drill 
should do some work at View creation time to actually define the underlying 
data of the View, vs. just lazily create a logical View and then wait for it to 
be used.

 

We have had discussions with Tool vendors to utilize the data discovery 
capabilities in Drill, but that is a major development for most of them that 
only large market demand will get them to move quicker in this regard. 

> View column types, modes are plan-time guesses, not actual types
> 
>
> Key: DRILL-6383
> URL: https://issues.apache.org/jira/browse/DRILL-6383
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Create a view views and look at the list of columns within the view. You'll 
> see that they are often wrong in name, type and mode.
> Consider a very simple CSV file with headers:
> {noformat}
> custId,name,balance,status
> 123,Fred,456.78
> 125,Betty,98.76,VIP
> 128,Barney,1.23,PAST DUE,30
> {noformat}
> Define the simplest possible view:
> {noformat}
> CREATE VIEW myView2 AS SELECT * FROM `csvh/cust.csvh`;
> {noformat}
> Then look at the view file:
> {noformat}
> {
>   "name" : "myView2",
>   "sql" : "SELECT *\nFROM `csvh/cust.csvh`",
>   "fields" : [ {
> "name" : "**",
> "type" : "DYNAMIC_STAR",
> "isNullable" : true
>   } ],
>   "workspaceSchemaPath" : [ "local", "data" ]
> }
> {noformat}
> It is clear that the view simply captured the plan-time list of the new 
> double-star for the wildcard. Since this is not a true type, it should not 
> have an `isNullable` attribute.
> OK, we have to spell out the columns:
> {noformat}
> CREATE VIEW myView3 AS SELECT custId  FROM `csvh/cust.csvh`;
> {noformat}
> Let's look at the view file:
> {noformat}
> {
>   "name" : "myView3",
>   "sql" : "SELECT `custId`\nFROM `csvh/cust.csvh`",
>   "fields" : [ {
> "name" : "custId",
> "type" : "ANY",
> "isNullable" : true
>   } ],
>   "workspaceSchemaPath" : [ "local", "data" ]
> }
> {noformat}
> The name is correct. The type is `ANY`, which is wrong. Since this is a CSV 
> file, the column type is `VARCHAR`. Further, because this is a CSV file which 
> headers, the mode is REQUIRED, but is listed as nullable. To verify:
> {noformat}
> SELECT sqlTypeOf(custId), modeOf(custId) FROM myView3 LIMIT 1;
> ++---+
> |   EXPR$0   |  EXPR$1   |
> ++---+
> | CHARACTER VARYING  | NOT NULL  |
> ++---+
> {noformat}
> Now, let's try a CSV file without headers:
> {noformat}
> 123,Fred,456.78
> 125,Betty,98.76,VIP
> {noformat}
> {noformat}
> CREATE VIEW myView4 AS SELECT columns FROM `csv/cust.csv`;
> SELECT * FROM myView4;
> ++
> |columns |
> ++
> | ["123","Fred","456.78"]|
> | ["125","Betty","98.76","VIP"]  |
> ++
> {noformat}
> Let's look at the view file:
> {noformat}
> {
>   "name" : "myView4",
>   "sql" : "SELECT `columns`\nFROM `csv/cust.csv`",
>   "fields" : [ {
> "name" : "columns",
> "type" : "ANY",
> "isNullable" : true
>   } ],
>   "workspaceSchemaPath" : [ "local", "data" ]
> }
> {noformat}
> This is almost non-sensical. `columns` is reported as type `ANY` and 
> nullable. But, `columns` is Repeated `VARCHAR` and repeated types cannot be 
> nullable.
> The conclusion is that the type information is virtually worthless and the 
> `isNullable` information is worse than worthless: it is plain wrong.
> The type information is valid only if the planner can inver types:
> {noformat}
> CREATE VIEW myView5 AS
>   SELECT CAST(cus

[jira] [Commented] (DRILL-4973) Sqlline history

2018-05-31 Thread Andries Engelbrecht (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-4973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497310#comment-16497310
 ] 

Andries Engelbrecht commented on DRILL-4973:


[~kkhatua] in old versions of sqlline the history feature stopped working after 
500 commands. Newer versions (not sure when this was resolved) no longer seem 
to have this issue, tested Drill 1.13 which keeps a rolling history of the last 
500 command which is sufficient for most use cases. I think this can be marked 
as fixed now.

> Sqlline history
> ---
>
> Key: DRILL-4973
> URL: https://issues.apache.org/jira/browse/DRILL-4973
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - CLI
>Reporter: Andries Engelbrecht
>Priority: Minor
>
> Currently the history on sqlline stops working after 500 queries have been 
> logged in the users .sqlline/history file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-2049) NoClassDefFoundError: org/apache/commons/lang/StringEscapeUtils in JDBC Driver

2015-01-21 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286285#comment-14286285
 ] 

Andries Engelbrecht commented on DRILL-2049:


I tested the updated jdbc driver and it no longer requires the jar with v2 
classes. Thanks!

> NoClassDefFoundError: org/apache/commons/lang/StringEscapeUtils in JDBC Driver
> --
>
> Key: DRILL-2049
> URL: https://issues.apache.org/jira/browse/DRILL-2049
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Reporter: Patrick Wong
>Assignee: Aditya Kishore
> Attachments: 
> DRILL-2049-NoClassDefFoundError-org-apache-commons-l.patch, 
> DRILL-2049.1.patch.txt
>
>
> Original request by Andries Engelbrecht (aengelbre...@maprtech.com)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2105) Query fails when using flatten on JSON data where some documents have an empty array

2015-01-28 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-2105:
--

 Summary: Query fails when using flatten on JSON data where some 
documents have an empty array
 Key: DRILL-2105
 URL: https://issues.apache.org/jira/browse/DRILL-2105
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Operators
Affects Versions: 0.7.0
 Environment: MFS with JSON
Reporter: Andries Engelbrecht
Assignee: Chris Westin


Drill query fails when using flatten on an array, where some records contain an 
empty array. Especially with larger data sets where the number of JSON 
documents are greater than 100k.

Using twitter data as sample.

select flatten (entities.hashtags) from dfs.foo.`file.json`;

Empty array
  "entities": {
"trends": [],
"symbols": [],
"urls": [
  {
"expanded_url": "http://on.nfl.com/1BkThQF";,
"indices": [
  118,
  140
],
"display_url": "on.nfl.com/1BkThQF",
"url": "http://t.co/Unr5KFy6hG";
  }
],
"hashtags": [],
"user_mentions": [
  {
"id": 19362299,
"name": "NFL Network",
"indices": [
  3,
  14
],
"screen_name": "nflnetwork",
"id_str": "19362299"
  }
]
  },

Array with content

  "entities": {
"trends": [],
"symbols": [],
"urls": [],
"hashtags": [
  {
"text": "djpreps",
"indices": [
  47,
  55
]
  },
  {
"text": "MSPreps",
"indices": [
  56,
  64
]
  }
],
"user_mentions": []
  },


Log output

2015-01-27 02:26:13,478 [2b3908b9-cf08-3fd5-3bd8-ebb6bb5b70f1:foreman] INFO  
o.a.d.e.store.mock.MockStorageEngine - Failure while attempting to check for 
Parquet metadata file.
java.io.IOException: Open failed for file: /data/twitter/nfl/2015, error: 
Invalid argument (22)
at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:191) 
~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr]
at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:776) 
~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr]
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:800) 
~[hadoop-common-2.4.1-mapr-1408.jar:na]
at 
org.apache.drill.exec.store.dfs.shim.fallback.FallbackFileSystem.open(FallbackFileSystem.java:94)
 ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
at 
org.apache.drill.exec.store.dfs.BasicFormatMatcher$MagicStringMatcher.matches(BasicFormatMatcher.java:138)
 ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
at 
org.apache.drill.exec.store.dfs.BasicFormatMatcher.isReadable(BasicFormatMatcher.java:107)
 ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isDirReadable(ParquetFormatPlugin.java:232)
 [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isReadable(ParquetFormatPlugin.java:212)
 [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
at 
org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory.create(WorkspaceSchemaFactory.java:141)
 [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
at 
org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory.create(WorkspaceSchemaFactory.java:58)
 [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry(ExpandingConcurrentMap.java:96)
 [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get(ExpandingConcurrentMap.java:90)
 [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
at 
org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable(WorkspaceSchemaFactory.java:273)
 [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
at 
net.hydromatic.optiq.jdbc.SimpleOptiqSchema.getTable(SimpleOptiqSchema.java:75) 
[optiq-core-0.9-drill-r12.jar:na]
at 
net.hydromatic.optiq.prepare.OptiqCatalogReader.getTableFrom(OptiqCatalogReader.java:87)
 [optiq-core-0.9-drill-r12.jar:na]
at 
net.hydromatic.optiq.prepare.OptiqCatalogReader.getTable(OptiqCatalogReader.java:70)
 [optiq-core-0.9-drill-r12.jar:na]
at 
net.hydromatic.optiq.prepare.OptiqCatalogReader.getTable(OptiqCatalogReader.java:42)
 [optiq-core-0.9-drill-r12.jar:na]
at 
org.eigenbase.sql.validate.EmptyScope.getTableNamespace(EmptyScope.java:67) 
[optiq-core-0.9-drill-r12.jar:na]
at 
org.eigenbase.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:75)
 [optiq-core-0.9-drill-r12.jar:na]
at 
org.eigenbase.sql.validate.AbstractNamespace.validate(AbstractNamespac

[jira] [Created] (DRILL-2140) RPC Error querying JSON with empty nested maps

2015-02-02 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-2140:
--

 Summary: RPC Error querying JSON with empty nested maps
 Key: DRILL-2140
 URL: https://issues.apache.org/jira/browse/DRILL-2140
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - RPC
Affects Versions: 0.7.0
 Environment: Centos 4 node MapR cluster
Reporter: Andries Engelbrecht
Assignee: Jacques Nadeau


When querying large number of documents in multiple directories with multiple 
JSON files in each, and some documents have no top level map that is used for a 
predicate, Drill produces a RPC error in the log.

Query
select t.retweeted_status.`user`.name as name, 
count(t.retweeted_status.favorited) as rt_count from `./nfl` t where 
t.retweeted_status.`user`.name is not null group by 
t.retweeted_status.`user`.name order by count(t.retweeted_status.favorited) 
desc limit 10;

Screen Error
Query failed: Query failed: Failure while running fragment., index: 0, length: 
1 (expected: range(0, 0)) [ b96e3bfa-74c9-4b78-886b-9a2c3fc4ea9b on 
se-node13.se.lab:31010 ]
[ b96e3bfa-74c9-4b78-886b-9a2c3fc4ea9b on se-node13.se.lab:31010 ]

Drillbit log attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2140) RPC Error querying JSON with empty nested maps

2015-02-02 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-2140:
---
Attachment: drillbit.log

> RPC Error querying JSON with empty nested maps
> --
>
> Key: DRILL-2140
> URL: https://issues.apache.org/jira/browse/DRILL-2140
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 0.7.0
> Environment: Centos 4 node MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jacques Nadeau
> Attachments: drillbit.log
>
>
> When querying large number of documents in multiple directories with multiple 
> JSON files in each, and some documents have no top level map that is used for 
> a predicate, Drill produces a RPC error in the log.
> Query
> select t.retweeted_status.`user`.name as name, 
> count(t.retweeted_status.favorited) as rt_count from `./nfl` t where 
> t.retweeted_status.`user`.name is not null group by 
> t.retweeted_status.`user`.name order by count(t.retweeted_status.favorited) 
> desc limit 10;
> Screen Error
> Query failed: Query failed: Failure while running fragment., index: 0, 
> length: 1 (expected: range(0, 0)) [ b96e3bfa-74c9-4b78-886b-9a2c3fc4ea9b on 
> se-node13.se.lab:31010 ]
> [ b96e3bfa-74c9-4b78-886b-9a2c3fc4ea9b on se-node13.se.lab:31010 ]
> Drillbit log attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2141) Data type error in group by and order by for JSON

2015-02-02 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-2141:
---
Attachment: drillbit.log

> Data type error in group by and order by for JSON
> -
>
> Key: DRILL-2141
> URL: https://issues.apache.org/jira/browse/DRILL-2141
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 0.7.0
>Reporter: Andries Engelbrecht
>Assignee: Daniel Barclay (Drill/MapR)
> Attachments: drillbit.log
>
>
> When doing group by and oder by on complex nested JSON getting Data type 
> errors.
> Query:
> select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) 
> as rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null 
> group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) 
> desc limit 10;
> Screen output:
> Query failed: Query failed: Failure while running fragment., Failure while 
> reading vector.  Expected vector class of 
> org.apache.drill.exec.vector.NullableIntVector but was holding vector class 
> org.apache.drill.exec.vector.NullableVarCharVector. [ 
> c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]
> [ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]
> java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
> query.
>   at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
>   at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
>   at sqlline.SqlLine.print(SqlLine.java:1809)
>   at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
>   at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
>   at sqlline.SqlLine.dispatch(SqlLine.java:889)
>   at sqlline.SqlLine.begin(SqlLine.java:763)
>   at sqlline.SqlLine.start(SqlLine.java:498)
>   at sqlline.SqlLine.main(SqlLine.java:460)
> Drill log attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2141) Data type error in group by and order by for JSON

2015-02-02 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-2141:
--

 Summary: Data type error in group by and order by for JSON
 Key: DRILL-2141
 URL: https://issues.apache.org/jira/browse/DRILL-2141
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 0.7.0
Reporter: Andries Engelbrecht
Assignee: Daniel Barclay (Drill/MapR)
 Attachments: drillbit.log

When doing group by and oder by on complex nested JSON getting Data type errors.

Query:
select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) as 
rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null group 
by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) desc 
limit 10;

Screen output:
Query failed: Query failed: Failure while running fragment., Failure while 
reading vector.  Expected vector class of 
org.apache.drill.exec.vector.NullableIntVector but was holding vector class 
org.apache.drill.exec.vector.NullableVarCharVector. [ 
c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]
[ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]


java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
query.
at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
at sqlline.SqlLine.print(SqlLine.java:1809)
at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
at sqlline.SqlLine.dispatch(SqlLine.java:889)
at sqlline.SqlLine.begin(SqlLine.java:763)
at sqlline.SqlLine.start(SqlLine.java:498)
at sqlline.SqlLine.main(SqlLine.java:460)

Drill log attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2141) Data type error in group by and order by for JSON

2015-02-02 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-2141:
---
Attachment: new_drillbit.log

> Data type error in group by and order by for JSON
> -
>
> Key: DRILL-2141
> URL: https://issues.apache.org/jira/browse/DRILL-2141
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 0.7.0
>Reporter: Andries Engelbrecht
>Assignee: Daniel Barclay (Drill/MapR)
> Attachments: drillbit.log, new_drillbit.log
>
>
> When doing group by and oder by on complex nested JSON getting Data type 
> errors.
> Query:
> select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) 
> as rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null 
> group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) 
> desc limit 10;
> Screen output:
> Query failed: Query failed: Failure while running fragment., Failure while 
> reading vector.  Expected vector class of 
> org.apache.drill.exec.vector.NullableIntVector but was holding vector class 
> org.apache.drill.exec.vector.NullableVarCharVector. [ 
> c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]
> [ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]
> java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
> query.
>   at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
>   at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
>   at sqlline.SqlLine.print(SqlLine.java:1809)
>   at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
>   at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
>   at sqlline.SqlLine.dispatch(SqlLine.java:889)
>   at sqlline.SqlLine.begin(SqlLine.java:763)
>   at sqlline.SqlLine.start(SqlLine.java:498)
>   at sqlline.SqlLine.main(SqlLine.java:460)
> Drill log attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2141) Data type error in group by and order by for JSON

2015-02-02 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301762#comment-14301762
 ] 

Andries Engelbrecht commented on DRILL-2141:


When changing the query to use a different field to filter out JSON docs 
without the top level map a different error is received (similar to DRILL-2140).

New Query:
select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) as 
rt_count from `./nfl` t where t.`text` like '%RT_@%' group by 
t.retweeted_status.`user`.name order by count(t.retweeted_status.id) desc limit 
10;

Screen Output:
Query failed: Query stopped., Undefined failure occurred. [ 
c480ac84-9dfa-4e1d-922e-d2aabe279b10 on drilldemo:31010 ]


java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
query.
at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
at sqlline.SqlLine.print(SqlLine.java:1809)
at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
at sqlline.SqlLine.dispatch(SqlLine.java:889)
at sqlline.SqlLine.begin(SqlLine.java:763)
at sqlline.SqlLine.start(SqlLine.java:498)
at sqlline.SqlLine.main(SqlLine.java:460)

New Drill log attached as new_drillbit.log

Also note this is a single node drill system, and also used alter session set 
`store.format` = 'json';



> Data type error in group by and order by for JSON
> -
>
> Key: DRILL-2141
> URL: https://issues.apache.org/jira/browse/DRILL-2141
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 0.7.0
>Reporter: Andries Engelbrecht
>Assignee: Daniel Barclay (Drill/MapR)
> Attachments: drillbit.log, new_drillbit.log
>
>
> When doing group by and oder by on complex nested JSON getting Data type 
> errors.
> Query:
> select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) 
> as rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null 
> group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) 
> desc limit 10;
> Screen output:
> Query failed: Query failed: Failure while running fragment., Failure while 
> reading vector.  Expected vector class of 
> org.apache.drill.exec.vector.NullableIntVector but was holding vector class 
> org.apache.drill.exec.vector.NullableVarCharVector. [ 
> c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]
> [ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]
> java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
> query.
>   at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
>   at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
>   at sqlline.SqlLine.print(SqlLine.java:1809)
>   at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
>   at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
>   at sqlline.SqlLine.dispatch(SqlLine.java:889)
>   at sqlline.SqlLine.begin(SqlLine.java:763)
>   at sqlline.SqlLine.start(SqlLine.java:498)
>   at sqlline.SqlLine.main(SqlLine.java:460)
> Drill log attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2157) Directory pruning on subdirectories only and data type conversions for directory filters

2015-02-03 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-2157:
--

 Summary: Directory pruning on subdirectories only and data type 
conversions for directory filters
 Key: DRILL-2157
 URL: https://issues.apache.org/jira/browse/DRILL-2157
 Project: Apache Drill
  Issue Type: Improvement
  Components: Query Planning & Optimization
Affects Versions: Future
Reporter: Andries Engelbrecht
Assignee: Jinfeng Ni
Priority: Minor


Drill will scan all files and directories when using only a subdirectory as a 
predicate. Additionally if the data type for the directory filter is not a 
string and is converted Drill will also first scan all the subdirectories adn 
files before applying the filter.

My current observation is that for a directory structure as listed below,
the pruning only works if the full tree is provided. If only a lower level
directory is supplied in the filter condition Drill only uses it as a
filter.

With directory structure as below
/2015
/01
   /10
   /11
   /12
   /13
   /14

Query:
select count(id) from `/foo` t where dir0='2015' and dir1='01' and
dir2='10'

Produces the correct pruning and query plan

01-02Project(id=[$3]): rowcount = 3670316.0, cumulative cost =
{1.1010948E7 rows, 1.4681284E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id =
28434
01-03  Project(dir0=[$0], dir1=[$3], dir2=[$2], id=[$1]):
rowcount = 3670316.0, cumulative cost = {7340632.0 rows, 1.468128E7 cpu,
0.0 io, 0.0 network, 0.0 memory}, id = 28433
01-04Scan(groupscan=[EasyGroupScan [selectionRoot=/foo,
numFiles=24, columns=[`dir0`, `dir1`, `dir2`, `id`]


However:

select count(id) from `/foo` t where dir2='10'

Produces full scan of all sub directories and only applies a filter
condition after the fact. Notice the numFiles between the 2, even though it
lists columns in the base scan

01-04Filter(condition=[=($0, '10')]): rowcount =
9423761.7, cumulative cost = {1.88475234E8 rows, 3.76950476E8 cpu, 0.0 io,
0.0 network, 0.0 memory}, id = 27470
01-05  Project(dir2=[$1], id=[$0]): rowcount =
6.2825078E7, cumulative cost = {1.25650156E8 rows, 1.25650164E8 cpu, 0.0
io, 0.0 network, 0.0 memory}, id = 27469
01-06Scan(groupscan=[EasyGroupScan
[selectionRoot=/foo, numFiles=405, columns=[`dir2`, `id`]

Also using the wrong data type for the filter produces a full scan

select count(id) from `/foo` where dir_year=2015 and dir_month=01 and dir_day=14

Produces

01-04Filter(condition=[AND(=(CAST($1):ANY NOT NULL, 2015), 
=(CAST($2):ANY NOT NULL, 1), =(CAST($3):ANY NOT NULL, 10))]): rowcount = 
212034.63825, cumulative cost = {1.88475234E8 rows, 1.005201264E9 cpu, 0.0 io, 
0.0 network, 0.0 memory}, id = 34910
01-05  Project(id=[$2], dir0=[$3], dir1=[$1], dir2=[$0]): 
rowcount = 6.2825078E7, cumulative cost = {1.25650156E8 rows, 2.51300328E8 cpu, 
0.0 io, 0.0 network, 0.0 memory}, id = 34909
01-06Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, 
numFiles=405, columns=[`id`, `dir0`, `dir1`, `dir2`],





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2157) Directory pruning on subdirectories only and data type conversions for directory filters

2015-02-04 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305368#comment-14305368
 ] 

Andries Engelbrecht commented on DRILL-2157:


In addition the ability to perform directory pruning with < and > expressions.

Also if the ability to support directory pruning in views where there is a 
predicate filter i.e. the view below is unable to perform directory pruning 
while views without the predicate filter are.

Create or replace view maprfs.views.`retweeted` as
select CAST(t.`id` as BIGINT) as `id`, 
CAST(t.retweeted_status.`id` as BIGINT) as `retweet_id`,
t.dir0 as dir_year,
t.dir1 as dir_month,
t.dir2 as dir_day,
t.dir3 as dir_hour,
CAST(t.retweeted_status.`created_at` as VARCHAR(40)) as `created_at`,
to_date ((concat (substring(t.retweeted_status.`created_at`, 
5,6),substring(t.retweeted_status.`created_at`, 26,5))), 'MMM dd ') as 
`date`,
to_timestamp ((concat (substring(t.retweeted_status.`created_at`, 
5,6),substring(t.retweeted_status.`created_at`, 
26,5),substring(t.retweeted_status.`created_at`, 11,9))), 'MMM dd  
HH:mm:ss') as `timestamp`,
CAST(t.retweeted_status.`text` as VARCHAR(140)) as `tweet`,
CAST(t.retweeted_status.`user`.`favorites_count` as INT) as `favorites_count`
from maprfs.twitter.`/nfl` t where t.retweeted_status.`user`.`name` is not null;


> Directory pruning on subdirectories only and data type conversions for 
> directory filters
> 
>
> Key: DRILL-2157
> URL: https://issues.apache.org/jira/browse/DRILL-2157
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: Future
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
>Priority: Minor
>
> Drill will scan all files and directories when using only a subdirectory as a 
> predicate. Additionally if the data type for the directory filter is not a 
> string and is converted Drill will also first scan all the subdirectories adn 
> files before applying the filter.
> My current observation is that for a directory structure as listed below,
> the pruning only works if the full tree is provided. If only a lower level
> directory is supplied in the filter condition Drill only uses it as a
> filter.
> With directory structure as below
> /2015
> /01
>/10
>/11
>/12
>/13
>/14
> Query:
> select count(id) from `/foo` t where dir0='2015' and dir1='01' and
> dir2='10'
> Produces the correct pruning and query plan
> 01-02Project(id=[$3]): rowcount = 3670316.0, cumulative cost =
> {1.1010948E7 rows, 1.4681284E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id =
> 28434
> 01-03  Project(dir0=[$0], dir1=[$3], dir2=[$2], id=[$1]):
> rowcount = 3670316.0, cumulative cost = {7340632.0 rows, 1.468128E7 cpu,
> 0.0 io, 0.0 network, 0.0 memory}, id = 28433
> 01-04Scan(groupscan=[EasyGroupScan [selectionRoot=/foo,
> numFiles=24, columns=[`dir0`, `dir1`, `dir2`, `id`]
> However:
> select count(id) from `/foo` t where dir2='10'
> Produces full scan of all sub directories and only applies a filter
> condition after the fact. Notice the numFiles between the 2, even though it
> lists columns in the base scan
> 01-04Filter(condition=[=($0, '10')]): rowcount =
> 9423761.7, cumulative cost = {1.88475234E8 rows, 3.76950476E8 cpu, 0.0 io,
> 0.0 network, 0.0 memory}, id = 27470
> 01-05  Project(dir2=[$1], id=[$0]): rowcount =
> 6.2825078E7, cumulative cost = {1.25650156E8 rows, 1.25650164E8 cpu, 0.0
> io, 0.0 network, 0.0 memory}, id = 27469
> 01-06Scan(groupscan=[EasyGroupScan
> [selectionRoot=/foo, numFiles=405, columns=[`dir2`, `id`]
> Also using the wrong data type for the filter produces a full scan
> select count(id) from `/foo` where dir_year=2015 and dir_month=01 and 
> dir_day=14
> Produces
> 01-04Filter(condition=[AND(=(CAST($1):ANY NOT NULL, 2015), 
> =(CAST($2):ANY NOT NULL, 1), =(CAST($3):ANY NOT NULL, 10))]): rowcount = 
> 212034.63825, cumulative cost = {1.88475234E8 rows, 1.005201264E9 cpu, 0.0 
> io, 0.0 network, 0.0 memory}, id = 34910
> 01-05  Project(id=[$2], dir0=[$3], dir1=[$1], dir2=[$0]): 
> rowcount = 6.2825078E7, cumulative cost = {1.25650156E8 rows, 2.51300328E8 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 34909
> 01-06Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, 
> numFiles=405, columns=[`id`, `dir0`, `dir1`, `dir2`],



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2263) Directory pruning best practices with Drill

2015-02-18 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-2263:
--

 Summary: Directory pruning best practices with Drill
 Key: DRILL-2263
 URL: https://issues.apache.org/jira/browse/DRILL-2263
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Reporter: Andries Engelbrecht
Assignee: Bridget Bevens


Add on for querying directories.
https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories
Best practices to ensure that directory pruning is properly applied.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2263) Directory pruning best practices with Drill

2015-02-18 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-2263:
---
Description: 
Add on for querying directories.
https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories
Best practices to ensure that directory pruning is properly applied.

Please see attached document for details and write up.


  was:
Add on for querying directories.
https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories
Best practices to ensure that directory pruning is properly applied.



> Directory pruning best practices with Drill
> ---
>
> Key: DRILL-2263
> URL: https://issues.apache.org/jira/browse/DRILL-2263
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
>
> Add on for querying directories.
> https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories
> Best practices to ensure that directory pruning is properly applied.
> Please see attached document for details and write up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2263) Directory pruning best practices with Drill

2015-02-18 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-2263:
---
Attachment: Optimizing directory pruning with Drill.docx

> Directory pruning best practices with Drill
> ---
>
> Key: DRILL-2263
> URL: https://issues.apache.org/jira/browse/DRILL-2263
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
> Attachments: Optimizing directory pruning with Drill.docx
>
>
> Add on for querying directories.
> https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories
> Best practices to ensure that directory pruning is properly applied.
> Please see attached document for details and write up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2265) Drill data exploration function for complex data types

2015-02-18 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-2265:
--

 Summary: Drill data exploration function for complex data types
 Key: DRILL-2265
 URL: https://issues.apache.org/jira/browse/DRILL-2265
 Project: Apache Drill
  Issue Type: Improvement
  Components: Functions - Drill
Reporter: Andries Engelbrecht
Assignee: Daniel Barclay (Drill/MapR)


Drill data exploration function for complex data types

When dealing with complex data in large volumes it will be extremely useful to 
have a function to collect metadata to provide a better view of the total data 
set.

If JSON is used as an example a data set can have an extremely large volume of 
JSON objects. Each object can have multiple schemas and subschemas with 
multiple nested subschemas as well as arrays. Not all objects will have all of 
the schemas or subschemas. When exploring this data in Drill a SQL dot notation 
is used to navigate the complex subschema structure, and it can become very 
cumbersome to fully understand the total picture of all the data.

A function that can explore the JSON objects in a data set (whether single file 
with multiple objects, single or multilevel directory structure) and provide 
the total structure of all the JSON objects to show all schema, subschema and 
arrays that are available for all the JSON objects. This way a data analyst 
will be able to see within the data set all the schema data that is available. 
Additionally if the function can provide the statistics information to show how 
many of the objects actually contain each of the schemas, subschemas and arrays 
(and data in each), this may indicate to an analyst how valuable or important 
in may be to explore any subschema or array.

To speed up the collection of this data, the function may contain an option to 
set a sample size to only sample a portion of the total volume and project the 
total data set. This is a very common operation being used with prominent RDBMS 
systems today. Additionally for data that changes or grows the metadata 
collection function will need to be run periodically to update the statistics.

To make the metadata more useful the results should be considered to be placed 
in a Drill metadata structure, similar to INFORMATION_SCHEMA, but specifically 
for statistics metadata only to be used by analysts for data exploration. Some 
security considerations should also be deigned to only allow access to users 
with access to the base data.

In addition to the use for data analyst and data exploration the metadata and 
statistics can also be used for Drill internal functions in the future, such as 
query optimization and creation of views.

This example specifically focusses on JSON data, but can similarly be applied 
to other complex data types that may require a very detailed understanding of 
the complex data set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2263) Directory pruning best practices with Drill

2015-02-18 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-2263:
---
Attachment: Optimizing directory pruning with Drill v2.docx

> Directory pruning best practices with Drill
> ---
>
> Key: DRILL-2263
> URL: https://issues.apache.org/jira/browse/DRILL-2263
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
> Attachments: Optimizing directory pruning with Drill v2.docx
>
>
> Add on for querying directories.
> https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories
> Best practices to ensure that directory pruning is properly applied.
> Please see attached document for details and write up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2263) Directory pruning best practices with Drill

2015-02-18 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-2263:
---
Attachment: (was: Optimizing directory pruning with Drill.docx)

> Directory pruning best practices with Drill
> ---
>
> Key: DRILL-2263
> URL: https://issues.apache.org/jira/browse/DRILL-2263
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
> Attachments: Optimizing directory pruning with Drill v2.docx
>
>
> Add on for querying directories.
> https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories
> Best practices to ensure that directory pruning is properly applied.
> Please see attached document for details and write up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2263) Directory pruning best practices with Drill

2015-02-19 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-2263:
---
Priority: Minor  (was: Major)

> Directory pruning best practices with Drill
> ---
>
> Key: DRILL-2263
> URL: https://issues.apache.org/jira/browse/DRILL-2263
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
>Priority: Minor
> Attachments: Optimizing directory pruning with Drill v2.docx
>
>
> Add on for querying directories.
> https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories
> Best practices to ensure that directory pruning is properly applied.
> Please see attached document for details and write up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2272) Tibco Spotfire Desktop configuration for Drill documentation

2015-02-19 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-2272:
--

 Summary: Tibco Spotfire Desktop configuration for Drill 
documentation
 Key: DRILL-2272
 URL: https://issues.apache.org/jira/browse/DRILL-2272
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Reporter: Andries Engelbrecht
Assignee: Bridget Bevens


Instructions to configure Tibco Spotfire Desktop with Drill using ODBC to be 
added to the wiki.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2272) Tibco Spotfire Desktop configuration for Drill documentation

2015-02-19 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-2272:
---
Attachment: Spotfire Desktop Drill Config.docx

> Tibco Spotfire Desktop configuration for Drill documentation
> 
>
> Key: DRILL-2272
> URL: https://issues.apache.org/jira/browse/DRILL-2272
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
> Attachments: Spotfire Desktop Drill Config.docx
>
>
> Instructions to configure Tibco Spotfire Desktop with Drill using ODBC to be 
> added to the wiki.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2141) Data type error in group by and order by for JSON

2015-02-27 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-2141:
---
Attachment: FlumeData.1422748800086

Sample file size.

Issue is more pronounced with larger sample file sizes and larger number of 
files.

> Data type error in group by and order by for JSON
> -
>
> Key: DRILL-2141
> URL: https://issues.apache.org/jira/browse/DRILL-2141
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 0.7.0
>Reporter: Andries Engelbrecht
>Assignee: Hanifi Gunes
> Fix For: 0.9.0
>
> Attachments: FlumeData.1422748800086, drillbit.log, new_drillbit.log
>
>
> When doing group by and oder by on complex nested JSON getting Data type 
> errors.
> Query:
> select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) 
> as rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null 
> group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) 
> desc limit 10;
> Screen output:
> Query failed: Query failed: Failure while running fragment., Failure while 
> reading vector.  Expected vector class of 
> org.apache.drill.exec.vector.NullableIntVector but was holding vector class 
> org.apache.drill.exec.vector.NullableVarCharVector. [ 
> c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]
> [ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]
> java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
> query.
>   at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
>   at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
>   at sqlline.SqlLine.print(SqlLine.java:1809)
>   at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
>   at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
>   at sqlline.SqlLine.dispatch(SqlLine.java:889)
>   at sqlline.SqlLine.begin(SqlLine.java:763)
>   at sqlline.SqlLine.start(SqlLine.java:498)
>   at sqlline.SqlLine.main(SqlLine.java:460)
> Drill log attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-2141) Data type error in group by and order by for JSON

2015-02-27 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340566#comment-14340566
 ] 

Andries Engelbrecht edited comment on DRILL-2141 at 2/27/15 6:49 PM:
-

Sample file attached now.

Issue is more pronounced with larger sample file sizes and larger number of 
files.

Let me know if you experience the issue, and perhaps we can test on larger 
environment.


was (Author: aengelbrecht):
Sample file size.

Issue is more pronounced with larger sample file sizes and larger number of 
files.

> Data type error in group by and order by for JSON
> -
>
> Key: DRILL-2141
> URL: https://issues.apache.org/jira/browse/DRILL-2141
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 0.7.0
>Reporter: Andries Engelbrecht
>Assignee: Hanifi Gunes
> Fix For: 0.9.0
>
> Attachments: FlumeData.1422748800086, drillbit.log, new_drillbit.log
>
>
> When doing group by and oder by on complex nested JSON getting Data type 
> errors.
> Query:
> select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) 
> as rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null 
> group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) 
> desc limit 10;
> Screen output:
> Query failed: Query failed: Failure while running fragment., Failure while 
> reading vector.  Expected vector class of 
> org.apache.drill.exec.vector.NullableIntVector but was holding vector class 
> org.apache.drill.exec.vector.NullableVarCharVector. [ 
> c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]
> [ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]
> java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
> query.
>   at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
>   at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
>   at sqlline.SqlLine.print(SqlLine.java:1809)
>   at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
>   at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
>   at sqlline.SqlLine.dispatch(SqlLine.java:889)
>   at sqlline.SqlLine.begin(SqlLine.java:763)
>   at sqlline.SqlLine.start(SqlLine.java:498)
>   at sqlline.SqlLine.main(SqlLine.java:460)
> Drill log attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2141) Data type error in group by and order by for JSON

2015-03-02 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344376#comment-14344376
 ] 

Andries Engelbrecht commented on DRILL-2141:


Error still present on a much larger sample set of data in a cluster.

select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) as 
rt_count from `./nfl` t where t.`text` like '%RT_@%' group by 
t.retweeted_status.`user`.name order by count(t.retweeted_status.id) desc limit 
10;
+++
|name|  rt_count  |
+++
Query failed: Query stopped., Undefined failure occurred. [ 
79f5d0d4-5101-48e6-a6bc-f25c147db6d8 on se-node11.se.lab:31010 ]


java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
query.
at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
at sqlline.SqlLine.print(SqlLine.java:1809)
at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
at sqlline.SqlLine.dispatch(SqlLine.java:889)
at sqlline.SqlLine.begin(SqlLine.java:763)
at sqlline.SqlLine.start(SqlLine.java:498)
at sqlline.SqlLine.main(SqlLine.java:460)


> Data type error in group by and order by for JSON
> -
>
> Key: DRILL-2141
> URL: https://issues.apache.org/jira/browse/DRILL-2141
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 0.7.0
>Reporter: Andries Engelbrecht
>Assignee: Hanifi Gunes
> Fix For: 0.9.0
>
> Attachments: FlumeData.1422748800086, drillbit.log, new_drillbit.log
>
>
> When doing group by and oder by on complex nested JSON getting Data type 
> errors.
> Query:
> select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) 
> as rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null 
> group by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) 
> desc limit 10;
> Screen output:
> Query failed: Query failed: Failure while running fragment., Failure while 
> reading vector.  Expected vector class of 
> org.apache.drill.exec.vector.NullableIntVector but was holding vector class 
> org.apache.drill.exec.vector.NullableVarCharVector. [ 
> c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]
> [ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]
> java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
> query.
>   at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
>   at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
>   at sqlline.SqlLine.print(SqlLine.java:1809)
>   at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
>   at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
>   at sqlline.SqlLine.dispatch(SqlLine.java:889)
>   at sqlline.SqlLine.begin(SqlLine.java:763)
>   at sqlline.SqlLine.start(SqlLine.java:498)
>   at sqlline.SqlLine.main(SqlLine.java:460)
> Drill log attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2424) Ignore hidden files in directory path

2015-03-10 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-2424:
--

 Summary: Ignore hidden files in directory path
 Key: DRILL-2424
 URL: https://issues.apache.org/jira/browse/DRILL-2424
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - JSON, Storage - Text & CSV
Affects Versions: 0.7.0
Reporter: Andries Engelbrecht
Assignee: Steven Phillips


When streaming data to the DFS some records can be incomplete during the 
temporary write phase for the last file(s). These file typically have a 
different extension like '.tmp' or can be marked hidden with a prefix of '.'  .

Querying the directory path will Drill will then cause a query error as some 
records may not be complete in the temporary files. Having the ability to have 
Drill ignore hidden files and/or to only read files of designated extension in 
the workspace will resolve this problem.

Example is using Flume to stream JSON files to a directory structure, the HDFS 
sink creates .tmp files (can be hidden with . prefix) that contains incomplete 
JSON objects till the file is closed and the .tmp extension (or prefix) is 
removed. Attempting to query the directory structure with Drill then results in 
errors due to the incomplete JSON object(s) in the tmp files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2456) regexp_replace using hex codes fails on larger JSON data sets

2015-03-13 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-2456:
---
Attachment: drillbit.log

> regexp_replace using hex codes fails on larger JSON data sets
> -
>
> Key: DRILL-2456
> URL: https://issues.apache.org/jira/browse/DRILL-2456
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 0.7.0
> Environment: Drill 0.7
> MapR 4.0.1
> CentOS
>Reporter: Andries Engelbrecht
>Assignee: Daniel Barclay (Drill)
> Attachments: drillbit.log
>
>
> This query works with only 1 file
> select regexp_replace(`text`, '[^\x20-\xad]', '°'), count(id)  from 
> dfs.twitter.`/feed/2015/03/13/17/FlumeData.1426267859699.json` group by 
> `text` order by count(id) desc limit 10;
> This one fails with multiple files
> select regexp_replace(`text`, '[^\x20-\xad]', '°'), count(id)  from 
> dfs.twitter.`/feed/2015/03/13` group by `text` order by count(id) desc limit 
> 10;
> Query failed: Query failed: Failure while trying to start remote fragment, 
> Encountered an illegal char on line 1, column 31: '' [ 
> 43ff1aa4-4a71-455d-b817-ec5eb8d179bb on twitternode:31010 ]
> Using text in regexp_replace does work for same dataset.
> This query works fine on full data set.
> select regexp_replace(`text`, '[^ -~¡-ÿ]', '°'), count(id)  from 
> dfs.twitter.`/feed/2015/03/13` group by `text` order by count(id) desc limit 
> 10;
> Attached snippet drillbit.log for error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2456) regexp_replace using hex codes fails on larger JSON data sets

2015-03-13 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-2456:
--

 Summary: regexp_replace using hex codes fails on larger JSON data 
sets
 Key: DRILL-2456
 URL: https://issues.apache.org/jira/browse/DRILL-2456
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 0.7.0
 Environment: Drill 0.7
MapR 4.0.1
CentOS
Reporter: Andries Engelbrecht
Assignee: Daniel Barclay (Drill)
 Attachments: drillbit.log

This query works with only 1 file

select regexp_replace(`text`, '[^\x20-\xad]', '°'), count(id)  from 
dfs.twitter.`/feed/2015/03/13/17/FlumeData.1426267859699.json` group by `text` 
order by count(id) desc limit 10;

This one fails with multiple files

select regexp_replace(`text`, '[^\x20-\xad]', '°'), count(id)  from 
dfs.twitter.`/feed/2015/03/13` group by `text` order by count(id) desc limit 10;

Query failed: Query failed: Failure while trying to start remote fragment, 
Encountered an illegal char on line 1, column 31: '' [ 
43ff1aa4-4a71-455d-b817-ec5eb8d179bb on twitternode:31010 ]

Using text in regexp_replace does work for same dataset.
This query works fine on full data set.

select regexp_replace(`text`, '[^ -~¡-ÿ]', '°'), count(id)  from 
dfs.twitter.`/feed/2015/03/13` group by `text` order by count(id) desc limit 10;

Attached snippet drillbit.log for error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2473) Set query timezone at session level

2015-03-15 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-2473:
--

 Summary: Set query timezone at session level
 Key: DRILL-2473
 URL: https://issues.apache.org/jira/browse/DRILL-2473
 Project: Apache Drill
  Issue Type: Improvement
  Components: Query Planning & Optimization
Affects Versions: Future
Reporter: Andries Engelbrecht
Assignee: Jinfeng Ni


Ability to set the user timezone for queries at session level to allow 
different users querying the same data form different timezones to localize the 
results to the desired timezone.

Allowance for DST where applicable should be incorporated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2473) Set query timezone at session level

2015-03-17 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14365261#comment-14365261
 ] 

Andries Engelbrecht commented on DRILL-2473:


Need to think if we want connection or session level, as an application may 
establish a single connection but serve multiple users from different timezones.

> Set query timezone at session level
> ---
>
> Key: DRILL-2473
> URL: https://issues.apache.org/jira/browse/DRILL-2473
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: Future
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
>
> Ability to set the user timezone for queries at session level to allow 
> different users querying the same data form different timezones to localize 
> the results to the desired timezone.
> Allowance for DST where applicable should be incorporated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2263) Directory pruning best practices with Drill

2015-03-22 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375102#comment-14375102
 ] 

Andries Engelbrecht commented on DRILL-2263:


I would suggest that we test a couple of specific conditions with the 0.8 
released version first before finalizing. 

> Directory pruning best practices with Drill
> ---
>
> Key: DRILL-2263
> URL: https://issues.apache.org/jira/browse/DRILL-2263
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
>Priority: Minor
> Attachments: Optimizing directory pruning with Drill v2.docx
>
>
> Add on for querying directories.
> https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories
> Best practices to ensure that directory pruning is properly applied.
> Please see attached document for details and write up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2157) Directory pruning on subdirectories only and data type conversions for directory filters

2015-03-22 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375103#comment-14375103
 ] 

Andries Engelbrecht commented on DRILL-2157:


Aman I will test the conditions, casting and views with the final 0.8 release.

Will file a new JIRA if there are any specific conditions that may still cause 
issues.

> Directory pruning on subdirectories only and data type conversions for 
> directory filters
> 
>
> Key: DRILL-2157
> URL: https://issues.apache.org/jira/browse/DRILL-2157
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: Future
>Reporter: Andries Engelbrecht
>Assignee: Aman Sinha
>Priority: Minor
> Fix For: 0.8.0
>
>
> Drill will scan all files and directories when using only a subdirectory as a 
> predicate. Additionally if the data type for the directory filter is not a 
> string and is converted Drill will also first scan all the subdirectories adn 
> files before applying the filter.
> My current observation is that for a directory structure as listed below,
> the pruning only works if the full tree is provided. If only a lower level
> directory is supplied in the filter condition Drill only uses it as a
> filter.
> With directory structure as below
> /2015
> /01
>/10
>/11
>/12
>/13
>/14
> Query:
> select count(id) from `/foo` t where dir0='2015' and dir1='01' and
> dir2='10'
> Produces the correct pruning and query plan
> 01-02Project(id=[$3]): rowcount = 3670316.0, cumulative cost =
> {1.1010948E7 rows, 1.4681284E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id =
> 28434
> 01-03  Project(dir0=[$0], dir1=[$3], dir2=[$2], id=[$1]):
> rowcount = 3670316.0, cumulative cost = {7340632.0 rows, 1.468128E7 cpu,
> 0.0 io, 0.0 network, 0.0 memory}, id = 28433
> 01-04Scan(groupscan=[EasyGroupScan [selectionRoot=/foo,
> numFiles=24, columns=[`dir0`, `dir1`, `dir2`, `id`]
> However:
> select count(id) from `/foo` t where dir2='10'
> Produces full scan of all sub directories and only applies a filter
> condition after the fact. Notice the numFiles between the 2, even though it
> lists columns in the base scan
> 01-04Filter(condition=[=($0, '10')]): rowcount =
> 9423761.7, cumulative cost = {1.88475234E8 rows, 3.76950476E8 cpu, 0.0 io,
> 0.0 network, 0.0 memory}, id = 27470
> 01-05  Project(dir2=[$1], id=[$0]): rowcount =
> 6.2825078E7, cumulative cost = {1.25650156E8 rows, 1.25650164E8 cpu, 0.0
> io, 0.0 network, 0.0 memory}, id = 27469
> 01-06Scan(groupscan=[EasyGroupScan
> [selectionRoot=/foo, numFiles=405, columns=[`dir2`, `id`]
> Also using the wrong data type for the filter produces a full scan
> select count(id) from `/foo` where dir_year=2015 and dir_month=01 and 
> dir_day=14
> Produces
> 01-04Filter(condition=[AND(=(CAST($1):ANY NOT NULL, 2015), 
> =(CAST($2):ANY NOT NULL, 1), =(CAST($3):ANY NOT NULL, 10))]): rowcount = 
> 212034.63825, cumulative cost = {1.88475234E8 rows, 1.005201264E9 cpu, 0.0 
> io, 0.0 network, 0.0 memory}, id = 34910
> 01-05  Project(id=[$2], dir0=[$3], dir1=[$1], dir2=[$0]): 
> rowcount = 6.2825078E7, cumulative cost = {1.25650156E8 rows, 2.51300328E8 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 34909
> 01-06Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, 
> numFiles=405, columns=[`id`, `dir0`, `dir1`, `dir2`],



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2518) MicroStrategy integration with Apache Drill instructions

2015-03-23 Thread Andries Engelbrecht (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andries Engelbrecht updated DRILL-2518:
---
Attachment: MicroStrategy-9-Drill-Configuration.docx

> MicroStrategy integration with Apache Drill instructions
> 
>
> Key: DRILL-2518
> URL: https://issues.apache.org/jira/browse/DRILL-2518
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
> Attachments: MicroStrategy-9-Drill-Configuration.docx
>
>
> Configuration instructions for enabling MicroStrategy with Apache Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2518) MicroStrategy integration with Apache Drill instructions

2015-03-23 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-2518:
--

 Summary: MicroStrategy integration with Apache Drill instructions
 Key: DRILL-2518
 URL: https://issues.apache.org/jira/browse/DRILL-2518
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Reporter: Andries Engelbrecht
Assignee: Bridget Bevens
 Attachments: MicroStrategy-9-Drill-Configuration.docx

Configuration instructions for enabling MicroStrategy with Apache Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2157) Directory pruning on subdirectories only and data type conversions for directory filters

2015-04-08 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486406#comment-14486406
 ] 

Andries Engelbrecht commented on DRILL-2157:


Can be marked as resolved.

Tested with following use cases and shows the correct numFiles being scanned.
Casting or string no longer needed for lower level directories.

Seems to be resolved in Drill 0.8 for direct DFS access

select count(id) from `/nfl` where dir0=2015 and dir1=01 and dir2=20 and dir3 
between 00 and 05;

select count(id) from `/nfl` where dir0=2015 and dir1=01 and dir2 between 20 
and 25 and dir3 between 00 and 05;

select count(id) from `/nfl` where dir0=2015 and dir1=01 and dir2 between 20 
and 25 and dir3 between 00 and 05;

select count(id) from `/nfl` where dir2 between 20 and 25 and dir3 between 00 
and 05;

select count(id) from `/nfl` where dir2>25 and dir3 between 00 and 05;

For Views also seems to report the correct number of files to be scanned with 
directory pruning

select count(id) from dfs.views.tweet_base where dir_year=2015 and dir_month=01 
and dir_day=26 and dir_hour>20;

select count(id) from dfs.views.tweet_base where dir_day between 20 and 26 and 
dir_hour>20;

> Directory pruning on subdirectories only and data type conversions for 
> directory filters
> 
>
> Key: DRILL-2157
> URL: https://issues.apache.org/jira/browse/DRILL-2157
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: Future
>Reporter: Andries Engelbrecht
>Assignee: Aman Sinha
>Priority: Minor
> Fix For: 0.8.0
>
>
> Drill will scan all files and directories when using only a subdirectory as a 
> predicate. Additionally if the data type for the directory filter is not a 
> string and is converted Drill will also first scan all the subdirectories adn 
> files before applying the filter.
> My current observation is that for a directory structure as listed below,
> the pruning only works if the full tree is provided. If only a lower level
> directory is supplied in the filter condition Drill only uses it as a
> filter.
> With directory structure as below
> /2015
> /01
>/10
>/11
>/12
>/13
>/14
> Query:
> select count(id) from `/foo` t where dir0='2015' and dir1='01' and
> dir2='10'
> Produces the correct pruning and query plan
> 01-02Project(id=[$3]): rowcount = 3670316.0, cumulative cost =
> {1.1010948E7 rows, 1.4681284E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id =
> 28434
> 01-03  Project(dir0=[$0], dir1=[$3], dir2=[$2], id=[$1]):
> rowcount = 3670316.0, cumulative cost = {7340632.0 rows, 1.468128E7 cpu,
> 0.0 io, 0.0 network, 0.0 memory}, id = 28433
> 01-04Scan(groupscan=[EasyGroupScan [selectionRoot=/foo,
> numFiles=24, columns=[`dir0`, `dir1`, `dir2`, `id`]
> However:
> select count(id) from `/foo` t where dir2='10'
> Produces full scan of all sub directories and only applies a filter
> condition after the fact. Notice the numFiles between the 2, even though it
> lists columns in the base scan
> 01-04Filter(condition=[=($0, '10')]): rowcount =
> 9423761.7, cumulative cost = {1.88475234E8 rows, 3.76950476E8 cpu, 0.0 io,
> 0.0 network, 0.0 memory}, id = 27470
> 01-05  Project(dir2=[$1], id=[$0]): rowcount =
> 6.2825078E7, cumulative cost = {1.25650156E8 rows, 1.25650164E8 cpu, 0.0
> io, 0.0 network, 0.0 memory}, id = 27469
> 01-06Scan(groupscan=[EasyGroupScan
> [selectionRoot=/foo, numFiles=405, columns=[`dir2`, `id`]
> Also using the wrong data type for the filter produces a full scan
> select count(id) from `/foo` where dir_year=2015 and dir_month=01 and 
> dir_day=14
> Produces
> 01-04Filter(condition=[AND(=(CAST($1):ANY NOT NULL, 2015), 
> =(CAST($2):ANY NOT NULL, 1), =(CAST($3):ANY NOT NULL, 10))]): rowcount = 
> 212034.63825, cumulative cost = {1.88475234E8 rows, 1.005201264E9 cpu, 0.0 
> io, 0.0 network, 0.0 memory}, id = 34910
> 01-05  Project(id=[$2], dir0=[$3], dir1=[$1], dir2=[$0]): 
> rowcount = 6.2825078E7, cumulative cost = {1.25650156E8 rows, 2.51300328E8 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 34909
> 01-06Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, 
> numFiles=405, columns=[`id`, `dir0`, `dir1`, `dir2`],



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2263) Directory pruning best practices with Drill

2015-04-08 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486357#comment-14486357
 ] 

Andries Engelbrecht commented on DRILL-2263:


Please close, issues resolved with Drill 0.8
No longer needed

> Directory pruning best practices with Drill
> ---
>
> Key: DRILL-2263
> URL: https://issues.apache.org/jira/browse/DRILL-2263
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
>Priority: Minor
> Attachments: Optimizing directory pruning with Drill v2.docx
>
>
> Add on for querying directories.
> https://cwiki.apache.org/confluence/display/DRILL/Querying+Directories
> Best practices to ensure that directory pruning is properly applied.
> Please see attached document for details and write up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2726) Display Drill version in sys.version

2015-04-08 Thread Andries Engelbrecht (JIRA)

Andries Engelbrecht created DRILL-2726:
--

 Summary: Display Drill version in sys.version
 Key: DRILL-2726
 URL: https://issues.apache.org/jira/browse/DRILL-2726
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Andries Engelbrecht


Include the Drill version information in sys.version, so it is easy to 
determine the exact version of Drill being used for support purposes.

Adding a version column to sys.version to show the exact version i.e.
mapr-drill-0.8.0.31168-1
or
apache-drill-0.8.0.31168-1

Will make it easier for users to quickly identify the Drill version being used, 
and provide that information.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2518) MicroStrategy integration with Apache Drill instructions

2015-04-08 Thread Andries Engelbrecht (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486356#comment-14486356
 ] 

Andries Engelbrecht commented on DRILL-2518:


The documentation was added here

http://drill.apache.org/docs/using-microstrategy-analytics-with-apache-drill/

Please close

Thank you

> MicroStrategy integration with Apache Drill instructions
> 
>
> Key: DRILL-2518
> URL: https://issues.apache.org/jira/browse/DRILL-2518
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Andries Engelbrecht
>Assignee: Bridget Bevens
> Attachments: MicroStrategy-9-Drill-Configuration.docx
>
>
> Configuration instructions for enabling MicroStrategy with Apache Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

97 matches

Mail list logo