[jira] [Created] (DRILL-4865) Add ANSI format for date/time functions

2016-08-30 Thread Serge Harnyk (JIRA)
Serge Harnyk created DRILL-4865:
---

 Summary: Add ANSI format for date/time functions
 Key: DRILL-4865
 URL: https://issues.apache.org/jira/browse/DRILL-4865
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.8.0
Reporter: Serge Harnyk
Assignee: Serge Harnyk
 Fix For: 1.9.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4864) Add ANSI format for date/time functions

2016-08-30 Thread Serge Harnyk (JIRA)
Serge Harnyk created DRILL-4864:
---

 Summary: Add ANSI format for date/time functions
 Key: DRILL-4864
 URL: https://issues.apache.org/jira/browse/DRILL-4864
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.8.0
Reporter: Serge Harnyk
Assignee: Serge Harnyk
 Fix For: 1.9.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4865) Add ANSI format for date/time functions

2016-08-30 Thread Serge Harnyk (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Harnyk closed DRILL-4865.
---
Resolution: Duplicate

> Add ANSI format for date/time functions
> ---
>
> Key: DRILL-4865
> URL: https://issues.apache.org/jira/browse/DRILL-4865
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Serge Harnyk
>Assignee: Serge Harnyk
> Fix For: 1.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4458) JDBC plugin case sensitive table names

2016-08-30 Thread Serge Harnyk (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Harnyk updated DRILL-4458:

Assignee: (was: Serge Harnyk)

> JDBC plugin case sensitive table names
> --
>
> Key: DRILL-4458
> URL: https://issues.apache.org/jira/browse/DRILL-4458
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
> Environment: Drill embedded mode on OSX, connecting to MS SQLServer
>Reporter: Paul Mogren
>Priority: Minor
>
> I just tried Drill with MS SQL Server and I found that Drill treats table
> names case-sensitively, contrary to
> https://drill.apache.org/docs/lexical-structure/ which indicates that
> table names are "case-insensitive unless enclosed in double quotation
> marks”. This presents a problem for users and existing SQL scripts that
> expect table names to be case-insensitive.
> This works: select * from mysandbox.dbo.AD_Role
> This does not work: select * from mysandbox.dbo.ad_role
> Mailing list reference including stack trace: 
> http://mail-archives.apache.org/mod_mbox/drill-user/201603.mbox/%3ccajrw0otv8n5ybmvu6w_efe4npgenrdk5grmh9jtbxu9xnni...@mail.gmail.com%3e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4436) Result data gets mixed up when various tables have a column "label"

2016-08-30 Thread Serge Harnyk (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Harnyk updated DRILL-4436:

Assignee: (was: Serge Harnyk)

> Result data gets mixed up when various tables have a column "label"
> ---
>
> Key: DRILL-4436
> URL: https://issues.apache.org/jira/browse/DRILL-4436
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
> Environment: Drill 1.5.0 with Zookeeper on CentOS 7.0 
>Reporter: Vincent Uribe
>
> We have two tables in a MySQL database:
> CREATE TABLE `Gender` (
>   `genderId` bigint(20) NOT NULL AUTO_INCREMENT,
>   `label` varchar(15) NOT NULL,
>   PRIMARY KEY (`genderId`)
> ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1;
> CREATE TABLE `Civility` (
>   `civilityId` bigint(20) NOT NULL AUTO_INCREMENT,
>   `abbreviation` varchar(15) NOT NULL,
>   `label` varchar(60) DEFAULT NULL
>   PRIMARY KEY (`civilityId`)
> ) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1;
> With a query on these two tables with Gender.label as 'gender' and 
> Civility.label as 'civility', we obtain, depending of the query :
> * gender in civility
> * civility in the gender
> * NULL in the other column (gender or civility)
> if we drop the table Gender and recreate it with like this:
> CREATE TABLE `Gender` (
>   `genderId` bigint(20) NOT NULL AUTO_INCREMENT,
>   `label2` varchar(15) NOT NULL,
>   PRIMARY KEY (`genderId`)select * from Gender
> ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1;
> Everything is fine.
> I guess something is wrong with the metadata...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4396) Generates invalid cast specification in re-written query to Postgres

2016-08-30 Thread Serge Harnyk (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Harnyk updated DRILL-4396:

Assignee: (was: Serge Harnyk)

> Generates invalid cast specification in re-written query to Postgres
> 
>
> Key: DRILL-4396
> URL: https://issues.apache.org/jira/browse/DRILL-4396
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
>Reporter: N Campbell
>
> select vint.rnum, tflt.rnum from postgres.public.vint , postgres.public.tflt 
> where vint.cint = tflt.cflt
> Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the 
> SQL query. 
> sql SELECT *
> FROM (SELECT "rnum", CAST("cint" AS DOUBLE) AS "$f2"
> FROM "public"."vint") AS "t"
> INNER JOIN "public"."tflt" ON "t"."$f2" = "tflt"."cflt"
> plugin postgres
> Fragment 0:0
> [Error Id: 9985ca6b-1faf-43e0-9465-b7a6e8876c6d on centos1:31010]
>   (org.postgresql.util.PSQLException) ERROR: type "double" does not exist
>   Position: 46
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse():2182
> org.postgresql.core.v3.QueryExecutorImpl.processResults():1911
> org.postgresql.core.v3.QueryExecutorImpl.execute():173
> org.postgresql.jdbc.PgStatement.execute():622
> org.postgresql.jdbc.PgStatement.executeWithFlags():458
> org.postgresql.jdbc.PgStatement.executeQuery():374
> org.apache.commons.dbcp.DelegatingStatement.executeQuery():208
> org.apache.commons.dbcp.DelegatingStatement.executeQuery():208
> org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup():177
> org.apache.drill.exec.physical.impl.ScanBatch.():108
> org.apache.drill.exec.physical.impl.ScanBatch.():136
> org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():40
> org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():33
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():147
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():170
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():127
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():170
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():127
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():170
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():101
> org.apache.drill.exec.physical.impl.ImplCreator.getExec():79
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():230
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
> SQLState:  null
> ErrorCode: 0
> create table TINT ( RNUM integer  not null , CINT integer   ) ;
> create view VINT as select * from TINT;
> create table TFLT ( RNUM integer  not null , CFLT float   ) ;
> create view VFLT as select * from TFLT;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4395) equi-inner join of two tables in Postgres returns null one of the projected columns

2016-08-30 Thread Serge Harnyk (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Harnyk updated DRILL-4395:

Assignee: (was: Serge Harnyk)

> equi-inner join of two tables in Postgres returns null one of the projected 
> columns
> ---
>
> Key: DRILL-4395
> URL: https://issues.apache.org/jira/browse/DRILL-4395
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
>Reporter: N Campbell
>
> This query should return 1,2,3,4 in both columns but returns null in the 
> second column. Both tables are in a Postgres 9.5 server mapped under Drill
> select tint.rnum, tbint.rnum from postgres.public.tint , 
> postgres.public.tbint where tint.cint = tbint.cbint
> create table TINT ( RNUM integer  not null , CINT integer   ) ;
> insert into TINT(RNUM, CINT) values ( 0, NULL);
> insert into TINT(RNUM, CINT) values ( 1, -1);
> insert into TINT(RNUM, CINT) values ( 2, 0);
> insert into TINT(RNUM, CINT) values ( 3, 1);
> insert into TINT(RNUM, CINT) values ( 4, 10);
> create table TBINT ( RNUM integer  not null , CBINT bigint   ) ;
> insert into TBINT(RNUM, CBINT) values ( 0, NULL);
> insert into TBINT(RNUM, CBINT) values ( 1, -1);
> insert into TBINT(RNUM, CBINT) values ( 2, 0);
> insert into TBINT(RNUM, CBINT) values ( 3, 1);
> insert into TBINT(RNUM, CBINT) values ( 4, 10);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4399) query using OVERLAPS function executes and returns 0 rows

2016-08-30 Thread Serge Harnyk (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Harnyk updated DRILL-4399:

Assignee: (was: Serge Harnyk)

> query using OVERLAPS function executes and returns 0 rows
> -
>
> Key: DRILL-4399
> URL: https://issues.apache.org/jira/browse/DRILL-4399
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
>Reporter: N Campbell
>
> Doc set makes not mention of this, but parses and executes
> select 1 from postgres.public.tdt where (date '1999-12-01' , date 
> '2001-12-31' ) overlaps  ( date '2001-01-01' , tdt.cdt ) and rnum=0
> This query executed by Postgres would return 1 row
> create table TDT ( RNUM integer  not null , CDT date   ) ;
> comment on table TDT is 'This describes table TDT.';
> grant select on table TDT to public;
> insert into TDT(RNUM, CDT) values ( 0, NULL);
> insert into TDT(RNUM, CDT) values ( 1, DATE '1996-01-01');
> insert into TDT(RNUM, CDT) values ( 2, DATE '2000-01-01');
> insert into TDT(RNUM, CDT) values ( 3, DATE '2000-12-31');



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4409) projecting literal will result in an empty resultset

2016-08-30 Thread Serge Harnyk (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Harnyk updated DRILL-4409:

Assignee: (was: Serge Harnyk)

> projecting literal will result in an empty resultset
> 
>
> Key: DRILL-4409
> URL: https://issues.apache.org/jira/browse/DRILL-4409
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
>Reporter: N Campbell
>
> A query which projects a literal as shown against a Postgres table will 
> result in an empty result set being returned. 
> select 'BB' from postgres.public.tversion



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4864) Add ANSI format for date/time functions

2016-08-30 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4864:

Description: 
The TO_DATE() is exposing the Joda string formatting conventions into the SQL 
layer. This is not following SQL conventions used by ANSI and many other 
database engines on the market.

Add new UDF "ansi_to_joda(string1, string2)", that takes string that represents 
ANSI datetime format and returns string that represents equal Joda format.
Add new session option "drill.exec.fn.to_date_format" that can be one of two 
values - "JODA"(default) and "ANSI".
If option is set to "JODA" queries with to_date() function would work in usual 
way.
If option is set to "ANSI" second argument would be wrapped with ansi_to_joda() 
function, that allows user to use ANSI datetime format
Wrapping is used in to_date(), to_time() and to_timestamp() functions.

  was:
The TO_DATE() is exposing the Joda string formatting conventions into the SQL 
layer. This is not following SQL conventions used by ANSI and many other 
database engines on the market.

Add new UDF "ansi_to_joda(string1, string2)", that takes string that represents 
ANSI datetime format and returns string that represents equal Joda format.
Add new session option "drill.exec.fn.to_date_format" that can be one of two 
values - "JODA"(default) and "ANSI".
If option is set to "JODA" queries with to_date() function would work in usual 
way.
If option is set to "ANSI" second argument would be wrapped with ansi_to_joda() 
function, that allows user to use ANSI datetime format.
Wrapping is used in to_date(), to_time() and to_timestamp() functions.


> Add ANSI format for date/time functions
> ---
>
> Key: DRILL-4864
> URL: https://issues.apache.org/jira/browse/DRILL-4864
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Serge Harnyk
>Assignee: Serge Harnyk
> Fix For: 1.9.0
>
>
> The TO_DATE() is exposing the Joda string formatting conventions into the SQL 
> layer. This is not following SQL conventions used by ANSI and many other 
> database engines on the market.
> Add new UDF "ansi_to_joda(string1, string2)", that takes string that 
> represents ANSI datetime format and returns string that represents equal Joda 
> format.
> Add new session option "drill.exec.fn.to_date_format" that can be one of two 
> values - "JODA"(default) and "ANSI".
> If option is set to "JODA" queries with to_date() function would work in 
> usual way.
> If option is set to "ANSI" second argument would be wrapped with 
> ansi_to_joda() function, that allows user to use ANSI datetime format
> Wrapping is used in to_date(), to_time() and to_timestamp() functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4864) Add ANSI format for date/time functions

2016-08-30 Thread Serge Harnyk (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Harnyk updated DRILL-4864:

Description: 
The TO_DATE() is exposing the Joda string formatting conventions into the SQL 
layer. This is not following SQL conventions used by ANSI and many other 
database engines on the market.

Add new UDF "ansi_to_joda(string1, string2)", that takes string that represents 
ANSI datetime format and returns string that represents equal Joda format.
Add new session option "drill.exec.fn.to_date_format" that can be one of two 
values - "JODA"(default) and "ANSI".
If option is set to "JODA" queries with to_date() function would work in usual 
way.
If option is set to "ANSI" second argument would be wrapped with ansi_to_joda() 
function, that allows user to use ANSI datetime format.
Wrapping is used in to_date(), to_time() and to_timestamp() functions.

> Add ANSI format for date/time functions
> ---
>
> Key: DRILL-4864
> URL: https://issues.apache.org/jira/browse/DRILL-4864
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Serge Harnyk
>Assignee: Serge Harnyk
> Fix For: 1.9.0
>
>
> The TO_DATE() is exposing the Joda string formatting conventions into the SQL 
> layer. This is not following SQL conventions used by ANSI and many other 
> database engines on the market.
> Add new UDF "ansi_to_joda(string1, string2)", that takes string that 
> represents ANSI datetime format and returns string that represents equal Joda 
> format.
> Add new session option "drill.exec.fn.to_date_format" that can be one of two 
> values - "JODA"(default) and "ANSI".
> If option is set to "JODA" queries with to_date() function would work in 
> usual way.
> If option is set to "ANSI" second argument would be wrapped with 
> ansi_to_joda() function, that allows user to use ANSI datetime format.
> Wrapping is used in to_date(), to_time() and to_timestamp() functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4866) Provide TABLE and PARTITION information in INFORMATION_SCHEMA for parquet tables created by Drill

2016-08-30 Thread Andries Engelbrecht (JIRA)
Andries Engelbrecht created DRILL-4866:
--

 Summary: Provide TABLE and PARTITION information in 
INFORMATION_SCHEMA for parquet tables created by Drill
 Key: DRILL-4866
 URL: https://issues.apache.org/jira/browse/DRILL-4866
 Project: Apache Drill
  Issue Type: Improvement
  Components: Metadata, Storage - Parquet
Reporter: Andries Engelbrecht


Provide the Table and Partition information on parquet tables created by Drill 
in INFORMATION_SCHEMA. This can be utilized by tools and users looking to 
optimize Drill queries by referencing the table and partition metadata from 
within Drill, as opposed to querying the parquet metadata underneath.

Potentially extend INFORMATION_SCHEMA with an additional PARTITIONS table 
similar to MySQL to provide information on column(s) used for partitioning.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4867) Scan fragment placement algorithm for parquet is not taking partition pruning into consideration

2016-08-30 Thread Padma Penumarthy (JIRA)
Padma Penumarthy created DRILL-4867:
---

 Summary: Scan fragment placement algorithm for parquet is not 
taking partition pruning into consideration
 Key: DRILL-4867
 URL: https://issues.apache.org/jira/browse/DRILL-4867
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.8.0
Reporter: Padma Penumarthy
Assignee: Padma Penumarthy
Priority: Critical


Drill decides how many scan fragments to run on each node based on endpoint 
affinity (ratio of number of bytes on the host/total bytes for the whole scan) 
of the node. But,  pruned rowGroups are not removed from the calculation. This 
messes up placement of scan fragment and can cause remote reads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4770) ParquetRecordReader throws NPE querying a single int64 column file

2016-08-30 Thread Padma Penumarthy (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450191#comment-15450191
 ] 

Padma Penumarthy commented on DRILL-4770:
-

The problem is happening because this parquet file has data page type v2. Drill 
fast reader does not have support for data page type v2. 

I tried using the parquet library reader (from parquet-mr) using the option
set store.parquet.use_new_reader=true;
This also does not work because int64 delta encoding support is not there in 
the version of library (1.8.1-drill-r0) that we use. Support for this was added 
to parquet-mr recently.
https://issues.apache.org/jira/browse/PARQUET-225

So, we have 2 choices:
1. Add support for data page v2 in drill fast reader
2. Wait for new parquet-mr release to happen so we can pick up the fix for 
int64 delta encoding. In this case, we will not be supporting this in the fast 
reader.


> ParquetRecordReader throws NPE querying a single int64 column file
> --
>
> Key: DRILL-4770
> URL: https://issues.apache.org/jira/browse/DRILL-4770
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Chun Chang
>Assignee: Padma Penumarthy
> Fix For: 1.8.0
>
> Attachments: int64_10_bs10k_ps1k_uncompressed.parquet
>
>
> I have a parquet file with a single int64 column. 
> {noformat}
> [root@perfnode166 parquet-mr]# java -jar 
> parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar dump 
> /mapr/drill50.perf.lab/drill/testdata/parquet_storage/int64_10_bs10k_ps1k_uncompressed.parquet
> row group 0
> 
> int64_field_required:  INT64 UNCOMPRESSED DO:0 FPO:4 SZ:55/55/1.00 VC:10 
> [more]...
> int64_field_required TV=10 RL=0 DL=0
> 
> 
> page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 0, max:  
> [more]... VC:10
> INT64 int64_field_required
> 
> *** row group 1 of 1, values 1 to 10 ***
> value 1:  R:0 D:0 V:0
> value 2:  R:0 D:0 V:1
> value 3:  R:0 D:0 V:2
> value 4:  R:0 D:0 V:3
> value 5:  R:0 D:0 V:4
> value 6:  R:0 D:0 V:5
> value 7:  R:0 D:0 V:6
> value 8:  R:0 D:0 V:7
> value 9:  R:0 D:0 V:8
> value 10: R:0 D:0 V:9
> {noformat}
> Drill version:
> {noformat}
> 0: jdbc:drill:schema=dfs.drillTestDir> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
>   commit_message  
> |commit_time | build_email |  
>build_time |
> +-+---+-++-++
> | 1.8.0-SNAPSHOT  | 05c42eae79ce3e309028b3824f9449b98e329f29  | DRILL-4707: 
> Fix memory leak or incorrect query result in case two column names are 
> case-insensitive identical.  | 29.06.2016 @ 08:15:13 PDT  | 
> inram...@gmail.com  | 07.07.2016 @ 10:50:40 PDT  |
> +-+---+-++-++
> 1 row selected (0.44 seconds)
> {noformat}
> drill throws NPE:
> {noformat}
> 2016-07-08 11:08:55,156 [288013c7-f122-f6be-936e-c18ebe9b92ef:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 288013c7-f122-f6be-936e-c18ebe9b92ef: select * from 
> dfs.`drill/testdata/parquet_storage/int64_10_bs10k_ps1k_uncompressed.parquet`
> 2016-07-08 11:08:55,292 [288013c7-f122-f6be-936e-c18ebe9b92ef:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Took 0 ms to get file statuses
> 2016-07-08 11:08:55,295 [288013c7-f122-f6be-936e-c18ebe9b92ef:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 
> 1 using 1 threads. Time: 2ms total, 2.423069ms avg, 2ms max.
> 2016-07-08 11:08:55,295 [288013c7-f122-f6be-936e-c18ebe9b92ef:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 
> 1 using 1 threads. Earliest start: 1.347000 μs, Latest start: 1

[jira] [Updated] (DRILL-4866) Provide TABLE and PARTITION information in INFORMATION_SCHEMA for parquet tables created by Drill

2016-08-30 Thread Neeraja (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neeraja updated DRILL-4866:
---
Assignee: Zelaine Fong

> Provide TABLE and PARTITION information in INFORMATION_SCHEMA for parquet 
> tables created by Drill
> -
>
> Key: DRILL-4866
> URL: https://issues.apache.org/jira/browse/DRILL-4866
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata, Storage - Parquet
>Reporter: Andries Engelbrecht
>Assignee: Zelaine Fong
>
> Provide the Table and Partition information on parquet tables created by 
> Drill in INFORMATION_SCHEMA. This can be utilized by tools and users looking 
> to optimize Drill queries by referencing the table and partition metadata 
> from within Drill, as opposed to querying the parquet metadata underneath.
> Potentially extend INFORMATION_SCHEMA with an additional PARTITIONS table 
> similar to MySQL to provide information on column(s) used for partitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4627) Drill should protect data placed into Zookeeper/ZK

2016-08-30 Thread Keys Botzum (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keys Botzum updated DRILL-4627:
---
Labels: security  (was: )

> Drill should protect data placed into Zookeeper/ZK
> --
>
> Key: DRILL-4627
> URL: https://issues.apache.org/jira/browse/DRILL-4627
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Keys Botzum
>Priority: Minor
>  Labels: security
>
> Drill is striving to improve it's security posture and is improving rapidly.
> One key item in a secure system is protection of all relevant data that an 
> attacker could use to cause harm. Today Drill does not protect the data in 
> ZK. This means that an attacker could alter it.
> I recommend that Drill create appropriate ZK ACLs on the data in ZK and 
> establish an appropriate authentication mechanism to ZK - that's likely 
> Kerberos for most Hadoop clusters but MapR Native Security for MapR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4335) Apache Drill should support network encryption

2016-08-30 Thread Keys Botzum (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keys Botzum updated DRILL-4335:
---
Labels: security  (was: )

> Apache Drill should support network encryption
> --
>
> Key: DRILL-4335
> URL: https://issues.apache.org/jira/browse/DRILL-4335
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Keys Botzum
>  Labels: security
>
> This is clearly related to Drill-291 but wanted to make explicit that this 
> needs to include network level encryption and not just authentication. This 
> is particularly important for the client connection to Drill which will often 
> be sending passwords in the clear until there is encryption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4868) Hive functions should update writerIndex accordingly when return binary type

2016-08-30 Thread Chunhui Shi (JIRA)
Chunhui Shi created DRILL-4868:
--

 Summary: Hive functions should update writerIndex accordingly when 
return binary type
 Key: DRILL-4868
 URL: https://issues.apache.org/jira/browse/DRILL-4868
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi


unhex is a Hive function. the returned binary buffer could not be consumed by 
convert_from as shown below.

0: jdbc:drill:zk=10.10.88.128:5181> select 
convert_from(unhex('0a5f710b'),'int_be') from (values(1));
Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex(0) + length(4) 
exceeds writerIndex(0): DrillBuf[31], udle: [25 0..1024]
Fragment 0:0
[Error Id: 5e72ce4a-6164-4260-8317-ca2bb6325013 on atsqa4-128.qa.lab:31010] 
(state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4854) Incorrect logic in log directory checks in drill-config.sh

2016-08-30 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4854:
-
Reviewer: Abhishek Girish

> Incorrect logic in log directory checks in drill-config.sh
> --
>
> Key: DRILL-4854
> URL: https://issues.apache.org/jira/browse/DRILL-4854
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.8.0
>
>
> The recent changes to the launch scripts introduced a subtle bug in the logic 
> that verifies the log directory:
>   if [[ ! -d "$DRILL_LOG_DIR" && ! -w "$DRILL_LOG_DIR" ]]; then
> ...
> if [[ ! -d "$DRILL_LOG_DIR" && ! -w "$DRILL_LOG_DIR" ]]; then
> In both cases, the operator should be or ("||").
> That is, if either the item is not a directory, or it is a directory but is 
> not writable, then do the fall-back steps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4846) Eliminate extra operations during metadata cache pruning

2016-08-30 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4846:
-
Reviewer: Dechang Gu

> Eliminate extra operations during metadata cache pruning
> 
>
> Key: DRILL-4846
> URL: https://issues.apache.org/jira/browse/DRILL-4846
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.8.0
>
>
> While doing performance testing for DRILL-4530 using a new data set and 
> queries, we found two potential performance issues: (a) the metadata cache 
> was being read twice in some cases and (b) the checking for directory 
> modification time was being done twice, once as part of the first phase of 
> directory-based pruning and subsequently after the second phase pruning.   
> This check gets expensive for large number of directories.   Creating this 
> JIRA to track fixes for these issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4857) When no partition pruning occurs with metadata caching there's a performance regression

2016-08-30 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4857:
-
Reviewer: Dechang Gu

> When no partition pruning occurs with metadata caching there's a performance 
> regression
> ---
>
> Key: DRILL-4857
> URL: https://issues.apache.org/jira/browse/DRILL-4857
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata, Query Planning & Optimization
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.8.0
>
>
> After DRILL-4530, we see the (expected) performance improvements in planning 
> time with metadata cache for cases where partition pruning got applied.  
> However, in cases where it did not get applied and for sufficiently large 
> number of files (tested with up to 400K files),  there's performance 
> regression.  Part of this was addressed by DRILL-4846.   This JIRA is to 
> track some remaining fixes to address the regression.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3898) NPE during external sort when there is not enough space for spilling

2016-08-30 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-3898:

Assignee: Boaz Ben-Zvi  (was: Deneche A. Hakim)

> NPE during external sort when there is not enough space for spilling
> 
>
> Key: DRILL-3898
> URL: https://issues.apache.org/jira/browse/DRILL-3898
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Boaz Ben-Zvi
> Fix For: Future
>
> Attachments: drillbit.log
>
>
> While verifying DRILL-3732 I ran into a new problem.
> I think drill somehow loses track of out of disk exception and does not 
> cancel rest of the query, which results in NPE:
> Reproduction is the same as in DRILL-3732:
> {code}
> 0: jdbc:drill:schema=dfs> create table store_sales_20(ss_item_sk, 
> ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, s_sold_date_sk, ss_promo_sk) 
> partition by (ss_promo_sk) as
> . . . . . . . . . . . . >  select 
> . . . . . . . . . . . . >  case when columns[2] = '' then cast(null as 
> varchar(100)) else cast(columns[2] as varchar(100)) end,
> . . . . . . . . . . . . >  case when columns[3] = '' then cast(null as 
> varchar(100)) else cast(columns[3] as varchar(100)) end,
> . . . . . . . . . . . . >  case when columns[4] = '' then cast(null as 
> varchar(100)) else cast(columns[4] as varchar(100)) end, 
> . . . . . . . . . . . . >  case when columns[5] = '' then cast(null as 
> varchar(100)) else cast(columns[5] as varchar(100)) end, 
> . . . . . . . . . . . . >  case when columns[0] = '' then cast(null as 
> varchar(100)) else cast(columns[0] as varchar(100)) end, 
> . . . . . . . . . . . . >  case when columns[8] = '' then cast(null as 
> varchar(100)) else cast(columns[8] as varchar(100)) end
> . . . . . . . . . . . . >  from 
> . . . . . . . . . . . . >   `store_sales.dat` ss 
> . . . . . . . . . . . . > ;
> Error: SYSTEM ERROR: NullPointerException
> Fragment 1:16
> [Error Id: 0ae9338d-d04f-4b4a-93aa-a80d13cedb29 on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> This exception in drillbit.log should have triggered query cancellation:
> {code}
> 2015-10-06 17:01:34,463 [WorkManager-2] ERROR 
> o.apache.drill.exec.work.WorkManager - 
> org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception.
> org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226)
>  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 
> ~[na:1.7.0_71]
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 
> ~[na:1.7.0_71]
> at java.io.FilterOutputStream.close(FilterOutputStream.java:157) 
> ~[na:1.7.0_71]
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
> ~[hadoop-common-2.5.1-mapr-1503.jar:na]
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:400)
>  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
> ~[hadoop-common-2.5.1-mapr-1503.jar:na]
> at 
> org.apache.drill.exec.physical.impl.xsort.BatchGroup.close(BatchGroup.java:152)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:44) 
> ~[drill-common-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:553)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:362)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>