[jira] [Resolved] (DRILL-3745) Hive CHAR not supported

2016-03-19 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-3745.
-
   Resolution: Fixed
Fix Version/s: 1.6.0

> Hive CHAR not supported
> ---
>
> Key: DRILL-3745
> URL: https://issues.apache.org/jira/browse/DRILL-3745
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nathaniel Auvil
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.6.0
>
>
> It doesn’t look like Drill 1.1.0 supports the Hive CHAR type?
> In Hive:
> create table development.foo
> (
>   bad CHAR(10)
> );
> And then in sqlline:
> > use `hive.development`;
> > select * from foo;
> Error: PARSE ERROR: Unsupported Hive data type CHAR.
> Following Hive data types are supported in Drill INFORMATION_SCHEMA:
> BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP,
> BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION
> [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] 
> (state=,code=0)
> This was originally found when getting failures trying to connect via JDBS 
> using Squirrel.  We have the Hive plugin enabled with tables using CHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2223) Empty parquet file created with Limit 0 query errors out when querying

2016-03-19 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200792#comment-15200792
 ] 

Khurram Faraaz commented on DRILL-2223:
---

Yes this is not reproducible on Drill 1.7.0. The question is if CTAS that used 
a LIMIT 0 query was successful, and because every successful CTAS creates a 
valid parquet file, one would expect that in this case CTAS create an empty 
parquet file that has the metadata information in the parquet footer with no 
actual data in the parquet file, since the query was a LIMIT 0 query.

{noformat}
0: jdbc:drill:schema=dfs.tmp> create table t_2223 as select firstName, 
lastName, isAlive, age, height_cm, address, phoneNumbers, hobbies from 
`employee.json` LIMIT 0;
+---++
| Fragment  | Number of records written  |
+---++
| 0_0   | 0  |
+---++
1 row selected (0.31 seconds)
0: jdbc:drill:schema=dfs.tmp> select * from t_2223;
Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 20: Table 
't_2223' not found

SQL Query null

[Error Id: 18273406-da54-415d-b8fe-aa96c6cc3c85 on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

> Empty parquet file created with Limit 0 query errors out when querying
> --
>
> Key: DRILL-2223
> URL: https://issues.apache.org/jira/browse/DRILL-2223
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 0.7.0
>Reporter: Aman Sinha
> Fix For: Future
>
>
> Doing a CTAS with limit 0 creates a 0 length parquet file which errors out 
> during querying.  This should at least write the schema information and 
> metadata which will allow queries to run. 
> {code}
> 0: jdbc:drill:zk=local> create table tt_nation2 as select n_nationkey, 
> n_name, n_regionkey from cp.`tpch/nation.parquet` limit 0;
> ++---+
> |  Fragment  | Number of records written |
> ++---+
> | 0_0| 0 |
> ++---+
> 1 row selected (0.315 seconds)
> 0: jdbc:drill:zk=local> select n_nationkey from tt_nation2;
> Query failed: RuntimeException: file:/tmp/tt_nation2/0_0_0.parquet is not a 
> Parquet file (too small)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4510) IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.dr

2016-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200148#comment-15200148
 ] 

ASF GitHub Bot commented on DRILL-4510:
---

GitHub user hsuanyi opened a pull request:

https://github.com/apache/drill/pull/433

DRILL-4510: Force Union-All to happen in a single node



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hsuanyi/incubator-drill DRILL-4510

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/433.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #433


commit 6c025c90ad23d16b0aefca5f03c033362f93
Author: Hsuan-Yi Chu 
Date:   2016-03-16T04:52:17Z

DRILL-4510: Force Union-All to happen in a single node




> IllegalStateException: Failure while reading vector.  Expected vector class 
> of org.apache.drill.exec.vector.NullableIntVector but was holding vector 
> class org.apache.drill.exec.vector.NullableVarCharVector
> -
>
> Key: DRILL-4510
> URL: https://issues.apache.org/jira/browse/DRILL-4510
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Reporter: Chun Chang
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
>
> Hit the following regression running advanced automation. Regression happened 
> between commit b979bebe83d7017880b0763adcbf8eb80acfcee8 and 
> 1f23b89623c72808f2ee866cec9b4b8a48929d68
> {noformat}
> Execution Failures:
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/original/query66.sql
> Query: 
> -- start query 66 in stream 0 using template query66.tpl 
> SELECT w_warehouse_name, 
>w_warehouse_sq_ft, 
>w_city, 
>w_county, 
>w_state, 
>w_country, 
>ship_carriers, 
>year1,
>Sum(jan_sales) AS jan_sales, 
>Sum(feb_sales) AS feb_sales, 
>Sum(mar_sales) AS mar_sales, 
>Sum(apr_sales) AS apr_sales, 
>Sum(may_sales) AS may_sales, 
>Sum(jun_sales) AS jun_sales, 
>Sum(jul_sales) AS jul_sales, 
>Sum(aug_sales) AS aug_sales, 
>Sum(sep_sales) AS sep_sales, 
>Sum(oct_sales) AS oct_sales, 
>Sum(nov_sales) AS nov_sales, 
>Sum(dec_sales) AS dec_sales, 
>Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot, 
>Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot, 
>Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot, 
>Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot, 
>Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot, 
>Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot, 
>Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot, 
>Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot, 
>Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot, 
>Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot, 
>Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot, 
>Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot, 
>Sum(jan_net)   AS jan_net, 
>Sum(feb_net)   AS feb_net, 
>Sum(mar_net)   AS mar_net, 
>Sum(apr_net)   AS apr_net, 
>Sum(may_net)   AS may_net, 
>Sum(jun_net)   AS jun_net, 
>Sum(jul_net)   AS jul_net, 
>Sum(aug_net)   AS aug_net, 
>Sum(sep_net)   AS sep_net, 
>Sum(oct_net)   AS oct_net, 
>Sum(nov_net)   AS nov_net, 
>Sum(dec_net)   AS dec_net 
> FROM   (SELECT w_warehouse_name, 
>w_warehouse_sq_ft, 
>

[jira] [Resolved] (DRILL-4372) Drill Operators and Functions should correctly expose their types within Calcite

2016-03-19 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni resolved DRILL-4372.
---
   Resolution: Fixed
Fix Version/s: 1.7.0

Fixed in commit: c0293354ec79b42ff27ce4ad2113a2ff52a934bd

> Drill Operators and Functions should correctly expose their types within 
> Calcite
> 
>
> Key: DRILL-4372
> URL: https://issues.apache.org/jira/browse/DRILL-4372
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning & Optimization
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> Currently, for most operators / functions, Drill would always claim the 
> return types being nullable-any. 
> However, in many cases (such as Hive, View, etc.), the types of input columns 
> are known. So, along with resolving to the correct operators / functions, we 
> can infer the output types at planning. 
> Having this mechanism can help speed up many applications, especially where 
> schemas alone are sufficient (e.g., Limit-0).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4510) IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.dr

2016-03-19 Thread Chun Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197687#comment-15197687
 ] 

Chun Chang commented on DRILL-4510:
---

Now, with this commit id, 1.7.0-SNAPSHOT
050ff9679d99b5cdacc86f5501802c3d2a6dd3e3, the error message becomes:

{noformat}
Failed with exception
java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not 
support schema changes

Fragment 2:0

[Error Id: ebd53ba1-f281-4441-8b20-105bd8bb2e06 on atsqa6c88.qa.lab:31010]
at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:321)
at 
oadd.net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:172)
at 
org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:203)
at 
org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:93)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes

Fragment 2:0

[Error Id: ebd53ba1-f281-4441-8b20-105bd8bb2e06 on atsqa6c88.qa.lab:31010]
at 
oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
at 
oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
at 
oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
at 
oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374)
at 
oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252)
at 
oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:285)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:257)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
at 
oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at 
oadd.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 

[jira] [Comment Edited] (DRILL-4398) SYSTEM ERROR: IllegalStateException: Memory was leaked by query

2016-03-19 Thread Matt Keranen (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201810#comment-15201810
 ] 

Matt Keranen edited comment on DRILL-4398 at 3/18/16 5:25 PM:
--

Getting similar in 1.6.0 with CTAS into Parquet from csv data stored in HDFS:

{noformat}
Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory 
leaked: (523264)
Allocator(op:1:12:5:ExternalSort) 2000/523264/343731840/357913941 
(res/actual/peak/limit)


Fragment 1:12

[Error Id: be0fef1f-e02a-422e-808f-2fe171ae7875 on es05:31010] (state=,code=0)
{noformat}


was (Author: mattk):
Getting similar in 1.6.0 with CTAS into Parquet from csv data stored in HDFS.

> SYSTEM ERROR: IllegalStateException: Memory was leaked by query
> ---
>
> Key: DRILL-4398
> URL: https://issues.apache.org/jira/browse/DRILL-4398
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
>Reporter: N Campbell
>Assignee: Taras Supyk
>
> Several queries fail with memory leaked errors
> select tjoin2.rnum, tjoin1.c1, tjoin2.c1 as c1j2, tjoin2.c2 as c2j2 from 
> postgres.public.tjoin1 full outer join postgres.public.tjoin2 on tjoin1.c1 = 
> tjoin2.c1
> select tjoin1.rnum, tjoin1.c1, tjoin2.c1 as c1j2, tjoin2.c2 from 
> postgres.public.tjoin1, lateral ( select tjoin2.c1, tjoin2.c2 from 
> postgres.public.tjoin2 where tjoin1.c1=tjoin2.c1) tjoin2
> SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory 
> leaked: (40960)
> Allocator(op:0:0:3:JdbcSubScan) 100/40960/135168/100 
> (res/actual/peak/limit)
> create table TJOIN1 (RNUM integer   not null , C1 integer, C2 integer);
> insert into TJOIN1 (RNUM, C1, C2) values ( 0, 10, 15);
> insert into TJOIN1 (RNUM, C1, C2) values ( 1, 20, 25);
> insert into TJOIN1 (RNUM, C1, C2) values ( 2, NULL, 50);
> create table TJOIN2 (RNUM integer   not null , C1 integer, C2 char(2));
> insert into TJOIN2 (RNUM, C1, C2) values ( 0, 10, 'BB');
> insert into TJOIN2 (RNUM, C1, C2) values ( 1, 15, 'DD');
> insert into TJOIN2 (RNUM, C1, C2) values ( 2, NULL, 'EE');
> insert into TJOIN2 (RNUM, C1, C2) values ( 3, 10, 'FF');



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4338) Concurrent query remains in CANCELLATION_REQUESTED state

2016-03-19 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197905#comment-15197905
 ] 

Khurram Faraaz commented on DRILL-4338:
---

Problem is reproducible on Drill 1.6.0, JDK 7 and git commit ID : 
64ab0a8ec9d98bf96f4d69274dddc180b8efe263

> Concurrent query remains in CANCELLATION_REQUESTED state 
> -
>
> Key: DRILL-4338
> URL: https://issues.apache.org/jira/browse/DRILL-4338
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.4.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
> Attachments: ConcurrencyTest.java, 
> query_In_cancellation_requested_state.png
>
>
> Execute a query concurrently through a Java program and while the java 
> program is under execution (executing SQL queries concurrently) issue Ctrl-C 
> on the prompt where the java program was being executed.
> Here are two observations, 
> (1) There is an Exception in drillbit.log.
> (2) Once Ctrl-C was issued to the java program, queries that were under 
> execution at that point of time, move from FAILED state to 
> CANCELLATION_REQUESTED state, they do not end up in CANCELED state. Ideally 
> that last state of these queries should be CANCELED state and not 
> CANCELLATION_REQUESTED. 
> Snippet from drillbit.log
> {noformat}
> 2016-02-02 06:21:21,903 [294fb51d-8a4c-c099-dc90-97434056e3d7:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 294fb51d-8a4c-c099-dc90-97434056e3d7:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2016-02-02 06:21:21,903 [294fb51d-8a4c-c099-dc90-97434056e3d7:frag:0:0] INFO  
> o.a.d.e.w.f.FragmentStatusReporter - 
> 294fb51d-8a4c-c099-dc90-97434056e3d7:0:0: State to report: RUNNING
> 2016-02-02 06:21:48,560 [UserServer-1] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.100.201:31010 <--> /10.10.100.201:45087 (user client).  
> Closing connection.
> java.io.IOException: syscall:read(...)() failed: Connection reset by peer
> 2016-02-02 06:21:48,562 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 294fb51d-8a4c-c099-dc90-97434056e3d7:0:0: State change requested RUNNING --> 
> FAILED
> 2016-02-02 06:21:48,562 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 294fb51d-f424-6adc-d668-1659e4353698:0:0: State change requested RUNNING --> 
> FAILED
> 2016-02-02 06:21:48,562 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 294fb51d-c7f6-2c8f-0689-af9de21a6d20:0:0: State change requested RUNNING --> 
> FAILED
> 2016-02-02 06:21:48,563 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 294fb51e-5de9-0919-be56-52f75a0532f1:0:0: State change requested RUNNING --> 
> FAILED
> 2016-02-02 06:21:48,573 [CONTROL-rpc-event-queue] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 294fb51d-f424-6adc-d668-1659e4353698:0:0: State change requested FAILED --> 
> CANCELLATION_REQUESTED
> 2016-02-02 06:21:48,573 [CONTROL-rpc-event-queue] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 294fb51d-f424-6adc-d668-1659e4353698:0:0: Ignoring unexpected state 
> transition FAILED --> CANCELLATION_REQUESTED
> 2016-02-02 06:21:48,580 [CONTROL-rpc-event-queue] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 294fb51e-5de9-0919-be56-52f75a0532f1:0:0: State change requested FAILED --> 
> CANCELLATION_REQUESTED
> 2016-02-02 06:21:48,580 [CONTROL-rpc-event-queue] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 294fb51e-5de9-0919-be56-52f75a0532f1:0:0: Ignoring unexpected state 
> transition FAILED --> CANCELLATION_REQUESTED
> 2016-02-02 06:21:48,588 [CONTROL-rpc-event-queue] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 294fb51d-c7f6-2c8f-0689-af9de21a6d20:0:0: State change requested FAILED --> 
> CANCELLATION_REQUESTED
> 2016-02-02 06:21:48,588 [CONTROL-rpc-event-queue] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 294fb51d-c7f6-2c8f-0689-af9de21a6d20:0:0: Ignoring unexpected state 
> transition FAILED --> CANCELLATION_REQUESTED
> 2016-02-02 06:21:48,596 [CONTROL-rpc-event-queue] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 294fb51d-8a4c-c099-dc90-97434056e3d7:0:0: State change requested FAILED --> 
> CANCELLATION_REQUESTED
> 2016-02-02 06:21:48,596 [CONTROL-rpc-event-queue] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 294fb51d-8a4c-c099-dc90-97434056e3d7:0:0: Ignoring unexpected state 
> transition FAILED --> CANCELLATION_REQUESTED
> 2016-02-02 06:21:48,597 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 294fb51d-f424-6adc-d668-1659e4353698:0:0: State change requested FAILED --> 
> FAILED
> 2016-02-02 06:21:48,599 [UserServer-1] WARN  
> o.a.d.exec.rpc.RpcExceptionHandler - Exception occurred with closed channel.  
> Connection: /10.10.100.201:31010 <--> 

[jira] [Commented] (DRILL-4317) Exceptions on SELECT and CTAS with large CSV files

2016-03-19 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197383#comment-15197383
 ] 

Deneche A. Hakim commented on DRILL-4317:
-

[~hgunes] can you please review ? thanks

> Exceptions on SELECT and CTAS with large CSV files
> --
>
> Key: DRILL-4317
> URL: https://issues.apache.org/jira/browse/DRILL-4317
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.4.0, 1.5.0
> Environment: 4 node cluster, Hadoop 2.7.0, 14.04.1-Ubuntu
>Reporter: Matt Keranen
>Assignee: Hanifi Gunes
> Fix For: 1.7.0
>
>
> Selecting from a CSV file or running a CTAS into Parquet generates exceptions.
> Source file is ~650MB, a table of 4 key columns followed by 39 numeric data 
> columns, otherwise a fairly simple format. Example:
> {noformat}
> 2015-10-17 
> 00:00,f5e9v8u2,err,fr7,226020793,76.094,26307,226020793,76.094,26307,
> 2015-10-17 
> 00:00,c3f9x5z2,err,mi1,1339159295,216.004,177690,1339159295,216.004,177690,
> 2015-10-17 
> 00:00,r5z2f2i9,err,mi1,7159994629,39718.011,65793,6142021303,30687.811,64630,143777403,40.521,146,75503742,41.905,89,170771174,168.165,198,192565529,370.475,222,97577280,318.068,120,62631452,288.253,68,32371173,189.527,39,41712265,299.184,46,39046408,363.418,47,34182318,465.343,43,127834582,6485.341,145
> 2015-10-17 
> 00:00,j9s6i8t2,err,fr7,20580443899,277445.055,67826,2814893469,85447.816,54275,2584757097,608.001,2044,1395571268,769.113,1051,3070616988,3000.005,2284,3413811671,6489.060,2569,1772235156,5806.214,1339,1097879284,5064.120,858,691884865,4035.397,511,672967845,4815.875,518,789163614,7306.684,599,813910495,10632.464,627,1462752147,143470.306,1151
> {noformat}
> A "SELECT from `/path/to/file.csv`" runs for 10's of minutes and eventually 
> results in:
> {noformat}
> java.lang.IndexOutOfBoundsException: index: 547681, length: 1 (expected: 
> range(0, 547681))
> at 
> io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1134)
> at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:136)
> at io.netty.buffer.WrappedByteBuf.getBytes(WrappedByteBuf.java:289)
> at 
> io.netty.buffer.UnsafeDirectLittleEndian.getBytes(UnsafeDirectLittleEndian.java:26)
> at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
> at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
> at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
> at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
> at 
> org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:443)
> at 
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getBytes(VarCharAccessor.java:125)
> at 
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getString(VarCharAccessor.java:146)
> at 
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:136)
> at 
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:94)
> at 
> org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:148)
> at 
> org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getObject(TypeConvertingSqlAccessor.java:795)
> at 
> org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:179)
> at 
> net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:420)
> at sqlline.Rows$Row.(Rows.java:157)
> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:63)
> at 
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> at sqlline.SqlLine.print(SqlLine.java:1593)
> at sqlline.Commands.execute(Commands.java:852)
> at sqlline.Commands.sql(Commands.java:751)
> at sqlline.SqlLine.dispatch(SqlLine.java:746)
> at sqlline.SqlLine.begin(SqlLine.java:621)
> at sqlline.SqlLine.start(SqlLine.java:375)
> at sqlline.SqlLine.main(SqlLine.java:268)
> {noformat}
> A CTAS on the same file with storage as Parquet results in:
> {noformat}
> Error: SYSTEM ERROR: IllegalArgumentException: length: -260 (expected: >= 0)
> Fragment 1:2
> [Error Id: 1807615e-4385-4f85-8402-5900aaa568e9 on es07:31010]
>   (java.lang.IllegalArgumentException) length: -260 (expected: >= 0)
> io.netty.buffer.AbstractByteBuf.checkIndex():1131
> io.netty.buffer.PooledUnsafeDirectByteBuf.nioBuffer():344
> 

[jira] [Updated] (DRILL-4518) Two or more columns present in of IN predicate, query returns wrong results.

2016-03-19 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-4518:
--
Attachment: f_20160316.json

attached data file used in test.

> Two or more columns present in  of IN predicate, query 
> returns wrong results.
> --
>
> Key: DRILL-4518
> URL: https://issues.apache.org/jira/browse/DRILL-4518
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
> Attachments: f_20160316.json
>
>
> Two or more columns present in  of IN predicate, query 
> returns wrong results.
> Drill 1.7.0-SNAPSHOT git commit ID: 245da979
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> alter system set 
> `store.json.all_text_mode`=true;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | store.json.all_text_mode updated.  |
> +---++
> 1 row selected (0.15 seconds)
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM `f_20160316.json` t WHERE (t.c1) 
> IN (1234,345643);
> +---+
> |  c1   |
> +---+
> | 1234  |
> +---+
> 1 row selected (0.292 seconds)
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM `f_20160316.json` t WHERE (t.c2) 
> IN (1234,345643);
> +---+
> |  c1   |
> +---+
> | null  |
> +---+
> 1 row selected (0.224 seconds)
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM `f_20160316.json` t WHERE 
> (t.c1,t.c2) IN (1234,345643);
> Error: VALIDATION ERROR: From line 1, column 35 to line 1, column 68: Values 
> passed to IN operator must have compatible types
> SQL Query null
> [Error Id: 740e94a7-b61b-4dbf-96f3-8166c4f94164 on centos-04.qa.lab:31010] 
> (state=,code=0)
> Stack trace from drillbit.log for above failure.
> 2016-03-17 06:57:40,227 [2915aa9b-381a-119d-2814-711fea9dd07c:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 2915aa9b-381a-119d-2814-711fea9dd07c: SELECT * FROM `f_20160316.json` t WHERE 
> (t.c1,t.c2) IN (1234,345643)
> 2016-03-17 06:57:40,286 [2915aa9b-381a-119d-2814-711fea9dd07c:foreman] INFO  
> o.a.d.exec.planner.sql.SqlConverter - User Error Occurred
> org.apache.drill.common.exceptions.UserException: VALIDATION ERROR: From line 
> 1, column 35 to line 1, column 68: Values passed to IN operator must have 
> compatible types
> SQL Query null
> [Error Id: 740e94a7-b61b-4dbf-96f3-8166c4f94164 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.SqlConverter.validate(SqlConverter.java:157)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:581)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:192)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:94)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:927) 
> [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:251) 
> [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: org.apache.calcite.runtime.CalciteContextException: From line 1, 
> column 35 to line 1, column 68: Values passed to IN operator must have 
> compatible types
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method) ~[na:1.7.0_45]
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  ~[na:1.7.0_45]
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  ~[na:1.7.0_45]
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) 
> ~[na:1.7.0_45]
> at 
> 

[jira] [Updated] (DRILL-2282) Eliminate spaces, special characters from names in function templates

2016-03-19 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-2282:
---
Issue Type: Improvement  (was: Bug)

> Eliminate spaces, special characters from names in function templates
> -
>
> Key: DRILL-2282
> URL: https://issues.apache.org/jira/browse/DRILL-2282
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Mehant Baid
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
> Attachments: DRILL-2282-updated.patch, DRILL-2282.patch
>
>
> Having spaces in the name of the functions causes issues while deserializing 
> such expressions when we try to read the plan fragment. As part of this JIRA 
> would like to clean up all the templates to not include special characters in 
> their names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3745) Hive CHAR not supported

2016-03-19 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197552#comment-15197552
 ] 

Arina Ielchiieva commented on DRILL-3745:
-

Commid id - dd4f03b.

> Hive CHAR not supported
> ---
>
> Key: DRILL-3745
> URL: https://issues.apache.org/jira/browse/DRILL-3745
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nathaniel Auvil
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> It doesn’t look like Drill 1.1.0 supports the Hive CHAR type?
> In Hive:
> create table development.foo
> (
>   bad CHAR(10)
> );
> And then in sqlline:
> > use `hive.development`;
> > select * from foo;
> Error: PARSE ERROR: Unsupported Hive data type CHAR.
> Following Hive data types are supported in Drill INFORMATION_SCHEMA:
> BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP,
> BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION
> [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] 
> (state=,code=0)
> This was originally found when getting failures trying to connect via JDBS 
> using Squirrel.  We have the Hive plugin enabled with tables using CHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4443) MIN/MAX on VARCHAR throw a NullPointerException

2016-03-19 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4443:
-
Reviewer: Khurram Faraaz

> MIN/MAX on VARCHAR throw a NullPointerException
> ---
>
> Key: DRILL-4443
> URL: https://issues.apache.org/jira/browse/DRILL-4443
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.6.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.6.0
>
> Attachments: DRILL_4443.parquet, test4443.csv
>
>
> Using a simple csv file that contains at least 2 groups of rows:
> {noformat}
> a,
> a,
> a,
> b,
> {noformat}
> Running a query with min/max throws a NullPointerException:
> {noformat}
> SELECT MIN(columns[1]) FROM `test4443.csv` GROUP BY columns[0];
> Error: SYSTEM ERROR: NullPointerException
> ...
> {noformat}
> {noformat}
> SELECT MAX(columns[1]) FROM `test4443.csv` GROUP BY columns[0];
> Error: SYSTEM ERROR: NullPointerException
> ...
> {noformat}
> The problem is caused by {{VarCharAggrFunctions.java}} that is not reseting 
> it's internal buffer properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-19 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4474:
-
Reviewer: Khurram Faraaz

> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
> Fix For: 1.6.0
>
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +--+
> | cnt  |
> +--+
> | 525  |
> +--+
> 1 row selected (14.318 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-5 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid),
> . . . . . . . > count(distinct sessionid)
> . . . . . . . > from 

[jira] [Updated] (DRILL-4519) File system directory-based partition pruning doesn't work correctly with parquet metadata

2016-03-19 Thread Miroslav Holubec (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miroslav Holubec updated DRILL-4519:

Description: 
We have parquet files in folders with following convention /MM/DD/HH.
Without drill's parquet metadata directory prunning works seamlessly.
{noformat}
select dir0, dir1, dir2 from hdfs.test.indexed;
dir0 = ,  dir1 = MM, dir2 = DD, dir3 = HH
{noformat}
After creating metadata and executing same query, dir0 contains HH folder name 
instead yearly folder name. dir1...3 are null.
{noformat}
refresh table metadata hdfs.test.indexed;
select dir0, dir1, dir2 from hdfs.test.indexed;
dir0 = HH,  dir1 = null, dir2 = null, dir3 = null
{noformat}



  was:
We have parquet files in folders with following convention /MM/DD/HH.
Without drill's parquet metadata directory prunning works seamlessly.
{noformat}
select dir0, dir1, dir2 from hdfs.test.indexed;
dir0 = ,  dir1 = MM, dir2 = DD, dir3 = HH
{noformat}
After creating metadata and executing same query, dir0 contains HH folder name 
instead yearly folder name. dir1...3 are null.
{noformat}
select dir0, dir1, dir2 from hdfs.test.indexed;
dir0 = HH,  dir1 = null, dir2 = null, dir3 = null
{noformat}




> File system directory-based partition pruning doesn't work correctly with 
> parquet metadata
> --
>
> Key: DRILL-4519
> URL: https://issues.apache.org/jira/browse/DRILL-4519
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Miroslav Holubec
>
> We have parquet files in folders with following convention /MM/DD/HH.
> Without drill's parquet metadata directory prunning works seamlessly.
> {noformat}
> select dir0, dir1, dir2 from hdfs.test.indexed;
> dir0 = ,  dir1 = MM, dir2 = DD, dir3 = HH
> {noformat}
> After creating metadata and executing same query, dir0 contains HH folder 
> name instead yearly folder name. dir1...3 are null.
> {noformat}
> refresh table metadata hdfs.test.indexed;
> select dir0, dir1, dir2 from hdfs.test.indexed;
> dir0 = HH,  dir1 = null, dir2 = null, dir3 = null
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2016-03-19 Thread Ian Hellstrom (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201352#comment-15201352
 ] 

Ian Hellstrom commented on DRILL-3562:
--

Is this a duplicate of 
[DRILL-2217|http://issues.apache.org/jira/browse/DRILL-2217]?

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
> Fix For: Future
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4376) Wrong results when doing a count(*) on part of directories with metadata cache

2016-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197701#comment-15197701
 ] 

ASF GitHub Bot commented on DRILL-4376:
---

Github user adeneche closed the pull request at:

https://github.com/apache/drill/pull/422


> Wrong results when doing a count(*) on part of directories with metadata cache
> --
>
> Key: DRILL-4376
> URL: https://issues.apache.org/jira/browse/DRILL-4376
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.4.0
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.7.0
>
>
> First create some parquet tables in multiple subfolders:
> {noformat}
> create table dfs.tmp.`test/201501` as select employee_id, full_name from 
> cp.`employee.json` limit 2;
> create table dfs.tmp.`test/201502` as select employee_id, full_name from 
> cp.`employee.json` limit 2;
> create table dfs.tmp.`test/201601` as select employee_id, full_name from 
> cp.`employee.json` limit 2;
> create table dfs.tmp.`test/201602` as select employee_id, full_name from 
> cp.`employee.json` limit 2;
> {noformat}
> Running the following query gives the expected count:
> {noformat}
> select count(*) from dfs.tmp.`test/20160*`;
> +-+
> | EXPR$0  |
> +-+
> | 4   |
> +-+
> {noformat}
> But once you create the metadata cache files, the query no longer returns the 
> correct results:
> {noformat}
> refresh table metadata dfs.tmp.`test`;
> select count(*) from dfs.tmp.`test/20160*`;
> +-+
> | EXPR$0  |
> +-+
> | 2   |
> +-+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4515) Fix an documentation error related to text file splitting

2016-03-19 Thread Deneche A. Hakim (JIRA)
Deneche A. Hakim created DRILL-4515:
---

 Summary: Fix an documentation error related to text file splitting
 Key: DRILL-4515
 URL: https://issues.apache.org/jira/browse/DRILL-4515
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Reporter: Deneche A. Hakim


In this documentation page:

http://drill.apache.org/docs/text-files-csv-tsv-psv/

We can read the following:

{quote}
Using a distributed file system, such as HDFS, instead of a local file system 
to query the files also improves performance because currently Drill does not 
split files on block splits.
{quote}

Drill actually attempts to split files on block boundaries when running on HDFS 
and MapRFS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4518) Two or more columns present in of IN predicate, query returns wrong results.

2016-03-19 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-4518:
-

 Summary: Two or more columns present in  of 
IN predicate, query returns wrong results.
 Key: DRILL-4518
 URL: https://issues.apache.org/jira/browse/DRILL-4518
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.7.0
 Environment: 4 node cluster CentOS
Reporter: Khurram Faraaz


Two or more columns present in  of IN predicate, query 
returns wrong results.
Drill 1.7.0-SNAPSHOT git commit ID: 245da979

{noformat}

0: jdbc:drill:schema=dfs.tmp> alter system set `store.json.all_text_mode`=true;
+---++
|  ok   |  summary   |
+---++
| true  | store.json.all_text_mode updated.  |
+---++
1 row selected (0.15 seconds)

0: jdbc:drill:schema=dfs.tmp> SELECT * FROM `f_20160316.json` t WHERE (t.c1) IN 
(1234,345643);
+---+
|  c1   |
+---+
| 1234  |
+---+
1 row selected (0.292 seconds)

0: jdbc:drill:schema=dfs.tmp> SELECT * FROM `f_20160316.json` t WHERE (t.c2) IN 
(1234,345643);
+---+
|  c1   |
+---+
| null  |
+---+
1 row selected (0.224 seconds)

0: jdbc:drill:schema=dfs.tmp> SELECT * FROM `f_20160316.json` t WHERE 
(t.c1,t.c2) IN (1234,345643);
Error: VALIDATION ERROR: From line 1, column 35 to line 1, column 68: Values 
passed to IN operator must have compatible types

SQL Query null

[Error Id: 740e94a7-b61b-4dbf-96f3-8166c4f94164 on centos-04.qa.lab:31010] 
(state=,code=0)

Stack trace from drillbit.log for above failure.

2016-03-17 06:57:40,227 [2915aa9b-381a-119d-2814-711fea9dd07c:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
2915aa9b-381a-119d-2814-711fea9dd07c: SELECT * FROM `f_20160316.json` t WHERE 
(t.c1,t.c2) IN (1234,345643)
2016-03-17 06:57:40,286 [2915aa9b-381a-119d-2814-711fea9dd07c:foreman] INFO  
o.a.d.exec.planner.sql.SqlConverter - User Error Occurred
org.apache.drill.common.exceptions.UserException: VALIDATION ERROR: From line 
1, column 35 to line 1, column 68: Values passed to IN operator must have 
compatible types

SQL Query null

[Error Id: 740e94a7-b61b-4dbf-96f3-8166c4f94164 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
 ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.SqlConverter.validate(SqlConverter.java:157) 
[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:581)
 [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:192)
 [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164)
 [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:94)
 [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:927) 
[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:251) 
[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_45]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_45]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
Caused by: org.apache.calcite.runtime.CalciteContextException: From line 1, 
column 35 to line 1, column 68: Values passed to IN operator must have 
compatible types
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) ~[na:1.7.0_45]
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 ~[na:1.7.0_45]
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[na:1.7.0_45]
at java.lang.reflect.Constructor.newInstance(Constructor.java:526) 
~[na:1.7.0_45]
at 
org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:405) 
~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:714) 
~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:702) 
~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:3931)
 

[jira] [Closed] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-03-19 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman closed DRILL-4392.
---

> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Blocker
> Fix For: 1.6.0
>
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 
> pinto be  | true
> {code}
> This basically breaks all the parquet files created by Drill's CTAS with 
> partition support. 
> Also, it will also fail one of the Pre-commit functional test [1]
> [1] 
> https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4516) Transform SUM(1) query to COUNT(1)

2016-03-19 Thread Sudip Mukherjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudip Mukherjee updated DRILL-4516:
---
Affects Version/s: 1.4.0

> Transform SUM(1) query to COUNT(1)
> --
>
> Key: DRILL-4516
> URL: https://issues.apache.org/jira/browse/DRILL-4516
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.4.0
>Reporter: Sudip Mukherjee
>
> If we connect drill with tableau we see some query requests like , select 
> sum(1) tablename. 
> This results in pulling all the records out of the underlying datasource and 
> aggregate them to get row count.
> The behavior can be optimized if the query gets transformed into a count(1) 
> query which is likely to be optimized at the datasource level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4402) pushing unsupported full outer join to Postgres

2016-03-19 Thread Taras Supyk (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199542#comment-15199542
 ] 

Taras Supyk commented on DRILL-4402:


Looks like this bug is already fixed in new version of calcite.

> pushing unsupported full outer join to Postgres
> ---
>
> Key: DRILL-4402
> URL: https://issues.apache.org/jira/browse/DRILL-4402
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
>Reporter: N Campbell
>Assignee: Taras Supyk
>
> Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the 
> SQL query. 
> sql SELECT *
> FROM "public"."tjoin1"
> FULL JOIN "public"."tjoin2" ON "tjoin1"."c1" < "tjoin2"."c1"
> plugin postgres
> Fragment 0:0
> [Error Id: bc54cf76-f4ff-474c-b3df-fa357bdf0ff8 on centos1:31010]
>   (org.postgresql.util.PSQLException) ERROR: FULL JOIN is only supported with 
> merge-joinable or hash-joinable join conditions
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse():2182
> org.postgresql.core.v3.QueryExecutorImpl.processResults():1911
> org.postgresql.core.v3.QueryExecutorImpl.execute():173
> org.postgresql.jdbc.PgStatement.execute():622
> org.postgresql.jdbc.PgStatement.executeWithFlags():458
> org.postgresql.jdbc.PgStatement.executeQuery():374
> org.apache.commons.dbcp.DelegatingStatement.executeQuery():208
> org.apache.commons.dbcp.DelegatingStatement.executeQuery():208
> org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup():177
> org.apache.drill.exec.physical.impl.ScanBatch.():108
> org.apache.drill.exec.physical.impl.ScanBatch.():136
> org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():40
> org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():33
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():147
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():170
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():127
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():170
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():127
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():170
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():101
> org.apache.drill.exec.physical.impl.ImplCreator.getExec():79
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():230
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
> SQLState:  null
> ErrorCode: 0
> create table TJOIN1 (RNUM integer   not null , C1 integer, C2 integer);
> insert into TJOIN1 (RNUM, C1, C2) values ( 0, 10, 15);
> insert into TJOIN1 (RNUM, C1, C2) values ( 1, 20, 25);
> insert into TJOIN1 (RNUM, C1, C2) values ( 2, NULL, 50);
> create table TJOIN2 (RNUM integer   not null , C1 integer, C2 char(2));
> insert into TJOIN2 (RNUM, C1, C2) values ( 0, 10, 'BB');
> insert into TJOIN2 (RNUM, C1, C2) values ( 1, 15, 'DD');
> insert into TJOIN2 (RNUM, C1, C2) values ( 2, NULL, 'EE');
> insert into TJOIN2 (RNUM, C1, C2) values ( 3, 10, 'FF');



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4383) Allow passign custom configuration options to a file system through the storage plugin config

2016-03-19 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4383:
-
Reviewer: Chun Chang

> Allow passign custom configuration options to a file system through the 
> storage plugin config
> -
>
> Key: DRILL-4383
> URL: https://issues.apache.org/jira/browse/DRILL-4383
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
> Fix For: 1.6.0
>
>
> A similar feature already exists in the Hive and Hbase plugins, it simply 
> provides a key/value map for passing custom configuration options to the 
> underlying storage system.
> This would be useful for the filesystem plugin to configure S3 without 
> needing to create a core-site.xml file or restart Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4484) NPE when querying empty directory

2016-03-19 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim resolved DRILL-4484.
-
Resolution: Fixed

Fixed in 71608ca

> NPE when querying  empty directory 
> ---
>
> Key: DRILL-4484
> URL: https://issues.apache.org/jira/browse/DRILL-4484
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
> Fix For: 1.7.0
>
>
> {code}
> 0: jdbc:drill:drillbit=localhost> select count(*) from 
> dfs.`/drill/xyz/201604*`;
> Error: VALIDATION ERROR: null
> SQL Query null
> 0: jdbc:drill:drillbit=localhost> select count(*) from 
> dfs.`/drill/xyz/20160401`;
> Error: VALIDATION ERROR: null
> SQL Query null
> [Error Id: 87366a2d-fc90-42f3-a076-aed5efdd27cb on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:drillbit=localhost> select count(*) from 
> dfs.`/drill/xyz/20160401/`;
> Error: VALIDATION ERROR: null
> SQL Query null
> [Error Id: ac122243-488e-4fb8-b89f-dc01c7e5c63a on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> {code}
> [Mon Mar 07 15:00:19 root@/drill/xyz ] # ls -lR
> .:
> total 5
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160101
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160102
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160103
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160104
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160105
> drwxr-xr-x 2 root root   1 Feb 26 16:31 20160201
> drwxr-xr-x 2 root root   3 Feb 26 16:31 20160202
> drwxr-xr-x 2 root root   4 Feb 26 16:31 20160301
> drwxr-xr-x 2 root root   0 Feb 26 16:31 20160401
> ./20160101:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160102:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160103:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160104:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160105:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160201:
> total 0
> ./20160202:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> -rw-r--r-- 1 root root 395 Feb 26 16:31 1_0_0.parquet
> ./20160301:
> total 2
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> -rw-r--r-- 1 root root 395 Feb 26 16:31 1_0_0.parquet
> -rw-r--r-- 1 root root 395 Feb 26 16:31 2_0_0.parquet
> ./20160401:
> total 0
> {code}
> Hakim's analysis:
> {code}
> More details about the NPE, actually it's an IllegalArgumentException: what 
> happens is that during planing no file meets the wildcard selection and the 
> query should fail during planing with a "Table not found" message, instead 
> execution starts and the scanner fail because no file was assigned to them
> {code}
> Drill version:
> {code}
> #Generated by Git-Commit-Id-Plugin
> #Mon Mar 07 19:38:24 UTC 2016
> git.commit.id.abbrev=a2fec78
> git.commit.user.email=adene...@gmail.com
> git.commit.message.full=DRILL-4457\: Difference in results returned by window 
> function over BIGINT data\n\nthis closes \#410\n
> git.commit.id=a2fec78695df979e240231cb9d32c7f18274a333
> git.commit.message.short=DRILL-4457\: Difference in results returned by 
> window function over BIGINT data
> git.commit.user.name=adeneche
> git.build.user.name=Unknown
> git.commit.id.describe=0.9.0-625-ga2fec78-dirty
> git.build.user.email=Unknown
> git.branch=master
> git.commit.time=07.03.2016 @ 17\:38\:42 UTC
> git.build.time=07.03.2016 @ 19\:38\:24 UTC
> git.remote.origin.url=https\://github.com/apache/drill
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4521) Drill doesn't correctly treat VARIANCE and STDDEV as two phase aggregates

2016-03-19 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-4521:
-

 Summary: Drill doesn't correctly treat VARIANCE and STDDEV as two 
phase aggregates
 Key: DRILL-4521
 URL: https://issues.apache.org/jira/browse/DRILL-4521
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jacques Nadeau
Assignee: MinJi Kim


These are supposed to be synonyms with STDDEV_POP and VARIANCE_POP but they are 
handled differently. This causes the reduce aggregates rule to not reduce these 
and thus they are handled as single phase aggregates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4517) Reading emtpy Parquet file failes with java.lang.IllegalArgumentException

2016-03-19 Thread Tobias (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199442#comment-15199442
 ] 

Tobias commented on DRILL-4517:
---

Is this fixed on head (1.7) as mentioned in DRILL-2223? If so we can build our 
own version

> Reading emtpy Parquet file failes with java.lang.IllegalArgumentException
> -
>
> Key: DRILL-4517
> URL: https://issues.apache.org/jira/browse/DRILL-4517
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Reporter: Tobias
>
> When querying a Parquet file that has a schema but no rows the Drill Server 
> will fail with the below
> This looks similar to DRILL-3557
> {noformat}
> {{ParquetMetaData{FileMetaData{schema: message TRANSACTION_REPORT {
>   required int64 MEMBER_ACCOUNT_ID;
>   required int64 TIMESTAMP_IN_HOUR;
>   optional int64 APPLICATION_ID;
> }
> , metadata: {}}}, blocks: []}
> {noformat}
> {noformat}
> Caused by: java.lang.IllegalArgumentException: MinorFragmentId 0 has no read 
> entries assigned
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) 
> ~[guava-14.0.1.jar:na]
> at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:707)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:105)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:68)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:35)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.base.AbstractGroupScan.accept(AbstractGroupScan.java:60)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:102)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:35)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitProject(AbstractPhysicalVisitor.java:77)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.config.Project.accept(Project.java:51) 
> ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:82)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:35)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitScreen(AbstractPhysicalVisitor.java:195)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.config.Screen.accept(Screen.java:97) 
> ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.SimpleParallelizer.generateWorkUnit(SimpleParallelizer.java:355)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.SimpleParallelizer.getFragments(SimpleParallelizer.java:134)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.getQueryWorkUnit(Foreman.java:518) 
> [drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:405) 
> [drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:926) 
> [drill-java-exec-1.5.0.jar:1.5.0]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4520) Error parsing JSON ( a column with different datatypes )

2016-03-19 Thread Shankar (JIRA)
Shankar created DRILL-4520:
--

 Summary: Error parsing JSON ( a column with different datatypes )
 Key: DRILL-4520
 URL: https://issues.apache.org/jira/browse/DRILL-4520
 Project: Apache Drill
  Issue Type: Test
Reporter: Shankar


I am stuck in the middle of somewhere. Could you please help me to resolve 
below error.
I am running query on drill 1.6.0 in cluster on logs json data (150GB size of 
log file) ( 1 json / line).

{quote}
solution as per my opinion - 
1. Either drill should able to ignore those lines(ANY data type) while reading 
or creating the table (CTAS).
2. Or Data will get stored as it is with ANY data type if any fields in data 
differs in their data types. This will be useful in the case where other 
columns (excluding ANY data type columns) carrying important informations.
{quote}


h4. -- test.json --

Abount Data : 
1. I have just extract 3 lines from logs for test purpose.
2. In data field called "ajaxUrl" is differ in datatype. Sometimes it contains 
string and sometime array of jsons and null as well. 
3. Here in our case - Some events in 150 gb json file are like this where they 
differ in structure. I could say there are only 0.1% (per 150gb json file) are 
such events.


{noformat}
{"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus1","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457658600032}
{"gameId":"https://daemon2.com/tournDetails.do?type=myGames=1556148_callback=jQuery213043","ajaxData":null,"metadata":null,"ajaxUrl":[{"R":0,"rNo":1,"gid":4,"wal":0,"d":{"gid":4,"pt":3,"wc":2326,"top":"1","reg":true,"brkt":1457771400268,"sk":"2507001010530109","id":56312439,"a":0,"st":145777140,"e":"0.0","j":0,"n":"Loot
 Qualifier 
1","tc":94,"et":0,"syst":1457771456,"rc":14577,"s":5,"t":1,"tk":false,"prnId":56311896,"jc":1,"tp":"10.0","ro":14540,"rp":0,"isprn":false},"fl":"192.168.35.42","aaid":"5828"}],"selectedItem":null,"sessionid":"D18104E8CA3071C7A8F4E141B127","timestamp":1457771458873}
{"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus2","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457958600032}
{noformat}


h4. -- Select Query  (ERROR) --

{noformat}
select
`timestamp`,
sessionid,
gameid,
ajaxUrl,
ajaxData
from dfs.`/tmp/test.json` t
;
{noformat}

{color:red}
Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you are 
using a ValueWriter of type NullableVarCharWriterImpl.

File  /tmp/test.json
Record  2
Fragment 0:0
{color}


h4. -- Select Query (works Fine with UNION type) --
Tried UNION type (an experimental feature)
set `exec.enable_union_type` = true;

{noformat}

set `exec.enable_union_type` = true;
+---+--+
|  ok   | summary  |
+---+--+
| true  | exec.enable_union_type updated.  |
+---+--+
1 row selected (0.193 seconds)



select
`timestamp`,
sessionid,
gameid,
ajaxUrl,
ajaxData
from dfs.`/tmp/test.json` t
;

+++--+---+---+
|   timestamp|   sessionid| 
   gameid|
ajaxUrl| ajaxData  |
+++--+---+---+
| 1457658600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null
 | 
/player/updatebonus1  | null  |
| 1457771458873  | D18104E8CA3071C7A8F4E141B127   | 
https://daemon2.com/tournDetails.do?type=myGames=1556148_callback=jQuery213043
  | []| null  |
| 1457958600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null
 | 
/player/updatebonus2  | null  |
+++--+---+---+
3 rows selected (0.965 seconds)

{noformat}



h4. -- CTAS Query (ERROR) --


{noformat}

set `exec.enable_union_type` = true;
+---+--+
|  ok   | summary  |
+---+--+
| true  | exec.enable_union_type updated.  |
+---+--+
1 row selected (0.193 seconds)


create table dfs.tmp.test1 AS 
select
`timestamp`,
sessionid,
gameid,
ajaxUrl,

[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-03-19 Thread Laurent Breuillard (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199633#comment-15199633
 ] 

Laurent Breuillard commented on DRILL-4203:
---

Hi all,

Have you some news about the pull request of this fix ? I saw that it is 
flagged 1.6.0 but the issue is still unresolved and release 1.6.0 is available 
since March 16, 2016.

Thank you

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Jason Altekruse
>Priority: Critical
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4517) Reading emtpy Parquet file failes with java.lang.IllegalArgumentException

2016-03-19 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199373#comment-15199373
 ] 

Khurram Faraaz commented on DRILL-4517:
---

This needs to be fixed sooner, that is because we need to use empty parquet 
files in Union All tests, to verify that empty input on either side of Union 
All operator is handled properly.

> Reading emtpy Parquet file failes with java.lang.IllegalArgumentException
> -
>
> Key: DRILL-4517
> URL: https://issues.apache.org/jira/browse/DRILL-4517
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Reporter: Tobias
>
> When querying a Parquet file that has a schema but no rows the Drill Server 
> will fail with the below
> This looks similar to DRILL-3557
> {noformat}
> {{ParquetMetaData{FileMetaData{schema: message TRANSACTION_REPORT {
>   required int64 MEMBER_ACCOUNT_ID;
>   required int64 TIMESTAMP_IN_HOUR;
>   optional int64 APPLICATION_ID;
> }
> , metadata: {}}}, blocks: []}
> {noformat}
> {noformat}
> Caused by: java.lang.IllegalArgumentException: MinorFragmentId 0 has no read 
> entries assigned
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) 
> ~[guava-14.0.1.jar:na]
> at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:707)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:105)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:68)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:35)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.base.AbstractGroupScan.accept(AbstractGroupScan.java:60)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:102)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:35)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitProject(AbstractPhysicalVisitor.java:77)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.config.Project.accept(Project.java:51) 
> ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:82)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:35)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitScreen(AbstractPhysicalVisitor.java:195)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.physical.config.Screen.accept(Screen.java:97) 
> ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.SimpleParallelizer.generateWorkUnit(SimpleParallelizer.java:355)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.planner.fragment.SimpleParallelizer.getFragments(SimpleParallelizer.java:134)
>  ~[drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.getQueryWorkUnit(Foreman.java:518) 
> [drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:405) 
> [drill-java-exec-1.5.0.jar:1.5.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:926) 
> [drill-java-exec-1.5.0.jar:1.5.0]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3745) Hive CHAR not supported

2016-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197551#comment-15197551
 ] 

ASF GitHub Bot commented on DRILL-3745:
---

Github user arina-ielchiieva closed the pull request at:

https://github.com/apache/drill/pull/399


> Hive CHAR not supported
> ---
>
> Key: DRILL-3745
> URL: https://issues.apache.org/jira/browse/DRILL-3745
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nathaniel Auvil
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> It doesn’t look like Drill 1.1.0 supports the Hive CHAR type?
> In Hive:
> create table development.foo
> (
>   bad CHAR(10)
> );
> And then in sqlline:
> > use `hive.development`;
> > select * from foo;
> Error: PARSE ERROR: Unsupported Hive data type CHAR.
> Following Hive data types are supported in Drill INFORMATION_SCHEMA:
> BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP,
> BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION
> [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] 
> (state=,code=0)
> This was originally found when getting failures trying to connect via JDBS 
> using Squirrel.  We have the Hive plugin enabled with tables using CHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4519) File system directory-based partition pruning doesn't work correctly with parquet metadata

2016-03-19 Thread Miroslav Holubec (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miroslav Holubec updated DRILL-4519:

Description: 
We have parquet files in folders with following convention /MM/DD/HH.
Without drill's parquet metadata directory prunning works seamlessly.
{noformat}
select dir0, dir1, dir2 from hdfs.test.indexed;
dir0 = ,  dir1 = MM, dir2 = DD, dir3 = HH
{noformat}
After creating metadata and executing same query, dir0 contains HH folder name 
instead yearly folder name. dir1...3 are null.
{noformat}
select dir0, dir1, dir2 from hdfs.test.indexed;
dir0 = HH,  dir1 = null, dir2 = null, dir3 = null
{noformat}



  was:
We have parquet files in folders with following convention /MM/DD/HH.
Without drill's parquet metadata directory prunning works seamlessly.
{noformat}
select dir0, dir1, dir2 from hdfs.test.indexed;
dir0 = ,  dir1 = MM, dir2 = DD, dir3 = HH
{noformat}
After creating metadata and executing same query, dir0 contains HH folder name 
instead yearly folder name. dir1...4 are null.
{noformat}
select dir0, dir1, dir2 from hdfs.test.indexed;
dir0 = HH,  dir1 = null, dir2 = null, dir3 = null
{noformat}




> File system directory-based partition pruning doesn't work correctly with 
> parquet metadata
> --
>
> Key: DRILL-4519
> URL: https://issues.apache.org/jira/browse/DRILL-4519
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Miroslav Holubec
>
> We have parquet files in folders with following convention /MM/DD/HH.
> Without drill's parquet metadata directory prunning works seamlessly.
> {noformat}
> select dir0, dir1, dir2 from hdfs.test.indexed;
> dir0 = ,  dir1 = MM, dir2 = DD, dir3 = HH
> {noformat}
> After creating metadata and executing same query, dir0 contains HH folder 
> name instead yearly folder name. dir1...3 are null.
> {noformat}
> select dir0, dir1, dir2 from hdfs.test.indexed;
> dir0 = HH,  dir1 = null, dir2 = null, dir3 = null
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4501) Complete MapOrListWriter for all supported data types

2016-03-19 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore resolved DRILL-4501.
---
Resolution: Fixed

Resolved by 
[245da97|https://fisheye6.atlassian.com/changelog/incubator-drill?cs=245da9790813569c5da9404e0fc5e45cc88e22bb].

> Complete MapOrListWriter for all supported data types
> -
>
> Key: DRILL-4501
> URL: https://issues.apache.org/jira/browse/DRILL-4501
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
> Fix For: 1.7.0
>
>
> This interface, at this time, does not include support for many data types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2223) Empty parquet file created with Limit 0 query errors out when querying

2016-03-19 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199393#comment-15199393
 ] 

Khurram Faraaz commented on DRILL-2223:
---

Like @amansinha100 said, this should at least write the schema information and 
metadata which will allow queries to run. I believe that is the correct 
approach to solve this problem.

> Empty parquet file created with Limit 0 query errors out when querying
> --
>
> Key: DRILL-2223
> URL: https://issues.apache.org/jira/browse/DRILL-2223
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 0.7.0
>Reporter: Aman Sinha
> Fix For: Future
>
>
> Doing a CTAS with limit 0 creates a 0 length parquet file which errors out 
> during querying.  This should at least write the schema information and 
> metadata which will allow queries to run. 
> {code}
> 0: jdbc:drill:zk=local> create table tt_nation2 as select n_nationkey, 
> n_name, n_regionkey from cp.`tpch/nation.parquet` limit 0;
> ++---+
> |  Fragment  | Number of records written |
> ++---+
> | 0_0| 0 |
> ++---+
> 1 row selected (0.315 seconds)
> 0: jdbc:drill:zk=local> select n_nationkey from tt_nation2;
> Query failed: RuntimeException: file:/tmp/tt_nation2/0_0_0.parquet is not a 
> Parquet file (too small)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2610) Local File System Storage Plugin

2016-03-19 Thread Austin Chungath Vincent (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201345#comment-15201345
 ] 

Austin Chungath Vincent commented on DRILL-2610:


Interesting, having access to the logs on every node is cool. I am going to try 
working on this.

> Local File System Storage Plugin
> 
>
> Key: DRILL-2610
> URL: https://issues.apache.org/jira/browse/DRILL-2610
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 0.8.0
>Reporter: Sudheesh Katkam
> Fix For: Future
>
>
> Create a storage plugin to query files on the local file system on the nodes 
> in the cluster. For example, users should be able to query log files in 
> /var/log/drill/ on all nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4317) Exceptions on SELECT and CTAS with large CSV files

2016-03-19 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197208#comment-15197208
 ] 

Deneche A. Hakim commented on DRILL-4317:
-

I found a bug in TextInput.updateLengthBasedOnConstraint() when Drill splits 
csv files. In most cases it works fine but when the split line ends with an 
empty value AND one of the previous rows in the same last batch contain a value 
in the last column we see the exception described above.

> Exceptions on SELECT and CTAS with large CSV files
> --
>
> Key: DRILL-4317
> URL: https://issues.apache.org/jira/browse/DRILL-4317
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.4.0, 1.5.0
> Environment: 4 node cluster, Hadoop 2.7.0, 14.04.1-Ubuntu
>Reporter: Matt Keranen
>Assignee: Deneche A. Hakim
>
> Selecting from a CSV file or running a CTAS into Parquet generates exceptions.
> Source file is ~650MB, a table of 4 key columns followed by 39 numeric data 
> columns, otherwise a fairly simple format. Example:
> {noformat}
> 2015-10-17 
> 00:00,f5e9v8u2,err,fr7,226020793,76.094,26307,226020793,76.094,26307,
> 2015-10-17 
> 00:00,c3f9x5z2,err,mi1,1339159295,216.004,177690,1339159295,216.004,177690,
> 2015-10-17 
> 00:00,r5z2f2i9,err,mi1,7159994629,39718.011,65793,6142021303,30687.811,64630,143777403,40.521,146,75503742,41.905,89,170771174,168.165,198,192565529,370.475,222,97577280,318.068,120,62631452,288.253,68,32371173,189.527,39,41712265,299.184,46,39046408,363.418,47,34182318,465.343,43,127834582,6485.341,145
> 2015-10-17 
> 00:00,j9s6i8t2,err,fr7,20580443899,277445.055,67826,2814893469,85447.816,54275,2584757097,608.001,2044,1395571268,769.113,1051,3070616988,3000.005,2284,3413811671,6489.060,2569,1772235156,5806.214,1339,1097879284,5064.120,858,691884865,4035.397,511,672967845,4815.875,518,789163614,7306.684,599,813910495,10632.464,627,1462752147,143470.306,1151
> {noformat}
> A "SELECT from `/path/to/file.csv`" runs for 10's of minutes and eventually 
> results in:
> {noformat}
> java.lang.IndexOutOfBoundsException: index: 547681, length: 1 (expected: 
> range(0, 547681))
> at 
> io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1134)
> at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:136)
> at io.netty.buffer.WrappedByteBuf.getBytes(WrappedByteBuf.java:289)
> at 
> io.netty.buffer.UnsafeDirectLittleEndian.getBytes(UnsafeDirectLittleEndian.java:26)
> at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
> at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
> at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
> at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
> at 
> org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:443)
> at 
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getBytes(VarCharAccessor.java:125)
> at 
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getString(VarCharAccessor.java:146)
> at 
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:136)
> at 
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:94)
> at 
> org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:148)
> at 
> org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getObject(TypeConvertingSqlAccessor.java:795)
> at 
> org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:179)
> at 
> net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:420)
> at sqlline.Rows$Row.(Rows.java:157)
> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:63)
> at 
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> at sqlline.SqlLine.print(SqlLine.java:1593)
> at sqlline.Commands.execute(Commands.java:852)
> at sqlline.Commands.sql(Commands.java:751)
> at sqlline.SqlLine.dispatch(SqlLine.java:746)
> at sqlline.SqlLine.begin(SqlLine.java:621)
> at sqlline.SqlLine.start(SqlLine.java:375)
> at sqlline.SqlLine.main(SqlLine.java:268)
> {noformat}
> A CTAS on the same file with storage as Parquet results in:
> {noformat}
> Error: SYSTEM ERROR: IllegalArgumentException: length: -260 (expected: >= 0)
> Fragment 1:2
> [Error Id: 1807615e-4385-4f85-8402-5900aaa568e9 on 

[jira] [Commented] (DRILL-4436) Result data gets mixed up when various tables have a column "label"

2016-03-19 Thread Serge Harnyk (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199767#comment-15199767
 ] 

Serge Harnyk commented on DRILL-4436:
-

I think the problem is more serious. I created two same tables Gender2 and 
Civility2 with columns "label2" in BOTH tables.

select gender2.label2 as label2 from postgres.public.gender2 join 
postgres.public.civility2 on genderId = civilityId
returns: civilityLabel

select civility2.label2 as label2 from postgres.public.gender2 join 
postgres.public.civility2 on genderId = civilityId
returns: null

select gender2.label2, civility2.label2 from postgres.public.gender2 join 
postgres.public.civility2 on genderId = civilityId
returns: civilityLabel  null

Project step works wrong when we select column with same name that in second 
table. Regardless its name.

> Result data gets mixed up when various tables have a column "label"
> ---
>
> Key: DRILL-4436
> URL: https://issues.apache.org/jira/browse/DRILL-4436
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
> Environment: Drill 1.5.0 with Zookeeper on CentOS 7.0 
>Reporter: Vincent Uribe
>Assignee: Serge Harnyk
>
> We have two tables in a MySQL database:
> CREATE TABLE `Gender` (
>   `genderId` bigint(20) NOT NULL AUTO_INCREMENT,
>   `label` varchar(15) NOT NULL,
>   PRIMARY KEY (`genderId`)
> ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1;
> CREATE TABLE `Civility` (
>   `civilityId` bigint(20) NOT NULL AUTO_INCREMENT,
>   `abbreviation` varchar(15) NOT NULL,
>   `label` varchar(60) DEFAULT NULL
>   PRIMARY KEY (`civilityId`)
> ) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1;
> With a query on these two tables with Gender.label as 'gender' and 
> Civility.label as 'civility', we obtain, depending of the query :
> * gender in civility
> * civility in the gender
> * NULL in the other column (gender or civility)
> if we drop the table Gender and recreate it with like this:
> CREATE TABLE `Gender` (
>   `genderId` bigint(20) NOT NULL AUTO_INCREMENT,
>   `label2` varchar(15) NOT NULL,
>   PRIMARY KEY (`genderId`)select * from Gender
> ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1;
> Everything is fine.
> I guess something is wrong with the metadata...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4398) SYSTEM ERROR: IllegalStateException: Memory was leaked by query

2016-03-19 Thread Matt Keranen (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201810#comment-15201810
 ] 

Matt Keranen commented on DRILL-4398:
-

Getting similar in 1.6.0 with CTAS into Parquet from csv data stored in HDFS.

> SYSTEM ERROR: IllegalStateException: Memory was leaked by query
> ---
>
> Key: DRILL-4398
> URL: https://issues.apache.org/jira/browse/DRILL-4398
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
>Reporter: N Campbell
>Assignee: Taras Supyk
>
> Several queries fail with memory leaked errors
> select tjoin2.rnum, tjoin1.c1, tjoin2.c1 as c1j2, tjoin2.c2 as c2j2 from 
> postgres.public.tjoin1 full outer join postgres.public.tjoin2 on tjoin1.c1 = 
> tjoin2.c1
> select tjoin1.rnum, tjoin1.c1, tjoin2.c1 as c1j2, tjoin2.c2 from 
> postgres.public.tjoin1, lateral ( select tjoin2.c1, tjoin2.c2 from 
> postgres.public.tjoin2 where tjoin1.c1=tjoin2.c1) tjoin2
> SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory 
> leaked: (40960)
> Allocator(op:0:0:3:JdbcSubScan) 100/40960/135168/100 
> (res/actual/peak/limit)
> create table TJOIN1 (RNUM integer   not null , C1 integer, C2 integer);
> insert into TJOIN1 (RNUM, C1, C2) values ( 0, 10, 15);
> insert into TJOIN1 (RNUM, C1, C2) values ( 1, 20, 25);
> insert into TJOIN1 (RNUM, C1, C2) values ( 2, NULL, 50);
> create table TJOIN2 (RNUM integer   not null , C1 integer, C2 char(2));
> insert into TJOIN2 (RNUM, C1, C2) values ( 0, 10, 'BB');
> insert into TJOIN2 (RNUM, C1, C2) values ( 1, 15, 'DD');
> insert into TJOIN2 (RNUM, C1, C2) values ( 2, NULL, 'EE');
> insert into TJOIN2 (RNUM, C1, C2) values ( 3, 10, 'FF');



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4405) invalid Postgres SQL generated for CONCAT (literal, literal)

2016-03-19 Thread Serge Harnyk (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197767#comment-15197767
 ] 

Serge Harnyk commented on DRILL-4405:
-

Calcite doesnt have a CONCAT() as function, only "||" operator.
When Drill parse query it set DrillSqlOperator as SqlOperator. And 
DrillSqlOperator on the inferReturnType step has only two options for return: 
Boolean for MinorType.BIT and "ANY" for another. Thats affects a lot of 
non-calcite functions like "PI".


> invalid Postgres SQL generated for CONCAT (literal, literal) 
> -
>
> Key: DRILL-4405
> URL: https://issues.apache.org/jira/browse/DRILL-4405
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
>Reporter: N Campbell
>Assignee: Serge Harnyk
>
> select concat( 'FF' , 'FF' )  from postgres.public.tversion
> Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the 
> SQL query. 
> sql SELECT CAST('' AS ANY) AS "EXPR$0"
> FROM "public"."tversion"
> plugin postgres
> Fragment 0:0
> [Error Id: c3f24106-8d75-4a57-a638-ac5f0aca0769 on centos1:31010]
>   (org.postgresql.util.PSQLException) ERROR: syntax error at or near "ANY"
>   Position: 23
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse():2182
> org.postgresql.core.v3.QueryExecutorImpl.processResults():1911
> org.postgresql.core.v3.QueryExecutorImpl.execute():173
> org.postgresql.jdbc.PgStatement.execute():622
> org.postgresql.jdbc.PgStatement.executeWithFlags():458
> org.postgresql.jdbc.PgStatement.executeQuery():374
> org.apache.commons.dbcp.DelegatingStatement.executeQuery():208
> org.apache.commons.dbcp.DelegatingStatement.executeQuery():208
> org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup():177
> org.apache.drill.exec.physical.impl.ScanBatch.():108
> org.apache.drill.exec.physical.impl.ScanBatch.():136
> org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():40
> org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():33
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():147
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():170
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():101
> org.apache.drill.exec.physical.impl.ImplCreator.getExec():79
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():230
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
> SQLState:  null
> ErrorCode: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4409) projecting literal will result in an empty resultset

2016-03-19 Thread Serge Harnyk (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197798#comment-15197798
 ] 

Serge Harnyk commented on DRILL-4409:
-

PostgreSQL JDBC Driver returns  metadata code for literal parts of query, 
which seems java.sql.Types.OTHER. PostgreSQL itself doesnt treat literals as 
VARCHAR or another string type.
For example, MySQL driver returns 12 metadata code, which seems 
java.sql.Types.VARCHAR.
When Drill faces java.sql.Types.OTHER it skips work with cell: 
org/apache/drill/exec/store/jdbc/JdbcRecordReader.java, line 190

I think Taras cant reproduced this bug on Oracle for same reason.

> projecting literal will result in an empty resultset
> 
>
> Key: DRILL-4409
> URL: https://issues.apache.org/jira/browse/DRILL-4409
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
>Reporter: N Campbell
>Assignee: Serge Harnyk
>
> A query which projects a literal as shown against a Postgres table will 
> result in an empty result set being returned. 
> select 'BB' from postgres.public.tversion



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4517) Reading emtpy Parquet file failes with java.lang.IllegalArgumentException

2016-03-19 Thread Tobias (JIRA)
Tobias created DRILL-4517:
-

 Summary: Reading emtpy Parquet file failes with 
java.lang.IllegalArgumentException
 Key: DRILL-4517
 URL: https://issues.apache.org/jira/browse/DRILL-4517
 Project: Apache Drill
  Issue Type: Bug
  Components:  Server
Reporter: Tobias


When querying a Parquet file that has a schema but no rows the Drill Server 
will fail with the below
This looks similar to DRILL-3557
{noformat}
{{ParquetMetaData{FileMetaData{schema: message TRANSACTION_REPORT {
  required int64 MEMBER_ACCOUNT_ID;
  required int64 TIMESTAMP_IN_HOUR;
  optional int64 APPLICATION_ID;
}
, metadata: {}}}, blocks: []}
{noformat}

{noformat}
Caused by: java.lang.IllegalArgumentException: MinorFragmentId 0 has no read 
entries assigned
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) 
~[guava-14.0.1.jar:na]
at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:707)
 ~[drill-java-exec-1.5.0.jar:1.5.0]
at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:105)
 ~[drill-java-exec-1.5.0.jar:1.5.0]
at 
org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:68)
 ~[drill-java-exec-1.5.0.jar:1.5.0]
at 
org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:35)
 ~[drill-java-exec-1.5.0.jar:1.5.0]
at 
org.apache.drill.exec.physical.base.AbstractGroupScan.accept(AbstractGroupScan.java:60)
 ~[drill-java-exec-1.5.0.jar:1.5.0]
at 
org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:102)
 ~[drill-java-exec-1.5.0.jar:1.5.0]
at 
org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:35)
 ~[drill-java-exec-1.5.0.jar:1.5.0]
at 
org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitProject(AbstractPhysicalVisitor.java:77)
 ~[drill-java-exec-1.5.0.jar:1.5.0]
at 
org.apache.drill.exec.physical.config.Project.accept(Project.java:51) 
~[drill-java-exec-1.5.0.jar:1.5.0]
at 
org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:82)
 ~[drill-java-exec-1.5.0.jar:1.5.0]
at 
org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:35)
 ~[drill-java-exec-1.5.0.jar:1.5.0]
at 
org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitScreen(AbstractPhysicalVisitor.java:195)
 ~[drill-java-exec-1.5.0.jar:1.5.0]
at org.apache.drill.exec.physical.config.Screen.accept(Screen.java:97) 
~[drill-java-exec-1.5.0.jar:1.5.0]
at 
org.apache.drill.exec.planner.fragment.SimpleParallelizer.generateWorkUnit(SimpleParallelizer.java:355)
 ~[drill-java-exec-1.5.0.jar:1.5.0]
at 
org.apache.drill.exec.planner.fragment.SimpleParallelizer.getFragments(SimpleParallelizer.java:134)
 ~[drill-java-exec-1.5.0.jar:1.5.0]
at 
org.apache.drill.exec.work.foreman.Foreman.getQueryWorkUnit(Foreman.java:518) 
[drill-java-exec-1.5.0.jar:1.5.0]
at 
org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:405) 
[drill-java-exec-1.5.0.jar:1.5.0]
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:926) 
[drill-java-exec-1.5.0.jar:1.5.0]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4459) SchemaChangeException while querying hive json table

2016-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197507#comment-15197507
 ] 

ASF GitHub Bot commented on DRILL-4459:
---

Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/431#discussion_r56354881
  
--- Diff: 
contrib/storage-hive/core/src/test/java/org/apache/drill/exec/fn/hive/TestInbuiltHiveUDFs.java
 ---
@@ -43,4 +47,17 @@ public void testEncode() throws Exception {
 .baselineValues(new Object[] { null })
 .go();
   }
+
+   @Test // DRILL-4459
+   public void testGetJsonObject() throws Exception {
+setColumnWidths(new int[]{260});
+String query = "select * from hive.simple_json where 
GET_JSON_OBJECT(simple_json.json, '$.DocId') = 'DocId2'";
+List results = testSqlWithResults(query);
+String expected = "json\n" + 
"{\"DocId\":\"DocId2\",\"User\":{\"Id\":122,\"Username\":\"larry122\",\"Name\":"
 +
--- End diff --

Can you specify this baseline as a complex object instead of a string? The 
testBuilder can be used to check results against java POJOs and it includes 
helper methods listOF/mapOf for building up complex structures.


> SchemaChangeException while querying hive json table
> 
>
> Key: DRILL-4459
> URL: https://issues.apache.org/jira/browse/DRILL-4459
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill, Functions - Hive
>Affects Versions: 1.4.0
> Environment: MapR-Drill 1.4.0
> Hive-1.2.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
>
> getting the SchemaChangeException while querying json documents stored in 
> hive table.
> {noformat}
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> {noformat}
> minimum reproduce
> {noformat}
> created sample json documents using the attached script(randomdata.sh)
> hive>create table simplejson(json string);
> hive>load data local inpath '/tmp/simple.json' into table simplejson;
> now query it through Drill.
> Drill Version
> select * from sys.version;
> +---++-+-++
> | commit_id | commit_message | commit_time | build_email | build_time |
> +---++-+-++
> | eafe0a245a0d4c0234bfbead10c6b2d7c8ef413d | DRILL-3901:  Don't do early 
> expansion of directory in the non-metadata-cache case because it already 
> happens during ParquetGroupScan's metadata gathering operation. | 07.10.2015 
> @ 17:12:57 UTC | Unknown | 07.10.2015 @ 17:36:16 UTC |
> +---++-+-++
> 0: jdbc:drill:zk=> select * from hive.`default`.simplejson where 
> GET_JSON_OBJECT(simplejson.json, '$.DocId') = 'DocId2759947' limit 1;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 1:1
> [Error Id: 74f054a8-6f1d-4ddd-9064-3939fcc82647 on ip-10-0-0-233:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3745) Hive CHAR not supported

2016-03-19 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-3745:

Fix Version/s: (was: 1.6.0)
   1.7.0

> Hive CHAR not supported
> ---
>
> Key: DRILL-3745
> URL: https://issues.apache.org/jira/browse/DRILL-3745
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nathaniel Auvil
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.7.0
>
>
> It doesn’t look like Drill 1.1.0 supports the Hive CHAR type?
> In Hive:
> create table development.foo
> (
>   bad CHAR(10)
> );
> And then in sqlline:
> > use `hive.development`;
> > select * from foo;
> Error: PARSE ERROR: Unsupported Hive data type CHAR.
> Following Hive data types are supported in Drill INFORMATION_SCHEMA:
> BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP,
> BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION
> [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] 
> (state=,code=0)
> This was originally found when getting failures trying to connect via JDBS 
> using Squirrel.  We have the Hive plugin enabled with tables using CHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3993) Rebase Drill on Calcite 1.7.0 release

2016-03-19 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-3993:
--
Summary: Rebase Drill on Calcite 1.7.0 release  (was: Rebase Drill on 
Calcite 1.5.0 release)

> Rebase Drill on Calcite 1.7.0 release
> -
>
> Key: DRILL-3993
> URL: https://issues.apache.org/jira/browse/DRILL-3993
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Sudheesh Katkam
>Assignee: Jacques Nadeau
>
> Calcite keeps moving, and now we need to catch up to Calcite 1.5, and ensure 
> there are no regressions.
> Also, how do we resolve this 'catching up' issue in the long term?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4317) Exceptions on SELECT and CTAS with large CSV files

2016-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197382#comment-15197382
 ] 

ASF GitHub Bot commented on DRILL-4317:
---

GitHub user adeneche opened a pull request:

https://github.com/apache/drill/pull/432

DRILL-4317: Exceptions on SELECT and CTAS with large CSV files



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adeneche/incubator-drill DRILL-4317

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #432


commit 5813c8684c1900a156a82c0651914f97aeb87f6f
Author: adeneche 
Date:   2016-03-16T13:47:18Z

DRILL-4317: Exceptions on SELECT and CTAS with large CSV files




> Exceptions on SELECT and CTAS with large CSV files
> --
>
> Key: DRILL-4317
> URL: https://issues.apache.org/jira/browse/DRILL-4317
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.4.0, 1.5.0
> Environment: 4 node cluster, Hadoop 2.7.0, 14.04.1-Ubuntu
>Reporter: Matt Keranen
>Assignee: Deneche A. Hakim
>
> Selecting from a CSV file or running a CTAS into Parquet generates exceptions.
> Source file is ~650MB, a table of 4 key columns followed by 39 numeric data 
> columns, otherwise a fairly simple format. Example:
> {noformat}
> 2015-10-17 
> 00:00,f5e9v8u2,err,fr7,226020793,76.094,26307,226020793,76.094,26307,
> 2015-10-17 
> 00:00,c3f9x5z2,err,mi1,1339159295,216.004,177690,1339159295,216.004,177690,
> 2015-10-17 
> 00:00,r5z2f2i9,err,mi1,7159994629,39718.011,65793,6142021303,30687.811,64630,143777403,40.521,146,75503742,41.905,89,170771174,168.165,198,192565529,370.475,222,97577280,318.068,120,62631452,288.253,68,32371173,189.527,39,41712265,299.184,46,39046408,363.418,47,34182318,465.343,43,127834582,6485.341,145
> 2015-10-17 
> 00:00,j9s6i8t2,err,fr7,20580443899,277445.055,67826,2814893469,85447.816,54275,2584757097,608.001,2044,1395571268,769.113,1051,3070616988,3000.005,2284,3413811671,6489.060,2569,1772235156,5806.214,1339,1097879284,5064.120,858,691884865,4035.397,511,672967845,4815.875,518,789163614,7306.684,599,813910495,10632.464,627,1462752147,143470.306,1151
> {noformat}
> A "SELECT from `/path/to/file.csv`" runs for 10's of minutes and eventually 
> results in:
> {noformat}
> java.lang.IndexOutOfBoundsException: index: 547681, length: 1 (expected: 
> range(0, 547681))
> at 
> io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1134)
> at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:136)
> at io.netty.buffer.WrappedByteBuf.getBytes(WrappedByteBuf.java:289)
> at 
> io.netty.buffer.UnsafeDirectLittleEndian.getBytes(UnsafeDirectLittleEndian.java:26)
> at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
> at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
> at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
> at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
> at 
> org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:443)
> at 
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getBytes(VarCharAccessor.java:125)
> at 
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getString(VarCharAccessor.java:146)
> at 
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:136)
> at 
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:94)
> at 
> org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:148)
> at 
> org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getObject(TypeConvertingSqlAccessor.java:795)
> at 
> org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:179)
> at 
> net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:420)
> at sqlline.Rows$Row.(Rows.java:157)
> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:63)
> at 
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> at sqlline.SqlLine.print(SqlLine.java:1593)
> at sqlline.Commands.execute(Commands.java:852)
> at 

[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-03-19 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200100#comment-15200100
 ] 

Victoria Markman commented on DRILL-4392:
-

This is fixed now,  test is passing in the latest nightly precommit run: 
http://10.10.104.91:8080/view/Nightly/job/Functional-Baseline-104.61/151/consoleFull

> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Blocker
> Fix For: 1.6.0
>
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 
> pinto be  | true
> {code}
> This basically breaks all the parquet files created by Drill's CTAS with 
> partition support. 
> Also, it will also fail one of the Pre-commit functional test [1]
> [1] 
> https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3549) Default value for planner.memory.max_query_memory_per_node needs to be increased

2016-03-19 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197897#comment-15197897
 ] 

Victoria Markman commented on DRILL-3549:
-

I hope this setting will not be hard coded, but calculated based on cluster 
settings + whatever else needs to be taken into consideration ...

> Default value for planner.memory.max_query_memory_per_node needs to be 
> increased
> 
>
> Key: DRILL-3549
> URL: https://issues.apache.org/jira/browse/DRILL-3549
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Abhishek Girish
>Assignee: Deneche A. Hakim
>Priority: Critical
>  Labels: usability
> Fix For: 1.7.0
>
>
> The current default value for planner.memory.max_query_memory_per_node is 
> 2147483648 (2 GB). This value is not enough, given the addition of window 
> function support. Most queries on reasonably sized data & cluster setup fail 
> with OOM due to insufficient memory. 
> The improve usability, the default needs to be increased to a reasonably 
> sized value (could be determined based on Drill Max Direct Memory).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)