[jira] [Created] (DRILL-7147) Source order of "drill-env.sh" and "distrib-env.sh" should be swapped

2019-04-01 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-7147:
--

 Summary: Source order of "drill-env.sh" and "distrib-env.sh" 
should be swapped
 Key: DRILL-7147
 URL: https://issues.apache.org/jira/browse/DRILL-7147
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.15.0
Reporter: Hao Zhu


In bin/drill-config.sh, the description of the source order is:
{code:java}
# Variables may be set in one of four places:
#
#   Environment (per run)
#   drill-env.sh (per site)
#   distrib-env.sh (per distribution)
#   drill-config.sh (this file, Drill defaults)
#
# Properties "inherit" from items lower on the list, and may be "overridden" by 
items
# higher on the list. In the environment, just set the variable:
{code}
However actually bin/drill-config.sh sources drill-env.sh firstly, and then 
distrib-env.sh.
{code:java}
drillEnv="$DRILL_CONF_DIR/drill-env.sh"
if [ -r "$drillEnv" ]; then
  . "$drillEnv"
fi
...

distribEnv="$DRILL_CONF_DIR/distrib-env.sh"
if [ -r "$distribEnv" ]; then
  . "$distribEnv"
else
  distribEnv="$DRILL_HOME/conf/distrib-env.sh"
  if [ -r "$distribEnv" ]; then
. "$distribEnv"
  fi
fi

{code}
We need to swap the source order of drill-env.sh and distrib-env.sh.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-5436) Need a way to input password which contains space when calling sqlline

2017-04-13 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-5436:
--

 Summary: Need a way to input password which contains space when 
calling sqlline
 Key: DRILL-5436
 URL: https://issues.apache.org/jira/browse/DRILL-5436
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - CLI
Affects Versions: 1.10.0
Reporter: Hao Zhu


create a user named "spaceuser" with password "hello world".
All below failed:
{code}
sqlline -u jdbc:drill:zk=xxx -n spaceuser -p 'hello world'
sqlline -u jdbc:drill:zk=xxx -n spaceuser -p "hello world"
sqlline -u jdbc:drill:zk=xxx -n spaceuser -p 'hello\ world'
sqlline -u jdbc:drill:zk=xxx -n spaceuser -p "hello\ world"
{code}

Need a way to input password which contains space when calling sqlline



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4101) The argument 'pattern' of Function 'like' has to be constant!

2016-06-02 Thread Hao Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312582#comment-15312582
 ] 

Hao Zhu commented on DRILL-4101:


Hi [~david_hudavy]

Could you open a case for this one?

Thanks,
Hao

> The argument 'pattern' of Function 'like' has to be constant!
> -
>
> Key: DRILL-4101
> URL: https://issues.apache.org/jira/browse/DRILL-4101
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
> Environment: drill1.2
>Reporter: david_hudavy
>
> 0: jdbc:drill:zk=local> select * from dfs.tmp.ta limit 10;
> +--+--+
> |  rdn_4   |   imsi   |
> +--+--+
> | mscId=UPG00494500412500  | 272004500412500  |
> | mscId=UPG00494500436500  | 272004500436500  |
> | mscId=UPG00494501833000  | 272004501833000  |
> | mscId=UPG00494502712000  | 272004502712000  |
> | mscId=UPG00494502732500  | 272004502732500  |
> | mscId=UPG00494502845500  | 272004502845500  |
> | mscId=UPG00494505721000  | 272004505721000  |
> | mscId=UPG00494507227500  | 272004507227500  |
> | mscId=UPG00494509548500  | 272004509548500  |
> | mscId=UPG00494501644500  | 272004501644500  |
> +--+--+
> 10 rows selected (0.344 seconds)
> 0: jdbc:drill:zk=local> select * from dfs.tmp.tb;
> +-+-+
> |  rdn_4  | epsvplmnid  |
> +-+-+
> | mscId=149000579913  | 46000   |
> | mscId=149000579912  | 262280  |
> +-+-+
> 2 rows selected (0.112 seconds)
> SELECT count(*) AS cnt
> FROM dfs.tmp.ta,dfs.tmp.tb
> WHERE ta.rdn_4 = tb.rdn_4
> AND ta.imsi NOT LIKE concat(tb.epsvplmnid,'%')
> Error: SYSTEM ERROR: DrillRuntimeException: The argument 'pattern' of 
> Function 'like' has to be constant!
> Fragment 0:0
> [Error Id: f103529c-60f4-4b8f-8d7a-b1f0619aab30 on vm1-4:31010] 
> (state=,code=0)
> [Error Id: f103529c-60f4-4b8f-8d7a-b1f0619aab30 on vm1-4:31010]
> at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:110) 
> [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61) 
> [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233) 
> [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:205) 
> [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>  [netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>  [netty-handler-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>  [netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transp

[jira] [Created] (DRILL-3798) Cannot group by the functions without ()

2015-09-17 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-3798:
--

 Summary: Cannot group by the functions without ()
 Key: DRILL-3798
 URL: https://issues.apache.org/jira/browse/DRILL-3798
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.1.0
Reporter: Hao Zhu
Assignee: Mehant Baid


Drill can not group-by the function without ().
eg:
{code}
SELECT CURRENT_DATE 
FROM hive.h1db.testdate
group  by CURRENT_DATE;


  Caused By (org.apache.calcite.sql.validate.SqlValidatorException) Column 
'CURRENT_DATE' not found in any table
{code}

Bad ones:
{code}
SELECT CURRENT_TIME 
FROM hive.h1db.testdate
group  by CURRENT_TIME;

SELECT CURRENT_TIMESTAMP 
FROM hive.h1db.testdate
group  by CURRENT_TIMESTAMP;

SELECT LOCALTIME 
FROM hive.h1db.testdate
group  by LOCALTIME;

SELECT LOCALTIMESTAMP 
FROM hive.h1db.testdate
group  by LOCALTIMESTAMP;
{code}

Good ones:
{code}
SELECT NOW()
FROM hive.h1db.testdate
group  by NOW();

SELECT TIMEOFDAY()
FROM hive.h1db.testdate
group  by TIMEOFDAY();

SELECT UNIX_TIMESTAMP()
FROM hive.h1db.testdate
group  by UNIX_TIMESTAMP();

SELECT PI()
FROM hive.h1db.testdate
group  by  PI();
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3735) Directory pruning is not happening when number of files is larger than 64k

2015-09-02 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-3735:
--

 Summary: Directory pruning is not happening when number of files 
is larger than 64k
 Key: DRILL-3735
 URL: https://issues.apache.org/jira/browse/DRILL-3735
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.1.0
Reporter: Hao Zhu
Assignee: Jinfeng Ni


When the number of files is larger than 64k limit, directory pruning is not 
happening. 
We need to increase this limit further to handle most use cases.

My proposal is to separate the code for directory pruning and partition 
pruning. 
Say in a parent directory there are 100 directories and 1 million files.
If we only query the file from one directory, we should firstly read the 100 
directories and narrow down to which directory; and then read the file paths in 
that directory in memory and do the rest stuff.

Current behavior is , Drill will read all the file paths of that 1 million 
files in memory firstly, and then do directory pruning or partition pruning. 
This is not performance efficient nor memory efficient. And also it can not 
scale.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3727) Drill should return NULL instead of failure if cast column is empty

2015-08-31 Thread Hao Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724471#comment-14724471
 ] 

Hao Zhu commented on DRILL-3727:


PostgreSQL's behavior is the similar as Drill. 
{code}
test=# create table testempty(col0 varchar);
CREATE TABLE

test=# insert into testempty values
test-# ('');
INSERT 0 1

test=# insert into testempty values('2015-01-01');
INSERT 0 1

test=# select * from testempty ;
col0


 2015-01-01
(2 rows)

test=# select cast(col0 as date) from testempty;
ERROR:  invalid input syntax for type date: ""

test=# select case when col0='' then null else cast(col0 as date) end from 
testempty;
col0


 2015-01-01
(2 rows)

{code}

> Drill should return NULL instead of failure if cast column is empty
> ---
>
> Key: DRILL-3727
> URL: https://issues.apache.org/jira/browse/DRILL-3727
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive
>Affects Versions: 1.1.0
> Environment: 1.1
>Reporter: Hao Zhu
>Assignee: Mehant Baid
>
> If Drill is casting an empty string to date, it will fail with error:
> Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must 
> be in the range [1,12]
> However Hive can just return a NULL instead.
> I think it makes sense for Drill to have the same behavior as Hive in this 
> case.
> Repro:
> Hive:
> {code}
> create table h1db.testempty(col0 string)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
> STORED AS TEXTFILE
> ;
> hive> select * from h1db.testempty ;
> OK
> 2015-01-01
> Time taken: 0.28 seconds, Fetched: 2 row(s)
> hive> select cast(col0 as date) from  h1db.testempty;
> OK
> NULL
> 2015-01-01
> Time taken: 0.078 seconds, Fetched: 2 row(s)
> {code}
> Drill:
> {code}
> use hive;
> > select * from h1db.testempty ;
> +-+
> |col0 |
> +-+
> | |
> | 2015-01-01  |
> +-+
> 2 rows selected (0.232 seconds)
> > select cast(col0 as date) from  h1db.testempty;
> Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must 
> be in the range [1,12]
> {code}
> Workaround:
> {code}
> > select case when col0='' then null else cast(col0 as date) end from  
> > h1db.testempty;
> +-+
> |   EXPR$0|
> +-+
> | null|
> | 2015-01-01  |
> +-+
> 2 rows selected (0.287 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3727) Drill should return NULL instead of failure if cast column is empty

2015-08-31 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-3727:
--

 Summary: Drill should return NULL instead of failure if cast 
column is empty
 Key: DRILL-3727
 URL: https://issues.apache.org/jira/browse/DRILL-3727
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Hive
Affects Versions: 1.1.0
 Environment: 1.1
Reporter: Hao Zhu
Assignee: Mehant Baid


If Drill is casting an empty string to date, it will fail with error:
Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must 
be in the range [1,12]
However Hive can just return a NULL instead.
I think it makes sense for Drill to have the same behavior as Hive in this case.

Repro:
Hive:
{code}
create table h1db.testempty(col0 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
;

hive> select * from h1db.testempty ;
OK

2015-01-01
Time taken: 0.28 seconds, Fetched: 2 row(s)

hive> select cast(col0 as date) from  h1db.testempty;
OK
NULL
2015-01-01
Time taken: 0.078 seconds, Fetched: 2 row(s)
{code}

Drill:
{code}
use hive;
> select * from h1db.testempty ;
+-+
|col0 |
+-+
| |
| 2015-01-01  |
+-+
2 rows selected (0.232 seconds)

> select cast(col0 as date) from  h1db.testempty;
Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must 
be in the range [1,12]
{code}

Workaround:
{code}
> select case when col0='' then null else cast(col0 as date) end from  
> h1db.testempty;
+-+
|   EXPR$0|
+-+
| null|
| 2015-01-01  |
+-+
2 rows selected (0.287 seconds)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable

2015-08-25 Thread Hao Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712244#comment-14712244
 ] 

Hao Zhu commented on DRILL-3710:


a. No optimization
{code}
explain plan for
select count(1) from h1_passwords where cast(col2 as int) in 
(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19);
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  StreamAgg(group=[{}], EXPR$0=[COUNT()])
00-02Project($f0=[1])
00-03  SelectionVectorRemover
00-04Filter(condition=[OR(=(CAST($0):INTEGER, 1), 
=(CAST($0):INTEGER, 2), =(CAST($0):INTEGER, 3), =(CAST($0):INTEGER, 4), 
=(CAST($0):INTEGER, 5), =(CAST($0):INTEGER, 6), =(CAST($0):INTEGER, 7), 
=(CAST($0):INTEGER, 8), =(CAST($0):INTEGER, 9), =(CAST($0):INTEGER, 10), 
=(CAST($0):INTEGER, 11), =(CAST($0):INTEGER, 12), =(CAST($0):INTEGER, 13), 
=(CAST($0):INTEGER, 14), =(CAST($0):INTEGER, 15), =(CAST($0):INTEGER, 16), 
=(CAST($0):INTEGER, 17), =(CAST($0):INTEGER, 18), =(CAST($0):INTEGER, 19))])
00-05  Scan(groupscan=[HiveScan [table=Table(dbName:default, 
tableName:h1_passwords), 
inputSplits=[maprfs:///user/hive/warehouse/h1_passwords/passwd:0+1680], 
columns=[`col2`], partitions= null]])
{code}
b. With optimization
{code}
explain plan for
select count(1) from h1_passwords where cast(col2 as int) in 
(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20);
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  StreamAgg(group=[{}], EXPR$0=[COUNT()])
00-02Project($f0=[1])
00-03  Project(f6=[$1], ROW_VALUE=[$0])
00-04MergeJoin(condition=[=($1, $0)], joinType=[inner])
00-06  SelectionVectorRemover
00-08Sort(sort0=[$0], dir0=[ASC])
00-10  HashAgg(group=[{0}])
00-12Values
00-05  SelectionVectorRemover
00-07Sort(sort0=[$0], dir0=[ASC])
00-09  Project(f6=[CAST($0):INTEGER])
00-11Scan(groupscan=[HiveScan [table=Table(dbName:default, 
tableName:h1_passwords), 
inputSplits=[maprfs:///user/hive/warehouse/h1_passwords/passwd:0+1680], 
columns=[`col2`], partitions= null]])
{code}

> Make the 20 in-list optimization configurable
> -
>
> Key: DRILL-3710
> URL: https://issues.apache.org/jira/browse/DRILL-3710
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.1.0
>Reporter: Hao Zhu
>Assignee: Jinfeng Ni
>
> If Drill has more than 20 in-lists , Drill can do an optimization to convert 
> that in-lists into a small hash table in memory, and then do a table join 
> instead.
> This can improve the performance of the query which has many in-lists.
> Could we make "20" configurable? So that we do not need to add duplicate/junk 
> in-list to make it more than 20.
> Sample query is :
> select count(*) from table where col in 
> (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3710) Make the 20 in-list optimization configurable

2015-08-25 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-3710:
--

 Summary: Make the 20 in-list optimization configurable
 Key: DRILL-3710
 URL: https://issues.apache.org/jira/browse/DRILL-3710
 Project: Apache Drill
  Issue Type: Improvement
  Components: Query Planning & Optimization
Affects Versions: 1.1.0
Reporter: Hao Zhu
Assignee: Jinfeng Ni


If Drill has more than 20 in-lists , Drill can do an optimization to convert 
that in-lists into a small hash table in memory, and then do a table join 
instead.
This can improve the performance of the query which has many in-lists.
Could we make "20" configurable? So that we do not need to add duplicate/junk 
in-list to make it more than 20.

Sample query is :
select count(*) from table where col in 
(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3688) Drill should honor "skip.header.line.count" attribute of Hive table

2015-08-21 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-3688:
--

 Summary: Drill should honor "skip.header.line.count" attribute of 
Hive table
 Key: DRILL-3688
 URL: https://issues.apache.org/jira/browse/DRILL-3688
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.1.0
 Environment: 1.1
Reporter: Hao Zhu
Assignee: Jinfeng Ni


Currently Drill does not honor the "skip.header.line.count" attribute of Hive 
table.
It may cause some other format conversion issue.

Reproduce:

1. Create a Hive table
{code}
create table h1db.testheader(col0 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
tblproperties("skip.header.line.count"="1");
{code}
2. Prepare a sample data:
{code}
# cat test.data
col0
2015-01-01
{code}
3. Load sample data into Hive
{code}
LOAD DATA LOCAL INPATH '/xxx/test.data' OVERWRITE INTO TABLE h1db.testheader;
{code}
4. Hive
{code}
hive> select * from h1db.testheader ;
OK
2015-01-01
Time taken: 0.254 seconds, Fetched: 1 row(s)
{code}
5. Drill
{code}
>  select * from hive.h1db.testheader ;
+-+
|col0 |
+-+
| col0|
| 2015-01-01  |
+-+
2 rows selected (0.257 seconds)

> select cast(col0 as date) from hive.h1db.testheader ;
Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must 
be in the range [1,12]

Fragment 0:0

[Error Id: 34353702-ca27-440b-a4f4-0c9f79fc8ccd on h1.poc.com:31010]

  (org.joda.time.IllegalFieldValueException) Value 0 for monthOfYear must be in 
the range [1,12]
org.joda.time.field.FieldUtils.verifyValueBounds():236
org.joda.time.chrono.BasicChronology.getDateMidnightMillis():613
org.joda.time.chrono.BasicChronology.getDateTimeMillis():159
org.joda.time.chrono.AssembledChronology.getDateTimeMillis():120
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.memGetDate():261
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getDate():218
org.apache.drill.exec.test.generated.ProjectorGen0.doEval():67
org.apache.drill.exec.test.generated.ProjectorGen0.projectRecords():62
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():172
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
org.apache.drill.exec.record.AbstractRecordBatch.next():147
org.apache.drill.exec.physical.impl.BaseRootExec.next():83
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():79
org.apache.drill.exec.physical.impl.BaseRootExec.next():73
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():261
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():255
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1566
org.apache.drill.exec.work.fragment.FragmentExecutor.run():255
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745 (state=,code=0)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3678) Plan generating for Drill on Hive takes huge java heap size

2015-08-20 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-3678:
--

 Summary: Plan generating for Drill on Hive takes huge java heap 
size
 Key: DRILL-3678
 URL: https://issues.apache.org/jira/browse/DRILL-3678
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.1.0
 Environment: 1.1
Reporter: Hao Zhu
Assignee: Jinfeng Ni


===Env===
Drill 1.0 on Hive 0.13
(Also tested Drill 1.1 and get the same behavior)

8 nodes drill cluster.
Jave heap size is set to 8G and Direct memory is set to 96G on each drillbit.

===Symptom===
This is a Hive parquet partition table which has multi-level partitions.  
The Hive table size is several TB with tens of thousands leaf partitions.

When doing a "select * from table limit 10", the query keeps in "pending" state 
to generate the SQL plan. And finally the drillbits crashed with java heap OOM.
{code}
java.lang.OutOfMemoryError: Java heap space
at 
hive.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:599)
at 
hive.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:360)
at 
hive.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:100)
at 
hive.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172)
at 
hive.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:95)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:66)
at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at 
org.apache.drill.exec.store.hive.HiveRecordReader.init(HiveRecordReader.java:246)
at 
org.apache.drill.exec.store.hive.HiveRecordReader.(HiveRecordReader.java:138)
at 
org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch(HiveScanBatchCreator.java:58)
at 
org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch(HiveScanBatchCreator.java:34)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:150)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:106)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:81)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:235)
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Node ran out of Heap memory, exiting.
java.lang.OutOfMemoryError: Java heap space
{code}

We captured some stacktraces of foreman thread in foreman drillbit, here are 2 
times' examples:
{code}
2a482cd9-7fb2-c492-1356-d049e90870c8:foreman id=115 state=RUNNABLE
at org.apache.xerces.dom.DeferredElementNSImpl.synchronizeData(Unknown 
Source)
at org.apache.xerces.dom.ElementImpl.getTagName(Unknown Source)
at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348)
at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2234)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2151)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:871)
at 
org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2069)
at org.apache.hadoop.mapred.JobConf.(JobConf.java:421)
at org.apache.drill.exec.store.hive.HiveScan.splitInput(HiveScan.java:178)
at org.apache.drill.exec.store.hive.HiveScan.getSplits(HiveScan.java:167)
at org.apache.drill.exec.store.hive.HiveScan.access$000(HiveScan.java:69)
at org.apache.drill.exec.store.hive.HiveScan$1.run(HiveScan.java:146)
at org.apache.drill.exec.store.hive.HiveScan$1.run(HiveScan.java:144)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
at 
org.apache.drill.exec.store.hive.HiveScan.getSplitsWithUGI(HiveScan.java:144)
at org.apache.drill.exec.store.hive.HiveScan.(HiveScan.java:119)
at 
org.apache.drill.exec.store.hive.HiveStoragePlugin.getPhysicalScan(HiveStoragePlugin.java:78)
at 
org.apache.drill.exec.store.hive.HiveStoragePlugin.getPhysicalScan(HiveStoragePlugin.java:41)
at 
org.apache.drill.exec.store.AbstractStoragePlugin.getP

[jira] [Updated] (DRILL-3621) Wrong results when Drill on Hbase query contains rowkey "or" or "IN"

2015-08-10 Thread Hao Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Zhu updated DRILL-3621:
---
Component/s: (was: Execution - Flow)
 Query Planning & Optimization

> Wrong results when Drill on Hbase query contains rowkey "or" or "IN"
> 
>
> Key: DRILL-3621
> URL: https://issues.apache.org/jira/browse/DRILL-3621
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.1.0
>Reporter: Hao Zhu
>Assignee: Chris Westin
>Priority: Critical
>
> If Drill on Hbase query contains row_key "in" or "or", it produces wrong 
> results.
> For example:
> 1. Create a hbase table
> {code}
> create 'testrowkey','cf'
> put 'testrowkey','DUMMY1','cf:c','value1'
> put 'testrowkey','DUMMY2','cf:c','value2'
> put 'testrowkey','DUMMY3','cf:c','value3'
> put 'testrowkey','DUMMY4','cf:c','value4'
> put 'testrowkey','DUMMY5','cf:c','value5'
> put 'testrowkey','DUMMY6','cf:c','value6'
> put 'testrowkey','DUMMY7','cf:c','value7'
> put 'testrowkey','DUMMY8','cf:c','value8'
> put 'testrowkey','DUMMY9','cf:c','value9'
> put 'testrowkey','DUMMY10','cf:c','value10'
> {code}
> 2. Drill queries:
> {code}
> 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> SELECT 
> CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY = 
> 'DUMMY10';
> +--+
> |RK|
> +--+
> | DUMMY10  |
> +--+
> 1 row selected (1.186 seconds)
> 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> SELECT 
> CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY = 
> 'DUMMY1';
> +-+
> |   RK|
> +-+
> | DUMMY1  |
> +-+
> 1 row selected (0.691 seconds)
> 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> SELECT 
> CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY IN 
> ('DUMMY1' , 'DUMMY10');
> +-+
> |   RK|
> +-+
> | DUMMY1  |
> +-+
> 1 row selected (0.71 seconds)
> 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> SELECT 
> CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY 
> ='DUMMY1' OR ROW_KEY = 'DUMMY10';
> +-+
> |   RK|
> +-+
> | DUMMY1  |
> +-+
> 1 row selected (0.693 seconds)
> {code}
> From explain plan, filter is pushed down to hbase scan layer.
> {code}
> 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> explain plan for SELECT 
> CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY IN 
> ('DUMMY1' , 'DUMMY10');
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(RK=[CONVERT_FROMUTF8($0)])
> 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec 
> [tableName=testrowkey, startRow=DUMMY1, stopRow=DUMMY10, filter=null], 
> columns=[`row_key`]]])
>  |
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3621) Wrong results when Drill on Hbase query contains rowkey "or" or "IN"

2015-08-10 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-3621:
--

 Summary: Wrong results when Drill on Hbase query contains rowkey 
"or" or "IN"
 Key: DRILL-3621
 URL: https://issues.apache.org/jira/browse/DRILL-3621
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.1.0
Reporter: Hao Zhu
Assignee: Chris Westin
Priority: Critical


If Drill on Hbase query contains row_key "in" or "or", it produces wrong 
results.

For example:
1. Create a hbase table
{code}
create 'testrowkey','cf'
put 'testrowkey','DUMMY1','cf:c','value1'
put 'testrowkey','DUMMY2','cf:c','value2'
put 'testrowkey','DUMMY3','cf:c','value3'
put 'testrowkey','DUMMY4','cf:c','value4'
put 'testrowkey','DUMMY5','cf:c','value5'
put 'testrowkey','DUMMY6','cf:c','value6'
put 'testrowkey','DUMMY7','cf:c','value7'
put 'testrowkey','DUMMY8','cf:c','value8'
put 'testrowkey','DUMMY9','cf:c','value9'
put 'testrowkey','DUMMY10','cf:c','value10'
{code}

2. Drill queries:
{code}
0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> SELECT 
CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY = 
'DUMMY10';
+--+
|RK|
+--+
| DUMMY10  |
+--+
1 row selected (1.186 seconds)
0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> SELECT 
CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY = 
'DUMMY1';
+-+
|   RK|
+-+
| DUMMY1  |
+-+
1 row selected (0.691 seconds)
0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> SELECT 
CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY IN 
('DUMMY1' , 'DUMMY10');
+-+
|   RK|
+-+
| DUMMY1  |
+-+
1 row selected (0.71 seconds)
0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> SELECT 
CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY ='DUMMY1' 
OR ROW_KEY = 'DUMMY10';
+-+
|   RK|
+-+
| DUMMY1  |
+-+
1 row selected (0.693 seconds)
{code}

>From explain plan, filter is pushed down to hbase scan layer.
{code}
0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> explain plan for SELECT 
CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY IN 
('DUMMY1' , 'DUMMY10');
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(RK=[CONVERT_FROMUTF8($0)])
00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec 
[tableName=testrowkey, startRow=DUMMY1, stopRow=DUMMY10, filter=null], 
columns=[`row_key`]]])
 |
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3579) Drill on Hive query fails if partition table has __HIVE_DEFAULT_PARTITION__

2015-07-29 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-3579:
--

 Summary: Drill on Hive query fails if partition table has 
__HIVE_DEFAULT_PARTITION__
 Key: DRILL-3579
 URL: https://issues.apache.org/jira/browse/DRILL-3579
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Hive
Affects Versions: 1.1.0
 Environment: Drill 1.1 on Hive 1.0
Reporter: Hao Zhu
Assignee: Mehant Baid


If Hive's partition table has __HIVE_DEFAULT_PARTITION__ in the case of null 
values in the partition column, Drill on Hive query will fail.

Minimum reproduce:
1.Hive:
{code}
CREATE TABLE h1_testpart2(id INT) PARTITIONED BY(id2 int);
set hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE h1_testpart2 PARTITION(id2) SELECT 1 as id1 , 20150101 
as id2 from h1_passwords limit 1;
INSERT OVERWRITE TABLE h1_testpart2 PARTITION(id2) SELECT 1 as id1 , null as 
id2 from h1_passwords limit 1;

{code}

2. Filesystem looks like:
{code}
h1 h1_testpart2]# ls -altr
total 2
drwxrwxrwx 89 mapr mapr 87 Jul 30 00:04 ..
drwxr-xr-x  2 mapr mapr  1 Jul 30 00:05 id2=20150101
drwxr-xr-x  2 mapr mapr  1 Jul 30 00:05 id2=__HIVE_DEFAULT_PARTITION__
drwxr-xr-x  4 mapr mapr  2 Jul 30 00:05 .
{code}

3.Drill will fail:
{code}
select * from h1_testpart2;
Error: SYSTEM ERROR: NumberFormatException: For input string: 
"__HIVE_DEFAULT_PARTITION__"

Fragment 0:0

[Error Id: 509eb392-db9a-42f3-96ea-fb597425f49f on h1.poc.com:31010]

  (java.lang.reflect.UndeclaredThrowableException) null
org.apache.hadoop.security.UserGroupInformation.doAs():1581
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():136
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():131
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106
org.apache.drill.exec.physical.impl.ImplCreator.getExec():81
org.apache.drill.exec.work.fragment.FragmentExecutor.run():235
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745
  Caused By (org.apache.drill.common.exceptions.ExecutionSetupException) 
Failure while initializing HiveRecordReader: For input string: 
"__HIVE_DEFAULT_PARTITION__"
org.apache.drill.exec.store.hive.HiveRecordReader.init():241
org.apache.drill.exec.store.hive.HiveRecordReader.():138
org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58
org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34
org.apache.drill.exec.physical.impl.ImplCreator$2.run():138
org.apache.drill.exec.physical.impl.ImplCreator$2.run():136
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1566
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():136
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():131
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106
org.apache.drill.exec.physical.impl.ImplCreator.getExec():81
org.apache.drill.exec.work.fragment.FragmentExecutor.run():235
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745
  Caused By (java.lang.NumberFormatException) For input string: 
"__HIVE_DEFAULT_PARTITION__"
java.lang.NumberFormatException.forInputString():65
java.lang.Integer.parseInt():580
java.lang.Integer.parseInt():615
org.apache.drill.exec.store.hive.HiveRecordReader.convertPartitionType():605
org.apache.drill.exec.store.hive.HiveRecordReader.init():236
org.apache.drill.exec.store.hive.HiveRecordReader.():138
org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58
org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34
org.apache.drill.exec.physical.impl.ImplCreator$2.run():138
org.apache.drill.exec.physical.impl.ImplCreator$2.run():136
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1566
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():136
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():131
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106
or

[jira] [Created] (DRILL-3578) UnsupportedOperationException: Unable to get value vector class for minor type [FIXEDBINARY] and mode [OPTIONAL]

2015-07-29 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-3578:
--

 Summary: UnsupportedOperationException: Unable to get value vector 
class for minor type [FIXEDBINARY] and mode [OPTIONAL]
 Key: DRILL-3578
 URL: https://issues.apache.org/jira/browse/DRILL-3578
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.1.0
Reporter: Hao Zhu
Assignee: Hanifi Gunes


The issue is Drill fails to read "timestamp" type in parquet file generated by 
Hive.

How to reproduce:
1. Create a external Hive CSV table in hive 1.0:
{code}
create external table type_test_csv
(
  id1 int,
  id2 string,
  id3 timestamp,
  id4 double
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/xxx/testcsv';
{code}
2. Put sample data for above external table:
{code}
1,One,2015-01-01 00:01:00,1.0
2,Two,2015-01-02 00:02:00,2.0
{code}

3. Create a parquet hive table:
{code}
create external table type_test
(
  id1 int,
  id2 string,
  id3 timestamp,
  id4 double
)
STORED AS PARQUET
LOCATION '/xxx/type_test';

INSERT OVERWRITE TABLE type_test
  SELECT * FROM type_test_csv;
{code}
4. Then querying the parquet file directly through filesystem storage plugin:
{code}
> select * from dfs.`xxx/type_test`;
Error: SYSTEM ERROR: UnsupportedOperationException: Unable to get value vector 
class for minor type [FIXEDBINARY] and mode [OPTIONAL]

Fragment 0:0

[Error Id: fccfe8b2-6427-46e5-8bfd-cac639e526e8 on h3.poc.com:31010] 
(state=,code=0)
{code}
5. If the sample data is only 1 row:
{code}
1,One,2015-01-01 00:01:00,1.0
{code}
Then the error message would become:
{code}
> select * from dfs.`xxx/type_test`;
Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type:INT96


[Error Id: b52b5d46-63a8-4be6-a11d-999a1b46c7c2 on h3.poc.com:31010] 
(state=,code=0)
{code}

Using Hive storage plugin works fine. This issue only applies to filesystem 
storage plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-1773) Issues when using JAVA code through Drill JDBC driver

2015-06-30 Thread Hao Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609024#comment-14609024
 ] 

Hao Zhu edited comment on DRILL-1773 at 6/30/15 8:42 PM:
-

I tested on Drill 1.0 and the same issue that "DEBUG" messages are showing.

However the 2nd issue seems fixed, and I do not need to "ctrl-C" any more.


was (Author: haozhu):
I tested on Drill 1.0 and the same issue that "DEBUG" messages are showing.

> Issues when using JAVA code through Drill JDBC driver
> -
>
> Key: DRILL-1773
> URL: https://issues.apache.org/jira/browse/DRILL-1773
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 0.6.0, 0.7.0
> Environment: Tested on 0.6R3
>Reporter: Hao Zhu
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.2.0
>
> Attachments: DrillHandler.patch, DrillJdbcExample.java
>
>
> When executing attached simple JAVA code through Drill JDBC driver(0..6 R3), 
> the query got executed and returned the correct result, however there are 2 
> issues:
> 1. It keeps printing DEBUG information.
> Is it default behavior or is there any way to disable DEBUG?
> eg:
> {code}
> 13:30:44.702 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator 
> - Generated: 
> io.netty.util.internal.__matchers__.io.netty.buffer.ByteBufMatcher
> 13:30:44.706 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator 
> - Generated: 
> io.netty.util.internal.__matchers__.org.apache.drill.exec.rpc.OutboundRpcMessageMatcher
> 13:30:44.708 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator 
> - Generated: 
> io.netty.util.internal.__matchers__.org.apache.drill.exec.rpc.InboundRpcMessageMatcher
> 13:30:44.717 [Client-1] DEBUG io.netty.util.Recycler - 
> -Dio.netty.recycler.maxCapacity.default: 262144
> {code}
> 2. After the query finished, it seems not close the connection and did not 
> return to shell prompt. 
> I have to manually issue "ctrl-C" to stop it.
> {code}
> 13:31:11.239 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG 
> org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 
> 0x1497d1d0d040839 after 0ms
> 13:31:24.573 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG 
> org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 
> 0x1497d1d0d040839 after 1ms
> 13:31:37.906 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG 
> org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 
> 0x1497d1d0d040839 after 0ms
> ^CAdministrators-MacBook-Pro-40:xxx$ 
> {code}
> 
> The DrillJdbcExample.java is attached.
> Command to run:
> {code}
> javac DrillJdbcExample.java
> java DrillJdbcExample
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1773) Issues when using JAVA code through Drill JDBC driver

2015-06-30 Thread Hao Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609024#comment-14609024
 ] 

Hao Zhu commented on DRILL-1773:


I tested on Drill 1.0 and the same issue that "DEBUG" messages are showing.

> Issues when using JAVA code through Drill JDBC driver
> -
>
> Key: DRILL-1773
> URL: https://issues.apache.org/jira/browse/DRILL-1773
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 0.6.0, 0.7.0
> Environment: Tested on 0.6R3
>Reporter: Hao Zhu
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.2.0
>
> Attachments: DrillHandler.patch, DrillJdbcExample.java
>
>
> When executing attached simple JAVA code through Drill JDBC driver(0..6 R3), 
> the query got executed and returned the correct result, however there are 2 
> issues:
> 1. It keeps printing DEBUG information.
> Is it default behavior or is there any way to disable DEBUG?
> eg:
> {code}
> 13:30:44.702 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator 
> - Generated: 
> io.netty.util.internal.__matchers__.io.netty.buffer.ByteBufMatcher
> 13:30:44.706 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator 
> - Generated: 
> io.netty.util.internal.__matchers__.org.apache.drill.exec.rpc.OutboundRpcMessageMatcher
> 13:30:44.708 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator 
> - Generated: 
> io.netty.util.internal.__matchers__.org.apache.drill.exec.rpc.InboundRpcMessageMatcher
> 13:30:44.717 [Client-1] DEBUG io.netty.util.Recycler - 
> -Dio.netty.recycler.maxCapacity.default: 262144
> {code}
> 2. After the query finished, it seems not close the connection and did not 
> return to shell prompt. 
> I have to manually issue "ctrl-C" to stop it.
> {code}
> 13:31:11.239 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG 
> org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 
> 0x1497d1d0d040839 after 0ms
> 13:31:24.573 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG 
> org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 
> 0x1497d1d0d040839 after 1ms
> 13:31:37.906 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG 
> org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 
> 0x1497d1d0d040839 after 0ms
> ^CAdministrators-MacBook-Pro-40:xxx$ 
> {code}
> 
> The DrillJdbcExample.java is attached.
> Command to run:
> {code}
> javac DrillJdbcExample.java
> java DrillJdbcExample
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3336) to_date(to_timestamp) with group-by in hbase/maprdb table fails with "java.lang.UnsupportedOperationException"

2015-06-22 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-3336:
--

 Summary: to_date(to_timestamp) with group-by in hbase/maprdb table 
fails with "java.lang.UnsupportedOperationException"
 Key: DRILL-3336
 URL: https://issues.apache.org/jira/browse/DRILL-3336
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow, Functions - Drill
Affects Versions: 1.0.0
 Environment: 1.0 GA version
Reporter: Hao Zhu
Assignee: Chris Westin
Priority: Critical


1. Create a hbase/maprdb table in hbase shell:
{code}
create '/tables/esr52','cf'
put '/tables/esr52','1434998909','cf:c','abc'
> scan '/tables/esr52'
ROW  COLUMN+CELL
 1434998909  
column=cf:c, timestamp=1434998994785, value=abc
{code}

2. Below SQLs work fine in Drill:
{code}
>  select * from maprdb.esr52;
+--+---+
|   row_key|  cf   |
+--+---+
| [B@5bafd971  | {"c":"YWJj"}  |
+--+---+
1 row selected (0.095 seconds)

> select to_date(to_timestamp(cast(convert_from(esrtable.row_key,'UTF8') as 
> int))) from maprdb.esr52 esrtable;
+-+
|   EXPR$0|
+-+
| 2015-06-22  |
+-+
1 row selected (0.127 seconds)
{code}

3. However below SQL with group-by fails:
{code}
select to_date(to_timestamp(cast(convert_from(esrtable.row_key,'UTF8') as 
int))),count(*) from maprdb.esr52 esrtable 
group by to_date(to_timestamp(cast(convert_from(esrtable.row_key,'UTF8') as 
int)));

Error: SYSTEM ERROR: java.lang.UnsupportedOperationException: Failure finding 
function that runtime code generation expected.  Signature: 
compare_to_nulls_high( VAR16CHAR:OPTIONAL, VAR16CHAR:OPTIONAL ) returns 
INT:REQUIRED

Fragment 3:0

[Error Id: 26003311-d40e-4a95-9d3c-68793459ad6d on h1.poc.com:31010]

  (java.lang.UnsupportedOperationException) Failure finding function that 
runtime code generation expected.  Signature: compare_to_nulls_high( 
VAR16CHAR:OPTIONAL, VAR16CHAR:OPTIONAL ) returns INT:REQUIRED

org.apache.drill.exec.expr.fn.FunctionGenerationHelper.getFunctionExpression():109

org.apache.drill.exec.expr.fn.FunctionGenerationHelper.getOrderingComparator():62

org.apache.drill.exec.expr.fn.FunctionGenerationHelper.getOrderingComparatorNullsHigh():79

org.apache.drill.exec.physical.impl.common.ChainedHashTable.setupIsKeyMatchInternal():257

org.apache.drill.exec.physical.impl.common.ChainedHashTable.createAndSetupHashTable():206
org.apache.drill.exec.test.generated.HashAggregatorGen1.setup():273

org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregatorInternal():240

org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregator():163
org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema():110
org.apache.drill.exec.record.AbstractRecordBatch.next():127
org.apache.drill.exec.record.AbstractRecordBatch.next():105
org.apache.drill.exec.record.AbstractRecordBatch.next():95
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
org.apache.drill.exec.record.AbstractRecordBatch.next():146
org.apache.drill.exec.physical.impl.BaseRootExec.next():83

org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():95
org.apache.drill.exec.physical.impl.BaseRootExec.next():73
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1566
org.apache.drill.exec.work.fragment.FragmentExecutor.run():253
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745 (state=,code=0)
{code}

4. If we remove to_date, and only group-by to_timestamp, it works fine:
{code}
select to_timestamp(cast(convert_from(esrtable.row_key,'UTF8') as int)) from 
maprdb.esr52 esrtable;
++
| EXPR$0 |
++
| 2015-06-22 18:48:29.0  |
++
1 row selected (0.084 seconds)

select to_timestamp(cast(convert_from(esrtable.row_key,'UTF8') as 
int)),count(*) from maprdb.esr52 esrtable
group by to_timestamp(cast(convert_from(esrtable.row_key,'UTF8') as int));
++-+
| EXPR$0 | EXPR$1  |
++-+
| 2015-06-22 18:48:29.0  | 1   |
++-+
1 row selected (0.641 seconds)
{code}


[jira] [Commented] (DRILL-3121) Hive partition pruning is not happening

2015-05-17 Thread Hao Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547451#comment-14547451
 ] 

Hao Zhu commented on DRILL-3121:


DFS is working fine.

{code}
> explain plan for select * from dfs.drill.`part1` where dir0='2015' and (dir1 
> >= '02' and dir1 <= '03');
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(*=[$0])
00-02Project(*=[$0])
00-03  Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/part1, 
numFiles=2, columns=[`*`], files=[maprfs:/drill/part1/2015/02/02.csv, 
maprfs:/drill/part1/2015/03/03.csv]]])
{code}

> Hive partition pruning is not happening
> ---
>
> Key: DRILL-3121
> URL: https://issues.apache.org/jira/browse/DRILL-3121
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.0.0
>Reporter: Hao Zhu
>Assignee: Chris Westin
> Fix For: 1.1.0
>
>
> Tested on 1.0.0 with below commit id, and hive 0.13.
> {code}
> >  select * from sys.version;
> +---+++--++
> | commit_id |   
> commit_message   |commit_time | 
> build_email  | build_time |
> +---+++--++
> | d8b19759657698581cc0d01d7038797952888123  | DRILL-3100: 
> TestImpersonationDisabledWithMiniDFS fails on Windows  | 15.05.2015 @ 
> 01:18:03 EDT  | Unknown  | 15.05.2015 @ 03:07:10 EDT  |
> +---+++--++
> 1 row selected (0.083 seconds)
> {code}
> How to reproduce:
> 1. Use hive to create below partition table:
> {code}
> CREATE TABLE partition_table(id INT, username string)
>  PARTITIONED BY(year STRING, month STRING)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
> insert into table partition_table PARTITION(year='2014',month='11') select 
> 1,'u' from passwords limit 1;
> insert into table partition_table PARTITION(year='2014',month='12') select 
> 2,'s' from passwords limit 1;
> insert into table partition_table PARTITION(year='2015',month='01') select 
> 3,'e' from passwords limit 1;
> insert into table partition_table PARTITION(year='2015',month='02') select 
> 4,'r' from passwords limit 1;
> insert into table partition_table PARTITION(year='2015',month='03') select 
> 5,'n' from passwords limit 1;
> {code}
> 2. Hive query can do partition pruning for below 2 queries:
> {code}
> hive>  explain EXTENDED select * from partition_table where year='2015' and 
> month in ( '02','03') ;
> partition values:
>   month 02
>   year 2015
> partition values:
>   month 03
>   year 2015  
> explain EXTENDED select * from partition_table where year='2015' and (month 
> >= '02' and month <= '03') ;
> partition values:
>   month 02
>   year 2015
> partition values:
>   month 03
>   year 2015
> {code}
> Hive only scans 2 partitions -- 2015/02 and 2015/03.
> 3. Drill can not do partition pruning for below 2 queries:
> {code}
> > explain plan for select * from hive.partition_table where `year`='2015' and 
> > `month` in ('02','03');
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(id=[$0], username=[$1], year=[$2], month=[$3])
> 00-02SelectionVectorRemover
> 00-03  Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, 
> '03')))])
> 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, 
> tableName:partition_table), 
> inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4,
>  maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, 
> maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], 
> columns=[`*`], partitions= [Partition(values:[2015, 01]), 
> Partition(values:[2015, 02]), Partition(values:[2015, 03])]]])
> > explain plan for select * from hive.partition_table where `year`='2015' and 
> > (`month` >= '02' and `month` <= '03' );
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(id=[$0], username=[$1], year=[$2], month=[$3])
> 00-02SelectionVectorRemover
> 00-03  Filter(condition=[AND(=($2, '2015'), >=($3, 

[jira] [Created] (DRILL-3121) Hive partition pruning is not happening

2015-05-17 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-3121:
--

 Summary: Hive partition pruning is not happening
 Key: DRILL-3121
 URL: https://issues.apache.org/jira/browse/DRILL-3121
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.0.0
Reporter: Hao Zhu
Assignee: Chris Westin


Tested on 1.0.0 with below commit id, and hive 0.13.
{code}
>  select * from sys.version;
+---+++--++
| commit_id |   
commit_message   |commit_time | 
build_email  | build_time |
+---+++--++
| d8b19759657698581cc0d01d7038797952888123  | DRILL-3100: 
TestImpersonationDisabledWithMiniDFS fails on Windows  | 15.05.2015 @ 01:18:03 
EDT  | Unknown  | 15.05.2015 @ 03:07:10 EDT  |
+---+++--++
1 row selected (0.083 seconds)
{code}

How to reproduce:
1. Use hive to create below partition table:
{code}
CREATE TABLE partition_table(id INT, username string)
 PARTITIONED BY(year STRING, month STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";

insert into table partition_table PARTITION(year='2014',month='11') select 
1,'u' from passwords limit 1;
insert into table partition_table PARTITION(year='2014',month='12') select 
2,'s' from passwords limit 1;
insert into table partition_table PARTITION(year='2015',month='01') select 
3,'e' from passwords limit 1;
insert into table partition_table PARTITION(year='2015',month='02') select 
4,'r' from passwords limit 1;
insert into table partition_table PARTITION(year='2015',month='03') select 
5,'n' from passwords limit 1;
{code}

2. Hive query can do partition pruning for below 2 queries:
{code}
hive>  explain EXTENDED select * from partition_table where year='2015' and 
month in ( '02','03') ;
partition values:
  month 02
  year 2015

partition values:
  month 03
  year 2015  

explain EXTENDED select * from partition_table where year='2015' and (month >= 
'02' and month <= '03') ;
partition values:
  month 02
  year 2015

partition values:
  month 03
  year 2015
{code}
Hive only scans 2 partitions -- 2015/02 and 2015/03.

3. Drill can not do partition pruning for below 2 queries:
{code}
> explain plan for select * from hive.partition_table where `year`='2015' and 
> `month` in ('02','03');
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(id=[$0], username=[$1], year=[$2], month=[$3])
00-02SelectionVectorRemover
00-03  Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, 
'03')))])
00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, 
tableName:partition_table), 
inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4,
 maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, 
maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], 
columns=[`*`], partitions= [Partition(values:[2015, 01]), 
Partition(values:[2015, 02]), Partition(values:[2015, 03])]]])

> explain plan for select * from hive.partition_table where `year`='2015' and 
> (`month` >= '02' and `month` <= '03' );
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(id=[$0], username=[$1], year=[$2], month=[$3])
00-02SelectionVectorRemover
00-03  Filter(condition=[AND(=($2, '2015'), >=($3, '02'), <=($3, 
'03'))])
00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, 
tableName:partition_table), 
inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4,
 maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, 
maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], 
columns=[`*`], partitions= [Partition(values:[2015, 01]), 
Partition(values:[2015, 02]), Partition(values:[2015, 03])]]])
{code}
Drill scans 3 partitions -- 2015/01, 2015/02 and 2015/03.

Note: if the inlist only has 1 value, Drill can do partition pruning well:
{code}
>  explain plan for select * from hive.partition_table where `year`='2015' and 
> `month` in ('02');
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(id=[$0], username=[

[jira] [Created] (DRILL-3119) Query stays in "CANCELLATION_REQUESTED" status in UI after OOM of Direct buffer memory

2015-05-16 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-3119:
--

 Summary: Query stays in "CANCELLATION_REQUESTED" status in UI 
after OOM of Direct buffer memory
 Key: DRILL-3119
 URL: https://issues.apache.org/jira/browse/DRILL-3119
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.0.0
Reporter: Hao Zhu
Assignee: Chris Westin


Tested in 1.0.0 with below commit id:
{code}
> select * from sys.version;
+---+++--++
| commit_id |   
commit_message   |commit_time | 
build_email  | build_time |
+---+++--++
| d8b19759657698581cc0d01d7038797952888123  | DRILL-3100: 
TestImpersonationDisabledWithMiniDFS fails on Windows  | 15.05.2015 @ 01:18:03 
EDT  | Unknown  | 15.05.2015 @ 03:07:10 EDT  |
+---+++--++
1 row selected (0.26 seconds)
{code}

How to reproduce:
1. Single node cluster.
2.  Reduce DRILL_MAX_DIRECT_MEMORY="2G".
3. Run a hash join which is big enough to trigger OOM.
eg:
{code}
select count(*) from
(
select a.* from dfs.root.`user/hive/warehouse/passwords_csv_big` a, 
dfs.root.`user/hive/warehouse/passwords_csv_big` b
where a.columns[1]=b.columns[1]
);
{code}

After that, drillbit.log shows OOM:
{code}
2015-05-16 19:24:34,391 [2aa866ba-8939-b184-0ba2-291734329f88:frag:4:4] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 2aa866ba-8939-b184-0ba2-291734329f88:4:4: 
State change requested from RUNNING --> FINISHED for
2015-05-16 19:24:34,391 [2aa866ba-8939-b184-0ba2-291734329f88:frag:4:4] INFO  
o.a.d.e.w.f.AbstractStatusReporter - State changed for 
2aa866ba-8939-b184-0ba2-291734329f88:4:4. New state: FINISHED
2015-05-16 19:24:38,561 [BitServer-5] ERROR o.a.d.exec.rpc.RpcExceptionHandler 
- Exception in RPC communication.  Connection: /10.0.0.31:31012 <--> 
/10.0.0.31:41923 (data server).  Closing connection.
io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
buffer memory
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
 ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
 [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at 
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) 
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 [netty-common-4.0.27.Final.jar:4.0.27.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.8.0_45]
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
~[na:1.8.0_45]
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) 
~[na:1.8.0_45]
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:437) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at io.netty.buffer.PoolArena.a

[jira] [Commented] (DRILL-3118) "java.lang.IndexOutOfBoundsException" if the source data has a "dir0" column

2015-05-16 Thread Hao Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546873#comment-14546873
 ] 

Hao Zhu commented on DRILL-3118:


Session level is working fine. Thanks Jacques. 
Could we correct the error so that it is more readable?


> "java.lang.IndexOutOfBoundsException" if the source data has a "dir0" column
> 
>
> Key: DRILL-3118
> URL: https://issues.apache.org/jira/browse/DRILL-3118
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.0.0
>Reporter: Hao Zhu
>Assignee: Chris Westin
>
> Tested on 1.0 with commit id:
> {code}
> select commit_id from sys.version;
> +---+
> | commit_id |
> +---+
> | d8b19759657698581cc0d01d7038797952888123  |
> +---+
> 1 row selected (0.097 seconds)
> {code}
> When source data has column name like "dir0","dir1", the query may fail 
> with "java.lang.IndexOutOfBoundsException".
> For example:
> {code}
> > select `dir999` from 
> > dfs.root.`user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet`;
> Error: SYSTEM ERROR: java.lang.IndexOutOfBoundsException: index: 0, length: 4 
> (expected: range(0, 0))
> Fragment 0:0
> [Error Id: d289b3d7-1172-4ed7-b679-7af80d9aca7c on h1.poc.com:31010]
>   (org.apache.drill.common.exceptions.DrillRuntimeException) Error in parquet 
> record reader.
> Message:
> Hadoop path: 
> /user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet
> Total records read: 0
> Mock records read: 0
> Records to read: 32768
> Row group index: 0
> Records in row group: 1
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema {
>   optional int32 id;
>   optional binary dir999;
> }
> , metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] 
> INT32  [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] 
> BINARY  [PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]}
> 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339
> 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441
> org.apache.drill.exec.physical.impl.ScanBatch.next():175
> org.apache.drill.exec.physical.impl.BaseRootExec.next():83
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
> org.apache.drill.exec.physical.impl.BaseRootExec.next():73
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
> java.security.AccessController.doPrivileged():-2
>   optional int32 id;
>   optional binary dir999;
> }
> , metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] 
> INT32  [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] 
> BINARY  [PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]}
> 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339
> 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441
> org.apache.drill.exec.physical.impl.ScanBatch.next():175
> org.apache.drill.exec.physical.impl.BaseRootExec.next():83
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
> org.apache.drill.exec.physical.impl.BaseRootExec.next():73
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1469
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():253
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (java.lang.IndexOutOfBoundsException) index: 0, length: 4 
> (expected: range(0, 0))
> io.netty.buffer.DrillBuf.checkIndexD():189
> io.netty.buffer.DrillBuf.chk():211
> io.netty.buffer.DrillBuf.getInt():491
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get():321
> org.apache.drill.exec.vector.VarBinaryVector$Mutator.setSafe():481
> 
> org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.fillEmpties():408
> 
> org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.setValueCount():513
> 
> org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields():78
> 
> org.apache.drill.exec.store.parquet.col

[jira] [Created] (DRILL-3118) "java.lang.IndexOutOfBoundsException" if the source data has a "dir0" column

2015-05-16 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-3118:
--

 Summary: "java.lang.IndexOutOfBoundsException" if the source data 
has a "dir0" column
 Key: DRILL-3118
 URL: https://issues.apache.org/jira/browse/DRILL-3118
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.0.0
Reporter: Hao Zhu
Assignee: Chris Westin


Tested on 1.0 with commit id:
{code}
select commit_id from sys.version;
+---+
| commit_id |
+---+
| d8b19759657698581cc0d01d7038797952888123  |
+---+
1 row selected (0.097 seconds)
{code}

When source data has column name like "dir0","dir1", the query may fail 
with "java.lang.IndexOutOfBoundsException".

For example:
{code}
> select `dir999` from 
> dfs.root.`user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet`;
Error: SYSTEM ERROR: java.lang.IndexOutOfBoundsException: index: 0, length: 4 
(expected: range(0, 0))

Fragment 0:0

[Error Id: d289b3d7-1172-4ed7-b679-7af80d9aca7c on h1.poc.com:31010]

  (org.apache.drill.common.exceptions.DrillRuntimeException) Error in parquet 
record reader.
Message:
Hadoop path: 
/user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet
Total records read: 0
Mock records read: 0
Records to read: 32768
Row group index: 0
Records in row group: 1
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema {
  optional int32 id;
  optional binary dir999;
}
, metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] INT32 
 [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] BINARY  
[PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]}

org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339

org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441
org.apache.drill.exec.physical.impl.ScanBatch.next():175
org.apache.drill.exec.physical.impl.BaseRootExec.next():83
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
org.apache.drill.exec.physical.impl.BaseRootExec.next():73
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
java.security.AccessController.doPrivileged():-2
  optional int32 id;
  optional binary dir999;
}
, metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] INT32 
 [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] BINARY  
[PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]}

org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339

org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441
org.apache.drill.exec.physical.impl.ScanBatch.next():175
org.apache.drill.exec.physical.impl.BaseRootExec.next():83
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
org.apache.drill.exec.physical.impl.BaseRootExec.next():73
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1469
org.apache.drill.exec.work.fragment.FragmentExecutor.run():253
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745
  Caused By (java.lang.IndexOutOfBoundsException) index: 0, length: 4 
(expected: range(0, 0))
io.netty.buffer.DrillBuf.checkIndexD():189
io.netty.buffer.DrillBuf.chk():211
io.netty.buffer.DrillBuf.getInt():491
org.apache.drill.exec.vector.UInt4Vector$Accessor.get():321
org.apache.drill.exec.vector.VarBinaryVector$Mutator.setSafe():481

org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.fillEmpties():408

org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.setValueCount():513

org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields():78

org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():425
org.apache.drill.exec.physical.impl.ScanBatch.next():175
org.apache.drill.exec.physical.impl.BaseRootExec.next():83
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
org.apache.drill.exec.physical.impl.BaseRootExec.next():73
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject

[jira] [Commented] (DRILL-2100) Drill not deleting spooling files

2015-05-15 Thread Hao Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546512#comment-14546512
 ] 

Hao Zhu commented on DRILL-2100:


Tested on Drill 1.0, when query finishes successfully, the spill directories 
remain but the files are deleted.
The minimum reproduce is on a single node cluster:
{code}
alter system set `planner.memory.max_query_memory_per_node`=21474836;
select count(*) from
(
select columns[5] from dfs.root.`user/hive/warehouse/passwords_csv_middle` 
order by columns[0], columns[1],columns[2]
);
{code}

The table "passwords_csv_middle" is about 400MB.

{code}
[root@h1 spill]# ls -altr 2aa9600f-016a-5283-f98e-ef22942981c2/*/*/*/
2aa9600f-016a-5283-f98e-ef22942981c2/major_fragment_2/minor_fragment_5/operator_2/:
total 8
drwxr-xr-x 3 mapr mapr 4096 May 16 01:40 ..
drwxr-xr-x 2 mapr mapr 4096 May 16 01:41 .

2aa9600f-016a-5283-f98e-ef22942981c2/major_fragment_2/minor_fragment_4/operator_2/:
total 8
drwxr-xr-x 3 mapr mapr 4096 May 16 01:40 ..
drwxr-xr-x 2 mapr mapr 4096 May 16 01:41 .

2aa9600f-016a-5283-f98e-ef22942981c2/major_fragment_2/minor_fragment_3/operator_2/:
total 8
drwxr-xr-x 3 mapr mapr 4096 May 16 01:40 ..
drwxr-xr-x 2 mapr mapr 4096 May 16 01:41 .

2aa9600f-016a-5283-f98e-ef22942981c2/major_fragment_2/minor_fragment_2/operator_2/:
total 8
drwxr-xr-x 3 mapr mapr 4096 May 16 01:40 ..
drwxr-xr-x 2 mapr mapr 4096 May 16 01:41 .

2aa9600f-016a-5283-f98e-ef22942981c2/major_fragment_2/minor_fragment_1/operator_2/:
total 8
drwxr-xr-x 3 mapr mapr 4096 May 16 01:40 ..
drwxr-xr-x 2 mapr mapr 4096 May 16 01:41 .

2aa9600f-016a-5283-f98e-ef22942981c2/major_fragment_2/minor_fragment_0/operator_2/:
total 8
drwxr-xr-x 3 mapr mapr 4096 May 16 01:40 ..
drwxr-xr-x 2 mapr mapr 4096 May 16 01:41 .
[root@h1 spill]# pwd
/tmp/drill/spill
{code}

I would suggest if SQL finishes successfully, the whole directory for SQL 
profile Id should be removed.





> Drill not deleting spooling files
> -
>
> Key: DRILL-2100
> URL: https://issues.apache.org/jira/browse/DRILL-2100
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 0.8.0
>Reporter: Abhishek Girish
>Assignee: Steven Phillips
> Fix For: 1.1.0
>
>
> Currently, after forcing queries to use an external sort by switching off 
> hash join/agg causes spill-to-disk files accumulating. 
> This causes issues with disk space availability when the spill is configured 
> to be on the local file system (/tmp/drill). Also not optimal when configured 
> to use DFS (custom). 
> Drill must clean up all temporary files created after a query completes or 
> after a drillbit restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3110) org.apache.drill.exec.rpc.RpcException: Data not accepted downstream.

2015-05-15 Thread Hao Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546385#comment-14546385
 ] 

Hao Zhu commented on DRILL-3110:


This time this error is due to OOM of direct memory on one node:

{code}
2015-05-15 23:29:14,590 [BitServer-7] ERROR o.a.d.exec.rpc.RpcExceptionHandler 
- Exception in RPC communication.  Connection: /10.0.0.28:31012 <--> 
/10.0.0.31:38972 (data server).  Closing connection.
io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
buffer memory
at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:346)
 ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:229)
 ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
 [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at 
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) 
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 [netty-common-4.0.27.Final.jar:4.0.27.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.8.0_45]
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
~[na:1.8.0_45]
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) 
~[na:1.8.0_45]
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:437) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:140)
 ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:4.0.27.Final]
at 
io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
 ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:4.0.27.Final]
at 
org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:98)
 ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
at 
org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:106)
 ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
at 
org.apache.drill.exec.rpc.ProtobufLengthDecoder.decode(ProtobufLengthDecoder.java:83)
 ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
at 
org.apache.drill.exec.rpc.data.DataProtobufLengthDecoder$Server.decode(DataProtobufLengthDecoder.java:52)
 ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:315)
 ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
... 12 common frames omitted
{code}

> org.apache.drill.exec.rpc.RpcException: Data not accepted downstream.
> -
>
> Key: DRILL-3110
> URL: https://issues.apache.org/jira/browse/DRILL-3110
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.0.0
> Environment: > select commit_id from sys.version;
> ++
> | commit_id  |
> ++
> | 5

[jira] [Commented] (DRILL-3110) org.apache.drill.exec.rpc.RpcException: Data not accepted downstream.

2015-05-15 Thread Hao Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546381#comment-14546381
 ] 

Hao Zhu commented on DRILL-3110:


However after increasing drill.exec.buffer.size=1000, I again triggered this 
issue with a little different error:
{code}
0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> select a.* from 
dfs.root.`user/hive/warehouse/passwords_csv_big` a, 
dfs.root.`user/hive/warehouse/passwords_csv_big` b
. . . . . . . . . . . . . . . . . . . . . . .> where a.columns[1]=b.columns[1] 
limit 5;
java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: 
java.lang.IllegalStateException: Failure while closing accountor.  Expected 
private and shared pools to be set to initial values.  However, one or more 
were not.  Stats are
zoneinitallocated   delta
private 100 100 0
shared  00  9998631712  368288.

Fragment 2:0

[Error Id: 89bc66e8-b5ec-41fc-bcf2-08e330077138 on h3.poc.com:31010]

  (java.lang.IllegalStateException) Failure while closing accountor.  Expected 
private and shared pools to be set to initial values.  However, one or more 
were not.  Stats are
zoneinitallocated   delta
private 100 100 0
shared  00  9998631712  368288.
org.apache.drill.exec.memory.AtomicRemainder.close():200
org.apache.drill.exec.memory.Accountor.close():386
org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.close():325
org.apache.drill.exec.ops.OperatorContextImpl.close():116
org.apache.drill.exec.ops.FragmentContext.suppressingClose():405
org.apache.drill.exec.ops.FragmentContext.close():394
org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():349
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():175
org.apache.drill.exec.work.fragment.FragmentExecutor.run():293
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745

at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
at 
sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:77)
at sqlline.TableOutputFormat.print(TableOutputFormat.java:106)
at sqlline.SqlLine.print(SqlLine.java:1583)
at sqlline.Commands.execute(Commands.java:852)
at sqlline.Commands.sql(Commands.java:751)
at sqlline.SqlLine.dispatch(SqlLine.java:738)
at sqlline.SqlLine.begin(SqlLine.java:612)
at sqlline.SqlLine.start(SqlLine.java:366)
at sqlline.SqlLine.main(SqlLine.java:259)
{code}

> org.apache.drill.exec.rpc.RpcException: Data not accepted downstream.
> -
>
> Key: DRILL-3110
> URL: https://issues.apache.org/jira/browse/DRILL-3110
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.0.0
> Environment: > select commit_id from sys.version;
> ++
> | commit_id  |
> ++
> | 583ca4a95df2c45b5ba20b517cb1aeed48c7548e |
> ++
> 1 row selected (0.098 seconds)
>Reporter: Hao Zhu
>Assignee: Chris Westin
>
> Joining two 1G CSV tables resulting in below error:
> {code}
> > select a.* from dfs.root.`user/hive/warehouse/passwords_csv_big` a, 
> > dfs.root.`user/hive/warehouse/passwords_csv_big` b
> . . . . . . . . . . . . . . . . . . . . . . .> where 
> a.columns[1]=b.columns[1] limit 5;
> ++
> |  columns   |
> ++
> | ["1","787148","92921","158596","17776","896094","2"] |
> | ["1","787148","10930","348699","534058","778852","2"] |
> | ["1","787148","10930","348699","534058","778852","2"] |
> | ["1","787148","10930","348699","534058","778852","2"] |
> | ["1","787148","10930","348699","534058","778852","2"] |
> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: 
> org.apache.drill.exec.rpc.RpcException: Data not accepted downstream.
> Fragment 5:15
> [Error Id: dd25cee9-1d1d-4658-9a83-cdefcafb7031 on h3.poc.com:31010]
>   (org.apache.drill.exec.rpc.RpcException) Data not accepted downstream.
> org.apache.drill.exec.ops.StatusHandler.success():54
> org.apache.drill.exec.ops.StatusHandler.success():29
> org.apache.drill.exec.rpc.ListeningCommand$DeferredRpcOutcome.success():55
> org.apache.drill.exec.rpc.ListeningCommand$DeferredRpcOutcome.success():46
> 
> org.apache.drill.exec.rpc.data.DataTunnel$ThrottlingOutcomeListener.success():133
> 
> org.apache.drill.exec.rpc.data.DataTunnel$ThrottlingOutcomeListener.success():116
> org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.set():98
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.de

[jira] [Commented] (DRILL-3110) org.apache.drill.exec.rpc.RpcException: Data not accepted downstream.

2015-05-15 Thread Hao Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546362#comment-14546362
 ] 

Hao Zhu commented on DRILL-3110:


Seems this is already fixed per DRILL-3061.

I tried below latest rpm build, and so far I have not seen this error.
{code}
> select commit_id from sys.version;
+---+
| commit_id |
+---+
| d8b19759657698581cc0d01d7038797952888123  |
+---+
1 row selected (0.06 seconds)
{code}

> org.apache.drill.exec.rpc.RpcException: Data not accepted downstream.
> -
>
> Key: DRILL-3110
> URL: https://issues.apache.org/jira/browse/DRILL-3110
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.0.0
> Environment: > select commit_id from sys.version;
> ++
> | commit_id  |
> ++
> | 583ca4a95df2c45b5ba20b517cb1aeed48c7548e |
> ++
> 1 row selected (0.098 seconds)
>Reporter: Hao Zhu
>Assignee: Chris Westin
>
> Joining two 1G CSV tables resulting in below error:
> {code}
> > select a.* from dfs.root.`user/hive/warehouse/passwords_csv_big` a, 
> > dfs.root.`user/hive/warehouse/passwords_csv_big` b
> . . . . . . . . . . . . . . . . . . . . . . .> where 
> a.columns[1]=b.columns[1] limit 5;
> ++
> |  columns   |
> ++
> | ["1","787148","92921","158596","17776","896094","2"] |
> | ["1","787148","10930","348699","534058","778852","2"] |
> | ["1","787148","10930","348699","534058","778852","2"] |
> | ["1","787148","10930","348699","534058","778852","2"] |
> | ["1","787148","10930","348699","534058","778852","2"] |
> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: 
> org.apache.drill.exec.rpc.RpcException: Data not accepted downstream.
> Fragment 5:15
> [Error Id: dd25cee9-1d1d-4658-9a83-cdefcafb7031 on h3.poc.com:31010]
>   (org.apache.drill.exec.rpc.RpcException) Data not accepted downstream.
> org.apache.drill.exec.ops.StatusHandler.success():54
> org.apache.drill.exec.ops.StatusHandler.success():29
> org.apache.drill.exec.rpc.ListeningCommand$DeferredRpcOutcome.success():55
> org.apache.drill.exec.rpc.ListeningCommand$DeferredRpcOutcome.success():46
> 
> org.apache.drill.exec.rpc.data.DataTunnel$ThrottlingOutcomeListener.success():133
> 
> org.apache.drill.exec.rpc.data.DataTunnel$ThrottlingOutcomeListener.success():116
> org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.set():98
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode():243
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode():188
> io.netty.handler.codec.MessageToMessageDecoder.channelRead():89
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324
> io.netty.handler.timeout.IdleStateHandler.channelRead():254
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324
> io.netty.handler.codec.MessageToMessageDecoder.channelRead():103
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324
> io.netty.handler.codec.ByteToMessageDecoder.channelRead():242
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead():86
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324
> io.netty.channel.DefaultChannelPipeline.fireChannelRead():847
> 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady():618
> io.netty.channel.epoll.EpollEventLoop.processReady():329
> io.netty.channel.epoll.EpollEventLoop.run():250
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run():111
> java.lang.Thread.run():745
> at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
> at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
> at sqlline.SqlLine.print(SqlLine.java:1809)
> at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
> at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
> at sqlline.SqlLine.dispatch(SqlLine.java:889)
> at sqlline.SqlLine.begin(SqlLine.java:763)
> at sqlline.SqlLine.start(SqlLine.java:498)
> at sqlline.SqlLine.main(SqlLine.java:460)
> {code}
> It can be workarounded by cha

[jira] [Created] (DRILL-3110) org.apache.drill.exec.rpc.RpcException: Data not accepted downstream.

2015-05-15 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-3110:
--

 Summary: org.apache.drill.exec.rpc.RpcException: Data not accepted 
downstream.
 Key: DRILL-3110
 URL: https://issues.apache.org/jira/browse/DRILL-3110
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.0.0
 Environment: > select commit_id from sys.version;
++
| commit_id  |
++
| 583ca4a95df2c45b5ba20b517cb1aeed48c7548e |
++
1 row selected (0.098 seconds)
Reporter: Hao Zhu
Assignee: Chris Westin


Joining two 1G CSV tables resulting in below error:
{code}
> select a.* from dfs.root.`user/hive/warehouse/passwords_csv_big` a, 
> dfs.root.`user/hive/warehouse/passwords_csv_big` b
. . . . . . . . . . . . . . . . . . . . . . .> where a.columns[1]=b.columns[1] 
limit 5;
++
|  columns   |
++
| ["1","787148","92921","158596","17776","896094","2"] |
| ["1","787148","10930","348699","534058","778852","2"] |
| ["1","787148","10930","348699","534058","778852","2"] |
| ["1","787148","10930","348699","534058","778852","2"] |
| ["1","787148","10930","348699","534058","778852","2"] |
java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: 
org.apache.drill.exec.rpc.RpcException: Data not accepted downstream.

Fragment 5:15

[Error Id: dd25cee9-1d1d-4658-9a83-cdefcafb7031 on h3.poc.com:31010]

  (org.apache.drill.exec.rpc.RpcException) Data not accepted downstream.
org.apache.drill.exec.ops.StatusHandler.success():54
org.apache.drill.exec.ops.StatusHandler.success():29
org.apache.drill.exec.rpc.ListeningCommand$DeferredRpcOutcome.success():55
org.apache.drill.exec.rpc.ListeningCommand$DeferredRpcOutcome.success():46

org.apache.drill.exec.rpc.data.DataTunnel$ThrottlingOutcomeListener.success():133

org.apache.drill.exec.rpc.data.DataTunnel$ThrottlingOutcomeListener.success():116
org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.set():98
org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode():243
org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode():188
io.netty.handler.codec.MessageToMessageDecoder.channelRead():89
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324
io.netty.handler.timeout.IdleStateHandler.channelRead():254
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324
io.netty.handler.codec.MessageToMessageDecoder.channelRead():103
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324
io.netty.handler.codec.ByteToMessageDecoder.channelRead():242
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324
io.netty.channel.ChannelInboundHandlerAdapter.channelRead():86
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324
io.netty.channel.DefaultChannelPipeline.fireChannelRead():847

io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady():618
io.netty.channel.epoll.EpollEventLoop.processReady():329
io.netty.channel.epoll.EpollEventLoop.run():250
io.netty.util.concurrent.SingleThreadEventExecutor$2.run():111
java.lang.Thread.run():745

at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
at sqlline.SqlLine.print(SqlLine.java:1809)
at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
at sqlline.SqlLine.dispatch(SqlLine.java:889)
at sqlline.SqlLine.begin(SqlLine.java:763)
at sqlline.SqlLine.start(SqlLine.java:498)
at sqlline.SqlLine.main(SqlLine.java:460)
{code}

It can be workarounded by changing drill.exec.buffer.size.
My understanding is "drill.exec.buffer.size" can only change the performance, 
but it should not cause SQL to fail,right?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2927) Pending query in resource queue starts after timeout

2015-04-30 Thread Hao Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Zhu updated DRILL-2927:
---
Attachment: Screen Shot 2015-04-30 at 11.07.21 AM.png

Pic 2

> Pending query in resource queue starts after timeout
> 
>
> Key: DRILL-2927
> URL: https://issues.apache.org/jira/browse/DRILL-2927
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 0.8.0
> Environment: Drill 0.8 released version.
>Reporter: Hao Zhu
>Assignee: Chris Westin
> Attachments: Screen Shot 2015-04-30 at 11.01.25 AM.png, Screen Shot 
> 2015-04-30 at 11.07.21 AM.png
>
>
> I set small queue to allow only 1 concurrent query:
> alter system set `exec.queue.enable`=TRUE;
> alter system set `exec.queue.small`=1;
> When running 2 small queries, one of them is pending which is expected.
> (See pic 1)
> ​After about 5mins(exec.queue.timeout_millis), the pending SQL starts. now we 
> have 2 queries running in small queue. 
> (See pic 2)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2927) Pending query in resource queue starts after timeout

2015-04-30 Thread Hao Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Zhu updated DRILL-2927:
---
Attachment: Screen Shot 2015-04-30 at 11.01.25 AM.png

Pic 1

> Pending query in resource queue starts after timeout
> 
>
> Key: DRILL-2927
> URL: https://issues.apache.org/jira/browse/DRILL-2927
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 0.8.0
> Environment: Drill 0.8 released version.
>Reporter: Hao Zhu
>Assignee: Chris Westin
> Attachments: Screen Shot 2015-04-30 at 11.01.25 AM.png
>
>
> I set small queue to allow only 1 concurrent query:
> alter system set `exec.queue.enable`=TRUE;
> alter system set `exec.queue.small`=1;
> When running 2 small queries, one of them is pending which is expected.
> (See pic 1)
> ​After about 5mins(exec.queue.timeout_millis), the pending SQL starts. now we 
> have 2 queries running in small queue. 
> (See pic 2)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-2927) Pending query in resource queue starts after timeout

2015-04-30 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-2927:
--

 Summary: Pending query in resource queue starts after timeout
 Key: DRILL-2927
 URL: https://issues.apache.org/jira/browse/DRILL-2927
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 0.8.0
 Environment: Drill 0.8 released version.
Reporter: Hao Zhu
Assignee: Chris Westin
 Attachments: Screen Shot 2015-04-30 at 11.01.25 AM.png

I set small queue to allow only 1 concurrent query:
alter system set `exec.queue.enable`=TRUE;
alter system set `exec.queue.small`=1;

When running 2 small queries, one of them is pending which is expected.
(See pic 1)

​After about 5mins(exec.queue.timeout_millis), the pending SQL starts. now we 
have 2 queries running in small queue. 
(See pic 2)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1289) Creating storage plugin for "hdfs:///" failed with "Unable to create/ update plugin: myhdfs"

2015-02-18 Thread Hao Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326353#comment-14326353
 ] 

Hao Zhu commented on DRILL-1289:


It is marked as fixed in drill 0.5.
Which version are you using and what is the storage plugin are you using?

> Creating storage plugin for "hdfs:///" failed with "Unable to create/ update 
> plugin: myhdfs"
> 
>
> Key: DRILL-1289
> URL: https://issues.apache.org/jira/browse/DRILL-1289
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 0.4.0
> Environment: OS: Centos 6.4
> HDFS: CDH5.1
> Drill: 0.4.0
>Reporter: Hao Zhu
> Fix For: 0.5.0
>
>
> In web GUI, I can successfully create a new storage plugin named "myhdfs" 
> using "file:///":
> {code}
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "file:///",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "storageformat": null
> },
> "tmp": {
>   "location": "/tmp",
>   "writable": true,
>   "storageformat": "csv"
> }
>   },
>   "formats": {
> "psv": {
>   "type": "text",
>   "extensions": [
> "tbl"
>   ],
>   "delimiter": "|"
> },
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "delimiter": ","
> },
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv"
>   ],
>   "delimiter": "\t"
> },
> "parquet": {
>   "type": "parquet"
> },
> "json": {
>   "type": "json"
> }
>   }
> }
> {code}
> However if I try to change "file:///" to "hdfs:///" to point to HDFS other 
> than local file system, drill log errors out "[qtp416200645-67] DEBUG 
> o.a.d.e.server.rest.StorageResources - Unable to create/ update plugin: 
> myhdfs".
> {code}
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "hdfs:///",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "storageformat": null
> },
> "tmp": {
>   "location": "/tmp",
>   "writable": true,
>   "storageformat": "csv"
> }
>   },
>   "formats": {
> "psv": {
>   "type": "text",
>   "extensions": [
> "tbl"
>   ],
>   "delimiter": "|"
> },
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "delimiter": ","
> },
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv"
>   ],
>   "delimiter": "\t"
> },
> "parquet": {
>   "type": "parquet"
> },
> "json": {
>   "type": "json"
> }
>   }
> }
> {code}
> On my cluster, I am using CDH5 hdfs, and it all client configurations are 
> valid. For example, on the drillbit server:
> {code}
> [root@hdm ~]# hdfs dfs -ls /
> Found 3 items
> drwxr-xr-x   - hbase hbase   0 2014-08-04 22:55 /hbase
> drwxrwxrwt   - hdfs  supergroup  0 2014-07-31 16:31 /tmp
> drwxr-xr-x   - hdfs  supergroup  0 2014-07-11 12:06 /user
> {code}
> Is there anything wrong with the storage plugin syntax for HDFS?
> If so, can drill log prints more debug info to show the reason why it failed?
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1773) Issues when using JAVA code thourgh Drill JDBC driver

2015-02-04 Thread Hao Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305575#comment-14305575
 ] 

Hao Zhu commented on DRILL-1773:


Hi Oleg,

Thanks for looking into it.
So the default behavior in jdbc driver is drill.exec.debug.error_on_leak=true? 

How do you think of disabling it by default?

Thanks,
Hao

> Issues when using JAVA code thourgh Drill JDBC driver
> -
>
> Key: DRILL-1773
> URL: https://issues.apache.org/jira/browse/DRILL-1773
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 0.6.0, 0.7.0
> Environment: Tested on 0.6R3
>Reporter: Hao Zhu
>Assignee: Daniel Barclay (Drill/MapR)
> Fix For: 0.8.0
>
> Attachments: DrillHandler.patch, DrillJdbcExample.java
>
>
> When executing attached simple JAVA code through Drill JDBC driver(0..6 R3), 
> the query got executed and returned the correct result, however there are 2 
> issues:
> 1. It keeps printing DEBUG information.
> Is it default behavior or is there any way to disable DEBUG?
> eg:
> {code}
> 13:30:44.702 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator 
> - Generated: 
> io.netty.util.internal.__matchers__.io.netty.buffer.ByteBufMatcher
> 13:30:44.706 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator 
> - Generated: 
> io.netty.util.internal.__matchers__.org.apache.drill.exec.rpc.OutboundRpcMessageMatcher
> 13:30:44.708 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator 
> - Generated: 
> io.netty.util.internal.__matchers__.org.apache.drill.exec.rpc.InboundRpcMessageMatcher
> 13:30:44.717 [Client-1] DEBUG io.netty.util.Recycler - 
> -Dio.netty.recycler.maxCapacity.default: 262144
> {code}
> 2. After the query finished, it seems not close the connection and did not 
> return to shell prompt. 
> I have to manually issue "ctrl-C" to stop it.
> {code}
> 13:31:11.239 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG 
> org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 
> 0x1497d1d0d040839 after 0ms
> 13:31:24.573 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG 
> org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 
> 0x1497d1d0d040839 after 1ms
> 13:31:37.906 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG 
> org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 
> 0x1497d1d0d040839 after 0ms
> ^CAdministrators-MacBook-Pro-40:xxx$ 
> {code}
> 
> The DrillJdbcExample.java is attached.
> Command to run:
> {code}
> javac DrillJdbcExample.java
> java DrillJdbcExample
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-2055) Drill should error out for Invalid json file if it has the same map key names.

2015-01-21 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-2055:
--

 Summary: Drill should error out for Invalid json file if it has 
the same map key names.
 Key: DRILL-2055
 URL: https://issues.apache.org/jira/browse/DRILL-2055
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 0.7.0
Reporter: Hao Zhu
Assignee: Jinfeng Ni
Priority: Minor


For json file with same map key names:

{
"a" : "x",
"a" : "y"
}

Should we consider it invalid json format and error out?
Ref:
http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object#answer-23195243




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1794) Can not make files with extension "log" to be recognized as json format?

2014-12-01 Thread Hao Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230444#comment-14230444
 ] 

Hao Zhu commented on DRILL-1794:


Hi Team,

Yep I tried to set it but it failed to save the storage plugin.

{code}
"log": {
  "type": "json", "extensions": [ "log" ], 
},
{code}

Or 

{code}
"log": {
  "type": "json", "extensions": [ "log" ]
},
{code}

> Can not make files with extension "log" to be recognized as json format?
> 
>
> Key: DRILL-1794
> URL: https://issues.apache.org/jira/browse/DRILL-1794
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 0.6.0
> Environment: 0.6R3
>Reporter: Hao Zhu
>
> If we want to use ".log" as the file extension, and also want it to be 
> recognized as json format, I tried to use below storage engine , but failed 
> to read the .log file..
> {code}
>   "formats": {
> "log": {
>   "type": "json"
> },
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "delimiter": ","
> }
>   }
> {code} 
> {code}
> 0: jdbc:drill:zk=n1a:5181,n2a:5181,n3a:5181> select * from  
> logtest.`test.json`;
> +++++
> |   field1   |   field2   |   field3   |   field4   |
> +++++
> | data1  | 100.0  | more data1 | 123.001|
> +++++
> 1 row selected (0.159 seconds)
> 0: jdbc:drill:zk=n1a:5181,n2a:5181,n3a:5181> select * from  
> logtest.`test.log`;
> Query failed: Failure while validating sql : 
> org.eigenbase.util.EigenbaseContextException: From line 1, column 16 to line 
> 1, column 22: Table 'logtest.test.log' not found
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> Do we support above requirement?
> If so, what is the storage plugin text?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-1794) Can not make files with extension "log" to be recognized as json format?

2014-12-01 Thread Hao Zhu (JIRA)
Hao Zhu created DRILL-1794:
--

 Summary: Can not make files with extension "log" to be recognized 
as json format?
 Key: DRILL-1794
 URL: https://issues.apache.org/jira/browse/DRILL-1794
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 0.6.0
 Environment: 0.6R3
Reporter: Hao Zhu


If we want to use ".log" as the file extension, and also want it to be 
recognized as json format, I tried to use below storage engine , but failed to 
read the .log file..


{code}
  "formats": {
"log": {
  "type": "json"
},
"csv": {
  "type": "text",
  "extensions": [
"csv"
  ],
  "delimiter": ","
}
  }
{code} 


{code}
0: jdbc:drill:zk=n1a:5181,n2a:5181,n3a:5181> select * from  logtest.`test.json`;
+++++
|   field1   |   field2   |   field3   |   field4   |
+++++
| data1  | 100.0  | more data1 | 123.001|
+++++
1 row selected (0.159 seconds)
0: jdbc:drill:zk=n1a:5181,n2a:5181,n3a:5181> select * from  logtest.`test.log`;
Query failed: Failure while validating sql : 
org.eigenbase.util.EigenbaseContextException: From line 1, column 16 to line 1, 
column 22: Table 'logtest.test.log' not found

Error: exception while executing query: Failure while executing query. 
(state=,code=0)
{code}

Do we support above requirement?
If so, what is the storage plugin text?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)