[jira] [Created] (DRILL-7147) Source order of "drill-env.sh" and "distrib-env.sh" should be swapped
Hao Zhu created DRILL-7147: -- Summary: Source order of "drill-env.sh" and "distrib-env.sh" should be swapped Key: DRILL-7147 URL: https://issues.apache.org/jira/browse/DRILL-7147 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.15.0 Reporter: Hao Zhu In bin/drill-config.sh, the description of the source order is: {code:java} # Variables may be set in one of four places: # # Environment (per run) # drill-env.sh (per site) # distrib-env.sh (per distribution) # drill-config.sh (this file, Drill defaults) # # Properties "inherit" from items lower on the list, and may be "overridden" by items # higher on the list. In the environment, just set the variable: {code} However actually bin/drill-config.sh sources drill-env.sh firstly, and then distrib-env.sh. {code:java} drillEnv="$DRILL_CONF_DIR/drill-env.sh" if [ -r "$drillEnv" ]; then . "$drillEnv" fi ... distribEnv="$DRILL_CONF_DIR/distrib-env.sh" if [ -r "$distribEnv" ]; then . "$distribEnv" else distribEnv="$DRILL_HOME/conf/distrib-env.sh" if [ -r "$distribEnv" ]; then . "$distribEnv" fi fi {code} We need to swap the source order of drill-env.sh and distrib-env.sh. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-5436) Need a way to input password which contains space when calling sqlline
Hao Zhu created DRILL-5436: -- Summary: Need a way to input password which contains space when calling sqlline Key: DRILL-5436 URL: https://issues.apache.org/jira/browse/DRILL-5436 Project: Apache Drill Issue Type: Bug Components: Client - CLI Affects Versions: 1.10.0 Reporter: Hao Zhu create a user named "spaceuser" with password "hello world". All below failed: {code} sqlline -u jdbc:drill:zk=xxx -n spaceuser -p 'hello world' sqlline -u jdbc:drill:zk=xxx -n spaceuser -p "hello world" sqlline -u jdbc:drill:zk=xxx -n spaceuser -p 'hello\ world' sqlline -u jdbc:drill:zk=xxx -n spaceuser -p "hello\ world" {code} Need a way to input password which contains space when calling sqlline -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-4101) The argument 'pattern' of Function 'like' has to be constant!
[ https://issues.apache.org/jira/browse/DRILL-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312582#comment-15312582 ] Hao Zhu commented on DRILL-4101: Hi [~david_hudavy] Could you open a case for this one? Thanks, Hao > The argument 'pattern' of Function 'like' has to be constant! > - > > Key: DRILL-4101 > URL: https://issues.apache.org/jira/browse/DRILL-4101 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.3.0 > Environment: drill1.2 >Reporter: david_hudavy > > 0: jdbc:drill:zk=local> select * from dfs.tmp.ta limit 10; > +--+--+ > | rdn_4 | imsi | > +--+--+ > | mscId=UPG00494500412500 | 272004500412500 | > | mscId=UPG00494500436500 | 272004500436500 | > | mscId=UPG00494501833000 | 272004501833000 | > | mscId=UPG00494502712000 | 272004502712000 | > | mscId=UPG00494502732500 | 272004502732500 | > | mscId=UPG00494502845500 | 272004502845500 | > | mscId=UPG00494505721000 | 272004505721000 | > | mscId=UPG00494507227500 | 272004507227500 | > | mscId=UPG00494509548500 | 272004509548500 | > | mscId=UPG00494501644500 | 272004501644500 | > +--+--+ > 10 rows selected (0.344 seconds) > 0: jdbc:drill:zk=local> select * from dfs.tmp.tb; > +-+-+ > | rdn_4 | epsvplmnid | > +-+-+ > | mscId=149000579913 | 46000 | > | mscId=149000579912 | 262280 | > +-+-+ > 2 rows selected (0.112 seconds) > SELECT count(*) AS cnt > FROM dfs.tmp.ta,dfs.tmp.tb > WHERE ta.rdn_4 = tb.rdn_4 > AND ta.imsi NOT LIKE concat(tb.epsvplmnid,'%') > Error: SYSTEM ERROR: DrillRuntimeException: The argument 'pattern' of > Function 'like' has to be constant! > Fragment 0:0 > [Error Id: f103529c-60f4-4b8f-8d7a-b1f0619aab30 on vm1-4:31010] > (state=,code=0) > [Error Id: f103529c-60f4-4b8f-8d7a-b1f0619aab30 on vm1-4:31010] > at > org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:110) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32) > [drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:205) > [drill-java-exec-1.2.0.jar:1.2.0] > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) > [netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) > [netty-handler-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > [netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transp
[jira] [Created] (DRILL-3798) Cannot group by the functions without ()
Hao Zhu created DRILL-3798: -- Summary: Cannot group by the functions without () Key: DRILL-3798 URL: https://issues.apache.org/jira/browse/DRILL-3798 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.1.0 Reporter: Hao Zhu Assignee: Mehant Baid Drill can not group-by the function without (). eg: {code} SELECT CURRENT_DATE FROM hive.h1db.testdate group by CURRENT_DATE; Caused By (org.apache.calcite.sql.validate.SqlValidatorException) Column 'CURRENT_DATE' not found in any table {code} Bad ones: {code} SELECT CURRENT_TIME FROM hive.h1db.testdate group by CURRENT_TIME; SELECT CURRENT_TIMESTAMP FROM hive.h1db.testdate group by CURRENT_TIMESTAMP; SELECT LOCALTIME FROM hive.h1db.testdate group by LOCALTIME; SELECT LOCALTIMESTAMP FROM hive.h1db.testdate group by LOCALTIMESTAMP; {code} Good ones: {code} SELECT NOW() FROM hive.h1db.testdate group by NOW(); SELECT TIMEOFDAY() FROM hive.h1db.testdate group by TIMEOFDAY(); SELECT UNIX_TIMESTAMP() FROM hive.h1db.testdate group by UNIX_TIMESTAMP(); SELECT PI() FROM hive.h1db.testdate group by PI(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3735) Directory pruning is not happening when number of files is larger than 64k
Hao Zhu created DRILL-3735: -- Summary: Directory pruning is not happening when number of files is larger than 64k Key: DRILL-3735 URL: https://issues.apache.org/jira/browse/DRILL-3735 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 1.1.0 Reporter: Hao Zhu Assignee: Jinfeng Ni When the number of files is larger than 64k limit, directory pruning is not happening. We need to increase this limit further to handle most use cases. My proposal is to separate the code for directory pruning and partition pruning. Say in a parent directory there are 100 directories and 1 million files. If we only query the file from one directory, we should firstly read the 100 directories and narrow down to which directory; and then read the file paths in that directory in memory and do the rest stuff. Current behavior is , Drill will read all the file paths of that 1 million files in memory firstly, and then do directory pruning or partition pruning. This is not performance efficient nor memory efficient. And also it can not scale. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3727) Drill should return NULL instead of failure if cast column is empty
[ https://issues.apache.org/jira/browse/DRILL-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724471#comment-14724471 ] Hao Zhu commented on DRILL-3727: PostgreSQL's behavior is the similar as Drill. {code} test=# create table testempty(col0 varchar); CREATE TABLE test=# insert into testempty values test-# (''); INSERT 0 1 test=# insert into testempty values('2015-01-01'); INSERT 0 1 test=# select * from testempty ; col0 2015-01-01 (2 rows) test=# select cast(col0 as date) from testempty; ERROR: invalid input syntax for type date: "" test=# select case when col0='' then null else cast(col0 as date) end from testempty; col0 2015-01-01 (2 rows) {code} > Drill should return NULL instead of failure if cast column is empty > --- > > Key: DRILL-3727 > URL: https://issues.apache.org/jira/browse/DRILL-3727 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Hive >Affects Versions: 1.1.0 > Environment: 1.1 >Reporter: Hao Zhu >Assignee: Mehant Baid > > If Drill is casting an empty string to date, it will fail with error: > Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must > be in the range [1,12] > However Hive can just return a NULL instead. > I think it makes sense for Drill to have the same behavior as Hive in this > case. > Repro: > Hive: > {code} > create table h1db.testempty(col0 string) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' > STORED AS TEXTFILE > ; > hive> select * from h1db.testempty ; > OK > 2015-01-01 > Time taken: 0.28 seconds, Fetched: 2 row(s) > hive> select cast(col0 as date) from h1db.testempty; > OK > NULL > 2015-01-01 > Time taken: 0.078 seconds, Fetched: 2 row(s) > {code} > Drill: > {code} > use hive; > > select * from h1db.testempty ; > +-+ > |col0 | > +-+ > | | > | 2015-01-01 | > +-+ > 2 rows selected (0.232 seconds) > > select cast(col0 as date) from h1db.testempty; > Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must > be in the range [1,12] > {code} > Workaround: > {code} > > select case when col0='' then null else cast(col0 as date) end from > > h1db.testempty; > +-+ > | EXPR$0| > +-+ > | null| > | 2015-01-01 | > +-+ > 2 rows selected (0.287 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3727) Drill should return NULL instead of failure if cast column is empty
Hao Zhu created DRILL-3727: -- Summary: Drill should return NULL instead of failure if cast column is empty Key: DRILL-3727 URL: https://issues.apache.org/jira/browse/DRILL-3727 Project: Apache Drill Issue Type: Bug Components: Functions - Hive Affects Versions: 1.1.0 Environment: 1.1 Reporter: Hao Zhu Assignee: Mehant Baid If Drill is casting an empty string to date, it will fail with error: Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must be in the range [1,12] However Hive can just return a NULL instead. I think it makes sense for Drill to have the same behavior as Hive in this case. Repro: Hive: {code} create table h1db.testempty(col0 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE ; hive> select * from h1db.testempty ; OK 2015-01-01 Time taken: 0.28 seconds, Fetched: 2 row(s) hive> select cast(col0 as date) from h1db.testempty; OK NULL 2015-01-01 Time taken: 0.078 seconds, Fetched: 2 row(s) {code} Drill: {code} use hive; > select * from h1db.testempty ; +-+ |col0 | +-+ | | | 2015-01-01 | +-+ 2 rows selected (0.232 seconds) > select cast(col0 as date) from h1db.testempty; Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must be in the range [1,12] {code} Workaround: {code} > select case when col0='' then null else cast(col0 as date) end from > h1db.testempty; +-+ | EXPR$0| +-+ | null| | 2015-01-01 | +-+ 2 rows selected (0.287 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable
[ https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712244#comment-14712244 ] Hao Zhu commented on DRILL-3710: a. No optimization {code} explain plan for select count(1) from h1_passwords where cast(col2 as int) in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) 00-02Project($f0=[1]) 00-03 SelectionVectorRemover 00-04Filter(condition=[OR(=(CAST($0):INTEGER, 1), =(CAST($0):INTEGER, 2), =(CAST($0):INTEGER, 3), =(CAST($0):INTEGER, 4), =(CAST($0):INTEGER, 5), =(CAST($0):INTEGER, 6), =(CAST($0):INTEGER, 7), =(CAST($0):INTEGER, 8), =(CAST($0):INTEGER, 9), =(CAST($0):INTEGER, 10), =(CAST($0):INTEGER, 11), =(CAST($0):INTEGER, 12), =(CAST($0):INTEGER, 13), =(CAST($0):INTEGER, 14), =(CAST($0):INTEGER, 15), =(CAST($0):INTEGER, 16), =(CAST($0):INTEGER, 17), =(CAST($0):INTEGER, 18), =(CAST($0):INTEGER, 19))]) 00-05 Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h1_passwords), inputSplits=[maprfs:///user/hive/warehouse/h1_passwords/passwd:0+1680], columns=[`col2`], partitions= null]]) {code} b. With optimization {code} explain plan for select count(1) from h1_passwords where cast(col2 as int) in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) 00-02Project($f0=[1]) 00-03 Project(f6=[$1], ROW_VALUE=[$0]) 00-04MergeJoin(condition=[=($1, $0)], joinType=[inner]) 00-06 SelectionVectorRemover 00-08Sort(sort0=[$0], dir0=[ASC]) 00-10 HashAgg(group=[{0}]) 00-12Values 00-05 SelectionVectorRemover 00-07Sort(sort0=[$0], dir0=[ASC]) 00-09 Project(f6=[CAST($0):INTEGER]) 00-11Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h1_passwords), inputSplits=[maprfs:///user/hive/warehouse/h1_passwords/passwd:0+1680], columns=[`col2`], partitions= null]]) {code} > Make the 20 in-list optimization configurable > - > > Key: DRILL-3710 > URL: https://issues.apache.org/jira/browse/DRILL-3710 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.1.0 >Reporter: Hao Zhu >Assignee: Jinfeng Ni > > If Drill has more than 20 in-lists , Drill can do an optimization to convert > that in-lists into a small hash table in memory, and then do a table join > instead. > This can improve the performance of the query which has many in-lists. > Could we make "20" configurable? So that we do not need to add duplicate/junk > in-list to make it more than 20. > Sample query is : > select count(*) from table where col in > (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3710) Make the 20 in-list optimization configurable
Hao Zhu created DRILL-3710: -- Summary: Make the 20 in-list optimization configurable Key: DRILL-3710 URL: https://issues.apache.org/jira/browse/DRILL-3710 Project: Apache Drill Issue Type: Improvement Components: Query Planning & Optimization Affects Versions: 1.1.0 Reporter: Hao Zhu Assignee: Jinfeng Ni If Drill has more than 20 in-lists , Drill can do an optimization to convert that in-lists into a small hash table in memory, and then do a table join instead. This can improve the performance of the query which has many in-lists. Could we make "20" configurable? So that we do not need to add duplicate/junk in-list to make it more than 20. Sample query is : select count(*) from table where col in (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3688) Drill should honor "skip.header.line.count" attribute of Hive table
Hao Zhu created DRILL-3688: -- Summary: Drill should honor "skip.header.line.count" attribute of Hive table Key: DRILL-3688 URL: https://issues.apache.org/jira/browse/DRILL-3688 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 1.1.0 Environment: 1.1 Reporter: Hao Zhu Assignee: Jinfeng Ni Currently Drill does not honor the "skip.header.line.count" attribute of Hive table. It may cause some other format conversion issue. Reproduce: 1. Create a Hive table {code} create table h1db.testheader(col0 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE tblproperties("skip.header.line.count"="1"); {code} 2. Prepare a sample data: {code} # cat test.data col0 2015-01-01 {code} 3. Load sample data into Hive {code} LOAD DATA LOCAL INPATH '/xxx/test.data' OVERWRITE INTO TABLE h1db.testheader; {code} 4. Hive {code} hive> select * from h1db.testheader ; OK 2015-01-01 Time taken: 0.254 seconds, Fetched: 1 row(s) {code} 5. Drill {code} > select * from hive.h1db.testheader ; +-+ |col0 | +-+ | col0| | 2015-01-01 | +-+ 2 rows selected (0.257 seconds) > select cast(col0 as date) from hive.h1db.testheader ; Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must be in the range [1,12] Fragment 0:0 [Error Id: 34353702-ca27-440b-a4f4-0c9f79fc8ccd on h1.poc.com:31010] (org.joda.time.IllegalFieldValueException) Value 0 for monthOfYear must be in the range [1,12] org.joda.time.field.FieldUtils.verifyValueBounds():236 org.joda.time.chrono.BasicChronology.getDateMidnightMillis():613 org.joda.time.chrono.BasicChronology.getDateTimeMillis():159 org.joda.time.chrono.AssembledChronology.getDateTimeMillis():120 org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.memGetDate():261 org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getDate():218 org.apache.drill.exec.test.generated.ProjectorGen0.doEval():67 org.apache.drill.exec.test.generated.ProjectorGen0.projectRecords():62 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():172 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 org.apache.drill.exec.record.AbstractRecordBatch.next():147 org.apache.drill.exec.physical.impl.BaseRootExec.next():83 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():79 org.apache.drill.exec.physical.impl.BaseRootExec.next():73 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():261 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():255 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1566 org.apache.drill.exec.work.fragment.FragmentExecutor.run():255 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1142 java.util.concurrent.ThreadPoolExecutor$Worker.run():617 java.lang.Thread.run():745 (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3678) Plan generating for Drill on Hive takes huge java heap size
Hao Zhu created DRILL-3678: -- Summary: Plan generating for Drill on Hive takes huge java heap size Key: DRILL-3678 URL: https://issues.apache.org/jira/browse/DRILL-3678 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 1.1.0 Environment: 1.1 Reporter: Hao Zhu Assignee: Jinfeng Ni ===Env=== Drill 1.0 on Hive 0.13 (Also tested Drill 1.1 and get the same behavior) 8 nodes drill cluster. Jave heap size is set to 8G and Direct memory is set to 96G on each drillbit. ===Symptom=== This is a Hive parquet partition table which has multi-level partitions. The Hive table size is several TB with tens of thousands leaf partitions. When doing a "select * from table limit 10", the query keeps in "pending" state to generate the SQL plan. And finally the drillbits crashed with java heap OOM. {code} java.lang.OutOfMemoryError: Java heap space at hive.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:599) at hive.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:360) at hive.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:100) at hive.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172) at hive.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:95) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:66) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72) at org.apache.drill.exec.store.hive.HiveRecordReader.init(HiveRecordReader.java:246) at org.apache.drill.exec.store.hive.HiveRecordReader.(HiveRecordReader.java:138) at org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch(HiveScanBatchCreator.java:58) at org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch(HiveScanBatchCreator.java:34) at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:150) at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:173) at org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:106) at org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:81) at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:235) at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Node ran out of Heap memory, exiting. java.lang.OutOfMemoryError: Java heap space {code} We captured some stacktraces of foreman thread in foreman drillbit, here are 2 times' examples: {code} 2a482cd9-7fb2-c492-1356-d049e90870c8:foreman id=115 state=RUNNABLE at org.apache.xerces.dom.DeferredElementNSImpl.synchronizeData(Unknown Source) at org.apache.xerces.dom.ElementImpl.getTagName(Unknown Source) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2234) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2151) at org.apache.hadoop.conf.Configuration.get(Configuration.java:871) at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2069) at org.apache.hadoop.mapred.JobConf.(JobConf.java:421) at org.apache.drill.exec.store.hive.HiveScan.splitInput(HiveScan.java:178) at org.apache.drill.exec.store.hive.HiveScan.getSplits(HiveScan.java:167) at org.apache.drill.exec.store.hive.HiveScan.access$000(HiveScan.java:69) at org.apache.drill.exec.store.hive.HiveScan$1.run(HiveScan.java:146) at org.apache.drill.exec.store.hive.HiveScan$1.run(HiveScan.java:144) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) at org.apache.drill.exec.store.hive.HiveScan.getSplitsWithUGI(HiveScan.java:144) at org.apache.drill.exec.store.hive.HiveScan.(HiveScan.java:119) at org.apache.drill.exec.store.hive.HiveStoragePlugin.getPhysicalScan(HiveStoragePlugin.java:78) at org.apache.drill.exec.store.hive.HiveStoragePlugin.getPhysicalScan(HiveStoragePlugin.java:41) at org.apache.drill.exec.store.AbstractStoragePlugin.getP
[jira] [Updated] (DRILL-3621) Wrong results when Drill on Hbase query contains rowkey "or" or "IN"
[ https://issues.apache.org/jira/browse/DRILL-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Zhu updated DRILL-3621: --- Component/s: (was: Execution - Flow) Query Planning & Optimization > Wrong results when Drill on Hbase query contains rowkey "or" or "IN" > > > Key: DRILL-3621 > URL: https://issues.apache.org/jira/browse/DRILL-3621 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.1.0 >Reporter: Hao Zhu >Assignee: Chris Westin >Priority: Critical > > If Drill on Hbase query contains row_key "in" or "or", it produces wrong > results. > For example: > 1. Create a hbase table > {code} > create 'testrowkey','cf' > put 'testrowkey','DUMMY1','cf:c','value1' > put 'testrowkey','DUMMY2','cf:c','value2' > put 'testrowkey','DUMMY3','cf:c','value3' > put 'testrowkey','DUMMY4','cf:c','value4' > put 'testrowkey','DUMMY5','cf:c','value5' > put 'testrowkey','DUMMY6','cf:c','value6' > put 'testrowkey','DUMMY7','cf:c','value7' > put 'testrowkey','DUMMY8','cf:c','value8' > put 'testrowkey','DUMMY9','cf:c','value9' > put 'testrowkey','DUMMY10','cf:c','value10' > {code} > 2. Drill queries: > {code} > 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> SELECT > CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY = > 'DUMMY10'; > +--+ > |RK| > +--+ > | DUMMY10 | > +--+ > 1 row selected (1.186 seconds) > 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> SELECT > CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY = > 'DUMMY1'; > +-+ > | RK| > +-+ > | DUMMY1 | > +-+ > 1 row selected (0.691 seconds) > 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> SELECT > CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY IN > ('DUMMY1' , 'DUMMY10'); > +-+ > | RK| > +-+ > | DUMMY1 | > +-+ > 1 row selected (0.71 seconds) > 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> SELECT > CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY > ='DUMMY1' OR ROW_KEY = 'DUMMY10'; > +-+ > | RK| > +-+ > | DUMMY1 | > +-+ > 1 row selected (0.693 seconds) > {code} > From explain plan, filter is pushed down to hbase scan layer. > {code} > 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> explain plan for SELECT > CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY IN > ('DUMMY1' , 'DUMMY10'); > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(RK=[CONVERT_FROMUTF8($0)]) > 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec > [tableName=testrowkey, startRow=DUMMY1, stopRow=DUMMY10, filter=null], > columns=[`row_key`]]]) > | > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3621) Wrong results when Drill on Hbase query contains rowkey "or" or "IN"
Hao Zhu created DRILL-3621: -- Summary: Wrong results when Drill on Hbase query contains rowkey "or" or "IN" Key: DRILL-3621 URL: https://issues.apache.org/jira/browse/DRILL-3621 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.1.0 Reporter: Hao Zhu Assignee: Chris Westin Priority: Critical If Drill on Hbase query contains row_key "in" or "or", it produces wrong results. For example: 1. Create a hbase table {code} create 'testrowkey','cf' put 'testrowkey','DUMMY1','cf:c','value1' put 'testrowkey','DUMMY2','cf:c','value2' put 'testrowkey','DUMMY3','cf:c','value3' put 'testrowkey','DUMMY4','cf:c','value4' put 'testrowkey','DUMMY5','cf:c','value5' put 'testrowkey','DUMMY6','cf:c','value6' put 'testrowkey','DUMMY7','cf:c','value7' put 'testrowkey','DUMMY8','cf:c','value8' put 'testrowkey','DUMMY9','cf:c','value9' put 'testrowkey','DUMMY10','cf:c','value10' {code} 2. Drill queries: {code} 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> SELECT CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY = 'DUMMY10'; +--+ |RK| +--+ | DUMMY10 | +--+ 1 row selected (1.186 seconds) 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> SELECT CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY = 'DUMMY1'; +-+ | RK| +-+ | DUMMY1 | +-+ 1 row selected (0.691 seconds) 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> SELECT CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY IN ('DUMMY1' , 'DUMMY10'); +-+ | RK| +-+ | DUMMY1 | +-+ 1 row selected (0.71 seconds) 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> SELECT CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY ='DUMMY1' OR ROW_KEY = 'DUMMY10'; +-+ | RK| +-+ | DUMMY1 | +-+ 1 row selected (0.693 seconds) {code} >From explain plan, filter is pushed down to hbase scan layer. {code} 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> explain plan for SELECT CONVERT_FROM(ROW_KEY,'UTF8') RK FROM hbase.testrowkey T WHERE ROW_KEY IN ('DUMMY1' , 'DUMMY10'); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(RK=[CONVERT_FROMUTF8($0)]) 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=testrowkey, startRow=DUMMY1, stopRow=DUMMY10, filter=null], columns=[`row_key`]]]) | {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3579) Drill on Hive query fails if partition table has __HIVE_DEFAULT_PARTITION__
Hao Zhu created DRILL-3579: -- Summary: Drill on Hive query fails if partition table has __HIVE_DEFAULT_PARTITION__ Key: DRILL-3579 URL: https://issues.apache.org/jira/browse/DRILL-3579 Project: Apache Drill Issue Type: Bug Components: Functions - Hive Affects Versions: 1.1.0 Environment: Drill 1.1 on Hive 1.0 Reporter: Hao Zhu Assignee: Mehant Baid If Hive's partition table has __HIVE_DEFAULT_PARTITION__ in the case of null values in the partition column, Drill on Hive query will fail. Minimum reproduce: 1.Hive: {code} CREATE TABLE h1_testpart2(id INT) PARTITIONED BY(id2 int); set hive.exec.dynamic.partition.mode=nonstrict; INSERT OVERWRITE TABLE h1_testpart2 PARTITION(id2) SELECT 1 as id1 , 20150101 as id2 from h1_passwords limit 1; INSERT OVERWRITE TABLE h1_testpart2 PARTITION(id2) SELECT 1 as id1 , null as id2 from h1_passwords limit 1; {code} 2. Filesystem looks like: {code} h1 h1_testpart2]# ls -altr total 2 drwxrwxrwx 89 mapr mapr 87 Jul 30 00:04 .. drwxr-xr-x 2 mapr mapr 1 Jul 30 00:05 id2=20150101 drwxr-xr-x 2 mapr mapr 1 Jul 30 00:05 id2=__HIVE_DEFAULT_PARTITION__ drwxr-xr-x 4 mapr mapr 2 Jul 30 00:05 . {code} 3.Drill will fail: {code} select * from h1_testpart2; Error: SYSTEM ERROR: NumberFormatException: For input string: "__HIVE_DEFAULT_PARTITION__" Fragment 0:0 [Error Id: 509eb392-db9a-42f3-96ea-fb597425f49f on h1.poc.com:31010] (java.lang.reflect.UndeclaredThrowableException) null org.apache.hadoop.security.UserGroupInformation.doAs():1581 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():136 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():131 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173 org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106 org.apache.drill.exec.physical.impl.ImplCreator.getExec():81 org.apache.drill.exec.work.fragment.FragmentExecutor.run():235 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1142 java.util.concurrent.ThreadPoolExecutor$Worker.run():617 java.lang.Thread.run():745 Caused By (org.apache.drill.common.exceptions.ExecutionSetupException) Failure while initializing HiveRecordReader: For input string: "__HIVE_DEFAULT_PARTITION__" org.apache.drill.exec.store.hive.HiveRecordReader.init():241 org.apache.drill.exec.store.hive.HiveRecordReader.():138 org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58 org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34 org.apache.drill.exec.physical.impl.ImplCreator$2.run():138 org.apache.drill.exec.physical.impl.ImplCreator$2.run():136 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1566 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():136 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():131 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173 org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106 org.apache.drill.exec.physical.impl.ImplCreator.getExec():81 org.apache.drill.exec.work.fragment.FragmentExecutor.run():235 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1142 java.util.concurrent.ThreadPoolExecutor$Worker.run():617 java.lang.Thread.run():745 Caused By (java.lang.NumberFormatException) For input string: "__HIVE_DEFAULT_PARTITION__" java.lang.NumberFormatException.forInputString():65 java.lang.Integer.parseInt():580 java.lang.Integer.parseInt():615 org.apache.drill.exec.store.hive.HiveRecordReader.convertPartitionType():605 org.apache.drill.exec.store.hive.HiveRecordReader.init():236 org.apache.drill.exec.store.hive.HiveRecordReader.():138 org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58 org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34 org.apache.drill.exec.physical.impl.ImplCreator$2.run():138 org.apache.drill.exec.physical.impl.ImplCreator$2.run():136 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1566 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():136 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():131 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173 org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106 or
[jira] [Created] (DRILL-3578) UnsupportedOperationException: Unable to get value vector class for minor type [FIXEDBINARY] and mode [OPTIONAL]
Hao Zhu created DRILL-3578: -- Summary: UnsupportedOperationException: Unable to get value vector class for minor type [FIXEDBINARY] and mode [OPTIONAL] Key: DRILL-3578 URL: https://issues.apache.org/jira/browse/DRILL-3578 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 1.1.0 Reporter: Hao Zhu Assignee: Hanifi Gunes The issue is Drill fails to read "timestamp" type in parquet file generated by Hive. How to reproduce: 1. Create a external Hive CSV table in hive 1.0: {code} create external table type_test_csv ( id1 int, id2 string, id3 timestamp, id4 double ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/xxx/testcsv'; {code} 2. Put sample data for above external table: {code} 1,One,2015-01-01 00:01:00,1.0 2,Two,2015-01-02 00:02:00,2.0 {code} 3. Create a parquet hive table: {code} create external table type_test ( id1 int, id2 string, id3 timestamp, id4 double ) STORED AS PARQUET LOCATION '/xxx/type_test'; INSERT OVERWRITE TABLE type_test SELECT * FROM type_test_csv; {code} 4. Then querying the parquet file directly through filesystem storage plugin: {code} > select * from dfs.`xxx/type_test`; Error: SYSTEM ERROR: UnsupportedOperationException: Unable to get value vector class for minor type [FIXEDBINARY] and mode [OPTIONAL] Fragment 0:0 [Error Id: fccfe8b2-6427-46e5-8bfd-cac639e526e8 on h3.poc.com:31010] (state=,code=0) {code} 5. If the sample data is only 1 row: {code} 1,One,2015-01-01 00:01:00,1.0 {code} Then the error message would become: {code} > select * from dfs.`xxx/type_test`; Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type:INT96 [Error Id: b52b5d46-63a8-4be6-a11d-999a1b46c7c2 on h3.poc.com:31010] (state=,code=0) {code} Using Hive storage plugin works fine. This issue only applies to filesystem storage plugin. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-1773) Issues when using JAVA code through Drill JDBC driver
[ https://issues.apache.org/jira/browse/DRILL-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609024#comment-14609024 ] Hao Zhu edited comment on DRILL-1773 at 6/30/15 8:42 PM: - I tested on Drill 1.0 and the same issue that "DEBUG" messages are showing. However the 2nd issue seems fixed, and I do not need to "ctrl-C" any more. was (Author: haozhu): I tested on Drill 1.0 and the same issue that "DEBUG" messages are showing. > Issues when using JAVA code through Drill JDBC driver > - > > Key: DRILL-1773 > URL: https://issues.apache.org/jira/browse/DRILL-1773 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 0.6.0, 0.7.0 > Environment: Tested on 0.6R3 >Reporter: Hao Zhu >Assignee: Daniel Barclay (Drill) > Fix For: 1.2.0 > > Attachments: DrillHandler.patch, DrillJdbcExample.java > > > When executing attached simple JAVA code through Drill JDBC driver(0..6 R3), > the query got executed and returned the correct result, however there are 2 > issues: > 1. It keeps printing DEBUG information. > Is it default behavior or is there any way to disable DEBUG? > eg: > {code} > 13:30:44.702 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator > - Generated: > io.netty.util.internal.__matchers__.io.netty.buffer.ByteBufMatcher > 13:30:44.706 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator > - Generated: > io.netty.util.internal.__matchers__.org.apache.drill.exec.rpc.OutboundRpcMessageMatcher > 13:30:44.708 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator > - Generated: > io.netty.util.internal.__matchers__.org.apache.drill.exec.rpc.InboundRpcMessageMatcher > 13:30:44.717 [Client-1] DEBUG io.netty.util.Recycler - > -Dio.netty.recycler.maxCapacity.default: 262144 > {code} > 2. After the query finished, it seems not close the connection and did not > return to shell prompt. > I have to manually issue "ctrl-C" to stop it. > {code} > 13:31:11.239 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG > org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: > 0x1497d1d0d040839 after 0ms > 13:31:24.573 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG > org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: > 0x1497d1d0d040839 after 1ms > 13:31:37.906 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG > org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: > 0x1497d1d0d040839 after 0ms > ^CAdministrators-MacBook-Pro-40:xxx$ > {code} > > The DrillJdbcExample.java is attached. > Command to run: > {code} > javac DrillJdbcExample.java > java DrillJdbcExample > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1773) Issues when using JAVA code through Drill JDBC driver
[ https://issues.apache.org/jira/browse/DRILL-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609024#comment-14609024 ] Hao Zhu commented on DRILL-1773: I tested on Drill 1.0 and the same issue that "DEBUG" messages are showing. > Issues when using JAVA code through Drill JDBC driver > - > > Key: DRILL-1773 > URL: https://issues.apache.org/jira/browse/DRILL-1773 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 0.6.0, 0.7.0 > Environment: Tested on 0.6R3 >Reporter: Hao Zhu >Assignee: Daniel Barclay (Drill) > Fix For: 1.2.0 > > Attachments: DrillHandler.patch, DrillJdbcExample.java > > > When executing attached simple JAVA code through Drill JDBC driver(0..6 R3), > the query got executed and returned the correct result, however there are 2 > issues: > 1. It keeps printing DEBUG information. > Is it default behavior or is there any way to disable DEBUG? > eg: > {code} > 13:30:44.702 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator > - Generated: > io.netty.util.internal.__matchers__.io.netty.buffer.ByteBufMatcher > 13:30:44.706 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator > - Generated: > io.netty.util.internal.__matchers__.org.apache.drill.exec.rpc.OutboundRpcMessageMatcher > 13:30:44.708 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator > - Generated: > io.netty.util.internal.__matchers__.org.apache.drill.exec.rpc.InboundRpcMessageMatcher > 13:30:44.717 [Client-1] DEBUG io.netty.util.Recycler - > -Dio.netty.recycler.maxCapacity.default: 262144 > {code} > 2. After the query finished, it seems not close the connection and did not > return to shell prompt. > I have to manually issue "ctrl-C" to stop it. > {code} > 13:31:11.239 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG > org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: > 0x1497d1d0d040839 after 0ms > 13:31:24.573 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG > org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: > 0x1497d1d0d040839 after 1ms > 13:31:37.906 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG > org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: > 0x1497d1d0d040839 after 0ms > ^CAdministrators-MacBook-Pro-40:xxx$ > {code} > > The DrillJdbcExample.java is attached. > Command to run: > {code} > javac DrillJdbcExample.java > java DrillJdbcExample > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3336) to_date(to_timestamp) with group-by in hbase/maprdb table fails with "java.lang.UnsupportedOperationException"
Hao Zhu created DRILL-3336: -- Summary: to_date(to_timestamp) with group-by in hbase/maprdb table fails with "java.lang.UnsupportedOperationException" Key: DRILL-3336 URL: https://issues.apache.org/jira/browse/DRILL-3336 Project: Apache Drill Issue Type: Bug Components: Execution - Flow, Functions - Drill Affects Versions: 1.0.0 Environment: 1.0 GA version Reporter: Hao Zhu Assignee: Chris Westin Priority: Critical 1. Create a hbase/maprdb table in hbase shell: {code} create '/tables/esr52','cf' put '/tables/esr52','1434998909','cf:c','abc' > scan '/tables/esr52' ROW COLUMN+CELL 1434998909 column=cf:c, timestamp=1434998994785, value=abc {code} 2. Below SQLs work fine in Drill: {code} > select * from maprdb.esr52; +--+---+ | row_key| cf | +--+---+ | [B@5bafd971 | {"c":"YWJj"} | +--+---+ 1 row selected (0.095 seconds) > select to_date(to_timestamp(cast(convert_from(esrtable.row_key,'UTF8') as > int))) from maprdb.esr52 esrtable; +-+ | EXPR$0| +-+ | 2015-06-22 | +-+ 1 row selected (0.127 seconds) {code} 3. However below SQL with group-by fails: {code} select to_date(to_timestamp(cast(convert_from(esrtable.row_key,'UTF8') as int))),count(*) from maprdb.esr52 esrtable group by to_date(to_timestamp(cast(convert_from(esrtable.row_key,'UTF8') as int))); Error: SYSTEM ERROR: java.lang.UnsupportedOperationException: Failure finding function that runtime code generation expected. Signature: compare_to_nulls_high( VAR16CHAR:OPTIONAL, VAR16CHAR:OPTIONAL ) returns INT:REQUIRED Fragment 3:0 [Error Id: 26003311-d40e-4a95-9d3c-68793459ad6d on h1.poc.com:31010] (java.lang.UnsupportedOperationException) Failure finding function that runtime code generation expected. Signature: compare_to_nulls_high( VAR16CHAR:OPTIONAL, VAR16CHAR:OPTIONAL ) returns INT:REQUIRED org.apache.drill.exec.expr.fn.FunctionGenerationHelper.getFunctionExpression():109 org.apache.drill.exec.expr.fn.FunctionGenerationHelper.getOrderingComparator():62 org.apache.drill.exec.expr.fn.FunctionGenerationHelper.getOrderingComparatorNullsHigh():79 org.apache.drill.exec.physical.impl.common.ChainedHashTable.setupIsKeyMatchInternal():257 org.apache.drill.exec.physical.impl.common.ChainedHashTable.createAndSetupHashTable():206 org.apache.drill.exec.test.generated.HashAggregatorGen1.setup():273 org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregatorInternal():240 org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregator():163 org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema():110 org.apache.drill.exec.record.AbstractRecordBatch.next():127 org.apache.drill.exec.record.AbstractRecordBatch.next():105 org.apache.drill.exec.record.AbstractRecordBatch.next():95 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 org.apache.drill.exec.record.AbstractRecordBatch.next():146 org.apache.drill.exec.physical.impl.BaseRootExec.next():83 org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():95 org.apache.drill.exec.physical.impl.BaseRootExec.next():73 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1566 org.apache.drill.exec.work.fragment.FragmentExecutor.run():253 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1142 java.util.concurrent.ThreadPoolExecutor$Worker.run():617 java.lang.Thread.run():745 (state=,code=0) {code} 4. If we remove to_date, and only group-by to_timestamp, it works fine: {code} select to_timestamp(cast(convert_from(esrtable.row_key,'UTF8') as int)) from maprdb.esr52 esrtable; ++ | EXPR$0 | ++ | 2015-06-22 18:48:29.0 | ++ 1 row selected (0.084 seconds) select to_timestamp(cast(convert_from(esrtable.row_key,'UTF8') as int)),count(*) from maprdb.esr52 esrtable group by to_timestamp(cast(convert_from(esrtable.row_key,'UTF8') as int)); ++-+ | EXPR$0 | EXPR$1 | ++-+ | 2015-06-22 18:48:29.0 | 1 | ++-+ 1 row selected (0.641 seconds) {code}
[jira] [Commented] (DRILL-3121) Hive partition pruning is not happening
[ https://issues.apache.org/jira/browse/DRILL-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547451#comment-14547451 ] Hao Zhu commented on DRILL-3121: DFS is working fine. {code} > explain plan for select * from dfs.drill.`part1` where dir0='2015' and (dir1 > >= '02' and dir1 <= '03'); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(*=[$0]) 00-02Project(*=[$0]) 00-03 Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/part1, numFiles=2, columns=[`*`], files=[maprfs:/drill/part1/2015/02/02.csv, maprfs:/drill/part1/2015/03/03.csv]]]) {code} > Hive partition pruning is not happening > --- > > Key: DRILL-3121 > URL: https://issues.apache.org/jira/browse/DRILL-3121 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.0.0 >Reporter: Hao Zhu >Assignee: Chris Westin > Fix For: 1.1.0 > > > Tested on 1.0.0 with below commit id, and hive 0.13. > {code} > > select * from sys.version; > +---+++--++ > | commit_id | > commit_message |commit_time | > build_email | build_time | > +---+++--++ > | d8b19759657698581cc0d01d7038797952888123 | DRILL-3100: > TestImpersonationDisabledWithMiniDFS fails on Windows | 15.05.2015 @ > 01:18:03 EDT | Unknown | 15.05.2015 @ 03:07:10 EDT | > +---+++--++ > 1 row selected (0.083 seconds) > {code} > How to reproduce: > 1. Use hive to create below partition table: > {code} > CREATE TABLE partition_table(id INT, username string) > PARTITIONED BY(year STRING, month STRING) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; > insert into table partition_table PARTITION(year='2014',month='11') select > 1,'u' from passwords limit 1; > insert into table partition_table PARTITION(year='2014',month='12') select > 2,'s' from passwords limit 1; > insert into table partition_table PARTITION(year='2015',month='01') select > 3,'e' from passwords limit 1; > insert into table partition_table PARTITION(year='2015',month='02') select > 4,'r' from passwords limit 1; > insert into table partition_table PARTITION(year='2015',month='03') select > 5,'n' from passwords limit 1; > {code} > 2. Hive query can do partition pruning for below 2 queries: > {code} > hive> explain EXTENDED select * from partition_table where year='2015' and > month in ( '02','03') ; > partition values: > month 02 > year 2015 > partition values: > month 03 > year 2015 > explain EXTENDED select * from partition_table where year='2015' and (month > >= '02' and month <= '03') ; > partition values: > month 02 > year 2015 > partition values: > month 03 > year 2015 > {code} > Hive only scans 2 partitions -- 2015/02 and 2015/03. > 3. Drill can not do partition pruning for below 2 queries: > {code} > > explain plan for select * from hive.partition_table where `year`='2015' and > > `month` in ('02','03'); > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) > 00-02SelectionVectorRemover > 00-03 Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, > '03')))]) > 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, > tableName:partition_table), > inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4, > maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, > maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], > columns=[`*`], partitions= [Partition(values:[2015, 01]), > Partition(values:[2015, 02]), Partition(values:[2015, 03])]]]) > > explain plan for select * from hive.partition_table where `year`='2015' and > > (`month` >= '02' and `month` <= '03' ); > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) > 00-02SelectionVectorRemover > 00-03 Filter(condition=[AND(=($2, '2015'), >=($3,
[jira] [Created] (DRILL-3121) Hive partition pruning is not happening
Hao Zhu created DRILL-3121: -- Summary: Hive partition pruning is not happening Key: DRILL-3121 URL: https://issues.apache.org/jira/browse/DRILL-3121 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Hao Zhu Assignee: Chris Westin Tested on 1.0.0 with below commit id, and hive 0.13. {code} > select * from sys.version; +---+++--++ | commit_id | commit_message |commit_time | build_email | build_time | +---+++--++ | d8b19759657698581cc0d01d7038797952888123 | DRILL-3100: TestImpersonationDisabledWithMiniDFS fails on Windows | 15.05.2015 @ 01:18:03 EDT | Unknown | 15.05.2015 @ 03:07:10 EDT | +---+++--++ 1 row selected (0.083 seconds) {code} How to reproduce: 1. Use hive to create below partition table: {code} CREATE TABLE partition_table(id INT, username string) PARTITIONED BY(year STRING, month STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; insert into table partition_table PARTITION(year='2014',month='11') select 1,'u' from passwords limit 1; insert into table partition_table PARTITION(year='2014',month='12') select 2,'s' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='01') select 3,'e' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='02') select 4,'r' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='03') select 5,'n' from passwords limit 1; {code} 2. Hive query can do partition pruning for below 2 queries: {code} hive> explain EXTENDED select * from partition_table where year='2015' and month in ( '02','03') ; partition values: month 02 year 2015 partition values: month 03 year 2015 explain EXTENDED select * from partition_table where year='2015' and (month >= '02' and month <= '03') ; partition values: month 02 year 2015 partition values: month 03 year 2015 {code} Hive only scans 2 partitions -- 2015/02 and 2015/03. 3. Drill can not do partition pruning for below 2 queries: {code} > explain plan for select * from hive.partition_table where `year`='2015' and > `month` in ('02','03'); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) 00-02SelectionVectorRemover 00-03 Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, '03')))]) 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015, 03])]]]) > explain plan for select * from hive.partition_table where `year`='2015' and > (`month` >= '02' and `month` <= '03' ); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) 00-02SelectionVectorRemover 00-03 Filter(condition=[AND(=($2, '2015'), >=($3, '02'), <=($3, '03'))]) 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015, 03])]]]) {code} Drill scans 3 partitions -- 2015/01, 2015/02 and 2015/03. Note: if the inlist only has 1 value, Drill can do partition pruning well: {code} > explain plan for select * from hive.partition_table where `year`='2015' and > `month` in ('02'); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(id=[$0], username=[
[jira] [Created] (DRILL-3119) Query stays in "CANCELLATION_REQUESTED" status in UI after OOM of Direct buffer memory
Hao Zhu created DRILL-3119: -- Summary: Query stays in "CANCELLATION_REQUESTED" status in UI after OOM of Direct buffer memory Key: DRILL-3119 URL: https://issues.apache.org/jira/browse/DRILL-3119 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Hao Zhu Assignee: Chris Westin Tested in 1.0.0 with below commit id: {code} > select * from sys.version; +---+++--++ | commit_id | commit_message |commit_time | build_email | build_time | +---+++--++ | d8b19759657698581cc0d01d7038797952888123 | DRILL-3100: TestImpersonationDisabledWithMiniDFS fails on Windows | 15.05.2015 @ 01:18:03 EDT | Unknown | 15.05.2015 @ 03:07:10 EDT | +---+++--++ 1 row selected (0.26 seconds) {code} How to reproduce: 1. Single node cluster. 2. Reduce DRILL_MAX_DIRECT_MEMORY="2G". 3. Run a hash join which is big enough to trigger OOM. eg: {code} select count(*) from ( select a.* from dfs.root.`user/hive/warehouse/passwords_csv_big` a, dfs.root.`user/hive/warehouse/passwords_csv_big` b where a.columns[1]=b.columns[1] ); {code} After that, drillbit.log shows OOM: {code} 2015-05-16 19:24:34,391 [2aa866ba-8939-b184-0ba2-291734329f88:frag:4:4] INFO o.a.d.e.w.fragment.FragmentExecutor - 2aa866ba-8939-b184-0ba2-291734329f88:4:4: State change requested from RUNNING --> FINISHED for 2015-05-16 19:24:34,391 [2aa866ba-8939-b184-0ba2-291734329f88:frag:4:4] INFO o.a.d.e.w.f.AbstractStatusReporter - State changed for 2aa866ba-8939-b184-0ba2-291734329f88:4:4. New state: FINISHED 2015-05-16 19:24:38,561 [BitServer-5] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.0.0.31:31012 <--> /10.0.0.31:41923 (data server). Closing connection. io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct buffer memory at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233) ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] Caused by: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.8.0_45] at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) ~[na:1.8.0_45] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[na:1.8.0_45] at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:437) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] at io.netty.buffer.PoolArena.a
[jira] [Commented] (DRILL-3118) "java.lang.IndexOutOfBoundsException" if the source data has a "dir0" column
[ https://issues.apache.org/jira/browse/DRILL-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546873#comment-14546873 ] Hao Zhu commented on DRILL-3118: Session level is working fine. Thanks Jacques. Could we correct the error so that it is more readable? > "java.lang.IndexOutOfBoundsException" if the source data has a "dir0" column > > > Key: DRILL-3118 > URL: https://issues.apache.org/jira/browse/DRILL-3118 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.0.0 >Reporter: Hao Zhu >Assignee: Chris Westin > > Tested on 1.0 with commit id: > {code} > select commit_id from sys.version; > +---+ > | commit_id | > +---+ > | d8b19759657698581cc0d01d7038797952888123 | > +---+ > 1 row selected (0.097 seconds) > {code} > When source data has column name like "dir0","dir1", the query may fail > with "java.lang.IndexOutOfBoundsException". > For example: > {code} > > select `dir999` from > > dfs.root.`user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet`; > Error: SYSTEM ERROR: java.lang.IndexOutOfBoundsException: index: 0, length: 4 > (expected: range(0, 0)) > Fragment 0:0 > [Error Id: d289b3d7-1172-4ed7-b679-7af80d9aca7c on h1.poc.com:31010] > (org.apache.drill.common.exceptions.DrillRuntimeException) Error in parquet > record reader. > Message: > Hadoop path: > /user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet > Total records read: 0 > Mock records read: 0 > Records to read: 32768 > Row group index: 0 > Records in row group: 1 > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema { > optional int32 id; > optional binary dir999; > } > , metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] > INT32 [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] > BINARY [PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]} > > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339 > > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441 > org.apache.drill.exec.physical.impl.ScanBatch.next():175 > org.apache.drill.exec.physical.impl.BaseRootExec.next():83 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80 > org.apache.drill.exec.physical.impl.BaseRootExec.next():73 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253 > java.security.AccessController.doPrivileged():-2 > optional int32 id; > optional binary dir999; > } > , metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] > INT32 [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] > BINARY [PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]} > > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339 > > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441 > org.apache.drill.exec.physical.impl.ScanBatch.next():175 > org.apache.drill.exec.physical.impl.BaseRootExec.next():83 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80 > org.apache.drill.exec.physical.impl.BaseRootExec.next():73 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1469 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():253 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.IndexOutOfBoundsException) index: 0, length: 4 > (expected: range(0, 0)) > io.netty.buffer.DrillBuf.checkIndexD():189 > io.netty.buffer.DrillBuf.chk():211 > io.netty.buffer.DrillBuf.getInt():491 > org.apache.drill.exec.vector.UInt4Vector$Accessor.get():321 > org.apache.drill.exec.vector.VarBinaryVector$Mutator.setSafe():481 > > org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.fillEmpties():408 > > org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.setValueCount():513 > > org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields():78 > > org.apache.drill.exec.store.parquet.col
[jira] [Created] (DRILL-3118) "java.lang.IndexOutOfBoundsException" if the source data has a "dir0" column
Hao Zhu created DRILL-3118: -- Summary: "java.lang.IndexOutOfBoundsException" if the source data has a "dir0" column Key: DRILL-3118 URL: https://issues.apache.org/jira/browse/DRILL-3118 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Hao Zhu Assignee: Chris Westin Tested on 1.0 with commit id: {code} select commit_id from sys.version; +---+ | commit_id | +---+ | d8b19759657698581cc0d01d7038797952888123 | +---+ 1 row selected (0.097 seconds) {code} When source data has column name like "dir0","dir1", the query may fail with "java.lang.IndexOutOfBoundsException". For example: {code} > select `dir999` from > dfs.root.`user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet`; Error: SYSTEM ERROR: java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 0)) Fragment 0:0 [Error Id: d289b3d7-1172-4ed7-b679-7af80d9aca7c on h1.poc.com:31010] (org.apache.drill.common.exceptions.DrillRuntimeException) Error in parquet record reader. Message: Hadoop path: /user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet Total records read: 0 Mock records read: 0 Records to read: 32768 Row group index: 0 Records in row group: 1 Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema { optional int32 id; optional binary dir999; } , metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] INT32 [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] BINARY [PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]} org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339 org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441 org.apache.drill.exec.physical.impl.ScanBatch.next():175 org.apache.drill.exec.physical.impl.BaseRootExec.next():83 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80 org.apache.drill.exec.physical.impl.BaseRootExec.next():73 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253 java.security.AccessController.doPrivileged():-2 optional int32 id; optional binary dir999; } , metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] INT32 [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] BINARY [PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]} org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339 org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441 org.apache.drill.exec.physical.impl.ScanBatch.next():175 org.apache.drill.exec.physical.impl.BaseRootExec.next():83 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80 org.apache.drill.exec.physical.impl.BaseRootExec.next():73 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1469 org.apache.drill.exec.work.fragment.FragmentExecutor.run():253 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1142 java.util.concurrent.ThreadPoolExecutor$Worker.run():617 java.lang.Thread.run():745 Caused By (java.lang.IndexOutOfBoundsException) index: 0, length: 4 (expected: range(0, 0)) io.netty.buffer.DrillBuf.checkIndexD():189 io.netty.buffer.DrillBuf.chk():211 io.netty.buffer.DrillBuf.getInt():491 org.apache.drill.exec.vector.UInt4Vector$Accessor.get():321 org.apache.drill.exec.vector.VarBinaryVector$Mutator.setSafe():481 org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.fillEmpties():408 org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.setValueCount():513 org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields():78 org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():425 org.apache.drill.exec.physical.impl.ScanBatch.next():175 org.apache.drill.exec.physical.impl.BaseRootExec.next():83 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80 org.apache.drill.exec.physical.impl.BaseRootExec.next():73 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject
[jira] [Commented] (DRILL-2100) Drill not deleting spooling files
[ https://issues.apache.org/jira/browse/DRILL-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546512#comment-14546512 ] Hao Zhu commented on DRILL-2100: Tested on Drill 1.0, when query finishes successfully, the spill directories remain but the files are deleted. The minimum reproduce is on a single node cluster: {code} alter system set `planner.memory.max_query_memory_per_node`=21474836; select count(*) from ( select columns[5] from dfs.root.`user/hive/warehouse/passwords_csv_middle` order by columns[0], columns[1],columns[2] ); {code} The table "passwords_csv_middle" is about 400MB. {code} [root@h1 spill]# ls -altr 2aa9600f-016a-5283-f98e-ef22942981c2/*/*/*/ 2aa9600f-016a-5283-f98e-ef22942981c2/major_fragment_2/minor_fragment_5/operator_2/: total 8 drwxr-xr-x 3 mapr mapr 4096 May 16 01:40 .. drwxr-xr-x 2 mapr mapr 4096 May 16 01:41 . 2aa9600f-016a-5283-f98e-ef22942981c2/major_fragment_2/minor_fragment_4/operator_2/: total 8 drwxr-xr-x 3 mapr mapr 4096 May 16 01:40 .. drwxr-xr-x 2 mapr mapr 4096 May 16 01:41 . 2aa9600f-016a-5283-f98e-ef22942981c2/major_fragment_2/minor_fragment_3/operator_2/: total 8 drwxr-xr-x 3 mapr mapr 4096 May 16 01:40 .. drwxr-xr-x 2 mapr mapr 4096 May 16 01:41 . 2aa9600f-016a-5283-f98e-ef22942981c2/major_fragment_2/minor_fragment_2/operator_2/: total 8 drwxr-xr-x 3 mapr mapr 4096 May 16 01:40 .. drwxr-xr-x 2 mapr mapr 4096 May 16 01:41 . 2aa9600f-016a-5283-f98e-ef22942981c2/major_fragment_2/minor_fragment_1/operator_2/: total 8 drwxr-xr-x 3 mapr mapr 4096 May 16 01:40 .. drwxr-xr-x 2 mapr mapr 4096 May 16 01:41 . 2aa9600f-016a-5283-f98e-ef22942981c2/major_fragment_2/minor_fragment_0/operator_2/: total 8 drwxr-xr-x 3 mapr mapr 4096 May 16 01:40 .. drwxr-xr-x 2 mapr mapr 4096 May 16 01:41 . [root@h1 spill]# pwd /tmp/drill/spill {code} I would suggest if SQL finishes successfully, the whole directory for SQL profile Id should be removed. > Drill not deleting spooling files > - > > Key: DRILL-2100 > URL: https://issues.apache.org/jira/browse/DRILL-2100 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 0.8.0 >Reporter: Abhishek Girish >Assignee: Steven Phillips > Fix For: 1.1.0 > > > Currently, after forcing queries to use an external sort by switching off > hash join/agg causes spill-to-disk files accumulating. > This causes issues with disk space availability when the spill is configured > to be on the local file system (/tmp/drill). Also not optimal when configured > to use DFS (custom). > Drill must clean up all temporary files created after a query completes or > after a drillbit restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3110) org.apache.drill.exec.rpc.RpcException: Data not accepted downstream.
[ https://issues.apache.org/jira/browse/DRILL-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546385#comment-14546385 ] Hao Zhu commented on DRILL-3110: This time this error is due to OOM of direct memory on one node: {code} 2015-05-15 23:29:14,590 [BitServer-7] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.0.0.28:31012 <--> /10.0.0.31:38972 (data server). Closing connection. io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct buffer memory at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:346) ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:229) ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] Caused by: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.8.0_45] at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) ~[na:1.8.0_45] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[na:1.8.0_45] at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:437) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] at io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:140) ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:4.0.27.Final] at io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171) ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:4.0.27.Final] at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:98) ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT] at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:106) ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT] at org.apache.drill.exec.rpc.ProtobufLengthDecoder.decode(ProtobufLengthDecoder.java:83) ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT] at org.apache.drill.exec.rpc.data.DataProtobufLengthDecoder$Server.decode(DataProtobufLengthDecoder.java:52) ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:315) ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] ... 12 common frames omitted {code} > org.apache.drill.exec.rpc.RpcException: Data not accepted downstream. > - > > Key: DRILL-3110 > URL: https://issues.apache.org/jira/browse/DRILL-3110 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.0.0 > Environment: > select commit_id from sys.version; > ++ > | commit_id | > ++ > | 5
[jira] [Commented] (DRILL-3110) org.apache.drill.exec.rpc.RpcException: Data not accepted downstream.
[ https://issues.apache.org/jira/browse/DRILL-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546381#comment-14546381 ] Hao Zhu commented on DRILL-3110: However after increasing drill.exec.buffer.size=1000, I again triggered this issue with a little different error: {code} 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> select a.* from dfs.root.`user/hive/warehouse/passwords_csv_big` a, dfs.root.`user/hive/warehouse/passwords_csv_big` b . . . . . . . . . . . . . . . . . . . . . . .> where a.columns[1]=b.columns[1] limit 5; java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: java.lang.IllegalStateException: Failure while closing accountor. Expected private and shared pools to be set to initial values. However, one or more were not. Stats are zoneinitallocated delta private 100 100 0 shared 00 9998631712 368288. Fragment 2:0 [Error Id: 89bc66e8-b5ec-41fc-bcf2-08e330077138 on h3.poc.com:31010] (java.lang.IllegalStateException) Failure while closing accountor. Expected private and shared pools to be set to initial values. However, one or more were not. Stats are zoneinitallocated delta private 100 100 0 shared 00 9998631712 368288. org.apache.drill.exec.memory.AtomicRemainder.close():200 org.apache.drill.exec.memory.Accountor.close():386 org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.close():325 org.apache.drill.exec.ops.OperatorContextImpl.close():116 org.apache.drill.exec.ops.FragmentContext.suppressingClose():405 org.apache.drill.exec.ops.FragmentContext.close():394 org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():349 org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():175 org.apache.drill.exec.work.fragment.FragmentExecutor.run():293 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1142 java.util.concurrent.ThreadPoolExecutor$Worker.run():617 java.lang.Thread.run():745 at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:77) at sqlline.TableOutputFormat.print(TableOutputFormat.java:106) at sqlline.SqlLine.print(SqlLine.java:1583) at sqlline.Commands.execute(Commands.java:852) at sqlline.Commands.sql(Commands.java:751) at sqlline.SqlLine.dispatch(SqlLine.java:738) at sqlline.SqlLine.begin(SqlLine.java:612) at sqlline.SqlLine.start(SqlLine.java:366) at sqlline.SqlLine.main(SqlLine.java:259) {code} > org.apache.drill.exec.rpc.RpcException: Data not accepted downstream. > - > > Key: DRILL-3110 > URL: https://issues.apache.org/jira/browse/DRILL-3110 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.0.0 > Environment: > select commit_id from sys.version; > ++ > | commit_id | > ++ > | 583ca4a95df2c45b5ba20b517cb1aeed48c7548e | > ++ > 1 row selected (0.098 seconds) >Reporter: Hao Zhu >Assignee: Chris Westin > > Joining two 1G CSV tables resulting in below error: > {code} > > select a.* from dfs.root.`user/hive/warehouse/passwords_csv_big` a, > > dfs.root.`user/hive/warehouse/passwords_csv_big` b > . . . . . . . . . . . . . . . . . . . . . . .> where > a.columns[1]=b.columns[1] limit 5; > ++ > | columns | > ++ > | ["1","787148","92921","158596","17776","896094","2"] | > | ["1","787148","10930","348699","534058","778852","2"] | > | ["1","787148","10930","348699","534058","778852","2"] | > | ["1","787148","10930","348699","534058","778852","2"] | > | ["1","787148","10930","348699","534058","778852","2"] | > java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: > org.apache.drill.exec.rpc.RpcException: Data not accepted downstream. > Fragment 5:15 > [Error Id: dd25cee9-1d1d-4658-9a83-cdefcafb7031 on h3.poc.com:31010] > (org.apache.drill.exec.rpc.RpcException) Data not accepted downstream. > org.apache.drill.exec.ops.StatusHandler.success():54 > org.apache.drill.exec.ops.StatusHandler.success():29 > org.apache.drill.exec.rpc.ListeningCommand$DeferredRpcOutcome.success():55 > org.apache.drill.exec.rpc.ListeningCommand$DeferredRpcOutcome.success():46 > > org.apache.drill.exec.rpc.data.DataTunnel$ThrottlingOutcomeListener.success():133 > > org.apache.drill.exec.rpc.data.DataTunnel$ThrottlingOutcomeListener.success():116 > org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.set():98 > org.apache.drill.exec.rpc.RpcBus$InboundHandler.de
[jira] [Commented] (DRILL-3110) org.apache.drill.exec.rpc.RpcException: Data not accepted downstream.
[ https://issues.apache.org/jira/browse/DRILL-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546362#comment-14546362 ] Hao Zhu commented on DRILL-3110: Seems this is already fixed per DRILL-3061. I tried below latest rpm build, and so far I have not seen this error. {code} > select commit_id from sys.version; +---+ | commit_id | +---+ | d8b19759657698581cc0d01d7038797952888123 | +---+ 1 row selected (0.06 seconds) {code} > org.apache.drill.exec.rpc.RpcException: Data not accepted downstream. > - > > Key: DRILL-3110 > URL: https://issues.apache.org/jira/browse/DRILL-3110 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.0.0 > Environment: > select commit_id from sys.version; > ++ > | commit_id | > ++ > | 583ca4a95df2c45b5ba20b517cb1aeed48c7548e | > ++ > 1 row selected (0.098 seconds) >Reporter: Hao Zhu >Assignee: Chris Westin > > Joining two 1G CSV tables resulting in below error: > {code} > > select a.* from dfs.root.`user/hive/warehouse/passwords_csv_big` a, > > dfs.root.`user/hive/warehouse/passwords_csv_big` b > . . . . . . . . . . . . . . . . . . . . . . .> where > a.columns[1]=b.columns[1] limit 5; > ++ > | columns | > ++ > | ["1","787148","92921","158596","17776","896094","2"] | > | ["1","787148","10930","348699","534058","778852","2"] | > | ["1","787148","10930","348699","534058","778852","2"] | > | ["1","787148","10930","348699","534058","778852","2"] | > | ["1","787148","10930","348699","534058","778852","2"] | > java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: > org.apache.drill.exec.rpc.RpcException: Data not accepted downstream. > Fragment 5:15 > [Error Id: dd25cee9-1d1d-4658-9a83-cdefcafb7031 on h3.poc.com:31010] > (org.apache.drill.exec.rpc.RpcException) Data not accepted downstream. > org.apache.drill.exec.ops.StatusHandler.success():54 > org.apache.drill.exec.ops.StatusHandler.success():29 > org.apache.drill.exec.rpc.ListeningCommand$DeferredRpcOutcome.success():55 > org.apache.drill.exec.rpc.ListeningCommand$DeferredRpcOutcome.success():46 > > org.apache.drill.exec.rpc.data.DataTunnel$ThrottlingOutcomeListener.success():133 > > org.apache.drill.exec.rpc.data.DataTunnel$ThrottlingOutcomeListener.success():116 > org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.set():98 > org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode():243 > org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode():188 > io.netty.handler.codec.MessageToMessageDecoder.channelRead():89 > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339 > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324 > io.netty.handler.timeout.IdleStateHandler.channelRead():254 > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339 > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324 > io.netty.handler.codec.MessageToMessageDecoder.channelRead():103 > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339 > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324 > io.netty.handler.codec.ByteToMessageDecoder.channelRead():242 > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339 > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324 > io.netty.channel.ChannelInboundHandlerAdapter.channelRead():86 > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339 > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324 > io.netty.channel.DefaultChannelPipeline.fireChannelRead():847 > > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady():618 > io.netty.channel.epoll.EpollEventLoop.processReady():329 > io.netty.channel.epoll.EpollEventLoop.run():250 > io.netty.util.concurrent.SingleThreadEventExecutor$2.run():111 > java.lang.Thread.run():745 > at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) > at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) > at sqlline.SqlLine.print(SqlLine.java:1809) > at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) > at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) > at sqlline.SqlLine.dispatch(SqlLine.java:889) > at sqlline.SqlLine.begin(SqlLine.java:763) > at sqlline.SqlLine.start(SqlLine.java:498) > at sqlline.SqlLine.main(SqlLine.java:460) > {code} > It can be workarounded by cha
[jira] [Created] (DRILL-3110) org.apache.drill.exec.rpc.RpcException: Data not accepted downstream.
Hao Zhu created DRILL-3110: -- Summary: org.apache.drill.exec.rpc.RpcException: Data not accepted downstream. Key: DRILL-3110 URL: https://issues.apache.org/jira/browse/DRILL-3110 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.0.0 Environment: > select commit_id from sys.version; ++ | commit_id | ++ | 583ca4a95df2c45b5ba20b517cb1aeed48c7548e | ++ 1 row selected (0.098 seconds) Reporter: Hao Zhu Assignee: Chris Westin Joining two 1G CSV tables resulting in below error: {code} > select a.* from dfs.root.`user/hive/warehouse/passwords_csv_big` a, > dfs.root.`user/hive/warehouse/passwords_csv_big` b . . . . . . . . . . . . . . . . . . . . . . .> where a.columns[1]=b.columns[1] limit 5; ++ | columns | ++ | ["1","787148","92921","158596","17776","896094","2"] | | ["1","787148","10930","348699","534058","778852","2"] | | ["1","787148","10930","348699","534058","778852","2"] | | ["1","787148","10930","348699","534058","778852","2"] | | ["1","787148","10930","348699","534058","778852","2"] | java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: org.apache.drill.exec.rpc.RpcException: Data not accepted downstream. Fragment 5:15 [Error Id: dd25cee9-1d1d-4658-9a83-cdefcafb7031 on h3.poc.com:31010] (org.apache.drill.exec.rpc.RpcException) Data not accepted downstream. org.apache.drill.exec.ops.StatusHandler.success():54 org.apache.drill.exec.ops.StatusHandler.success():29 org.apache.drill.exec.rpc.ListeningCommand$DeferredRpcOutcome.success():55 org.apache.drill.exec.rpc.ListeningCommand$DeferredRpcOutcome.success():46 org.apache.drill.exec.rpc.data.DataTunnel$ThrottlingOutcomeListener.success():133 org.apache.drill.exec.rpc.data.DataTunnel$ThrottlingOutcomeListener.success():116 org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.set():98 org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode():243 org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode():188 io.netty.handler.codec.MessageToMessageDecoder.channelRead():89 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324 io.netty.handler.timeout.IdleStateHandler.channelRead():254 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324 io.netty.handler.codec.MessageToMessageDecoder.channelRead():103 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324 io.netty.handler.codec.ByteToMessageDecoder.channelRead():242 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324 io.netty.channel.ChannelInboundHandlerAdapter.channelRead():86 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead():339 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead():324 io.netty.channel.DefaultChannelPipeline.fireChannelRead():847 io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady():618 io.netty.channel.epoll.EpollEventLoop.processReady():329 io.netty.channel.epoll.EpollEventLoop.run():250 io.netty.util.concurrent.SingleThreadEventExecutor$2.run():111 java.lang.Thread.run():745 at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) at sqlline.SqlLine.print(SqlLine.java:1809) at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) at sqlline.SqlLine.dispatch(SqlLine.java:889) at sqlline.SqlLine.begin(SqlLine.java:763) at sqlline.SqlLine.start(SqlLine.java:498) at sqlline.SqlLine.main(SqlLine.java:460) {code} It can be workarounded by changing drill.exec.buffer.size. My understanding is "drill.exec.buffer.size" can only change the performance, but it should not cause SQL to fail,right? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2927) Pending query in resource queue starts after timeout
[ https://issues.apache.org/jira/browse/DRILL-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Zhu updated DRILL-2927: --- Attachment: Screen Shot 2015-04-30 at 11.07.21 AM.png Pic 2 > Pending query in resource queue starts after timeout > > > Key: DRILL-2927 > URL: https://issues.apache.org/jira/browse/DRILL-2927 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 0.8.0 > Environment: Drill 0.8 released version. >Reporter: Hao Zhu >Assignee: Chris Westin > Attachments: Screen Shot 2015-04-30 at 11.01.25 AM.png, Screen Shot > 2015-04-30 at 11.07.21 AM.png > > > I set small queue to allow only 1 concurrent query: > alter system set `exec.queue.enable`=TRUE; > alter system set `exec.queue.small`=1; > When running 2 small queries, one of them is pending which is expected. > (See pic 1) > After about 5mins(exec.queue.timeout_millis), the pending SQL starts. now we > have 2 queries running in small queue. > (See pic 2) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2927) Pending query in resource queue starts after timeout
[ https://issues.apache.org/jira/browse/DRILL-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Zhu updated DRILL-2927: --- Attachment: Screen Shot 2015-04-30 at 11.01.25 AM.png Pic 1 > Pending query in resource queue starts after timeout > > > Key: DRILL-2927 > URL: https://issues.apache.org/jira/browse/DRILL-2927 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 0.8.0 > Environment: Drill 0.8 released version. >Reporter: Hao Zhu >Assignee: Chris Westin > Attachments: Screen Shot 2015-04-30 at 11.01.25 AM.png > > > I set small queue to allow only 1 concurrent query: > alter system set `exec.queue.enable`=TRUE; > alter system set `exec.queue.small`=1; > When running 2 small queries, one of them is pending which is expected. > (See pic 1) > After about 5mins(exec.queue.timeout_millis), the pending SQL starts. now we > have 2 queries running in small queue. > (See pic 2) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2927) Pending query in resource queue starts after timeout
Hao Zhu created DRILL-2927: -- Summary: Pending query in resource queue starts after timeout Key: DRILL-2927 URL: https://issues.apache.org/jira/browse/DRILL-2927 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 0.8.0 Environment: Drill 0.8 released version. Reporter: Hao Zhu Assignee: Chris Westin Attachments: Screen Shot 2015-04-30 at 11.01.25 AM.png I set small queue to allow only 1 concurrent query: alter system set `exec.queue.enable`=TRUE; alter system set `exec.queue.small`=1; When running 2 small queries, one of them is pending which is expected. (See pic 1) After about 5mins(exec.queue.timeout_millis), the pending SQL starts. now we have 2 queries running in small queue. (See pic 2) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1289) Creating storage plugin for "hdfs:///" failed with "Unable to create/ update plugin: myhdfs"
[ https://issues.apache.org/jira/browse/DRILL-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326353#comment-14326353 ] Hao Zhu commented on DRILL-1289: It is marked as fixed in drill 0.5. Which version are you using and what is the storage plugin are you using? > Creating storage plugin for "hdfs:///" failed with "Unable to create/ update > plugin: myhdfs" > > > Key: DRILL-1289 > URL: https://issues.apache.org/jira/browse/DRILL-1289 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 0.4.0 > Environment: OS: Centos 6.4 > HDFS: CDH5.1 > Drill: 0.4.0 >Reporter: Hao Zhu > Fix For: 0.5.0 > > > In web GUI, I can successfully create a new storage plugin named "myhdfs" > using "file:///": > {code} > { > "type": "file", > "enabled": true, > "connection": "file:///", > "workspaces": { > "root": { > "location": "/", > "writable": false, > "storageformat": null > }, > "tmp": { > "location": "/tmp", > "writable": true, > "storageformat": "csv" > } > }, > "formats": { > "psv": { > "type": "text", > "extensions": [ > "tbl" > ], > "delimiter": "|" > }, > "csv": { > "type": "text", > "extensions": [ > "csv" > ], > "delimiter": "," > }, > "tsv": { > "type": "text", > "extensions": [ > "tsv" > ], > "delimiter": "\t" > }, > "parquet": { > "type": "parquet" > }, > "json": { > "type": "json" > } > } > } > {code} > However if I try to change "file:///" to "hdfs:///" to point to HDFS other > than local file system, drill log errors out "[qtp416200645-67] DEBUG > o.a.d.e.server.rest.StorageResources - Unable to create/ update plugin: > myhdfs". > {code} > { > "type": "file", > "enabled": true, > "connection": "hdfs:///", > "workspaces": { > "root": { > "location": "/", > "writable": false, > "storageformat": null > }, > "tmp": { > "location": "/tmp", > "writable": true, > "storageformat": "csv" > } > }, > "formats": { > "psv": { > "type": "text", > "extensions": [ > "tbl" > ], > "delimiter": "|" > }, > "csv": { > "type": "text", > "extensions": [ > "csv" > ], > "delimiter": "," > }, > "tsv": { > "type": "text", > "extensions": [ > "tsv" > ], > "delimiter": "\t" > }, > "parquet": { > "type": "parquet" > }, > "json": { > "type": "json" > } > } > } > {code} > On my cluster, I am using CDH5 hdfs, and it all client configurations are > valid. For example, on the drillbit server: > {code} > [root@hdm ~]# hdfs dfs -ls / > Found 3 items > drwxr-xr-x - hbase hbase 0 2014-08-04 22:55 /hbase > drwxrwxrwt - hdfs supergroup 0 2014-07-31 16:31 /tmp > drwxr-xr-x - hdfs supergroup 0 2014-07-11 12:06 /user > {code} > Is there anything wrong with the storage plugin syntax for HDFS? > If so, can drill log prints more debug info to show the reason why it failed? > Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1773) Issues when using JAVA code thourgh Drill JDBC driver
[ https://issues.apache.org/jira/browse/DRILL-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305575#comment-14305575 ] Hao Zhu commented on DRILL-1773: Hi Oleg, Thanks for looking into it. So the default behavior in jdbc driver is drill.exec.debug.error_on_leak=true? How do you think of disabling it by default? Thanks, Hao > Issues when using JAVA code thourgh Drill JDBC driver > - > > Key: DRILL-1773 > URL: https://issues.apache.org/jira/browse/DRILL-1773 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 0.6.0, 0.7.0 > Environment: Tested on 0.6R3 >Reporter: Hao Zhu >Assignee: Daniel Barclay (Drill/MapR) > Fix For: 0.8.0 > > Attachments: DrillHandler.patch, DrillJdbcExample.java > > > When executing attached simple JAVA code through Drill JDBC driver(0..6 R3), > the query got executed and returned the correct result, however there are 2 > issues: > 1. It keeps printing DEBUG information. > Is it default behavior or is there any way to disable DEBUG? > eg: > {code} > 13:30:44.702 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator > - Generated: > io.netty.util.internal.__matchers__.io.netty.buffer.ByteBufMatcher > 13:30:44.706 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator > - Generated: > io.netty.util.internal.__matchers__.org.apache.drill.exec.rpc.OutboundRpcMessageMatcher > 13:30:44.708 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator > - Generated: > io.netty.util.internal.__matchers__.org.apache.drill.exec.rpc.InboundRpcMessageMatcher > 13:30:44.717 [Client-1] DEBUG io.netty.util.Recycler - > -Dio.netty.recycler.maxCapacity.default: 262144 > {code} > 2. After the query finished, it seems not close the connection and did not > return to shell prompt. > I have to manually issue "ctrl-C" to stop it. > {code} > 13:31:11.239 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG > org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: > 0x1497d1d0d040839 after 0ms > 13:31:24.573 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG > org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: > 0x1497d1d0d040839 after 1ms > 13:31:37.906 [main-SendThread(xx.xx.xx.xx:5181)] DEBUG > org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: > 0x1497d1d0d040839 after 0ms > ^CAdministrators-MacBook-Pro-40:xxx$ > {code} > > The DrillJdbcExample.java is attached. > Command to run: > {code} > javac DrillJdbcExample.java > java DrillJdbcExample > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2055) Drill should error out for Invalid json file if it has the same map key names.
Hao Zhu created DRILL-2055: -- Summary: Drill should error out for Invalid json file if it has the same map key names. Key: DRILL-2055 URL: https://issues.apache.org/jira/browse/DRILL-2055 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 0.7.0 Reporter: Hao Zhu Assignee: Jinfeng Ni Priority: Minor For json file with same map key names: { "a" : "x", "a" : "y" } Should we consider it invalid json format and error out? Ref: http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object#answer-23195243 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1794) Can not make files with extension "log" to be recognized as json format?
[ https://issues.apache.org/jira/browse/DRILL-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230444#comment-14230444 ] Hao Zhu commented on DRILL-1794: Hi Team, Yep I tried to set it but it failed to save the storage plugin. {code} "log": { "type": "json", "extensions": [ "log" ], }, {code} Or {code} "log": { "type": "json", "extensions": [ "log" ] }, {code} > Can not make files with extension "log" to be recognized as json format? > > > Key: DRILL-1794 > URL: https://issues.apache.org/jira/browse/DRILL-1794 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 0.6.0 > Environment: 0.6R3 >Reporter: Hao Zhu > > If we want to use ".log" as the file extension, and also want it to be > recognized as json format, I tried to use below storage engine , but failed > to read the .log file.. > {code} > "formats": { > "log": { > "type": "json" > }, > "csv": { > "type": "text", > "extensions": [ > "csv" > ], > "delimiter": "," > } > } > {code} > {code} > 0: jdbc:drill:zk=n1a:5181,n2a:5181,n3a:5181> select * from > logtest.`test.json`; > +++++ > | field1 | field2 | field3 | field4 | > +++++ > | data1 | 100.0 | more data1 | 123.001| > +++++ > 1 row selected (0.159 seconds) > 0: jdbc:drill:zk=n1a:5181,n2a:5181,n3a:5181> select * from > logtest.`test.log`; > Query failed: Failure while validating sql : > org.eigenbase.util.EigenbaseContextException: From line 1, column 16 to line > 1, column 22: Table 'logtest.test.log' not found > Error: exception while executing query: Failure while executing query. > (state=,code=0) > {code} > Do we support above requirement? > If so, what is the storage plugin text? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-1794) Can not make files with extension "log" to be recognized as json format?
Hao Zhu created DRILL-1794: -- Summary: Can not make files with extension "log" to be recognized as json format? Key: DRILL-1794 URL: https://issues.apache.org/jira/browse/DRILL-1794 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 0.6.0 Environment: 0.6R3 Reporter: Hao Zhu If we want to use ".log" as the file extension, and also want it to be recognized as json format, I tried to use below storage engine , but failed to read the .log file.. {code} "formats": { "log": { "type": "json" }, "csv": { "type": "text", "extensions": [ "csv" ], "delimiter": "," } } {code} {code} 0: jdbc:drill:zk=n1a:5181,n2a:5181,n3a:5181> select * from logtest.`test.json`; +++++ | field1 | field2 | field3 | field4 | +++++ | data1 | 100.0 | more data1 | 123.001| +++++ 1 row selected (0.159 seconds) 0: jdbc:drill:zk=n1a:5181,n2a:5181,n3a:5181> select * from logtest.`test.log`; Query failed: Failure while validating sql : org.eigenbase.util.EigenbaseContextException: From line 1, column 16 to line 1, column 22: Table 'logtest.test.log' not found Error: exception while executing query: Failure while executing query. (state=,code=0) {code} Do we support above requirement? If so, what is the storage plugin text? -- This message was sent by Atlassian JIRA (v6.3.4#6332)