[jira] [Commented] (DRILL-4310) Memory leak in hash partition sender when query is cancelled

2017-06-26 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064355#comment-16064355
 ] 

Khurram Faraaz commented on DRILL-4310:
---

[~RomanKulyk] There's a repro for a similar leak that occurs in HashJoinPOP, 
you should look at DRILL-5564 for the steps to repro the memory leak.

> Memory leak in hash partition sender when query is cancelled
> 
>
> Key: DRILL-4310
> URL: https://issues.apache.org/jira/browse/DRILL-4310
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 0.5.0
>Reporter: Victoria Markman
> Attachments: 29593ea8-88d2-612e-7c58-aa11652c4072.sys.drill, 
> drillbit.log.133, drillbit.log.134, drillbit.log.135, drillbit.log.136
>
>
> Query got cancelled (still investigating what caused cancellation).
> Here is an excerpt from drillbit.log
> {code}
> 2016-01-26 00:46:29,627 [29593ea8-88d2-612e-7c58-aa11652c4072:frag:2:2] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> Allocator[op:2:2:0:HashPartitionSender] closed with outstanding buffers 
> allocated (4).
> Allocator(op:2:2:0:HashPartitionSender) 100/10240/2140160/100 
> (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 4
> ledger[10892635] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310917183..0, 
> allocatorManager: [7140397, life: 23697371310913697..0] holds 1 buffers.
> DrillBuf[13122380], udle: [7140398 0..4096]
> ledger[10892636] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371311045504..0, 
> allocatorManager: [7140398, life: 23697371311041789..0] holds 1 buffers.
> DrillBuf[13122381], udle: [7140399 0..1024]
> ledger[10892634] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310795164..0, 
> allocatorManager: [7140396, life: 23697371310789988..0] holds 1 buffers.
> DrillBuf[13122379], udle: [7140397 0..4096]
> ledger[10892513] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371288488073..0, 
> allocatorManager: [7140275, life: 23697371288484282..0] holds 1 buffers.
> DrillBuf[13122245], udle: [7140276 0..1024]
>   reservations: 0
> Fragment 2:2
> [Error Id: 043c5d25-c4de-4a70-9cb1-d4987822ee3b on atsqa4-134.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Allocator[op:2:2:0:HashPartitionSender] closed with 
> outstanding buffers allocated (4).
> Allocator(op:2:2:0:HashPartitionSender) 100/10240/2140160/100 
> (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 4
> ledger[10892635] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310917183..0, 
> allocatorManager: [7140397, life: 23697371310913697..0] holds 1 buffers.
> DrillBuf[13122380], udle: [7140398 0..4096]
> ledger[10892636] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371311045504..0, 
> allocatorManager: [7140398, life: 23697371311041789..0] holds 1 buffers.
> DrillBuf[13122381], udle: [7140399 0..1024]
> ledger[10892634] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310795164..0, 
> allocatorManager: [7140396, life: 23697371310789988..0] holds 1 buffers.
> DrillBuf[13122379], udle: [7140397 0..4096]
> ledger[10892513] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371288488073..0, 
> allocatorManager: [7140275, life: 23697371288484282..0] holds 1 buffers.
> DrillBuf[13122245], udle: [7140276 0..1024]
>   reservations: 0
> {code}
> Reproduced twice by running: ./run.sh -s Advanced/tpcds/tpcds_sf100/original 
> -g smoke -t 600 -n 10 -i 100 -m
> Cluster configuration: vanilla, 48GB of memory, 4GB heap.
> Attaching query profile and logs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064034#comment-16064034
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/858
  
@laurentgo  Updated based on the review comments. The Timeout Executor 
Service is now maintained for the lifetime of a connection and closed during 
shutdown. Also refactored by introducing `isTimedOut()` and 
`cancelDueToTimeout()` .



> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064002#comment-16064002
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/851
  
Rebased on latest master.


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064001#comment-16064001
 ] 

ASF GitHub Bot commented on DRILL-5517:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/840
  
Rebased on master. Resolved merge conflict.


> Provide size-aware set operations in value vectors
> --
>
> Key: DRILL-5517
> URL: https://issues.apache.org/jira/browse/DRILL-5517
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is 
> to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") 
> Effort starts by providing "size-aware" set operations in value vectors which:
> * Operate as {{setSafe()}} while vectors are below 16 MB.
> * Throw a new, specific exception ({{VectorOverflowException}}) if setting 
> the value (and growing the vector) would exceed the vector limit.
> The methods in value vectors then become the foundation on which we can 
> construct size-aware record batch "writers."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063455#comment-16063455
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r124068854
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/TimeoutTrigger.java ---
@@ -0,0 +1,105 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.jdbc.impl;
+
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.calcite.avatica.AvaticaStatement;
+import org.apache.drill.exec.rpc.NamedThreadFactory;
+import org.apache.drill.jdbc.SqlTimeoutException;
+
+/**
+ * Timeout Trigger required for canceling of running queries
+ */
+class TimeoutTrigger implements Callable {
+  //Not using the DrillbitContext's ExecutorService as this is threadPool 
is light-weight (threads wake up to cancel tasks) but needs a low response time
+  private static ExecutorService timeoutService;
--- End diff --

The Executor has only sleepign threads. If the Drillbit is shutdown, I'm 
assuming that running queries are cancelled (which implicitly terminates any 
timeout-related threads in this executor), so the Executor will automatically 
be void of any active threads. However, I see your point about having a clean 
way to shut it down. I'll make this as a part of the DrillbitContext, since the 
Executor actually services a running query and, IMO, shouldn't be bound to a 
connection. At the DrillbitContext, I should be able to do a clean shutdown 
without the need for any complex checks. 


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4079) Hive: Filter with a trailing space is not working

2017-06-26 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063438#comment-16063438
 ] 

Rahul Challapalli commented on DRILL-4079:
--

Thanks for looking into this [~vvysotskyi]. I couldn't reproduce this issue as 
well on the latest master (commit_id = 
90f43bff7a01eaaee6c8861137759b05367dfcf3).

> Hive: Filter with a trailing space is not working
> -
>
> Key: DRILL-4079
> URL: https://issues.apache.org/jira/browse/DRILL-4079
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Reporter: Rahul Challapalli
>Assignee: Volodymyr Vysotskyi
>Priority: Critical
>
> The below query does not return the right result
> {code}
> select * from hive.empty_lengthy_p2 where varchar_col=' ';
> +--+--+
> | int_col  | varchar_col  |
> +--+--+
> +--+--+
> No rows selected (0.393 seconds)
> {code}
> Data : 
> {code}
> 1|dhfawriuueiq dshfjklhfiue eiufhwelfhleiruhj ejfwekjlf hsjdkgfhsdjk  hjd 
> hdfkh sdhg dkj hsdhg jds gsdlgd sd hjk sdjhkjdhgsdhg
> 2|jkdshgf jhg sdgj dlsg jsdgjgjkdhgiergergd fgjgioug8945u irjfoiej0930j 
> pofkqpgogogj dogj09g djvkldsjgjgirewoie dkflvsd 
> vkdvskgjiwegjwe;sdkvjsdgfdgksdjgkdjkdjgksjg sdkjgdsjg skdjggj;sdgjd sk;gjsd
> 3|dfg
> 4|sdjklhkhjdfgjhdfgkjhdfkjldfsgjdsfkjhdfmnb,cv
> 5|dfg
> 6|
> 7|jkdshgf jhg sdgj dlsg jsdgjgjkdhgiergergd fgjgioug8945u irjfoiej0930j 
> pofkqpgogogj dogj09g djvkldsjgjgirewoie dkflvsd 
> vkdvskgjiwegjwe;sdkvjsdgfdgksdjgkdjkdjgksjg sdkjgdsjg skdjggj;sdgjd sk;gjsd
> 8|?
> 9|
> 10|?
> {code}
> Hive DDL :
> {code}
> DROP TABLE IF EXISTS empty_lengthy;
> CREATE EXTERNAL TABLE empty_lengthy (
> int_col INT,
> varchar_col STRING
>)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY "|"
> STORED AS TEXTFILE LOCATION 
> "/drill/testdata/partition_pruning/hive/empty_lengthy_partitions.tbl";
> DROP TABLE IF EXISTS empty_lengthy_p2;
> CREATE TABLE empty_lengthy_p2 (
> int_col INT
>)
> PARTITIONED BY (varchar_col STRING)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY "|"
> STORED AS TEXTFILE LOCATION 
> "/drill/testdata/partition_pruning/hive/empty_lengthy_partitions_p2"
> TBLPROPERTIES ("serialization.null.format"="?");
> SET hive.exec.dynamic.partition.mode=true;
> insert overwrite table empty_lengthy_p2 partition (varchar_col)
> select int_col, varchar_col from empty_lengthy;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-4970) Wrong results when casting double to bigint or int

2017-06-26 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-4970:
---

Assignee: Volodymyr Vysotskyi

> Wrong results when casting double to bigint or int
> --
>
> Key: DRILL-4970
> URL: https://issues.apache.org/jira/browse/DRILL-4970
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.8.0
>Reporter: Robert Hou
>Assignee: Volodymyr Vysotskyi
> Attachments: test_table
>
>
> This query returns the wrong result
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> test_table where (int_id > -3025 and bigint_id <= -256) or (cast(double_id as 
> bigint) >= -255 and double_id <= -5);
> +-+
> | EXPR$0  |
> +-+
> | 2769|
> +-+
> Without the cast, it returns the correct result:
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> test_table where (int_id > -3025 and bigint_id <= -256) or (double_id >= -255 
> and double_id <= -5);
> +-+
> | EXPR$0  |
> +-+
> | 3020|
> +-+
> By itself, the result is also correct:
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> test_table where (cast(double_id as bigint) >= -255 and double_id <= -5);
> +-+
> | EXPR$0  |
> +-+
> | 251 |
> +-+



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4310) Memory leak in hash partition sender when query is cancelled

2017-06-26 Thread Roman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063153#comment-16063153
 ] 

Roman commented on DRILL-4310:
--

Can't reproduce this issue on local cluster.

[~vicky] could you please check a reproduce on the latest Drill version? If you 
will get reproduce, could you please provide more information about a reproduce 
scenario?

> Memory leak in hash partition sender when query is cancelled
> 
>
> Key: DRILL-4310
> URL: https://issues.apache.org/jira/browse/DRILL-4310
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 0.5.0
>Reporter: Victoria Markman
> Attachments: 29593ea8-88d2-612e-7c58-aa11652c4072.sys.drill, 
> drillbit.log.133, drillbit.log.134, drillbit.log.135, drillbit.log.136
>
>
> Query got cancelled (still investigating what caused cancellation).
> Here is an excerpt from drillbit.log
> {code}
> 2016-01-26 00:46:29,627 [29593ea8-88d2-612e-7c58-aa11652c4072:frag:2:2] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> Allocator[op:2:2:0:HashPartitionSender] closed with outstanding buffers 
> allocated (4).
> Allocator(op:2:2:0:HashPartitionSender) 100/10240/2140160/100 
> (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 4
> ledger[10892635] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310917183..0, 
> allocatorManager: [7140397, life: 23697371310913697..0] holds 1 buffers.
> DrillBuf[13122380], udle: [7140398 0..4096]
> ledger[10892636] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371311045504..0, 
> allocatorManager: [7140398, life: 23697371311041789..0] holds 1 buffers.
> DrillBuf[13122381], udle: [7140399 0..1024]
> ledger[10892634] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310795164..0, 
> allocatorManager: [7140396, life: 23697371310789988..0] holds 1 buffers.
> DrillBuf[13122379], udle: [7140397 0..4096]
> ledger[10892513] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371288488073..0, 
> allocatorManager: [7140275, life: 23697371288484282..0] holds 1 buffers.
> DrillBuf[13122245], udle: [7140276 0..1024]
>   reservations: 0
> Fragment 2:2
> [Error Id: 043c5d25-c4de-4a70-9cb1-d4987822ee3b on atsqa4-134.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Allocator[op:2:2:0:HashPartitionSender] closed with 
> outstanding buffers allocated (4).
> Allocator(op:2:2:0:HashPartitionSender) 100/10240/2140160/100 
> (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 4
> ledger[10892635] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310917183..0, 
> allocatorManager: [7140397, life: 23697371310913697..0] holds 1 buffers.
> DrillBuf[13122380], udle: [7140398 0..4096]
> ledger[10892636] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371311045504..0, 
> allocatorManager: [7140398, life: 23697371311041789..0] holds 1 buffers.
> DrillBuf[13122381], udle: [7140399 0..1024]
> ledger[10892634] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310795164..0, 
> allocatorManager: [7140396, life: 23697371310789988..0] holds 1 buffers.
> DrillBuf[13122379], udle: [7140397 0..4096]
> ledger[10892513] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371288488073..0, 
> allocatorManager: [7140275, life: 23697371288484282..0] holds 1 buffers.
> DrillBuf[13122245], udle: [7140276 0..1024]
>   reservations: 0
> {code}
> Reproduced twice by running: ./run.sh -s Advanced/tpcds/tpcds_sf100/original 
> -g smoke -t 600 -n 10 -i 100 -m
> Cluster configuration: vanilla, 48GB of memory, 4GB heap.
> Attaching query profile and logs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4595) FragmentExecutor.fail() should interrupt the fragment thread to avoid possible query hangs

2017-06-26 Thread Roman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063113#comment-16063113
 ] 

Roman commented on DRILL-4595:
--

I tried steps from previous comment on Drill after DRILL-5599 fix and it seems 
the problem was solved.

After step 2) I got  
{code:sql}
Error: DATA_READ ERROR: Error reading page data

File:  /drill/testdata/tpcds_sf100/parquet/web_sales/0_0_7.parquet
Column:  ws_ship_hdemo_sk
Row Group Start:  7836969
Fragment 1:1

[Error Id: a8ca60e9-5ef7-42b7-b93a-fd7c69c06aef on node1:31010] (state=,code=0)
{code}

Drillbit was running (I did not get drillbit down) and table files were cleaned 
up. Also as I see in UI, the query correctly finished with FAILED state. 
Information from logs:

{code:sql}
2017-06-26 13:35:30,518 [26aef278-acd0-2649-502d-636b78c58f66:frag:1:1] INFO  
o.a.d.e.s.p.c.AsyncPageReader - User Error Occurred: Error reading page data 
(Failure allocating buffer.)
org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Error 
reading page data

File:  /drill/testdata/tpcds_sf100/parquet/web_sales/0_0_7.parquet
Column:  ws_ship_hdemo_sk
Row Group Start:  7836969

[Error Id: a8ca60e9-5ef7-42b7-b93a-fd7c69c06aef ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
 ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.handleAndThrowException(AsyncPageReader.java:185)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.nextInternal(AsyncPageReader.java:273)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.columnreaders.PageReader.next(PageReader.java:307)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.columnreaders.NullableColumnReader.processPages(NullableColumnReader.java:69)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readAllFixedFieldsSerial(BatchReader.java:63)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readAllFixedFields(BatchReader.java:56)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader$FixedWidthReader.readRecords(BatchReader.java:141)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readBatch(BatchReader.java:42)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:297)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:180) 
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:133)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:133)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT

[jira] [Comment Edited] (DRILL-5256) Exception in thread "main" java.lang.ExceptionInInitializerError

2017-06-26 Thread N Campbell (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061413#comment-16061413
 ] 

N Campbell edited comment on DRILL-5256 at 6/26/17 1:38 PM:


The Apache JDBC drill will load if one explicitly sets 
-Doadd.io.netty.allocator.numDirectArenas= as a workaround.

This work around does not help the MAPR JDBC driver where you have to use 
-Dio.netty.allocator.numDirectArenas

https://netty.io/4.1/api/io/netty/buffer/PooledByteBufAllocator.html




was (Author: the6campbells):
The Apache JDBC drill will load if one explicitly sets 
-Doadd.io.netty.allocator.numDirectArenas= as a workaround.

This work around does not help the MAPR JDBC driver

https://netty.io/4.1/api/io/netty/buffer/PooledByteBufAllocator.html



> Exception in thread "main" java.lang.ExceptionInInitializerError
> 
>
> Key: DRILL-5256
> URL: https://issues.apache.org/jira/browse/DRILL-5256
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.7.0, 1.8.0
> Environment: Windows 7, IBM SDK 1.6 and IBM SDK 1.7
>Reporter: Vasu
>  Labels: jvm
>
> Below error while connecting to Drill Server 
> Exception in thread "main" java.lang.ExceptionInInitializerError
> at java.lang.J9VMInternals.initialize(J9VMInternals.java:257)
> at 
> oadd.org.apache.drill.exec.memory.BaseAllocator.(BaseAllocator.java:44)
> at java.lang.J9VMInternals.initializeImpl(Native Method)
> at java.lang.J9VMInternals.initialize(J9VMInternals.java:235)
> at java.lang.J9VMInternals.initialize(J9VMInternals.java:202)
> at 
> oadd.org.apache.drill.exec.memory.RootAllocatorFactory.newRoot(RootAllocatorFactory.java:38)
> at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:143)
> at 
> org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:64)
> at org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:69)
> at 
> oadd.net.hydromatic.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:126)
> at org.apache.drill.jdbc.Driver.connect(Driver.java:72)
> at java.sql.DriverManager.getConnection(DriverManager.java:583)
> at java.sql.DriverManager.getConnection(DriverManager.java:245)
> at com.trianz.drill.ApacheDrillDemo.main(ApacheDrillDemo.java:13)
> Caused by: java.lang.NullPointerException
> at 
> oadd.io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.(PooledByteBufAllocatorL.java:93)
> at 
> oadd.io.netty.buffer.PooledByteBufAllocatorL.(PooledByteBufAllocatorL.java:56)
> at 
> oadd.org.apache.drill.exec.memory.AllocationManager.(AllocationManager.java:60)
> at java.lang.J9VMInternals.initializeImpl(Native Method)
> at java.lang.J9VMInternals.initialize(J9VMInternals.java:235)
> ... 13 more
> When I tried to debug in to source code, following is the place where we are 
> getting NULL POINTER EXCEPTION
>  
> drill/exec/memory/base/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java
>  
> Line: 93: this.chunkSize = directArenas[0].chunkSize;
> Below is the code snapshot.
>  public InnerAllocator() {
>   super(true);
>   try {
> Field f = 
> PooledByteBufAllocator.class.getDeclaredField("directArenas");
> f.setAccessible(true);
> this.directArenas = (PoolArena[]) f.get(this);
>   } catch (Exception e) {
> throw new RuntimeException("Failure while initializing allocator.  
> Unable to retrieve direct arenas field.", e);
>   }
>   this.chunkSize = directArenas[0].chunkSize;
> if (memoryLogger.isTraceEnabled()) {
> Can anyone please help on this? Thanks in advance.
> Thanks,
> Vasu T



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4595) FragmentExecutor.fail() should interrupt the fragment thread to avoid possible query hangs

2017-06-26 Thread Roman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063080#comment-16063080
 ] 

Roman commented on DRILL-4595:
--

I tried to catch some issues with CTAS queries and got something.

Steps:
1) set small DRILL_HEAP and DRILL_MAX_DIRECT_MEMORY to drill-env.sh
{code:sql}
Example:
export DRILL_HEAP=${DRILL_HEAP:-"1G"}
export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"1G"}
{code}
2) run long CTAS query
{code:sql}
Example:
CREATE TABLE dfs.tmp.table3 AS SELECT * FROM 
dfs.tpcds_sf1_parquet_views.web_sales;
{code}

After that drillbit fails (process was killed) with error:
{code:sql}
Error: CONNECTION ERROR: Connection /192.168.121.7:47697 <--> 
node1/192.168.121.7:31010 (user client) closed unexpectedly. Drillbit down?


[Error Id: 3de27393-8f21-4869-acd3-c4a14d01ed44 ] (state=,code=0)
{code}
Information from drillbit.log:
{code:sql}
2017-06-26 13:02:53,062 [26aefa29-490b-e807-d093-548607458d28:frag:1:0] ERROR 
o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, exiting. 
Information message: Unable to handle out of memory condition in 
FragmentExecutor.
java.lang.OutOfMemoryError: Java heap space
at java.util.AbstractList.iterator(AbstractList.java:288) 
~[na:1.8.0_131]
at 
org.apache.parquet.bytes.BytesInput$SequenceBytesIn.writeAllTo(BytesInput.java:263)
 ~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at org.apache.parquet.bytes.BytesInput.toByteArray(BytesInput.java:174) 
~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.bytes.ConcatenatingByteArrayCollector.collect(ConcatenatingByteArrayCollector.java:33)
 ~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:118)
 ~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:154)
 ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.column.impl.ColumnWriterV1.accountForValueWritten(ColumnWriterV1.java:115)
 ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.column.impl.ColumnWriterV1.write(ColumnWriterV1.java:187) 
~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addDouble(MessageColumnIO.java:483)
 ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.drill.exec.store.ParquetOutputRecordWriter$NullableFloat8ParquetConverter.writeField(ParquetOutputRecordWriter.java:970)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.store.EventBasedRecordWriter.write(EventBasedRecordWriter.java:65)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:106)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) 
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95) 
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at java.security.AccessController.doPrivileged(Native Method) 
~[na:1.8.0_131]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_131]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
 ~[hadoop-common-2.7.0-mapr-1607.jar:na]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_131]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
2017-06-26 13:02:53,924 [26aefa29-490b-e807-d093-548607458d28:frag:1:1

[jira] [Comment Edited] (DRILL-4079) Hive: Filter with a trailing space is not working

2017-06-26 Thread Volodymyr Vysotskyi (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062857#comment-16062857
 ] 

Volodymyr Vysotskyi edited comment on DRILL-4079 at 6/26/17 10:40 AM:
--

[~rkins] I don't see these spaces in the second column in the Data. Without 
them, Drill returns the same results as Hive:
{noformat}
hive> select * from empty_lengthy_p2 where varchar_col=' ';
OK
Time taken: 0.329 seconds
{noformat}
When I am inserting spaces manually in the Data, Drill returns correct result:
{noformat}
+--+--+
| int_col  | varchar_col  |
+--+--+
| 6|  |
| 9|  |
+--+--+
{noformat}
Data with the spaces:
{noformat}
1|dhfawriuueiq dshfjklhfiue eiufhwelfhleiruhj ejfwekjlf $
2|jkdshgf jhg sdgj dlsg jsdgjgjkdhgiergergd fgjgioug$
3|dfg
4|sdjklhkhjdfgjhdfgkjhdfkjldfsgjdsfkjhdfmnb,cv
5|dfg
6| 
7|jkdshgf jhg sdgj dlsg jsdgjgjkdhgiergergd fgjgioug$
8|?
9| 
10|?
{noformat}
My steps:
1. create empty_lengthy table;
2. put the data file in the folder 
/drill/testdata/partition_pruning/hive/empty_lengthy_partitions.tbl
3. create empty_lengthy_p2 table;
4. SET hive.exec.dynamic.partition.mode=true;
5. insert data from empty_lengthy table to the empty_lengthy_p2.

Could you please check that Data is displayed in the Jira correctly and this 
bug is still reproduced? 


was (Author: vvysotskyi):
[~rkins] I don't see these spaces in the second column in the Data. Without 
them, Drill returns the same results as Hive. When I am inserting spaces 
manually in the Data, Drill returns correct result. 

Could you please check that Data is displayed in the Jira correctly and this 
bug is still reproduced?

> Hive: Filter with a trailing space is not working
> -
>
> Key: DRILL-4079
> URL: https://issues.apache.org/jira/browse/DRILL-4079
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Reporter: Rahul Challapalli
>Assignee: Volodymyr Vysotskyi
>Priority: Critical
>
> The below query does not return the right result
> {code}
> select * from hive.empty_lengthy_p2 where varchar_col=' ';
> +--+--+
> | int_col  | varchar_col  |
> +--+--+
> +--+--+
> No rows selected (0.393 seconds)
> {code}
> Data : 
> {code}
> 1|dhfawriuueiq dshfjklhfiue eiufhwelfhleiruhj ejfwekjlf hsjdkgfhsdjk  hjd 
> hdfkh sdhg dkj hsdhg jds gsdlgd sd hjk sdjhkjdhgsdhg
> 2|jkdshgf jhg sdgj dlsg jsdgjgjkdhgiergergd fgjgioug8945u irjfoiej0930j 
> pofkqpgogogj dogj09g djvkldsjgjgirewoie dkflvsd 
> vkdvskgjiwegjwe;sdkvjsdgfdgksdjgkdjkdjgksjg sdkjgdsjg skdjggj;sdgjd sk;gjsd
> 3|dfg
> 4|sdjklhkhjdfgjhdfgkjhdfkjldfsgjdsfkjhdfmnb,cv
> 5|dfg
> 6|
> 7|jkdshgf jhg sdgj dlsg jsdgjgjkdhgiergergd fgjgioug8945u irjfoiej0930j 
> pofkqpgogogj dogj09g djvkldsjgjgirewoie dkflvsd 
> vkdvskgjiwegjwe;sdkvjsdgfdgksdjgkdjkdjgksjg sdkjgdsjg skdjggj;sdgjd sk;gjsd
> 8|?
> 9|
> 10|?
> {code}
> Hive DDL :
> {code}
> DROP TABLE IF EXISTS empty_lengthy;
> CREATE EXTERNAL TABLE empty_lengthy (
> int_col INT,
> varchar_col STRING
>)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY "|"
> STORED AS TEXTFILE LOCATION 
> "/drill/testdata/partition_pruning/hive/empty_lengthy_partitions.tbl";
> DROP TABLE IF EXISTS empty_lengthy_p2;
> CREATE TABLE empty_lengthy_p2 (
> int_col INT
>)
> PARTITIONED BY (varchar_col STRING)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY "|"
> STORED AS TEXTFILE LOCATION 
> "/drill/testdata/partition_pruning/hive/empty_lengthy_partitions_p2"
> TBLPROPERTIES ("serialization.null.format"="?");
> SET hive.exec.dynamic.partition.mode=true;
> insert overwrite table empty_lengthy_p2 partition (varchar_col)
> select int_col, varchar_col from empty_lengthy;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-4720) MINDIR() and IMINDIR() functions return no results with metadata cache

2017-06-26 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-4720:
---

Assignee: Arina Ielchiieva

> MINDIR() and IMINDIR() functions return no results with metadata cache
> --
>
> Key: DRILL-4720
> URL: https://issues.apache.org/jira/browse/DRILL-4720
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.7.0
>Reporter: Krystal
>Assignee: Arina Ielchiieva
>
> Parquet directories with meta data cache return 0 rows for MINDIR and IMINDIR 
> functions.
> hadoop fs -ls /tmp/querylogs_4
> Found 6 items
> -rwxr-xr-x   3 mapr mapr  15406 2016-06-13 10:18 
> /tmp/querylogs_4/.drill.parquet_metadata
> drwxr-xr-x   - root root  4 2016-06-13 10:18 /tmp/querylogs_4/1985
> drwxr-xr-x   - root root  3 2016-06-13 10:18 /tmp/querylogs_4/1999
> drwxr-xr-x   - root root  3 2016-06-13 10:18 /tmp/querylogs_4/2005
> drwxr-xr-x   - root root  4 2016-06-13 10:18 /tmp/querylogs_4/2014
> drwxr-xr-x   - root root  6 2016-06-13 10:18 /tmp/querylogs_4/2016
> hadoop fs -ls /tmp/querylogs_4/1985
> Found 4 items
> -rwxr-xr-x   3 mapr mapr   3634 2016-06-13 10:18 
> /tmp/querylogs_4/1985/.drill.parquet_metadata
> drwxr-xr-x   - root root  2 2016-06-13 10:18 /tmp/querylogs_4/1985/Feb
> drwxr-xr-x   - root root  2 2016-06-13 10:18 /tmp/querylogs_4/1985/apr
> drwxr-xr-x   - root root  2 2016-06-13 10:18 
> /tmp/querylogs_4/1985/jan 
> SELECT * FROM `dfs.tmp`.`querylogs_4` WHERE dir0 = 
> MINDIR('dfs.tmp','querylogs_4');
> +---+---+--+---++++---+---+---+
> | voter_id  | name  | age  | registration  | contributions  | voterzone  | 
> date_time  | dir0  | dir1  | dir2  |
> +---+---+--+---++++---+---+---+
> +---+---+--+---++++---+---+---+
> No rows selected (0.803 seconds)
> If the meta cache is removed, expected data is returned.
> Here is the physical plan:
> {code}
> 00-00Screen : rowType = RecordType(ANY *): rowcount = 3.75, cumulative 
> cost = {54.125 rows, 169.125 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
> 664191
> 00-01  Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 3.75, 
> cumulative cost = {53.75 rows, 168.75 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
> id = 664190
> 00-02Project(T51¦¦*=[$0]) : rowType = RecordType(ANY T51¦¦*): 
> rowcount = 3.75, cumulative cost = {53.75 rows, 168.75 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 664189
> 00-03  SelectionVectorRemover : rowType = RecordType(ANY T51¦¦*, ANY 
> dir0): rowcount = 3.75, cumulative cost = {53.75 rows, 168.75 cpu, 0.0 io, 
> 0.0 network, 0.0 memory}, id = 664188
> 00-04Filter(condition=[=($1, '.drill.parquet_metadata')]) : 
> rowType = RecordType(ANY T51¦¦*, ANY dir0): rowcount = 3.75, cumulative cost 
> = {50.0 rows, 165.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 664187
> 00-05  Project(T51¦¦*=[$0], dir0=[$1]) : rowType = RecordType(ANY 
> T51¦¦*, ANY dir0): rowcount = 25.0, cumulative cost = {25.0 rows, 50.0 cpu, 
> 0.0 io, 0.0 network, 0.0 memory}, id = 664186
> 00-06Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=/tmp/querylogs_4/2005/May/voter25.parquet/0_0_0.parquet]], 
> selectionRoot=/tmp/querylogs_4, numFiles=1, usedMetadataFile=true, 
> columns=[`*`]]]) : rowType = (DrillRecordRow[*, dir0]): rowcount = 25.0, 
> cumulative cost = {25.0 rows, 50.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 664185
> {code}
> Here is the plan for the same query against the same directory structure 
> without meta data cache:
> {code}
> 00-00Screen : rowType = RecordType(ANY *): rowcount = 75.0, cumulative 
> cost = {82.5 rows, 157.5 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 664312
> 00-01  Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 75.0, 
> cumulative cost = {75.0 rows, 150.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 664311
> 00-02Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 75.0, 
> cumulative cost = {75.0 rows, 150.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 664310
> 00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///tmp/querylogs_1/1985/Feb/voter10.parquet/0_0_0.parquet], 
> ReadEntryWithPath 
> [path=maprfs:///tmp/querylogs_1/1985/jan/voter5.parquet/0_0_0.parquet], 
> ReadEntryWithPath 
> [path=maprfs:///tmp/querylogs_1/1985/apr/voter65.parquet/0_0_0.parquet]], 
> selectionRoot=maprfs:/tmp/querylogs_1, numFiles=3, usedMetadataFile=fa

[jira] [Assigned] (DRILL-4722) date_add() function returns incorrect result with interval hour, minute and second

2017-06-26 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-4722:
---

Assignee: Volodymyr Vysotskyi

> date_add() function returns incorrect result with interval hour, minute and 
> second  
> 
>
> Key: DRILL-4722
> URL: https://issues.apache.org/jira/browse/DRILL-4722
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Krystal
>Assignee: Volodymyr Vysotskyi
> Fix For: Future
>
>
> The following query returns the same data for the second column as the first:
> select date_add(cast('2015-01-24 07:27:05.0' as timestamp), interval '3' 
> HOUR), date_add(cast('2015-01-24 07:27:05.0' as timestamp), interval '5' 
> HOUR) from (values(1));
> +++
> | EXPR$0 | EXPR$1 |
> +++
> | 2015-01-24 10:27:05.0  | 2015-01-24 10:27:05.0  |
> +++
> If each column is run separately, then it produces correct result:
> select date_add(cast('2015-01-24 07:27:05.0' as timestamp), interval '3' 
> HOUR) from (values(1));
> ++
> | EXPR$0 |
> ++
> | 2015-01-24 10:27:05.0  |
> ++
> select date_add(cast('2015-01-24 07:27:05.0' as timestamp), interval '5' 
> HOUR) from (values(1));
> ++
> | EXPR$0 |
> ++
> | 2015-01-24 12:27:05.0  |
> ++
> Same problem is seen for interval of minute and second:
> select date_add(cast('2015-01-24 07:27:05.0' as timestamp), interval '50' 
> MINUTE), date_add(cast('2015-01-24 07:27:05.0' as timestamp), interval '40' 
> MINUTE) from (values(1));
> +++
> | EXPR$0 | EXPR$1 |
> +++
> | 2015-01-24 08:17:05.0  | 2015-01-24 08:17:05.0  |
> +++
> select date_add(cast('2015-01-24 07:27:05.0' as timestamp), interval '50' 
> second), date_add(cast('2015-01-24 07:27:05.0' as timestamp), interval '40' 
> second) from (values(1));
> +++
> | EXPR$0 | EXPR$1 |
> +++
> | 2015-01-24 07:27:55.0  | 2015-01-24 07:27:55.0  |
> +++
> select date_add(cast('2015-01-24 07:27:05.0' as timestamp), interval '3' 
> HOUR), date_add(cast('2015-01-24 07:27:05.0' as timestamp), interval '50' 
> MINUTE), date_add(cast('2015-01-24 07:27:05.0' as timestamp), interval '50' 
> second) from (values(1));
> ++++
> | EXPR$0 | EXPR$1 | EXPR$2 |
> ++++
> | 2015-01-24 10:27:05.0  | 2015-01-24 10:27:05.0  | 2015-01-24 10:27:05.0  |
> ++++



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-4079) Hive: Filter with a trailing space is not working

2017-06-26 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-4079:
---

Assignee: Volodymyr Vysotskyi

> Hive: Filter with a trailing space is not working
> -
>
> Key: DRILL-4079
> URL: https://issues.apache.org/jira/browse/DRILL-4079
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Reporter: Rahul Challapalli
>Assignee: Volodymyr Vysotskyi
>Priority: Critical
>
> The below query does not return the right result
> {code}
> select * from hive.empty_lengthy_p2 where varchar_col=' ';
> +--+--+
> | int_col  | varchar_col  |
> +--+--+
> +--+--+
> No rows selected (0.393 seconds)
> {code}
> Data : 
> {code}
> 1|dhfawriuueiq dshfjklhfiue eiufhwelfhleiruhj ejfwekjlf hsjdkgfhsdjk  hjd 
> hdfkh sdhg dkj hsdhg jds gsdlgd sd hjk sdjhkjdhgsdhg
> 2|jkdshgf jhg sdgj dlsg jsdgjgjkdhgiergergd fgjgioug8945u irjfoiej0930j 
> pofkqpgogogj dogj09g djvkldsjgjgirewoie dkflvsd 
> vkdvskgjiwegjwe;sdkvjsdgfdgksdjgkdjkdjgksjg sdkjgdsjg skdjggj;sdgjd sk;gjsd
> 3|dfg
> 4|sdjklhkhjdfgjhdfgkjhdfkjldfsgjdsfkjhdfmnb,cv
> 5|dfg
> 6|
> 7|jkdshgf jhg sdgj dlsg jsdgjgjkdhgiergergd fgjgioug8945u irjfoiej0930j 
> pofkqpgogogj dogj09g djvkldsjgjgirewoie dkflvsd 
> vkdvskgjiwegjwe;sdkvjsdgfdgksdjgkdjkdjgksjg sdkjgdsjg skdjggj;sdgjd sk;gjsd
> 8|?
> 9|
> 10|?
> {code}
> Hive DDL :
> {code}
> DROP TABLE IF EXISTS empty_lengthy;
> CREATE EXTERNAL TABLE empty_lengthy (
> int_col INT,
> varchar_col STRING
>)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY "|"
> STORED AS TEXTFILE LOCATION 
> "/drill/testdata/partition_pruning/hive/empty_lengthy_partitions.tbl";
> DROP TABLE IF EXISTS empty_lengthy_p2;
> CREATE TABLE empty_lengthy_p2 (
> int_col INT
>)
> PARTITIONED BY (varchar_col STRING)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY "|"
> STORED AS TEXTFILE LOCATION 
> "/drill/testdata/partition_pruning/hive/empty_lengthy_partitions_p2"
> TBLPROPERTIES ("serialization.null.format"="?");
> SET hive.exec.dynamic.partition.mode=true;
> insert overwrite table empty_lengthy_p2 partition (varchar_col)
> select int_col, varchar_col from empty_lengthy;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4079) Hive: Filter with a trailing space is not working

2017-06-26 Thread Volodymyr Vysotskyi (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062857#comment-16062857
 ] 

Volodymyr Vysotskyi commented on DRILL-4079:


[~rkins] I don't see these spaces in the second column in the Data. Without 
them, Drill returns the same results as Hive. When I am inserting spaces 
manually in the Data, Drill returns correct result. 

Could you please check that Data is displayed in the Jira correctly and this 
bug is still reproduced?

> Hive: Filter with a trailing space is not working
> -
>
> Key: DRILL-4079
> URL: https://issues.apache.org/jira/browse/DRILL-4079
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Reporter: Rahul Challapalli
>Priority: Critical
>
> The below query does not return the right result
> {code}
> select * from hive.empty_lengthy_p2 where varchar_col=' ';
> +--+--+
> | int_col  | varchar_col  |
> +--+--+
> +--+--+
> No rows selected (0.393 seconds)
> {code}
> Data : 
> {code}
> 1|dhfawriuueiq dshfjklhfiue eiufhwelfhleiruhj ejfwekjlf hsjdkgfhsdjk  hjd 
> hdfkh sdhg dkj hsdhg jds gsdlgd sd hjk sdjhkjdhgsdhg
> 2|jkdshgf jhg sdgj dlsg jsdgjgjkdhgiergergd fgjgioug8945u irjfoiej0930j 
> pofkqpgogogj dogj09g djvkldsjgjgirewoie dkflvsd 
> vkdvskgjiwegjwe;sdkvjsdgfdgksdjgkdjkdjgksjg sdkjgdsjg skdjggj;sdgjd sk;gjsd
> 3|dfg
> 4|sdjklhkhjdfgjhdfgkjhdfkjldfsgjdsfkjhdfmnb,cv
> 5|dfg
> 6|
> 7|jkdshgf jhg sdgj dlsg jsdgjgjkdhgiergergd fgjgioug8945u irjfoiej0930j 
> pofkqpgogogj dogj09g djvkldsjgjgirewoie dkflvsd 
> vkdvskgjiwegjwe;sdkvjsdgfdgksdjgkdjkdjgksjg sdkjgdsjg skdjggj;sdgjd sk;gjsd
> 8|?
> 9|
> 10|?
> {code}
> Hive DDL :
> {code}
> DROP TABLE IF EXISTS empty_lengthy;
> CREATE EXTERNAL TABLE empty_lengthy (
> int_col INT,
> varchar_col STRING
>)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY "|"
> STORED AS TEXTFILE LOCATION 
> "/drill/testdata/partition_pruning/hive/empty_lengthy_partitions.tbl";
> DROP TABLE IF EXISTS empty_lengthy_p2;
> CREATE TABLE empty_lengthy_p2 (
> int_col INT
>)
> PARTITIONED BY (varchar_col STRING)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY "|"
> STORED AS TEXTFILE LOCATION 
> "/drill/testdata/partition_pruning/hive/empty_lengthy_partitions_p2"
> TBLPROPERTIES ("serialization.null.format"="?");
> SET hive.exec.dynamic.partition.mode=true;
> insert overwrite table empty_lengthy_p2 partition (varchar_col)
> select int_col, varchar_col from empty_lengthy;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result

2017-06-26 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062855#comment-16062855
 ] 

Arina Ielchiieva commented on DRILL-4735:
-

>From the discussion with [~jni]:
{quote}
Returning -1 for implicit columns would solve this problem, but it would 
regress for "select count(nonExistCol)".  Basically, 4 types of  column count 
statistics: 
1) column exists and meta data has the statistics: return correct stat
2) column exists and no meta data : return -1 
3) column does not exist ==> count(nonExistCol) = 0, return 0
4) implicit columns : parquet meta data does not have such column. But such 
columns do exists : currently return incorrect 0. should return -1. 

The ideal solution is to differentiate case 3 and case 4.  If we could not find 
ideal solution, we then have no choice but consider something else.
{quote}

> Count(dir0) on parquet returns 0 result
> ---
>
> Key: DRILL-4735
> URL: https://issues.apache.org/jira/browse/DRILL-4735
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0
>Reporter: Krystal
>Assignee: Jinfeng Ni
>Priority: Critical
>
> Selecting a count of dir0, dir1, etc against a parquet directory returns 0 
> rows.
> select count(dir0) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> select count(dir1) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> If I put both dir0 and dir1 in the same select, it returns expected result:
> select count(dir0), count(dir1) from `min_max_dir`;
> +-+-+
> | EXPR$0  | EXPR$1  |
> +-+-+
> | 600 | 600 |
> +-+-+
> Here is the physical plan for count(dir0) query:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, 
> cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 1346
> 00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1345
> 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1344
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns
>  = null, isStarQuery = false, isSkipQuery = false]]) : rowType = 
> RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343
> {code}
> Here is part of the explain plan for the count(dir0) and count(dir1) in the 
> same select:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): 
> rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 1623
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT 
> EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, 
> 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : 
> rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, 
> cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 1621
> 00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],...,
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]],
>  selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, 
> usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = 
> RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 
> rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620
> {code}
> Notice that in the first case, 
> "org.apache.drill.exec.store.pojo.PojoRecordReader" is used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result

2017-06-26 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062851#comment-16062851
 ] 

Arina Ielchiieva commented on DRILL-4735:
-

Looks like the problem is with {{ConvertCountToDirectScan}} rule when we [check 
number of null values in 
column|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java#L140
 ]. {{oldGrpScan.getColumnValueCount(SchemaPath.getSimplePath(columnName))}} 
will [return 0 if column does not 
exist|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java#L1052],
 
It will also return 0 if column has only null values. In case of dir0 or any 
other file system partition or implicit columns they are not present in 
{{columnValueCounts}} map.
It’s good idea to convert to Direct Scan when 
{{oldGrpScan.getColumnValueCount}} returns 0, since count will return 0 anyway 
and we won’t have to spend time reading all table files.
We might return -1 for the cases when column is not found and read all table 
files. This will work totally fine for file system partition and implicit 
columns but if column doesn’t exist for real we’ll read all table files in vein.
Unfortunately we can’t find out if column is file system partition or implicit 
in {{ConvertCountToDirectScan}} since we don’t have access to session 
{{OptionManager}} where current file system partition and implicit columns 
names are stored (you know, they can be changed at runtime). In 
{{ParquetGroupScan}} we do have access to {{OptionManager}} using 
[{{formatPlugin.getContext().getOptionManager()}}|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java#L203]
 but this is system option manager and it doesn’t hold information about 
session options (current file system partition and implicit columns names can 
be changed at session level).


> Count(dir0) on parquet returns 0 result
> ---
>
> Key: DRILL-4735
> URL: https://issues.apache.org/jira/browse/DRILL-4735
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0
>Reporter: Krystal
>Assignee: Jinfeng Ni
>Priority: Critical
>
> Selecting a count of dir0, dir1, etc against a parquet directory returns 0 
> rows.
> select count(dir0) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> select count(dir1) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> If I put both dir0 and dir1 in the same select, it returns expected result:
> select count(dir0), count(dir1) from `min_max_dir`;
> +-+-+
> | EXPR$0  | EXPR$1  |
> +-+-+
> | 600 | 600 |
> +-+-+
> Here is the physical plan for count(dir0) query:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, 
> cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 1346
> 00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1345
> 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1344
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns
>  = null, isStarQuery = false, isSkipQuery = false]]) : rowType = 
> RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343
> {code}
> Here is part of the explain plan for the count(dir0) and count(dir1) in the 
> same select:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): 
> rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 1623
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT 
> EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, 
> 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : 
> rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, 
> cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 1621
> 00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/test

[jira] [Updated] (DRILL-5609) Resources leak on parquet table when the query hangs with CANCELLATION_REQUESTED state

2017-06-26 Thread Roman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman updated DRILL-5609:
-
Attachment: Cancellation_requested_parquet_1.jpg
Cancellation_requested_parquet_2.jpg
ConcurrencyTest.java
drillbit.log
jstack.log

> Resources leak on parquet table when the query hangs with 
> CANCELLATION_REQUESTED state
> --
>
> Key: DRILL-5609
> URL: https://issues.apache.org/jira/browse/DRILL-5609
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Roman
> Attachments: Cancellation_requested_parquet_1.jpg, 
> Cancellation_requested_parquet_2.jpg, ConcurrencyTest.java, drillbit.log, 
> jstack.log
>
>
> I tried to run tpcds_sf100-query2 on parquet table in 10 concurrency threads 
> on single node drillbit cluster (I use Drill with DRILL-5599 fix) and caught 
> a resources leak. The query hanged in CANCELLATION_REQUESTED state.
> Steps to reproduce:
> 1) Start ConcurrencyTest.java with tpcds_sf100-query2 on parquet table (in 
> attachment);
> 2) Wait 3-5 seconds and make Ctrl+c to kill a client.
> 3) Retry step 2) several times until you get "CANCELLATION_REQUESTED" on some 
> queries.
> Queries will hang until drillbit restart. If we make "top", we can see that 
> drillbit uses CPU.
> Jstack example:
> {code:xml}
> "26af36b2-7a44-5af8-e0c3-95a4f132fc7a:frag:14:1" #1268 daemon prio=10 
> os_prio=0 tid=0x7f25a5afa800 nid=0x16f2 runnable [0x7f2535a5a000]
>java.lang.Thread.State: RUNNABLE
>   at java.lang.Throwable.fillInStackTrace(Native Method)
>   at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
>   - locked <0x000728ca82b0> (a java.lang.InterruptedException)
>   at java.lang.Throwable.(Throwable.java:250)
>   at java.lang.Exception.(Exception.java:54)
>   at java.lang.InterruptedException.(InterruptedException.java:57)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:439)
>   at 
> org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.clear(AsyncPageReader.java:301)
>   at 
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.clear(ColumnReader.java:147)
>   at 
> org.apache.drill.exec.store.parquet.columnreaders.ReadState.close(ReadState.java:179)
>   at 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.close(ParquetRecordReader.java:318)
>   at 
> org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:209)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:133)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105)
>   at 
> org.apache.drill.exec.physical.impl.broadcastsender.BroadcastSenderRootExec.innerNext(BroadcastSenderRootExec.java:95)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> I added drillbit.log and full jstack log in attachments.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5609) Resources leak on parquet table when the query hangs with CANCELLATION_REQUESTED state

2017-06-26 Thread Roman (JIRA)
Roman created DRILL-5609:


 Summary: Resources leak on parquet table when the query hangs with 
CANCELLATION_REQUESTED state
 Key: DRILL-5609
 URL: https://issues.apache.org/jira/browse/DRILL-5609
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.11.0
Reporter: Roman


I tried to run tpcds_sf100-query2 on parquet table in 10 concurrency threads on 
single node drillbit cluster (I use Drill with DRILL-5599 fix) and caught a 
resources leak. The query hanged in CANCELLATION_REQUESTED state.

Steps to reproduce:

1) Start ConcurrencyTest.java with tpcds_sf100-query2 on parquet table (in 
attachment);
2) Wait 3-5 seconds and make Ctrl+c to kill a client.
3) Retry step 2) several times until you get "CANCELLATION_REQUESTED" on some 
queries.

Queries will hang until drillbit restart. If we make "top", we can see that 
drillbit uses CPU.

Jstack example:
{code:xml}
"26af36b2-7a44-5af8-e0c3-95a4f132fc7a:frag:14:1" #1268 daemon prio=10 os_prio=0 
tid=0x7f25a5afa800 nid=0x16f2 runnable [0x7f2535a5a000]
   java.lang.Thread.State: RUNNABLE
at java.lang.Throwable.fillInStackTrace(Native Method)
at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
- locked <0x000728ca82b0> (a java.lang.InterruptedException)
at java.lang.Throwable.(Throwable.java:250)
at java.lang.Exception.(Exception.java:54)
at java.lang.InterruptedException.(InterruptedException.java:57)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:439)
at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.clear(AsyncPageReader.java:301)
at 
org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.clear(ColumnReader.java:147)
at 
org.apache.drill.exec.store.parquet.columnreaders.ReadState.close(ReadState.java:179)
at 
org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.close(ParquetRecordReader.java:318)
at 
org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:209)
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:133)
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105)
at 
org.apache.drill.exec.physical.impl.broadcastsender.BroadcastSenderRootExec.innerNext(BroadcastSenderRootExec.java:95)
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}

I added drillbit.log and full jstack log in attachments.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5130) UNION ALL difference in results

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062820#comment-16062820
 ] 

ASF GitHub Bot commented on DRILL-5130:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/853
  
Merged into master with commit id 33682be99e719dce9cb326e2835ebc4ae434104a


> UNION ALL difference in results
> ---
>
> Key: DRILL-5130
> URL: https://issues.apache.org/jira/browse/DRILL-5130
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow, Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Arina Ielchiieva
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> Drill 1.9.0 git commit ID: 51246693
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> values(1,2,3,4,5,6) union all 
> values(7,8,9,10,11,12);
> +-+-+-+-+-+-+
> | EXPR$0  | EXPR$1  | EXPR$2  | EXPR$3  | EXPR$4  | EXPR$5  |
> +-+-+-+-+-+-+
> | 7   | 8   | 9   | 10  | 11  | 12  |
> | 7   | 8   | 9   | 10  | 11  | 12  |
> +-+-+-+-+-+-+
> 2 rows selected (0.209 seconds)
> {noformat}
> Postgres 9.3
> {noformat}
> postgres=# values(1,2,3,4,5,6) union all values(7,8,9,10,11,12);
>  column1 | column2 | column3 | column4 | column5 | column6 
> -+-+-+-+-+-
>1 |   2 |   3 |   4 |   5 |   6
>7 |   8 |   9 |  10 |  11 |  12
> (2 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5130) UNION ALL difference in results

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062821#comment-16062821
 ] 

ASF GitHub Bot commented on DRILL-5130:
---

Github user arina-ielchiieva closed the pull request at:

https://github.com/apache/drill/pull/853


> UNION ALL difference in results
> ---
>
> Key: DRILL-5130
> URL: https://issues.apache.org/jira/browse/DRILL-5130
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow, Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Arina Ielchiieva
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> Drill 1.9.0 git commit ID: 51246693
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> values(1,2,3,4,5,6) union all 
> values(7,8,9,10,11,12);
> +-+-+-+-+-+-+
> | EXPR$0  | EXPR$1  | EXPR$2  | EXPR$3  | EXPR$4  | EXPR$5  |
> +-+-+-+-+-+-+
> | 7   | 8   | 9   | 10  | 11  | 12  |
> | 7   | 8   | 9   | 10  | 11  | 12  |
> +-+-+-+-+-+-+
> 2 rows selected (0.209 seconds)
> {noformat}
> Postgres 9.3
> {noformat}
> postgres=# values(1,2,3,4,5,6) union all values(7,8,9,10,11,12);
>  column1 | column2 | column3 | column4 | column5 | column6 
> -+-+-+-+-+-
>1 |   2 |   3 |   4 |   5 |   6
>7 |   8 |   9 |  10 |  11 |  12
> (2 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5599) Notify StatusHandlerListener that batch sending has failed even if channel is still open

2017-06-26 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062817#comment-16062817
 ] 

Arina Ielchiieva commented on DRILL-5599:
-

Merged into master with commit id 7e6571aa5d4c58185dbfa131de99354ea7dc6b4e

> Notify StatusHandlerListener that batch sending has failed even if channel is 
> still open 
> -
>
> Key: DRILL-5599
> URL: https://issues.apache.org/jira/browse/DRILL-5599
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
> Attachments: sample.json
>
>
> *Issue*
> Queries stay in CANCELLATION_REQUESTED state after connection with client was 
> interrupted. Jstack shows that threads for such queries are blocked and 
> waiting to semaphore to be released.
> {noformat}
> "26b70318-ddde-9ead-eee2-0828da97b59f:frag:0:0" daemon prio=10 
> tid=0x7f56dc3c9000 nid=0x25fd waiting on condition [0x7f56b31dc000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0006f4688ab0> (a 
> java.util.concurrent.Semaphore$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
>   at java.util.concurrent.Semaphore.acquire(Semaphore.java:472)
>   at 
> org.apache.drill.exec.ops.SendingAccountor.waitForSendComplete(SendingAccountor.java:48)
>   - locked <0x0006f4688a78> (a 
> org.apache.drill.exec.ops.SendingAccountor)
>   at 
> org.apache.drill.exec.ops.FragmentContext.waitForSendComplete(FragmentContext.java:486)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:134)
>   at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.close(ScreenCreator.java:141)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:313)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264)
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:  - <0x00073f800b68> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}
> *Reproduce*
> Ran modified ConcurrencyTest.java referenced in DRILL-4338 and cancel after 
> 2-3 seconds. ConcurrencyTest.java should be modified as follows:
> {{ExecutorService executor = Executors.newFixedThreadPool(10);}} and execute 
> 200 queries  {{for (int i = 1; i <= 200; i++)}}.
> Query: {{select * from dfs.`sample.json`}}, data set is attached.
> *Problem description*
> Looks like the problem occurs when the server has sent data to the client and 
> waiting from the client confirmation that data was received. In this case 
> [{{ChannelListenerWithCoordinationId}}|https://github.com/apache/drill/blob/master/exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java#L118]
>  is used for tracking. {{ChannelListenerWithCoordinationId}} contains 
> {{StatusHandler}} which keeps track of sent batches. It updates 
> {{SendingAccountor}} with information about how many batches were sent and 
> how many batches have reached the client (successfully or not).
> When sent operation is complete (successfully or not) 
> {{operationComplete(ChannelFuture future)}} is called. Given future contains 
> information if sent operation was successful or not, failure cause, channel 
> status etc. If sent operation was successful we do nothing since in this case 
> client sent us acknowledgment and when we received it, we notified 
> {{StatusHandlerListener}} that batch was received. But if sent operation has 
> failed, we need to notify {{StatusHandler}} that sent was unsuccessful.
> {{operationComplete(ChannelFuture future)}} code:
> {code}
>   if (!future.isSuccess()) {
> removeFromMap(coordinationId);
> if (future.channel().isActive()) {
>   throw new RpcException("Fu