[jira] [Resolved] (DRILL-4146) Concurrent queries hang in planner in ReflectiveRelMetadataProvider

2015-11-30 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha resolved DRILL-4146.
---
Resolution: Fixed

Fixed in 53e7a696f

> Concurrent queries hang in planner in ReflectiveRelMetadataProvider
> ---
>
> Key: DRILL-4146
> URL: https://issues.apache.org/jira/browse/DRILL-4146
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.4.0
>
>
> At concurrency levels of 30 or more for certain workloads we have seen 
> queries hang in the planning phase in Calcite.  The top of the jstack is 
> shown below: 
> {noformat}
> "29b47a17-6ef3-4b7f-98e7-a7c1a702c32f:foreman" daemon prio=10 
> tid=0x7f55484a1800 nid=0x289a runnable [0x7f54b4369000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.getEntry(HashMap.java:465)
> at java.util.HashMap.get(HashMap.java:417)
> at 
> org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider.apply(ReflectiveRelMetadataProvider.java:251)
> at 
> org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
> at 
> org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
> {noformat}
>  
> After some investigations, we found that this issue was actually addressed by 
> CALCITE-874 (ReflectiveRelMetadataProvider is not thread-safe).   This JIRA 
> is a placeholder to merge that Calcite fix since Drill is currently not 
> up-to-date with Calcite and there is an immediate need for running queries in 
> a high concurrency environment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4146) Concurrent queries hang in planner in ReflectiveRelMetadataProvider

2015-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033262#comment-15033262
 ] 

ASF GitHub Bot commented on DRILL-4146:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/285


> Concurrent queries hang in planner in ReflectiveRelMetadataProvider
> ---
>
> Key: DRILL-4146
> URL: https://issues.apache.org/jira/browse/DRILL-4146
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.4.0
>
>
> At concurrency levels of 30 or more for certain workloads we have seen 
> queries hang in the planning phase in Calcite.  The top of the jstack is 
> shown below: 
> {noformat}
> "29b47a17-6ef3-4b7f-98e7-a7c1a702c32f:foreman" daemon prio=10 
> tid=0x7f55484a1800 nid=0x289a runnable [0x7f54b4369000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.getEntry(HashMap.java:465)
> at java.util.HashMap.get(HashMap.java:417)
> at 
> org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider.apply(ReflectiveRelMetadataProvider.java:251)
> at 
> org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
> at 
> org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
> {noformat}
>  
> After some investigations, we found that this issue was actually addressed by 
> CALCITE-874 (ReflectiveRelMetadataProvider is not thread-safe).   This JIRA 
> is a placeholder to merge that Calcite fix since Drill is currently not 
> up-to-date with Calcite and there is an immediate need for running queries in 
> a high concurrency environment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4109) NPE in RecordIterator

2015-11-30 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033173#comment-15033173
 ] 

Victoria Markman commented on DRILL-4109:
-

[~aah] NPE is not there anymore, however, now bunch of queries fail with error 
below.

{code}
0: jdbc:drill:drillbit=localhost> SELECT ca_zip,
. . . . . . . . . . . . . . . . >Sum(cs_sales_price)
. . . . . . . . . . . . . . . . > FROM   catalog_sales,
. . . . . . . . . . . . . . . . >customer,
. . . . . . . . . . . . . . . . >customer_address,
. . . . . . . . . . . . . . . . >date_dim
. . . . . . . . . . . . . . . . > WHERE  cs_bill_customer_sk = c_customer_sk
. . . . . . . . . . . . . . . . >AND c_current_addr_sk = ca_address_sk
. . . . . . . . . . . . . . . . >AND ( Substr(ca_zip, 1, 5) IN ( 
'85669', '86197', '88274', '83405',
. . . . . . . . . . . . . . . . >
'86475', '85392', '85460', '80348',
. . . . . . . . . . . . . . . . >
'81792' )
. . . . . . . . . . . . . . . . >   OR ca_state IN ( 'CA', 'WA', 
'GA' )
. . . . . . . . . . . . . . . . >   OR cs_sales_price > 500 )
. . . . . . . . . . . . . . . . >AND cs_sold_date_sk = d_date_sk
. . . . . . . . . . . . . . . . >AND d_qoy = 1
. . . . . . . . . . . . . . . . >AND d_year = 1998
. . . . . . . . . . . . . . . . > GROUP  BY ca_zip
. . . . . . . . . . . . . . . . > ORDER  BY ca_zip
. . . . . . . . . . . . . . . . > LIMIT 100;
Error: SYSTEM ERROR: IllegalArgumentException: innerPosition:2184, 
outerPosition:63336, innerRecordCount:2184, totalRecordCount:65520
Fragment 3:1
[Error Id: 687f2bee-f8f7-4a80-b211-fd50c4216342 on atsqa4-133.qa.lab:31010] 
(state=,code=0)
{code}

This is query15.sql, I'm attaching debug log 

> NPE in RecordIterator
> -
>
> Key: DRILL-4109
> URL: https://issues.apache.org/jira/browse/DRILL-4109
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Victoria Markman
>Assignee: amit hadke
>Priority: Blocker
> Fix For: 1.4.0
>
> Attachments: 29ac6c1b-9b33-3457-8bc8-9e2dff6ad438.sys.drill, 
> 29b41f37-4803-d7ce-e05f-912d1f65da79.sys.drill, drillbit.log, 
> drillbit.log.debug
>
>
> 4 node cluster
> 36GB of direct memory
> 4GB heap memory
> planner.memory.max_query_memory_per_node=2GB (default)
> planner.enable_hashjoin = false
> Spill directory has 6.4T of memory available:
> {noformat}
> [Tue Nov 17 18:23:18 /tmp/drill ] # df -H .
> Filesystem   Size  Used Avail Use% Mounted on
> localhost:/mapr  7.7T  1.4T  6.4T  18% /mapr
> {noformat}
> Run query below: 
> framework/resources/Advanced/tpcds/tpcds_sf100/original/query15.sql
> drillbit.log
> {code}
> 2015-11-18 02:22:12,639 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:9] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_9/operator_17/7
> 2015-11-18 02:22:12,770 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:5] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_5/operator_17/7
> 2015-11-18 02:22:13,345 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:17] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_17/operator_17/7
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_13/operator_16/1
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] WARN 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Starting to merge. 34 batch groups. 
> Current allocated memory: 2252186
> 2015-11-18 02:22:13,363 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested RUNNING --> 
> FAILED
> 2015-11-18 02:22:13,370 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested FAILED --> 
> FINISHED
> 2015-11-18 02:22:13,371 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on 

[jira] [Created] (DRILL-4147) Union All operator runs in a single fragment

2015-11-30 Thread amit hadke (JIRA)
amit hadke created DRILL-4147:
-

 Summary: Union All operator runs in a single fragment
 Key: DRILL-4147
 URL: https://issues.apache.org/jira/browse/DRILL-4147
 Project: Apache Drill
  Issue Type: Bug
Reporter: amit hadke


A User noticed that running select  from a single directory is much faster than 
union all on two directories.
(https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/#comment-2349732267)
 

It seems like UNION ALL operator doesn't parallelize sub scans (its using 
SINGLETON for distribution type). Everything is ran in single fragment.

We may have to use SubsetTransformer in UnionAllPrule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-2618) BasicFormatMatcher calls getFirstPath(...) without checking # of paths is not zero

2015-11-30 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032987#comment-15032987
 ] 

Khurram Faraaz edited comment on DRILL-2618 at 12/1/15 3:04 AM:


Verified Fix on Drill 1.4.0 commit ID : ff76078b
Querying an empty directory now returns, Table  not found, message.
Directory "empty_DIR" used in query below had no files in it, it is an empty 
directory.

{code}
0: jdbc:drill:schema=dfs.tmp> select * from empty_DIR;
Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 23: Table 
'empty_DIR' not found


[Error Id: 2e07938e-145b-4c00-917d-9cd4c7ad54d1 on centos-01.qa.lab:31010] 
(state=,code=0)
{code}


was (Author: khfaraaz):
Verified Fix on Drill 1.4.0 commit ID : ff76078b
Querying an empty directory now returns, Table  not found, message.

{code}
0: jdbc:drill:schema=dfs.tmp> select * from empty_DIR;
Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 23: Table 
'empty_DIR' not found


[Error Id: 2e07938e-145b-4c00-917d-9cd4c7ad54d1 on centos-01.qa.lab:31010] 
(state=,code=0)
{code}

> BasicFormatMatcher calls getFirstPath(...) without checking # of paths is not 
> zero
> --
>
> Key: DRILL-2618
> URL: https://issues.apache.org/jira/browse/DRILL-2618
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Daniel Barclay (Drill)
>Assignee: Hanifi Gunes
> Fix For: 1.4.0
>
>
> {{BasicFormatMatcher.isReadable(...)}} calls {{getFirstPath(...)}} without 
> checking that there is at least one path.  This can cause an 
> IndexOutOfBoundsException.
> To reproduce, create an empty directory {{/tmp/CaseInsensitiveColumnNames}} 
> and run 
> {{exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2618) BasicFormatMatcher calls getFirstPath(...) without checking # of paths is not zero

2015-11-30 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032987#comment-15032987
 ] 

Khurram Faraaz commented on DRILL-2618:
---

Verified Fix on Drill 1.4.0 commit ID : ff76078b
Querying an empty directory now returns, Table  not found, message.

{code}
0: jdbc:drill:schema=dfs.tmp> select * from empty_DIR;
Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 23: Table 
'empty_DIR' not found


[Error Id: 2e07938e-145b-4c00-917d-9cd4c7ad54d1 on centos-01.qa.lab:31010] 
(state=,code=0)
{code}

> BasicFormatMatcher calls getFirstPath(...) without checking # of paths is not 
> zero
> --
>
> Key: DRILL-2618
> URL: https://issues.apache.org/jira/browse/DRILL-2618
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Daniel Barclay (Drill)
>Assignee: Hanifi Gunes
> Fix For: 1.4.0
>
>
> {{BasicFormatMatcher.isReadable(...)}} calls {{getFirstPath(...)}} without 
> checking that there is at least one path.  This can cause an 
> IndexOutOfBoundsException.
> To reproduce, create an empty directory {{/tmp/CaseInsensitiveColumnNames}} 
> and run 
> {{exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3938) Hive: Failure reading from a partition when a new column is added to the table after the partition creation

2015-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032881#comment-15032881
 ] 

ASF GitHub Bot commented on DRILL-3938:
---

Github user mehant commented on the pull request:

https://github.com/apache/drill/pull/211#issuecomment-160822016
  
Other than the minor comment, lgtm. +1


> Hive: Failure reading from a partition when a new column is added to the 
> table after the partition creation
> ---
>
> Key: DRILL-3938
> URL: https://issues.apache.org/jira/browse/DRILL-3938
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 0.4.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.4.0
>
>
> Repro:
> From Hive:
> {code}
> CREATE TABLE kv(key INT, value STRING);
> LOAD DATA LOCAL INPATH 
> '/Users/hadoop/apache-repos/hive-install/apache-hive-1.0.0-bin/examples/files/kv1.txt'
>  INTO TABLE kv;
> CREATE TABLE kv_p(key INT, value STRING, part1 STRING);
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.max.dynamic.partitions=1;
> set hive.exec.max.dynamic.partitions.pernode=1;
> INSERT INTO TABLE kv_p PARTITION (part1) SELECT key, value, value as s FROM 
> kv;
> ALTER TABLE kv_p ADD COLUMNS (newcol STRING);
> {code}
> From Drill:
> {code}
> USE hive;
> DESCRIBE kv_p;
> SELECT newcol FROM kv_p;
> throws column 'newcol' not found error in HiveRecordReader while selecting 
> only the projected columns.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4146) Concurrent queries hang in planner in ReflectiveRelMetadataProvider

2015-11-30 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032884#comment-15032884
 ] 

Aman Sinha commented on DRILL-4146:
---

[~jni] this is just updating the Calcite version to 1.4.0-drill-r10.  Could you 
pls review ? 

> Concurrent queries hang in planner in ReflectiveRelMetadataProvider
> ---
>
> Key: DRILL-4146
> URL: https://issues.apache.org/jira/browse/DRILL-4146
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.4.0
>
>
> At concurrency levels of 30 or more for certain workloads we have seen 
> queries hang in the planning phase in Calcite.  The top of the jstack is 
> shown below: 
> {noformat}
> "29b47a17-6ef3-4b7f-98e7-a7c1a702c32f:foreman" daemon prio=10 
> tid=0x7f55484a1800 nid=0x289a runnable [0x7f54b4369000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.getEntry(HashMap.java:465)
> at java.util.HashMap.get(HashMap.java:417)
> at 
> org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider.apply(ReflectiveRelMetadataProvider.java:251)
> at 
> org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
> at 
> org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
> {noformat}
>  
> After some investigations, we found that this issue was actually addressed by 
> CALCITE-874 (ReflectiveRelMetadataProvider is not thread-safe).   This JIRA 
> is a placeholder to merge that Calcite fix since Drill is currently not 
> up-to-date with Calcite and there is an immediate need for running queries in 
> a high concurrency environment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3938) Hive: Failure reading from a partition when a new column is added to the table after the partition creation

2015-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032878#comment-15032878
 ] 

ASF GitHub Bot commented on DRILL-3938:
---

Github user mehant commented on a diff in the pull request:

https://github.com/apache/drill/pull/211#discussion_r46231527
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
 ---
@@ -101,12 +101,24 @@ public boolean matches(RelOptRuleCall call) {
   return true;
 }
 
+final List tableSchema = hiveTable.getSd().getCols();
 // Make sure all partitions have the same input format as the table 
input format
 for (HivePartition partition : partitions) {
-  Class inputFormat = 
getInputFormatFromSD(hiveTable, partition.getPartition().getSd());
+  final StorageDescriptor partitionSD = 
partition.getPartition().getSd();
+  Class inputFormat = 
getInputFormatFromSD(hiveTable, partitionSD);
   if (inputFormat == null || !inputFormat.equals(tableInputFormat)) {
 return false;
   }
+
+  // Make sure the schema of the table and schema of the partition 
matches. If not return false. Currently native
--- End diff --

Could you add a minor comment indicating that the schema changes between 
partition and table can happen due to "alter table" statements and that we 
would need the converter functions in case the type of the column has been 
changed via the alter table.


> Hive: Failure reading from a partition when a new column is added to the 
> table after the partition creation
> ---
>
> Key: DRILL-3938
> URL: https://issues.apache.org/jira/browse/DRILL-3938
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 0.4.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.4.0
>
>
> Repro:
> From Hive:
> {code}
> CREATE TABLE kv(key INT, value STRING);
> LOAD DATA LOCAL INPATH 
> '/Users/hadoop/apache-repos/hive-install/apache-hive-1.0.0-bin/examples/files/kv1.txt'
>  INTO TABLE kv;
> CREATE TABLE kv_p(key INT, value STRING, part1 STRING);
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.max.dynamic.partitions=1;
> set hive.exec.max.dynamic.partitions.pernode=1;
> INSERT INTO TABLE kv_p PARTITION (part1) SELECT key, value, value as s FROM 
> kv;
> ALTER TABLE kv_p ADD COLUMNS (newcol STRING);
> {code}
> From Drill:
> {code}
> USE hive;
> DESCRIBE kv_p;
> SELECT newcol FROM kv_p;
> throws column 'newcol' not found error in HiveRecordReader while selecting 
> only the projected columns.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4146) Concurrent queries hang in planner in ReflectiveRelMetadataProvider

2015-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032876#comment-15032876
 ] 

ASF GitHub Bot commented on DRILL-4146:
---

GitHub user amansinha100 opened a pull request:

https://github.com/apache/drill/pull/285

DRILL-4146: Concurrent queries hang in planner.  Fix is in Calcite (C…

…ALCITE-874).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amansinha100/incubator-drill hashfunc2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/285.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #285


commit 39a11b8ae1e103b9a1e9c141918023ead5361e46
Author: Aman Sinha 
Date:   2015-12-01T01:48:46Z

DRILL-4146: Concurrent queries hang in planner.  Fix is in Calcite 
(CALCITE-874).




> Concurrent queries hang in planner in ReflectiveRelMetadataProvider
> ---
>
> Key: DRILL-4146
> URL: https://issues.apache.org/jira/browse/DRILL-4146
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.4.0
>
>
> At concurrency levels of 30 or more for certain workloads we have seen 
> queries hang in the planning phase in Calcite.  The top of the jstack is 
> shown below: 
> {noformat}
> "29b47a17-6ef3-4b7f-98e7-a7c1a702c32f:foreman" daemon prio=10 
> tid=0x7f55484a1800 nid=0x289a runnable [0x7f54b4369000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.getEntry(HashMap.java:465)
> at java.util.HashMap.get(HashMap.java:417)
> at 
> org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider.apply(ReflectiveRelMetadataProvider.java:251)
> at 
> org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
> at 
> org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
> {noformat}
>  
> After some investigations, we found that this issue was actually addressed by 
> CALCITE-874 (ReflectiveRelMetadataProvider is not thread-safe).   This JIRA 
> is a placeholder to merge that Calcite fix since Drill is currently not 
> up-to-date with Calcite and there is an immediate need for running queries in 
> a high concurrency environment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4146) Concurrent queries hang in planner in ReflectiveRelMetadataProvider

2015-11-30 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-4146:
--
Fix Version/s: 1.4.0

> Concurrent queries hang in planner in ReflectiveRelMetadataProvider
> ---
>
> Key: DRILL-4146
> URL: https://issues.apache.org/jira/browse/DRILL-4146
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.4.0
>
>
> At concurrency levels of 30 or more for certain workloads we have seen 
> queries hang in the planning phase in Calcite.  The top of the jstack is 
> shown below: 
> {noformat}
> "29b47a17-6ef3-4b7f-98e7-a7c1a702c32f:foreman" daemon prio=10 
> tid=0x7f55484a1800 nid=0x289a runnable [0x7f54b4369000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.getEntry(HashMap.java:465)
> at java.util.HashMap.get(HashMap.java:417)
> at 
> org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider.apply(ReflectiveRelMetadataProvider.java:251)
> at 
> org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
> at 
> org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
> {noformat}
>  
> After some investigations, we found that this issue was actually addressed by 
> CALCITE-874 (ReflectiveRelMetadataProvider is not thread-safe).   This JIRA 
> is a placeholder to merge that Calcite fix since Drill is currently not 
> up-to-date with Calcite and there is an immediate need for running queries in 
> a high concurrency environment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4146) Concurrent queries hang in planner in ReflectiveRelMetadataProvider

2015-11-30 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032862#comment-15032862
 ] 

Aman Sinha commented on DRILL-4146:
---

I have verified on a private branch that applying the Calcite fix resolves the 
issue, so I will go ahead and merge it.  Marking this for 1.4.0. 

> Concurrent queries hang in planner in ReflectiveRelMetadataProvider
> ---
>
> Key: DRILL-4146
> URL: https://issues.apache.org/jira/browse/DRILL-4146
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> At concurrency levels of 30 or more for certain workloads we have seen 
> queries hang in the planning phase in Calcite.  The top of the jstack is 
> shown below: 
> {noformat}
> "29b47a17-6ef3-4b7f-98e7-a7c1a702c32f:foreman" daemon prio=10 
> tid=0x7f55484a1800 nid=0x289a runnable [0x7f54b4369000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.getEntry(HashMap.java:465)
> at java.util.HashMap.get(HashMap.java:417)
> at 
> org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider.apply(ReflectiveRelMetadataProvider.java:251)
> at 
> org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
> at 
> org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
> {noformat}
>  
> After some investigations, we found that this issue was actually addressed by 
> CALCITE-874 (ReflectiveRelMetadataProvider is not thread-safe).   This JIRA 
> is a placeholder to merge that Calcite fix since Drill is currently not 
> up-to-date with Calcite and there is an immediate need for running queries in 
> a high concurrency environment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4146) Concurrent queries hang in planner in ReflectiveRelMetadataProvider

2015-11-30 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-4146:
--
Description: 
At concurrency levels of 30 or more for certain workloads we have seen queries 
hang in the planning phase in Calcite.  The top of the jstack is shown below: 

{noformat}
"29b47a17-6ef3-4b7f-98e7-a7c1a702c32f:foreman" daemon prio=10 
tid=0x7f55484a1800 nid=0x289a runnable [0x7f54b4369000]
   java.lang.Thread.State: RUNNABLE
at java.util.HashMap.getEntry(HashMap.java:465)
at java.util.HashMap.get(HashMap.java:417)
at 
org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider.apply(ReflectiveRelMetadataProvider.java:251)
at 
org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
at 
org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
{noformat}
 
After some investigations, we found that this issue was actually addressed by 
CALCITE-874 (ReflectiveRelMetadataProvider is not thread-safe).   This JIRA is 
a placeholder to merge that Calcite fix since Drill is currently not up-to-date 
with Calcite and there is an immediate need for running queries in a high 
concurrency environment. 

  was:
At concurrency levels of 30 or more for certain workloads we have seen queries 
hang in the planning phase in Calcite.  The top of the jstack is shown below: 

{noformat}
"29b47a17-6ef3-4b7f-98e7-a7c1a702c32f:foreman" daemon prio=10 
tid=0x7f55484a1800 nid=0x289a runnable [0x7f54b4369000]
   java.lang.Thread.State: RUNNABLE
at java.util.HashMap.getEntry(HashMap.java:465)
at java.util.HashMap.get(HashMap.java:417)
at 
org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider.apply(ReflectiveRelMetadataProvider.java:251)
at 
org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
at 
org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
{code}
 
After some investigations, we found that this issue was actually addressed by 
CALCITE-874 (ReflectiveRelMetadataProvider is not thread-safe).   This JIRA is 
a placeholder to merge that Calcite fix since Drill is currently not up-to-date 
with Calcite and there is an immediate need for running queries in a high 
concurrency environment. 


> Concurrent queries hang in planner in ReflectiveRelMetadataProvider
> ---
>
> Key: DRILL-4146
> URL: https://issues.apache.org/jira/browse/DRILL-4146
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> At concurrency levels of 30 or more for certain workloads we have seen 
> queries hang in the planning phase in Calcite.  The top of the jstack is 
> shown below: 
> {noformat}
> "29b47a17-6ef3-4b7f-98e7-a7c1a702c32f:foreman" daemon prio=10 
> tid=0x7f55484a1800 nid=0x289a runnable [0x7f54b4369000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.getEntry(HashMap.java:465)
> at java.util.HashMap.get(HashMap.java:417)
> at 
> org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider.apply(ReflectiveRelMetadataProvider.java:251)
> at 
> org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
> at 
> org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
> {noformat}
>  
> After some investigations, we found that this issue was actually addressed by 
> CALCITE-874 (ReflectiveRelMetadataProvider is not thread-safe).   This JIRA 
> is a placeholder to merge that Calcite fix since Drill is currently not 
> up-to-date with Calcite and there is an immediate need for running queries in 
> a high concurrency environment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4146) Concurrent queries hang in planner in ReflectiveRelMetadataProvider

2015-11-30 Thread Aman Sinha (JIRA)
Aman Sinha created DRILL-4146:
-

 Summary: Concurrent queries hang in planner in 
ReflectiveRelMetadataProvider
 Key: DRILL-4146
 URL: https://issues.apache.org/jira/browse/DRILL-4146
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.3.0
Reporter: Aman Sinha
Assignee: Aman Sinha


At concurrency levels of 30 or more for certain workloads we have seen queries 
hang in the planning phase in Calcite.  The top of the jstack is shown below: 

{noformat}
"29b47a17-6ef3-4b7f-98e7-a7c1a702c32f:foreman" daemon prio=10 
tid=0x7f55484a1800 nid=0x289a runnable [0x7f54b4369000]
   java.lang.Thread.State: RUNNABLE
at java.util.HashMap.getEntry(HashMap.java:465)
at java.util.HashMap.get(HashMap.java:417)
at 
org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider.apply(ReflectiveRelMetadataProvider.java:251)
at 
org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
at 
org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
{code}
 
After some investigations, we found that this issue was actually addressed by 
CALCITE-874 (ReflectiveRelMetadataProvider is not thread-safe).   This JIRA is 
a placeholder to merge that Calcite fix since Drill is currently not up-to-date 
with Calcite and there is an immediate need for running queries in a high 
concurrency environment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

2015-11-30 Thread Peter McTaggart (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter McTaggart updated DRILL-4145:
---
Environment: 
Drill 1.3.0 on a 3 node distributed-mode cluster on AWS.
Data files on S3.

S3 storage plugin configuration:
{
  "type": "file",
  "enabled": true,
  "connection": "s3a://",
  "workspaces": {
"root": {
  "location": "/",
  "writable": false,
  "defaultInputFormat": null
},
"views": {
  "location": "/processed",
  "writable": true,
  "defaultInputFormat": null
},
"tmp": {
  "location": "/tmp",
  "writable": true,
  "defaultInputFormat": null
}
  },
  "formats": {
"psv": {
  "type": "text",
  "extensions": [
"tbl"
  ],
  "delimiter": "|"
},
"csv": {
  "type": "text",
  "extensions": [
"csv"
  ],
  "extractHeader": true,
  "delimiter": ","
},
"tsv": {
  "type": "text",
  "extensions": [
"tsv"
  ],
  "delimiter": "\t"
},
"parquet": {
  "type": "parquet"
},
"json": {
  "type": "json"
},
"avro": {
  "type": "avro"
},
"sequencefile": {
  "type": "sequencefile",
  "extensions": [
"seq"
  ]
},
"csvh": {
  "type": "text",
  "extensions": [
"csvh",
"csv"
  ],
  "extractHeader": true,
  "delimiter": ","
}
  }
}


  was:
Drill 1.3.0 on a 3 node distriubted cluster on AWS.
Data files on S3.

S3 storage plugin configuration:
{
  "type": "file",
  "enabled": true,
  "connection": "s3a://",
  "workspaces": {
"root": {
  "location": "/",
  "writable": false,
  "defaultInputFormat": null
},
"views": {
  "location": "/processed",
  "writable": true,
  "defaultInputFormat": null
},
"tmp": {
  "location": "/tmp",
  "writable": true,
  "defaultInputFormat": null
}
  },
  "formats": {
"psv": {
  "type": "text",
  "extensions": [
"tbl"
  ],
  "delimiter": "|"
},
"csv": {
  "type": "text",
  "extensions": [
"csv"
  ],
  "extractHeader": true,
  "delimiter": ","
},
"tsv": {
  "type": "text",
  "extensions": [
"tsv"
  ],
  "delimiter": "\t"
},
"parquet": {
  "type": "parquet"
},
"json": {
  "type": "json"
},
"avro": {
  "type": "avro"
},
"sequencefile": {
  "type": "sequencefile",
  "extensions": [
"seq"
  ]
},
"csvh": {
  "type": "text",
  "extensions": [
"csvh",
"csv"
  ],
  "extractHeader": true,
  "delimiter": ","
}
  }
}



> IndexOutOfBoundsException raised during select * query on S3 csv file
> -
>
> Key: DRILL-4145
> URL: https://issues.apache.org/jira/browse/DRILL-4145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
> Environment: Drill 1.3.0 on a 3 node distributed-mode cluster on AWS.
> Data files on S3.
> S3 storage plugin configuration:
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3a://",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> },
> "views": {
>   "location": "/processed",
>   "writable": true,
>   "defaultInputFormat": null
> },
> "tmp": {
>   "location": "/tmp",
>   "writable": true,
>   "defaultInputFormat": null
> }
>   },
>   "formats": {
> "psv": {
>   "type": "text",
>   "extensions": [
> "tbl"
>   ],
>   "delimiter": "|"
> },
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> },
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv"
>   ],
>   "delimiter": "\t"
> },
> "parquet": {
>   "type": "parquet"
> },
> "json": {
>   "type": "json"
> },
> "avro": {
>   "type": "avro"
> },
> "sequencefile": {
>   "type": "sequencefile",
>   "extensions": [
> "seq"
>   ]
> },
> "csvh": {
>   "type": "text",
>   "extensions": [
> "csvh",
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> }
>   }
> }
>Reporter: Peter McTaggart
> Attachments: apps1-bad.csv, apps1.csv
>
>
> When trying to query (via sqlline or WebUI) a .csv file I am getting an 
> IndexOutofBoundsException:
> {noformat} 0: jdbc:drill:> select * from 
> s3data.root.`staging/data/apps1-bad.csv` limit 1;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
> (expec

[jira] [Comment Edited] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

2015-11-30 Thread Peter McTaggart (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032860#comment-15032860
 ] 

Peter McTaggart edited comment on DRILL-4145 at 12/1/15 1:28 AM:
-

a working (apps1.csv) and non-working file (apps1-bad.csv) -- there is only one 
line difference between these (apps1-bad.csv has one extra line).


was (Author: pdmct):
a working and non-working file -- there is only one line difference between 
these (apps1-bad.csv has one extra line).

> IndexOutOfBoundsException raised during select * query on S3 csv file
> -
>
> Key: DRILL-4145
> URL: https://issues.apache.org/jira/browse/DRILL-4145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
> Environment: Drill 1.3.0 on a 3 node distriubted cluster on AWS.
> Data files on S3.
> S3 storage plugin configuration:
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3a://",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> },
> "views": {
>   "location": "/processed",
>   "writable": true,
>   "defaultInputFormat": null
> },
> "tmp": {
>   "location": "/tmp",
>   "writable": true,
>   "defaultInputFormat": null
> }
>   },
>   "formats": {
> "psv": {
>   "type": "text",
>   "extensions": [
> "tbl"
>   ],
>   "delimiter": "|"
> },
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> },
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv"
>   ],
>   "delimiter": "\t"
> },
> "parquet": {
>   "type": "parquet"
> },
> "json": {
>   "type": "json"
> },
> "avro": {
>   "type": "avro"
> },
> "sequencefile": {
>   "type": "sequencefile",
>   "extensions": [
> "seq"
>   ]
> },
> "csvh": {
>   "type": "text",
>   "extensions": [
> "csvh",
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> }
>   }
> }
>Reporter: Peter McTaggart
> Attachments: apps1-bad.csv, apps1.csv
>
>
> When trying to query (via sqlline or WebUI) a .csv file I am getting an 
> IndexOutofBoundsException:
> {noformat} 0: jdbc:drill:> select * from 
> s3data.root.`staging/data/apps1-bad.csv` limit 1;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
> ip-X.compute.internal:31010] (state=,code=0)
> 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | FIELD_1  |   FIELD_2| FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6 
>   | FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |   
> FIELD_12   | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | 
> FIELD_18  | FIELD_19  |   FIELD_20   | FIELD_21  | FIELD_22  | 
> FIELD_23  | FIELD_24  | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | 
> FIELD_29  | FIELD_30  | FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | 
> FIELD_35  |
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | 489517   | 27/10/2015 02:05:27  | 261  | 1130232  | 0| 
> 925630488  | 0| 925630488  | -1   | 19531580547  |   | 
> 27/10/2015 02:00:00  |   | 30| 300   | 0 | 0  
>|   |   | 27/10/2015 02:05:27  | 0 | 1 | 0 
> | 35.0  |   |   |   | 505   | 872.0   
>   |   | aBc   |   |   |   |   |
> +--+--+--+--+--++--++--+

[jira] [Updated] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

2015-11-30 Thread Peter McTaggart (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter McTaggart updated DRILL-4145:
---
Attachment: apps1.csv
apps1-bad.csv

a working and non-working file -- there is only one line difference between 
these (apps1-bad.csv has one extra line).

> IndexOutOfBoundsException raised during select * query on S3 csv file
> -
>
> Key: DRILL-4145
> URL: https://issues.apache.org/jira/browse/DRILL-4145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
> Environment: Drill 1.3.0 on a 3 node distriubted cluster on AWS.
> Data files on S3.
> S3 storage plugin configuration:
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3a://",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> },
> "views": {
>   "location": "/processed",
>   "writable": true,
>   "defaultInputFormat": null
> },
> "tmp": {
>   "location": "/tmp",
>   "writable": true,
>   "defaultInputFormat": null
> }
>   },
>   "formats": {
> "psv": {
>   "type": "text",
>   "extensions": [
> "tbl"
>   ],
>   "delimiter": "|"
> },
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> },
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv"
>   ],
>   "delimiter": "\t"
> },
> "parquet": {
>   "type": "parquet"
> },
> "json": {
>   "type": "json"
> },
> "avro": {
>   "type": "avro"
> },
> "sequencefile": {
>   "type": "sequencefile",
>   "extensions": [
> "seq"
>   ]
> },
> "csvh": {
>   "type": "text",
>   "extensions": [
> "csvh",
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> }
>   }
> }
>Reporter: Peter McTaggart
> Attachments: apps1-bad.csv, apps1.csv
>
>
> When trying to query (via sqlline or WebUI) a .csv file I am getting an 
> IndexOutofBoundsException:
> {noformat} 0: jdbc:drill:> select * from 
> s3data.root.`staging/data/apps1-bad.csv` limit 1;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
> ip-X.compute.internal:31010] (state=,code=0)
> 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | FIELD_1  |   FIELD_2| FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6 
>   | FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |   
> FIELD_12   | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | 
> FIELD_18  | FIELD_19  |   FIELD_20   | FIELD_21  | FIELD_22  | 
> FIELD_23  | FIELD_24  | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | 
> FIELD_29  | FIELD_30  | FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | 
> FIELD_35  |
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | 489517   | 27/10/2015 02:05:27  | 261  | 1130232  | 0| 
> 925630488  | 0| 925630488  | -1   | 19531580547  |   | 
> 27/10/2015 02:00:00  |   | 30| 300   | 0 | 0  
>|   |   | 27/10/2015 02:05:27  | 0 | 1 | 0 
> | 35.0  |   |   |   | 505   | 872.0   
>   |   | aBc   |   |   |   |   |
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---

[jira] [Updated] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

2015-11-30 Thread Peter McTaggart (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter McTaggart updated DRILL-4145:
---
Description: 
When trying to query (via sqlline or WebUI) a .csv file I am getting an 
IndexOutofBoundsException:
{noformat} 0: jdbc:drill:> select * from 
s3data.root.`staging/data/apps1-bad.csv` limit 1;
Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
(expected: range(0, 16384))

Fragment 0:0

[Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
ip-X.compute.internal:31010] (state=,code=0)
0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
+--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| FIELD_1  |   FIELD_2| FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6   
| FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |   FIELD_12  
 | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | FIELD_18  | 
FIELD_19  |   FIELD_20   | FIELD_21  | FIELD_22  | FIELD_23  | FIELD_24 
 | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | FIELD_29  | FIELD_30  | 
FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | FIELD_35  |
+--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 489517   | 27/10/2015 02:05:27  | 261  | 1130232  | 0| 925630488  
| 0| 925630488  | -1   | 19531580547  |   | 27/10/2015 
02:00:00  |   | 30| 300   | 0 | 0 | 
  |   | 27/10/2015 02:05:27  | 0 | 1 | 0
 | 35.0  |   |   |   | 505   | 872.0 |  
 | aBc   |   |   |   |   |
+--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
1 row selected (1.094 seconds)
0: jdbc:drill:>  {noformat}

Good file: apps1.csv, and 
Bad file: apps1-bad.csv  attached.


  was:
When trying to query (via sqlline or WebUI) a .csv file I am getting an 
IndexOutofBoundsException:
{noformat} 0: jdbc:drill:> select * from 
s3data.root.`staging/data/apps1-bad.csv` limit 1;
Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
(expected: range(0, 16384))

Fragment 0:0

[Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
ip-X.compute.internal:31010] (state=,code=0)
0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
+--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| FIELD_1  |   FIELD_2| FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6   
| FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |   FIELD_12  
 | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | FIELD_18  | 
FIELD_19  |   FIELD_20   | FIELD_21  | FIELD_22  | FIELD_23  | FIELD_24 
 | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | FIELD_29  | FIELD_30  | 
FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | FIELD_35  |
+--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 489517   | 27/10/2015 02:05:27 

[jira] [Updated] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

2015-11-30 Thread Peter McTaggart (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter McTaggart updated DRILL-4145:
---
Description: 
When trying to query (via sqlline or WebUI) a .csv file I am getting an 
IndexOutofBoundsException:
{noformat} 0: jdbc:drill:> select * from 
s3data.root.`staging/data/apps1-bad.csv` limit 1;
Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
(expected: range(0, 16384))

Fragment 0:0

[Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
ip-X.compute.internal:31010] (state=,code=0)
0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
+--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| FIELD_1  |   FIELD_2| FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6   
| FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |   FIELD_12  
 | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | FIELD_18  | 
FIELD_19  |   FIELD_20   | FIELD_21  | FIELD_22  | FIELD_23  | FIELD_24 
 | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | FIELD_29  | FIELD_30  | 
FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | FIELD_35  |
+--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 489517   | 27/10/2015 02:05:27  | 261  | 1130232  | 0| 925630488  
| 0| 925630488  | -1   | 19531580547  |   | 27/10/2015 
02:00:00  |   | 30| 300   | 0 | 0 | 
  |   | 27/10/2015 02:05:27  | 0 | 1 | 0
 | 35.0  |   |   |   | 505   | 872.0 |  
 | aBc   |   |   |   |   |
+--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
1 row selected (1.094 seconds)
0: jdbc:drill:>  {noformat{

Good file: apps1.csv, and 
Bad file: apps1-bad.csv  attached.


  was:
When trying to query (via sqlline or WebUI) a .csv file I am getting an 
IndexOutofBoundsException:
{{ 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1-bad.csv` limit 
1;
Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
(expected: range(0, 16384))

Fragment 0:0

[Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
ip-X.compute.internal:31010] (state=,code=0)
0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
+--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| FIELD_1  |   FIELD_2| FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6   
| FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |   FIELD_12  
 | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | FIELD_18  | 
FIELD_19  |   FIELD_20   | FIELD_21  | FIELD_22  | FIELD_23  | FIELD_24 
 | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | FIELD_29  | FIELD_30  | 
FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | FIELD_35  |
+--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 489517   | 27/10/2015 02:05:27  | 261  

[jira] [Created] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

2015-11-30 Thread Peter McTaggart (JIRA)
Peter McTaggart created DRILL-4145:
--

 Summary: IndexOutOfBoundsException raised during select * query on 
S3 csv file
 Key: DRILL-4145
 URL: https://issues.apache.org/jira/browse/DRILL-4145
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.3.0
 Environment: Drill 1.3.0 on a 3 node distriubted cluster on AWS.
Data files on S3.

S3 storage plugin configuration:
{
  "type": "file",
  "enabled": true,
  "connection": "s3a://",
  "workspaces": {
"root": {
  "location": "/",
  "writable": false,
  "defaultInputFormat": null
},
"views": {
  "location": "/processed",
  "writable": true,
  "defaultInputFormat": null
},
"tmp": {
  "location": "/tmp",
  "writable": true,
  "defaultInputFormat": null
}
  },
  "formats": {
"psv": {
  "type": "text",
  "extensions": [
"tbl"
  ],
  "delimiter": "|"
},
"csv": {
  "type": "text",
  "extensions": [
"csv"
  ],
  "extractHeader": true,
  "delimiter": ","
},
"tsv": {
  "type": "text",
  "extensions": [
"tsv"
  ],
  "delimiter": "\t"
},
"parquet": {
  "type": "parquet"
},
"json": {
  "type": "json"
},
"avro": {
  "type": "avro"
},
"sequencefile": {
  "type": "sequencefile",
  "extensions": [
"seq"
  ]
},
"csvh": {
  "type": "text",
  "extensions": [
"csvh",
"csv"
  ],
  "extractHeader": true,
  "delimiter": ","
}
  }
}

Reporter: Peter McTaggart


When trying to query (via sqlline or WebUI) a .csv file I am getting an 
IndexOutofBoundsException:
{{ 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1-bad.csv` limit 
1;
Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
(expected: range(0, 16384))

Fragment 0:0

[Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
ip-X.compute.internal:31010] (state=,code=0)
0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
+--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| FIELD_1  |   FIELD_2| FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6   
| FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |   FIELD_12  
 | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | FIELD_18  | 
FIELD_19  |   FIELD_20   | FIELD_21  | FIELD_22  | FIELD_23  | FIELD_24 
 | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | FIELD_29  | FIELD_30  | 
FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | FIELD_35  |
+--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 489517   | 27/10/2015 02:05:27  | 261  | 1130232  | 0| 925630488  
| 0| 925630488  | -1   | 19531580547  |   | 27/10/2015 
02:00:00  |   | 30| 300   | 0 | 0 | 
  |   | 27/10/2015 02:05:27  | 0 | 1 | 0
 | 35.0  |   |   |   | 505   | 872.0 |  
 | aBc   |   |   |   |   |
+--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
1 row selected (1.094 seconds)
0: jdbc:drill:>  }}

Good file: apps1.csv, and 
Bad file: apps1-bad.csv  attached.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4119) Skew in hash distribution for varchar (and possibly other) types of data

2015-11-30 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha resolved DRILL-4119.
---
Resolution: Fixed

Fixed in ff76078b6 . 

> Skew in hash distribution for varchar (and possibly other) types of data
> 
>
> Key: DRILL-4119
> URL: https://issues.apache.org/jira/browse/DRILL-4119
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.4.0
>
>
> We are seeing substantial skew for an Id column that contains varchar data of 
> length 32.   It is easily reproducible by a group-by query: 
> {noformat}
> Explain plan for SELECT SomeId From table GROUP BY SomeId;
> ...
> 01-02  HashAgg(group=[{0}])
> 01-03Project(SomeId=[$0])
> 01-04  HashToRandomExchange(dist0=[[$0]])
> 02-01UnorderedMuxExchange
> 03-01  Project(SomeId=[$0], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
> 03-02HashAgg(group=[{0}])
> 03-03  Project(SomeId=[$0])
> {noformat}
> The string id happens to be of the following type: 
> {noformat}
> e4b4388e8865819126cb0e4dcaa7261d
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032824#comment-15032824
 ] 

ASF GitHub Bot commented on DRILL-4134:
---

Github user julienledem commented on a diff in the pull request:

https://github.com/apache/drill/pull/283#discussion_r46227624
  
--- Diff: exec/memory/base/src/main/java/io/netty/buffer/DrillBuf.java ---
@@ -186,42 +137,110 @@ private final void checkIndexD(int index, int 
fieldLength) {
* @param start The starting position of the bytes to be read.
* @param end The exclusive endpoint of the bytes to be read.
*/
-  public void checkBytes(int start, int end){
-if (BOUNDS_CHECKING_ENABLED) {
+  public void checkBytes(int start, int end) {
+if (BoundsChecking.BOUNDS_CHECKING_ENABLED) {
   checkIndexD(start, end - start);
 }
   }
 
   private void chk(int index, int width) {
-if (BOUNDS_CHECKING_ENABLED) {
+if (BoundsChecking.BOUNDS_CHECKING_ENABLED) {
   checkIndexD(index, width);
 }
   }
 
-  private void chk(int index) {
-if (BOUNDS_CHECKING_ENABLED) {
-  checkIndexD(index);
+  private void ensure(int width) {
+if (BoundsChecking.BOUNDS_CHECKING_ENABLED) {
+  ensureWritable(width);
 }
   }
 
-  private void ensure(int width) {
-if (BOUNDS_CHECKING_ENABLED) {
-  ensureWritable(width);
+  /**
+   * Create a new DrillBuf that is associated with an alternative 
allocator for the purposes of memory ownership and
+   * accounting. This has no impact on the reference counting for this 
allocator.
+   *
+   * This operation has no impact on the reference count of this DrillBuf. 
The newly created DrillBuf with either have a
+   * reference count of 1 (in the case that this is the first time this 
memory is being associated with the new
+   * allocator) or the current value of the reference count + 1 for the 
other AllocatorManager/BufferLedger combination
+   * in the case that the provided allocator already had an association to 
this underlying memory.
--- End diff --

"This operation has no impact on the reference count of this DrillBuf."
Is this true only if the provided allocator is different from the current 
one?


> Incorporate remaining patches from DRILL-1942 Allocator refactor
> 
>
> Key: DRILL-4134
> URL: https://issues.apache.org/jira/browse/DRILL-4134
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032819#comment-15032819
 ] 

ASF GitHub Bot commented on DRILL-4134:
---

Github user julienledem commented on a diff in the pull request:

https://github.com/apache/drill/pull/283#discussion_r46226910
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/data/DataServer.java ---
@@ -167,11 +169,29 @@ private void send(final FragmentRecordBatch 
fragmentBatch, final DrillBuf body,
 }
 
 final BufferAllocator allocator = 
manager.getFragmentContext().getAllocator();
-final Pointer out = new Pointer<>();
-
 final boolean withinMemoryEnvelope;
-
-withinMemoryEnvelope = allocator.takeOwnership(body, out);
+final DrillBuf transferredBuffer;
+try {
+  TransferResult result = body.transferOwnership(allocator);
+  withinMemoryEnvelope = result.allocationFit;
+  transferredBuffer = result.buffer;
+} catch(final AllocatorClosedException e) {
+  /*
+   * It can happen that between the time we get the fragment manager 
and we
+   * try to transfer this buffer to it, the fragment may have been 
cancelled
+   * and closed. When that happens, the allocator will be closed when 
we
+   * attempt this. That just means we can drop this data on the floor, 
since
+   * the receiver no longer exists (and no longer needs it).
+   *
+   * Note that checking manager.isCancelled() before we attempt this 
isn't enough,
+   * because of timing: it may still be cancelled between that check 
and
+   * the attempt to do the memory transfer. To double check ourselves, 
we
+   * do check manager.isCancelled() here, after the fact; it shouldn't
+   * change again after its allocator has been closed.
+   */
+  assert manager.isCancelled();
--- End diff --

I would use a regular exception here:
```
if (!manager.isCancelled()) {
   throw new ...
}
```
that way you can have a better message than just AssertionError and chain e 
to the exception.


> Incorporate remaining patches from DRILL-1942 Allocator refactor
> 
>
> Key: DRILL-4134
> URL: https://issues.apache.org/jira/browse/DRILL-4134
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032808#comment-15032808
 ] 

ASF GitHub Bot commented on DRILL-4134:
---

Github user julienledem commented on a diff in the pull request:

https://github.com/apache/drill/pull/283#discussion_r46225897
  
--- Diff: 
exec/memory/base/src/main/java/org/apache/drill/exec/memory/AllocationReservation.java
 ---
@@ -0,0 +1,155 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.memory;
+
+import io.netty.buffer.DrillBuf;
+
+import com.google.common.base.Preconditions;
+
+/**
+ * Supports cumulative allocation reservation. Clients may increase the 
size of the reservation repeatedly until they
+ * call for an allocation of the current total size. The reservation can 
only be used once, and will throw an exception
+ * if it is used more than once.
+ * 
+ * For the purposes of airtight memory accounting, the reservation must be 
close()d whether it is used or not.
+ * This is not threadsafe.
+ */
+public abstract class AllocationReservation implements AutoCloseable {
+  private int nBytes = 0;
+  private boolean used = false;
+  private boolean closed = false;
+
+  /**
+   * Constructor. Prevent construction except by derived classes.
+   * The expectation is that the derived class will be a non-static 
inner
+   * class in an allocator.
+   */
+  AllocationReservation() {
+  }
+
+  /**
+   * Add to the current reservation.
+   *
+   * Adding may fail if the allocator is not allowed to consume any 
more space.
+   *
+   * @param nBytes the number of bytes to add
+   * @return true if the addition is possible, false otherwise
+   * @throws IllegalStateException if called after buffer() is used to 
allocate the reservation
+   */
+  public boolean add(final int nBytes) {
+Preconditions.checkArgument(nBytes >= 0, "nBytes(%d) < 0", nBytes);
+Preconditions.checkState(!closed, "Attempt to increase reservation 
after reservation has been closed");
+Preconditions.checkState(!used, "Attempt to increase reservation after 
reservation has been used");
+
+// we round up to next power of two since all reservations are done in 
powers of two. This may overestimate the
+// preallocation since someone may perceive additions to be power of 
two. If this becomes a problem, we can look at
+// modifying this behavior so that we maintain what we reserve and 
what the user asked for and make sure to only
+// round to power of two as necessary.
+final int nBytesTwo = BaseAllocator.nextPowerOfTwo(nBytes);
+if (!reserve(nBytesTwo)) {
+  return false;
+}
+
+this.nBytes += nBytesTwo;
+return true;
+  }
+
+  /**
+   * Requests a reservation of additional space.
+   *
+   * The implementation of the allocator's inner class provides this.
+   *
+   * @param nBytes the amount to reserve
+   * @return true if the reservation can be satisfied, false otherwise
+   */
+  abstract boolean reserve(int nBytes);
+
+  /**
+   * Allocate a buffer whose size is the total of all the add()s made.
+   *
+   * The allocation request can still fail, even if the amount of space
+   * requested is available, if the allocation cannot be made contiguously.
+   *
+   * @return the buffer, or null, if the request cannot be satisfied
+   * @throws IllegalStateException if called called more than once
+   */
+  public DrillBuf buffer() {
--- End diff --

methods should be a verb. How about allocateBuffer() ?


> Incorporate remaining patches from DRILL-1942 Allocator refactor
> 
>
> Key: DRILL-4134
> URL: https://issues.apache.org/jira/browse/DRILL-4134
> Project: Ap

[jira] [Commented] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032803#comment-15032803
 ] 

ASF GitHub Bot commented on DRILL-4134:
---

Github user julienledem commented on a diff in the pull request:

https://github.com/apache/drill/pull/283#discussion_r46225593
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java ---
@@ -80,7 +79,12 @@
*/
   private boolean closed = false;
 
+  @Deprecated
--- End diff --

If we can we should remove it altogether


> Incorporate remaining patches from DRILL-1942 Allocator refactor
> 
>
> Key: DRILL-4134
> URL: https://issues.apache.org/jira/browse/DRILL-4134
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032796#comment-15032796
 ] 

ASF GitHub Bot commented on DRILL-4134:
---

Github user julienledem commented on a diff in the pull request:

https://github.com/apache/drill/pull/283#discussion_r46225495
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java ---
@@ -465,6 +441,14 @@ public Executor getExecutor(){
 return context.getExecutor();
   }
 
+  public long getFragmentMemoryLimit() {
+return allocator.getLimit();
+  }
+
+  public void setFragmentMemoryLimit(long value) {
--- End diff --

getFragmentMemoryLimit and setFragmentMemoryLimit don't look like they are 
used anywhere. Let's just remove them.


> Incorporate remaining patches from DRILL-1942 Allocator refactor
> 
>
> Key: DRILL-4134
> URL: https://issues.apache.org/jira/browse/DRILL-4134
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032790#comment-15032790
 ] 

ASF GitHub Bot commented on DRILL-4134:
---

Github user julienledem commented on a diff in the pull request:

https://github.com/apache/drill/pull/283#discussion_r46225385
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java ---
@@ -314,11 +285,16 @@ public BufferAllocator getAllocator() {
 return allocator;
   }
 
-  public BufferAllocator getNewChildAllocator(final long 
initialReservation,
-  final long 
maximumReservation,
-  final boolean 
applyFragmentLimit) throws OutOfMemoryException {
-return allocator.getChildAllocator(new AsLimitConsumer(), 
initialReservation, maximumReservation,
-applyFragmentLimit);
+  public BufferAllocator getNewChildAllocator(final String operatorName,
+  final int operatorId,
+  final long initialReservation,
+  final long maximumReservation,
+  final boolean applyFragmentLimit) throws OutOfMemoryException {
+return allocator.newChildAllocator(
+"op:" + QueryIdHelper.getFragmentId(fragment.getHandle()) + ":" + 
operatorId + ":" + operatorName,
+initialReservation,
+maximumReservation
+);
--- End diff --

applyFragmentLimit is not used?


> Incorporate remaining patches from DRILL-1942 Allocator refactor
> 
>
> Key: DRILL-4134
> URL: https://issues.apache.org/jira/browse/DRILL-4134
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4119) Skew in hash distribution for varchar (and possibly other) types of data

2015-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032791#comment-15032791
 ] 

ASF GitHub Bot commented on DRILL-4119:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/279


> Skew in hash distribution for varchar (and possibly other) types of data
> 
>
> Key: DRILL-4119
> URL: https://issues.apache.org/jira/browse/DRILL-4119
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.4.0
>
>
> We are seeing substantial skew for an Id column that contains varchar data of 
> length 32.   It is easily reproducible by a group-by query: 
> {noformat}
> Explain plan for SELECT SomeId From table GROUP BY SomeId;
> ...
> 01-02  HashAgg(group=[{0}])
> 01-03Project(SomeId=[$0])
> 01-04  HashToRandomExchange(dist0=[[$0]])
> 02-01UnorderedMuxExchange
> 03-01  Project(SomeId=[$0], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
> 03-02HashAgg(group=[{0}])
> 03-03  Project(SomeId=[$0])
> {noformat}
> The string id happens to be of the following type: 
> {noformat}
> e4b4388e8865819126cb0e4dcaa7261d
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4124) Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise in logs

2015-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032755#comment-15032755
 ] 

ASF GitHub Bot commented on DRILL-4124:
---

Github user julienledem commented on the pull request:

https://github.com/apache/drill/pull/281#issuecomment-160803787
  
Thanks for the heads up. I will fix this.


> Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise 
> in logs
> ---
>
> Key: DRILL-4124
> URL: https://issues.apache.org/jira/browse/DRILL-4124
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4124) Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise in logs

2015-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032725#comment-15032725
 ] 

ASF GitHub Bot commented on DRILL-4124:
---

Github user jaltekruse commented on the pull request:

https://github.com/apache/drill/pull/281#issuecomment-160799401
  
I put it on my merge branch but it actually caused two small compilation 
issues in other modules that were using the utility method you refactored. I 
forgot to update this with a comment. 


> Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise 
> in logs
> ---
>
> Key: DRILL-4124
> URL: https://issues.apache.org/jira/browse/DRILL-4124
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4137) Metadata Cache not being leveraged

2015-11-30 Thread Rahul Challapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli updated DRILL-4137:
-
Affects Version/s: 1.4.0
   1.3.0

> Metadata Cache not being leveraged
> --
>
> Key: DRILL-4137
> URL: https://issues.apache.org/jira/browse/DRILL-4137
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Rahul Challapalli
>Priority: Critical
> Attachments: fewtypes.parquet
>
>
> git.commit.id.abbrev=367d74a
> The below query is not leveraging the metadata
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> explain plan for  select * from fewtypes;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(*=[$0])
> 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/metadata_caching/fewtypes/fewtypes.parquet]], 
> selectionRoot=/drill/testdata/metadata_caching/fewtypes/fewtypes.parquet, 
> numFiles=1, usedMetadataFile=false, columns=[`*`]]])
> {code}
> I attached the data set used



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4082) Better error message when multiple versions of the same function are found by the classpath scanner

2015-11-30 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-4082:

Fix Version/s: 1.4.0

> Better error message when multiple versions of the same function are found by 
> the classpath scanner
> ---
>
> Key: DRILL-4082
> URL: https://issues.apache.org/jira/browse/DRILL-4082
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 1.4.0
>
>
> PR:
> https://github.com/apache/drill/pull/252



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4144) AssertionError : Internal error: Error while applying rule HivePushPartitionFilterIntoScan:Filter_On_Project_Hive

2015-11-30 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-4144:
-

 Summary: AssertionError : Internal error: Error while applying 
rule HivePushPartitionFilterIntoScan:Filter_On_Project_Hive
 Key: DRILL-4144
 URL: https://issues.apache.org/jira/browse/DRILL-4144
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.3.0
 Environment: 4 node cluster
Reporter: Khurram Faraaz


AssertionError seen on Drill 1.3 version d61bb83a on a 4 node cluster, as part 
of Functional test run on JDK 8. Note that assertions were enabled as part of 
test execution.

{code}
[root@centos-01 bin]# java -version
openjdk version "1.8.0_65"
OpenJDK Runtime Environment (build 1.8.0_65-b17)
OpenJDK 64-Bit Server VM (build 25.65-b01, mixed mode)
{code}

Failing test case : 
Functional/interpreted_partition_pruning/hive/text/hier_intint/plan/4.q

{code}
query => explain plan for select l_orderkey, l_partkey, l_quantity, l_shipdate, 
l_shipinstruct from hive.lineitem_text_partitioned_hive_hier_intint where case 
when `month` > 11 then 2 else null end is not null and `year` = 1991;
{code}

{noformat}

[Error Id: c0e23293-2592-4421-9953-bc7d6488398f on centos-03.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: AssertionError


[Error Id: c0e23293-2592-4421-9953-bc7d6488398f on centos-03.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
 ~[drill-common-1.3.0.jar:1.3.0]
at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:742)
 [drill-java-exec-1.3.0.jar:1.3.0]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
 [drill-java-exec-1.3.0.jar:1.3.0]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
 [drill-java-exec-1.3.0.jar:1.3.0]
at 
org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
[drill-common-1.3.0.jar:1.3.0]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
 [drill-java-exec-1.3.0.jar:1.3.0]
at 
org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894) 
[drill-java-exec-1.3.0.jar:1.3.0]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) 
[drill-java-exec-1.3.0.jar:1.3.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_65]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_65]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
exception during fragment initialization: Internal error: Error while applying 
rule HivePushPartitionFilterIntoScan:Filter_On_Project_Hive, args 
[rel#879659:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#879658:Subset#8.LOGICAL.ANY([]).[],condition=AND(IS
 NOT NULL(CASE(>($6, 11), 2, null)), =($5, 1991))), 
rel#879657:DrillProjectRel.LOGICAL.ANY([]).[](input=rel#879656:Subset#7.LOGICAL.ANY([]).[],l_orderkey=$6,l_partkey=$3,l_quantity=$0,l_shipdate=$5,l_shipinstruct=$1,year=$2,month=$4),
 rel#879639:DrillScanRel.LOGICAL.ANY([]).[](table=[hive, 
lineitem_text_partitioned_hive_hier_intint],groupscan=HiveScan 
[table=Table(dbName:default, 
tableName:lineitem_text_partitioned_hive_hier_intint), 
inputSplits=[maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/1/lineitemaa.tbl:0+106992,
 
maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/10/lineitemaj.tbl:0+106646,
 
maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/11/lineitemak.tbl:0+106900,
 
maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/12/lineitemal.tbl:0+11926,
 
maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/2/lineitemab.tbl:0+106663,
 
maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/3/lineitemac.tbl:0+106980,
 
maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/4/lineitemad.tbl:0+106276,
 
maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/5/lineitemae.tbl:0+107315,
 
maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/6/lineitemaf.tbl:0+106592,
 
maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/7/lineitemag.tbl:0+107400,
 
maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/8/lineitemah.tbl:0+106951,
 
maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/9/lineitemai.tbl:0+106872,
 
maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical

[jira] [Commented] (DRILL-4109) NPE in RecordIterator

2015-11-30 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032705#comment-15032705
 ] 

Victoria Markman commented on DRILL-4109:
-

[~aah] sure, I will give it a try. Stay tuned.

> NPE in RecordIterator
> -
>
> Key: DRILL-4109
> URL: https://issues.apache.org/jira/browse/DRILL-4109
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Victoria Markman
>Assignee: amit hadke
>Priority: Blocker
> Fix For: 1.4.0
>
> Attachments: 29ac6c1b-9b33-3457-8bc8-9e2dff6ad438.sys.drill, 
> 29b41f37-4803-d7ce-e05f-912d1f65da79.sys.drill, drillbit.log, 
> drillbit.log.debug
>
>
> 4 node cluster
> 36GB of direct memory
> 4GB heap memory
> planner.memory.max_query_memory_per_node=2GB (default)
> planner.enable_hashjoin = false
> Spill directory has 6.4T of memory available:
> {noformat}
> [Tue Nov 17 18:23:18 /tmp/drill ] # df -H .
> Filesystem   Size  Used Avail Use% Mounted on
> localhost:/mapr  7.7T  1.4T  6.4T  18% /mapr
> {noformat}
> Run query below: 
> framework/resources/Advanced/tpcds/tpcds_sf100/original/query15.sql
> drillbit.log
> {code}
> 2015-11-18 02:22:12,639 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:9] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_9/operator_17/7
> 2015-11-18 02:22:12,770 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:5] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_5/operator_17/7
> 2015-11-18 02:22:13,345 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:17] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_17/operator_17/7
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_13/operator_16/1
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] WARN 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Starting to merge. 34 batch groups. 
> Current allocated memory: 2252186
> 2015-11-18 02:22:13,363 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested RUNNING --> 
> FAILED
> 2015-11-18 02:22:13,370 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested FAILED --> 
> FINISHED
> 2015-11-18 02:22:13,371 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> java.lang.NullPointerException: null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4083) Drill not using HiveDrillNativeParquetScan if no column is needed to be read from HIVE

2015-11-30 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-4083:

Fix Version/s: 1.4.0

> Drill not using HiveDrillNativeParquetScan if no column is needed to be read 
> from HIVE
> --
>
> Key: DRILL-4083
> URL: https://issues.apache.org/jira/browse/DRILL-4083
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.4.0
>
>
> For example, for a query such as:
> {code}
> Select count(*) from hive.parquetTable
> {code}
> would not use HiveDrillNativeParquetScan. However, the following query will 
> use:
> {code}
> Select count(*) from hive.parquetTable where column > 0
> {code}
> Ideally, both should use the same HiveDrillNativeParquetScan



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4124) Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise in logs

2015-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032685#comment-15032685
 ] 

ASF GitHub Bot commented on DRILL-4124:
---

Github user julienledem commented on the pull request:

https://github.com/apache/drill/pull/281#issuecomment-160790637
  
Thanks @jaltekruse ! Did you mean to commit this?


> Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise 
> in logs
> ---
>
> Key: DRILL-4124
> URL: https://issues.apache.org/jira/browse/DRILL-4124
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4111) turn tests off in travis as they don't work there

2015-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032682#comment-15032682
 ] 

ASF GitHub Bot commented on DRILL-4111:
---

Github user julienledem commented on the pull request:

https://github.com/apache/drill/pull/267#issuecomment-160790479
  
Thanks @sudheeshkatkam. Did you mean to commit this?


> turn tests off in travis as they don't work there
> -
>
> Key: DRILL-4111
> URL: https://issues.apache.org/jira/browse/DRILL-4111
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> Since the travis build always fails, we should just turn it off for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4109) NPE in RecordIterator

2015-11-30 Thread amit hadke (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032386#comment-15032386
 ] 

amit hadke commented on DRILL-4109:
---

[~vicky] Could you please try to reproduce DRILL-4109 and DRILL-4125 with my 
change above? 
Repo: https://github.com/amithadke/drill
Branch: DRILL-4109


> NPE in RecordIterator
> -
>
> Key: DRILL-4109
> URL: https://issues.apache.org/jira/browse/DRILL-4109
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Victoria Markman
>Assignee: amit hadke
>Priority: Blocker
> Fix For: 1.4.0
>
> Attachments: 29ac6c1b-9b33-3457-8bc8-9e2dff6ad438.sys.drill, 
> 29b41f37-4803-d7ce-e05f-912d1f65da79.sys.drill, drillbit.log, 
> drillbit.log.debug
>
>
> 4 node cluster
> 36GB of direct memory
> 4GB heap memory
> planner.memory.max_query_memory_per_node=2GB (default)
> planner.enable_hashjoin = false
> Spill directory has 6.4T of memory available:
> {noformat}
> [Tue Nov 17 18:23:18 /tmp/drill ] # df -H .
> Filesystem   Size  Used Avail Use% Mounted on
> localhost:/mapr  7.7T  1.4T  6.4T  18% /mapr
> {noformat}
> Run query below: 
> framework/resources/Advanced/tpcds/tpcds_sf100/original/query15.sql
> drillbit.log
> {code}
> 2015-11-18 02:22:12,639 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:9] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_9/operator_17/7
> 2015-11-18 02:22:12,770 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:5] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_5/operator_17/7
> 2015-11-18 02:22:13,345 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:17] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_17/operator_17/7
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_13/operator_16/1
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] WARN 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Starting to merge. 34 batch groups. 
> Current allocated memory: 2252186
> 2015-11-18 02:22:13,363 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested RUNNING --> 
> FAILED
> 2015-11-18 02:22:13,370 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested FAILED --> 
> FINISHED
> 2015-11-18 02:22:13,371 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> java.lang.NullPointerException: null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4119) Skew in hash distribution for varchar (and possibly other) types of data

2015-11-30 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032229#comment-15032229
 ] 

Mehant Baid commented on DRILL-4119:


I think it makes sense to address that as a separate issue. Patch looks good 
otherwise. +1. 

> Skew in hash distribution for varchar (and possibly other) types of data
> 
>
> Key: DRILL-4119
> URL: https://issues.apache.org/jira/browse/DRILL-4119
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.4.0
>
>
> We are seeing substantial skew for an Id column that contains varchar data of 
> length 32.   It is easily reproducible by a group-by query: 
> {noformat}
> Explain plan for SELECT SomeId From table GROUP BY SomeId;
> ...
> 01-02  HashAgg(group=[{0}])
> 01-03Project(SomeId=[$0])
> 01-04  HashToRandomExchange(dist0=[[$0]])
> 02-01UnorderedMuxExchange
> 03-01  Project(SomeId=[$0], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
> 03-02HashAgg(group=[{0}])
> 03-03  Project(SomeId=[$0])
> {noformat}
> The string id happens to be of the following type: 
> {noformat}
> e4b4388e8865819126cb0e4dcaa7261d
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-3992) Unable to query Oracle DB using JDBC Storage Plug-In

2015-11-30 Thread Eric Roma (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Roma closed DRILL-3992.


resolved in 1.3 update

> Unable to query Oracle DB using JDBC Storage Plug-In
> 
>
> Key: DRILL-3992
> URL: https://issues.apache.org/jira/browse/DRILL-3992
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
> Environment: Windows 7 Enterprise 64-bit, Oracle 10g, Teradata 15.00
>Reporter: Eric Roma
>Assignee: Jacques Nadeau
>Priority: Minor
>  Labels: newbie
> Fix For: 1.3.0
>
>
> *See External Issue URL for Stack Overflow Post*
> *Appears to be similar issue at 
> http://stackoverflow.com/questions/33370438/apache-drill-1-2-and-sql-server-jdbc*
> Using Apache Drill v1.2 and Oracle Database 10g Enterprise Edition Release 
> 10.2.0.4.0 - 64bit in embedded mode.
> I'm curious if anyone has had any success connecting Apache Drill to an 
> Oracle DB. I've updated the drill-override.conf with the following 
> configurations (per documents):
> drill.exec: {
>   cluster-id: "drillbits1",
>   zk.connect: "localhost:2181",
>   drill.exec.sys.store.provider.local.path = "/mypath"
> }
> and placed the ojdbc6.jar in \apache-drill-1.2.0\jars\3rdparty. I can 
> successfully create the storage plug-in:
> {
>   "type": "jdbc",
>   "driver": "oracle.jdbc.driver.OracleDriver",
>   "url": "jdbc:oracle:thin:@::",
>   "username": "USERNAME",
>   "password": "PASSWORD",
>   "enabled": true
> }
> but when I issue a query such as:
> select * from ..`dual`; 
> I get the following error:
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: 
> From line 1, column 15 to line 1, column 20: Table 
> '..dual' not found [Error Id: 
> 57a4153c-6378-4026-b90c-9bb727e131ae on :].
> I've tried to query other schema/tables and get a similar result. I've also 
> tried connecting to Teradata and get the same error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4119) Skew in hash distribution for varchar (and possibly other) types of data

2015-11-30 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032062#comment-15032062
 ] 

Aman Sinha commented on DRILL-4119:
---

[~mehant] would it make sense to open a separate JIRA for the underlying 
XXHash.hash64 implementation ?  I feel that for hash32, we would still want to 
avoid down casting and instead use the mixing as proposed in this JIRA.  If you 
agree, I can merge in my patch. 

> Skew in hash distribution for varchar (and possibly other) types of data
> 
>
> Key: DRILL-4119
> URL: https://issues.apache.org/jira/browse/DRILL-4119
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.4.0
>
>
> We are seeing substantial skew for an Id column that contains varchar data of 
> length 32.   It is easily reproducible by a group-by query: 
> {noformat}
> Explain plan for SELECT SomeId From table GROUP BY SomeId;
> ...
> 01-02  HashAgg(group=[{0}])
> 01-03Project(SomeId=[$0])
> 01-04  HashToRandomExchange(dist0=[[$0]])
> 02-01UnorderedMuxExchange
> 03-01  Project(SomeId=[$0], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
> 03-02HashAgg(group=[{0}])
> 03-03  Project(SomeId=[$0])
> {noformat}
> The string id happens to be of the following type: 
> {noformat}
> e4b4388e8865819126cb0e4dcaa7261d
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4143) REFRESH TABLE METADATA - Permission Issues with metadata files

2015-11-30 Thread John Omernik (JIRA)
John Omernik created DRILL-4143:
---

 Summary: REFRESH TABLE METADATA - Permission Issues with metadata 
files
 Key: DRILL-4143
 URL: https://issues.apache.org/jira/browse/DRILL-4143
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.3.0
Reporter: John Omernik
 Fix For: Future


Summary of Refresh Metadata Issues confirmed by two different users on Drill 
User Mailing list. (Title: REFRESH TABLE METADATA - Access Denied)

This issue pertains to table METADATA and revolves around user authentication. 

Basically, when the drill bits are running as one user, and the data is owned 
by another user, there can be access denied issues on subsequent queries after 
issuing a REFRESH TABLE METADATA command. 


To troubleshoot what is actually happening, I turned on MapR Auditing (This is 
a handy feature) and found that when I run a query (that is giving me access 
denied.. my query is select count(1) from testtable ) Per MapR the user I am 
logged in as (dataowner) is trying to do a create operation on the 
.drill.parquet_metadata file and it's failing with status: 17. Per Keys at 
MapR, "status 17 means errno 17 which means EEXIST. Looks like Drill is trying 
to create a file that already exists." This seems to indicate that drill is 
perhaps trying to create the .drill.parquet_metadata on each select as the 
dataowner user, but the permissions (as seen below) don't allow it. 


Here are the steps to reproduce:

Enable Authentication. 

Run all drill bits in the cluster as "drillbituser", then have the files owned 
by "dataowner". Note the root of the table permissions are drwxrwxr-x but as 
Drill loads each partition it loads them as drwxr-xr-x (all with 
dataowner:dataowner ownership). That may be something too, the default 
permissions when creating a table?  Another note, in my setup, drillbituser is 
in the group for dataowner.  Thus, they should always have read access. 


# Authenticated as dataowner (this should have full permissions to all the data)

Enter username for jdbc:drill:zk=zknode1:5181: dataowner
Enter password for jdbc:drill:zk=zknode1:5181: **
0: jdbc:drill:zk=zknode1> use dfs.dev;
+---+--+
|  ok   |   summary|
+---+--+
| true  | Default schema changed to [dfs.dev]  |
+---+--+
1 row selected (0.307 seconds)

# The query works fine with no table metadata

0: jdbc:drill:zk=zknode1> select count(1) from `testtable`;
+---+
|  EXPR$0   |
+---+
| 24565203  |
+---+
1 row selected (3.392 seconds)

# Refresh of metadata works under with no errors
0: jdbc:drill:zk=zknode1> refresh table metadata `testtable`;
+---+---+
|  ok   |summary|
+---+---+
| true  | Successfully updated metadata for table testtable.  |
+---+---+
1 row selected (5.767 seconds)
 
# Trying to run the same query, it returns a access denied issue. 
0: jdbc:drill:zk=zknode1> select count(1) from `testtable`;
Error: SYSTEM ERROR: IOException: 2127.7646.2950962 
/data/dev/testtable/2015-11-12/.drill.parquet_metadata (Permission denied)
 
 
[Error Id: 7bfce2e7-f78d-4fba-b047-f4c85b471de4 on node1:31010] (state=,code=0)




 
 
# Note how all the files are owned by the drillbituser. Per discussion on list, 
this is normal 
 
$ find ./ -type f -name ".drill.parquet_metadata" -exec ls -ls {} \;
726 -rwxr-xr-x 1 drillbituser drillbituser 742837 Nov 30 14:27 
./2015-11-12/.drill.parquet_metadata
583 -rwxr-xr-x 1 drillbituser drillbituser 596146 Nov 30 14:27 
./2015-11-29/.drill.parquet_metadata
756 -rwxr-xr-x 1 drillbituser drillbituser 773811 Nov 30 14:27 
./2015-11-11/.drill.parquet_metadata
763 -rwxr-xr-x 1 drillbituser drillbituser 780829 Nov 30 14:27 
./2015-11-04/.drill.parquet_metadata
632 -rwxr-xr-x 1 drillbituser drillbituser 646851 Nov 30 14:27 
./2015-11-08/.drill.parquet_metadata
845 -rwxr-xr-x 1 drillbituser drillbituser 864421 Nov 30 14:27 
./2015-11-05/.drill.parquet_metadata
771 -rwxr-xr-x 1 drillbituser drillbituser 788823 Nov 30 14:27 
./2015-11-28/.drill.parquet_metadata
1273 -rwxr-xr-x 1 drillbituser drillbituser 1303168 Nov 30 14:27 
./2015-11-10/.drill.parquet_metadata
645 -rwxr-xr-x 1 drillbituser drillbituser 660028 Nov 30 14:27 
./2015-11-22/.drill.parquet_metadata
1017 -rwxr-xr-x 1 drillbituser drillbituser 1040469 Nov 30 14:27 
./2015-11-13/.drill.parquet_metadata
1280 -rwxr-xr-x 1 drillbituser drillbituser 1310552 Nov 30 14:27 
./2015-11-03/.drill.parquet_metadata
585 -rwxr-xr-x 1 drillbituser drillbituser 598973 Nov 30 14:27 
./2015-11-07/.drill.parquet_m