date:20170619

[jira] [Commented] (DRILL-2975) Extended Json : Time type reporting data which is dependent on the system on which it ran

2017-06-19 Thread Vitalii Diravka (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055238#comment-16055238
 ] 

Vitalii Diravka commented on DRILL-2975:


[~rkins] If these two queries are used on machines with different timezones, 
the result should be different as well. 

> Extended Json : Time type reporting data which is dependent on the system on 
> which it ran
> -
>
> Key: DRILL-2975
> URL: https://issues.apache.org/jira/browse/DRILL-2975
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: Future
>
>
> git.commit.id.abbrev=3b19076
> Data :
> {code}
> {
>   "int_col" : {"$numberLong": 1},
>   "date_col" : {"$dateDay": "2012-05-22"},
>   "time_col"  : {"$time": "19:20:30.45Z"}
> }
> {code}
> System 1 :
> {code}
> 0: jdbc:drill:schema=dfs_eea> select time_col from `extended_json/data1.json` 
> d;
> ++
> |  time_col  |
> ++
> | 19:20:30.450 |
> ++
> {code}
> System 2 :
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexP> select time_col from 
> `temp.json`;
> ++
> |  time_col  |
> ++
> | 11:20:30.450 |
> ++
> {code}
> The above results are inconsistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055209#comment-16055209
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user kkhatua closed the pull request at:

https://github.com/apache/drill/pull/856


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5598) AllocationHelper.allocateNew ignores maps, arrays

2017-06-19 Thread Paul Rogers (JIRA)

Paul Rogers created DRILL-5598:
--

 Summary: AllocationHelper.allocateNew ignores maps, arrays
 Key: DRILL-5598
 URL: https://issues.apache.org/jira/browse/DRILL-5598
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers
Assignee: Paul Rogers
 Fix For: 1.11.0


The method {{VectorAccessibleUtilities.allocateVectors()}} is used to allocate 
vectors when the external sort creates a spill batch. (Along with various other 
places.)

This method does not allocate space for repeated vectors or vectors contained 
in maps, resulting in vectors starting life with a very short size. This cases 
repeated doublings as data is loaded into the vectors:

{code}
BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of bytes: 
[32768] -> [65536]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[16384] -> [32768]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[16384] -> [32768]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [4096] 
-> [8192]
...
{code}

Maps can be handled by iterating over the contained vectors. Arrays and 
VarChars are harder as the code needs some hint about data size. We have 
hard-coded hints available (the assumption that VarChar columns are 50 
characters wide, and that arrays have 10 elements.) Better would be to pass in 
metadata about sizes extracted from previously-seen batches in the same 
operator that allocates a new batch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5597) Incorrect "bits" vector allocation in nullable vectors allocateNew()

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5597:
---
Description: 
Consider the following code in a generated nullable vector, such as 
{{NullableBigIntVector}}:

{code}
  public void allocateNew(int valueCount) {
try {
  values.allocateNew(valueCount);
  bits.allocateNew(valueCount+1);
{code}

There are as may "bits" entries as data entries, no need to allocate an extra 
1. When the {{valueCount}} is a power of two, the error will cause the 
allocation of twice as large a vector as necessary. (128K, say, instead of 64K, 
since 64K+1 power-of-two rounds to 128K.)

By contrast the +1 correction is needed for offset vectors, but the "bits" 
vector is not an offset vector.

By contrast, another variation of the same method is correct:

{code}
  public void allocateNew(int totalBytes, int valueCount) {
try {
  values.allocateNew(totalBytes, valueCount);
  bits.allocateNew(valueCount);
{code}

  was:
Consider the following code in a generated nullable vector, such as 
{{NullableBigIntVector}}:

{code}
  public void allocateNew(int valueCount) {
try {
  values.allocateNew(valueCount);
  bits.allocateNew(valueCount+1);
{code}

There are as may "bits" entries as data entries, no need to allocate an extra 
1. When the {{valueCount}} is a power of two, the error will cause the 
allocation of twice as large a vector as necessary. (128K, say, instead of 64K, 
since 64K+1 power-of-two rounds to 128K.)

By contrast the +1 correction is needed for offset vectors, but the "bits" 
vector is not an offset vector.


> Incorrect "bits" vector allocation in nullable vectors allocateNew()
> 
>
> Key: DRILL-5597
> URL: https://issues.apache.org/jira/browse/DRILL-5597
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Consider the following code in a generated nullable vector, such as 
> {{NullableBigIntVector}}:
> {code}
>   public void allocateNew(int valueCount) {
> try {
>   values.allocateNew(valueCount);
>   bits.allocateNew(valueCount+1);
> {code}
> There are as may "bits" entries as data entries, no need to allocate an extra 
> 1. When the {{valueCount}} is a power of two, the error will cause the 
> allocation of twice as large a vector as necessary. (128K, say, instead of 
> 64K, since 64K+1 power-of-two rounds to 128K.)
> By contrast the +1 correction is needed for offset vectors, but the "bits" 
> vector is not an offset vector.
> By contrast, another variation of the same method is correct:
> {code}
>   public void allocateNew(int totalBytes, int valueCount) {
> try {
>   values.allocateNew(totalBytes, valueCount);
>   bits.allocateNew(valueCount);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5597) Incorrect "bits" vector allocation in nullable vectors allocateNew()

2017-06-19 Thread Paul Rogers (JIRA)

Paul Rogers created DRILL-5597:
--

 Summary: Incorrect "bits" vector allocation in nullable vectors 
allocateNew()
 Key: DRILL-5597
 URL: https://issues.apache.org/jira/browse/DRILL-5597
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers
Assignee: Paul Rogers
Priority: Minor
 Fix For: 1.11.0


Consider the following code in a generated nullable vector, such as 
{{NullableBigIntVector}}:

{code}
  public void allocateNew(int valueCount) {
try {
  values.allocateNew(valueCount);
  bits.allocateNew(valueCount+1);
{code}

There are as may "bits" entries as data entries, no need to allocate an extra 
1. When the {{valueCount}} is a power of two, the error will cause the 
allocation of twice as large a vector as necessary. (128K, say, instead of 64K, 
since 64K+1 power-of-two rounds to 128K.)

By contrast the +1 correction is needed for offset vectors, but the "bits" 
vector is not an offset vector.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5457) Support Spill to Disk for the Hash Aggregate Operator

2017-06-19 Thread Boaz Ben-Zvi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi updated DRILL-5457:

Labels: ready-to-commit  (was: )

> Support Spill to Disk for the Hash Aggregate Operator
> -
>
> Key: DRILL-5457
> URL: https://issues.apache.org/jira/browse/DRILL-5457
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> Support gradual spilling memory to disk as the available memory gets too 
> small to allow in memory work for the Hash Aggregate Operator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5513) Managed External Sort : OOM error during the merge phase

2017-06-19 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055044#comment-16055044
 ] 

Paul Rogers commented on DRILL-5513:


Revised the memory calculations to consider the worst-case memory size of 
spilled batches. Fixed the spill-related error. Combined, the test now passes. 
The sort is forced to do many spill/merge/spill cycles, but completes.

Changes will be checked in after DRILL-5325 is complete.

> Managed External Sort : OOM error during the merge phase
> 
>
> Key: DRILL-5513
> URL: https://issues.apache.org/jira/browse/DRILL-5513
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Attachments: 26e5f7b8-71e8-afca-e72e-fad7be2b2416.sys.drill, 
> drillbit.log
>
>
> git.commit.id.abbrev=1e0a14c
> No of nodes in cluster : 1
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> The below query fails with an OOM
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_query` = 100;
> alter session set `planner.memory.max_query_memory_per_node` = 652428800;
> select count(*) from (select s1.type type, flatten(s1.rms.rptd) rptds from 
> (select d.type type, d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid) s1 
> order by s1.rms.mapid);
> {code}
> Exception from the logs
> {code}
> 2017-05-15 12:58:46,646 [BitServer-4] DEBUG 
> o.a.drill.exec.work.foreman.Foreman - 26e5f7b8-71e8-afca-e72e-fad7be2b2416: 
> State change requested RUNNING --> FAILED
> org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One 
> or more nodes ran out of memory while executing the query.
> Unable to allocate buffer of size 2097152 due to memory limit. Current 
> allocation: 19791880
> Fragment 5:2
> [Error Id: bb67176f-a780-400d-88c9-06fea131ea64 on qa-node190.qa.lab:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) Unable to allocate 
> buffer of size 2097152 due to memory limit. Current allocation: 19791880
> org.apache.drill.exec.memory.BaseAllocator.buffer():220
> org.apache.drill.exec.memory.BaseAllocator.buffer():195
> org.apache.drill.exec.vector.BigIntVector.reAlloc():212
> org.apache.drill.exec.vector.BigIntVector.copyFromSafe():324
> org.apache.drill.exec.vector.NullableBigIntVector.copyFromSafe():367
> 
> org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe():328
> 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe():360
> 
> org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe():220
> org.apache.drill.exec.vector.complex.MapVector.copyFromSafe():82
> 
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen4494.doCopy():34
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen4494.next():76
> 
> org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next():234
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns():1214
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():689
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():559
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>

[jira] [Commented] (DRILL-5457) Support Spill to Disk for the Hash Aggregate Operator

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055038#comment-16055038
 ] 

ASF GitHub Bot commented on DRILL-5457:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/822#discussion_r122863264
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ---
@@ -149,14 +149,24 @@ public IterOutcome innerNext() {
   if ( aggOut == HashAggregator.AggIterOutcome.AGG_OK ) { return 
IterOutcome.OK; }
   // if RESTART - continue below with doWork() - read some spilled 
partition, just like reading incoming
   incoming = aggregator.getNewIncoming(); // Restart - incoming was 
just changed
-  if ( wasKilled ) { // if kill() was called before, then finish up
-aggregator.cleanup();
-incoming.kill(false);
-return IterOutcome.NONE;
-  }
 }
 
-AggOutcome out = aggregator.doWork();
+if ( wasKilled ) { // if kill() was called before, then finish up
--- End diff --

Done


> Support Spill to Disk for the Hash Aggregate Operator
> -
>
> Key: DRILL-5457
> URL: https://issues.apache.org/jira/browse/DRILL-5457
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
> Fix For: 1.11.0
>
>
> Support gradual spilling memory to disk as the available memory gets too 
> small to allow in memory work for the Hash Aggregate Operator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5457) Support Spill to Disk for the Hash Aggregate Operator

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055037#comment-16055037
 ] 

ASF GitHub Bot commented on DRILL-5457:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/822#discussion_r122863244
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ---
@@ -149,14 +149,24 @@ public IterOutcome innerNext() {
   if ( aggOut == HashAggregator.AggIterOutcome.AGG_OK ) { return 
IterOutcome.OK; }
   // if RESTART - continue below with doWork() - read some spilled 
partition, just like reading incoming
   incoming = aggregator.getNewIncoming(); // Restart - incoming was 
just changed
-  if ( wasKilled ) { // if kill() was called before, then finish up
-aggregator.cleanup();
-incoming.kill(false);
-return IterOutcome.NONE;
-  }
 }
 
-AggOutcome out = aggregator.doWork();
+if ( wasKilled ) { // if kill() was called before, then finish up
+  aggregator.cleanup();
+  incoming.kill(false);
+  return IterOutcome.NONE;
+}
+
+// Read and aggregate records
+// ( may need to run again if the spilled partition that was read
+//   generated new partitions that were all spilled )
+AggOutcome out = AggOutcome.CALL_WORK_AGAIN;
+while ( out == AggOutcome.CALL_WORK_AGAIN) {
+  //
+  //  Read incoming batches and process their records
+  //
+  out = aggregator.doWork();
+}
--- End diff --

Done ( do while ...)



> Support Spill to Disk for the Hash Aggregate Operator
> -
>
> Key: DRILL-5457
> URL: https://issues.apache.org/jira/browse/DRILL-5457
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
> Fix For: 1.11.0
>
>
> Support gradual spilling memory to disk as the available memory gets too 
> small to allow in memory work for the Hash Aggregate Operator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055022#comment-16055022
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/856
  
Existing negative Unit tests are failing. Look like we never had proper 
unit tests. Will add them and update the PR.


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5457) Support Spill to Disk for the Hash Aggregate Operator

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055009#comment-16055009
 ] 

ASF GitHub Bot commented on DRILL-5457:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/822#discussion_r122858748
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ---
@@ -149,14 +149,24 @@ public IterOutcome innerNext() {
   if ( aggOut == HashAggregator.AggIterOutcome.AGG_OK ) { return 
IterOutcome.OK; }
   // if RESTART - continue below with doWork() - read some spilled 
partition, just like reading incoming
   incoming = aggregator.getNewIncoming(); // Restart - incoming was 
just changed
-  if ( wasKilled ) { // if kill() was called before, then finish up
-aggregator.cleanup();
-incoming.kill(false);
-return IterOutcome.NONE;
-  }
 }
 
-AggOutcome out = aggregator.doWork();
+if ( wasKilled ) { // if kill() was called before, then finish up
+  aggregator.cleanup();
+  incoming.kill(false);
+  return IterOutcome.NONE;
+}
+
+// Read and aggregate records
+// ( may need to run again if the spilled partition that was read
+//   generated new partitions that were all spilled )
+AggOutcome out = AggOutcome.CALL_WORK_AGAIN;
+while ( out == AggOutcome.CALL_WORK_AGAIN) {
+  //
+  //  Read incoming batches and process their records
+  //
+  out = aggregator.doWork();
+}
--- End diff --

Scratch that, I see you need the value of "out". So:
```
  AggOutcome out;
  do {
  //
  //  Read incoming batches and process their records
  //
  out = aggregator.doWork();
  } while (out == AggOutcome.CALL_WORK_AGAIN) {
```

Or Even:
```
  //  Read incoming batches and process their records
  AggOutcome out;
  while ((out = aggregator.doWork()) == AggOutcome.CALL_WORK_AGAIN) {
// Nothing to do
  }
```



> Support Spill to Disk for the Hash Aggregate Operator
> -
>
> Key: DRILL-5457
> URL: https://issues.apache.org/jira/browse/DRILL-5457
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
> Fix For: 1.11.0
>
>
> Support gradual spilling memory to disk as the available memory gets too 
> small to allow in memory work for the Hash Aggregate Operator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5457) Support Spill to Disk for the Hash Aggregate Operator

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054994#comment-16054994
 ] 

ASF GitHub Bot commented on DRILL-5457:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/822#discussion_r122858065
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ---
@@ -149,14 +149,24 @@ public IterOutcome innerNext() {
   if ( aggOut == HashAggregator.AggIterOutcome.AGG_OK ) { return 
IterOutcome.OK; }
   // if RESTART - continue below with doWork() - read some spilled 
partition, just like reading incoming
   incoming = aggregator.getNewIncoming(); // Restart - incoming was 
just changed
-  if ( wasKilled ) { // if kill() was called before, then finish up
-aggregator.cleanup();
-incoming.kill(false);
-return IterOutcome.NONE;
-  }
 }
 
-AggOutcome out = aggregator.doWork();
+if ( wasKilled ) { // if kill() was called before, then finish up
+  aggregator.cleanup();
+  incoming.kill(false);
+  return IterOutcome.NONE;
+}
+
+// Read and aggregate records
+// ( may need to run again if the spilled partition that was read
+//   generated new partitions that were all spilled )
+AggOutcome out = AggOutcome.CALL_WORK_AGAIN;
+while ( out == AggOutcome.CALL_WORK_AGAIN) {
+  //
+  //  Read incoming batches and process their records
+  //
+  out = aggregator.doWork();
+}
--- End diff --

```
while (aggregator.doWork() == AggOutcome.CALL_WORK_AGAIN) {
  // Nothing to do
}
```
?

In one of your reviews you said you didn't like empty loops, but sometimes 
they are handy...


> Support Spill to Disk for the Hash Aggregate Operator
> -
>
> Key: DRILL-5457
> URL: https://issues.apache.org/jira/browse/DRILL-5457
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
> Fix For: 1.11.0
>
>
> Support gradual spilling memory to disk as the available memory gets too 
> small to allow in memory work for the Hash Aggregate Operator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054998#comment-16054998
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r122858130
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/RowSetTest.java 
---
@@ -309,13 +317,38 @@ private void testDoubleRW() {
 assertEquals(0, reader.column(0).getDouble(), 0.01);
 assertTrue(reader.next());
 assertEquals(Double.MAX_VALUE, reader.column(0).getDouble(), 0.01);
+assertEquals(Double.MAX_VALUE, (double) reader.column(0).getObject(), 
0.01);
--- End diff --

I don't think explicit casting is required here since 
`getObject-->getDouble` already returns `double` type


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055003#comment-16055003
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r122854874
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/SchemaBuilder.java ---
@@ -53,6 +53,47 @@
 public class SchemaBuilder {
 
   /**
+   * Build a column schema (AKA "materialized field") based on name and a
+   * variety of schema options. Every column needs a name and (minor) type,
+   * some may need a mode other than required, may need a width, may
+   * need scale and precision, and so on.
+   */
+
+  // TODO: Add map methods
+
+  public static class ColumnBuilder {
+private String name;
+private MajorType.Builder typeBuilder;
--- End diff --

private `final`


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054997#comment-16054997
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r122858155
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/RowSetTest.java 
---
@@ -289,13 +295,15 @@ private void testFloatRW() {
 assertEquals(0, reader.column(0).getDouble(), 0.01);
 assertTrue(reader.next());
 assertEquals(Float.MAX_VALUE, reader.column(0).getDouble(), 0.01);
+assertEquals((double) Float.MAX_VALUE, (double) 
reader.column(0).getObject(), 0.01);
--- End diff --

Same as below for second cast


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054999#comment-16054999
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r122855768
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/SchemaBuilder.java ---
@@ -96,27 +137,57 @@ public SchemaBuilder withSVMode(SelectionVectorMode 
svMode) {
 
   public SchemaBuilder() { }
 
+  public SchemaBuilder(BatchSchema baseSchema) {
+for (MaterializedField field : baseSchema) {
+  columns.add(field);
+}
+  }
+
   public SchemaBuilder add(String pathName, MajorType type) {
-MaterializedField col = MaterializedField.create(pathName, type);
+return add(MaterializedField.create(pathName, type));
+  }
+
+  public SchemaBuilder add(MaterializedField col) {
 columns.add(col);
 return this;
   }
 
+  public static MaterializedField columnSchema(String pathName, MinorType 
type, DataMode mode) {
+return MaterializedField.create(pathName,
+MajorType.newBuilder()
--- End diff --

Why not use `SchemaBuilder.ColumnBuilder` here as well ?


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055001#comment-16055001
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r122857402
  
--- Diff: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/impl/TupleReaderImpl.java
 ---
@@ -101,8 +88,61 @@ public String getAsString(int colIndex) {
   return "\"" + colReader.getString() + "\"";
 case DECIMAL:
   return colReader.getDecimal().toPlainString();
+case ARRAY:
+  return getArrayAsString(colReader.array());
 default:
   throw new IllegalArgumentException("Unsupported type " + 
colReader.valueType());
 }
   }
+
+  private String bytesToString(byte[] value) {
+StringBuilder buf = new StringBuilder()
+.append("[");
+int len = Math.min(value.length, 20);
+for (int i = 0; i < len;  i++) {
+  if (i > 0) {
+buf.append(", ");
+  }
+  buf.append((int) value[i]);
+}
+if (value.length > len) {
+  buf.append("...");
+}
+buf.append("]");
+return buf.toString();
+  }
+
+  private String getArrayAsString(ArrayReader array) {
+StringBuilder buf = new StringBuilder();
+buf.append("[");
+for (int i = 0; i < array.size(); i++) {
+  if (i > 0) {
+buf.append( ", " );
+  }
+  switch (array.valueType()) {
--- End diff --

for valueType `MAP` and `ARRAY` we should `throw 
UnsupportedOperationException()`


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055000#comment-16055000
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r122855065
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/SchemaBuilder.java ---
@@ -96,27 +137,57 @@ public SchemaBuilder withSVMode(SelectionVectorMode 
svMode) {
 
   public SchemaBuilder() { }
 
+  public SchemaBuilder(BatchSchema baseSchema) {
+for (MaterializedField field : baseSchema) {
+  columns.add(field);
--- End diff --

just call `add` method of `SchemaBuilder` instead of `columns.add(field)`


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5457) Support Spill to Disk for the Hash Aggregate Operator

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054995#comment-16054995
 ] 

ASF GitHub Bot commented on DRILL-5457:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/822#discussion_r122858048
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ---
@@ -149,14 +149,24 @@ public IterOutcome innerNext() {
   if ( aggOut == HashAggregator.AggIterOutcome.AGG_OK ) { return 
IterOutcome.OK; }
   // if RESTART - continue below with doWork() - read some spilled 
partition, just like reading incoming
   incoming = aggregator.getNewIncoming(); // Restart - incoming was 
just changed
-  if ( wasKilled ) { // if kill() was called before, then finish up
-aggregator.cleanup();
-incoming.kill(false);
-return IterOutcome.NONE;
-  }
 }
 
-AggOutcome out = aggregator.doWork();
+if ( wasKilled ) { // if kill() was called before, then finish up
--- End diff --

Spaces, here and below.


> Support Spill to Disk for the Hash Aggregate Operator
> -
>
> Key: DRILL-5457
> URL: https://issues.apache.org/jira/browse/DRILL-5457
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
> Fix For: 1.11.0
>
>
> Support gradual spilling memory to disk as the available memory gets too 
> small to allow in memory work for the Hash Aggregate Operator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055002#comment-16055002
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r122854686
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/SubOperatorTest.java ---
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ 
**/
--- End diff --

Replace with just */. Not sure if `checkstyle` plugin is fine with this 
style.


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054996#comment-16054996
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r122854555
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/SubOperatorTest.java ---
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ 
**/
+package org.apache.drill.test;
+
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+
+public class SubOperatorTest extends DrillTest {
+
+  protected static OperatorFixture fixture;
+
+  @BeforeClass
+  public static void setUpBeforeClass() throws Exception {
+fixture = OperatorFixture.standardFixture();
+  }
+
+  @AfterClass
+  public static void tearDownAfterClass() throws Exception {
--- End diff --

How about just `setup` and `tearDown` ? Since annotations @BeforeClass and 
@AfterClass makes the other part clear ?


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4955) Log Parser for Drill

2017-06-19 Thread Kunal Khatua (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054985#comment-16054985
 ] 

Kunal Khatua commented on DRILL-4955:
-

[~cgivre] is there a PR for this? 

I've experimented with LogStash / GrokParser and it did a pretty decent job. 
That would probably be more powerful for adopting log parsing (incl. Drill). 

Personally, IMHO, it would be neat if we could use Drill to parse and analyze 
its own logs, but I didn't pursue this much further with the ELK combo after a 
point becauseI felt that a good design would be to have the logs itself exposed 
as a System table. 

> Log Parser for Drill
> 
>
> Key: DRILL-4955
> URL: https://issues.apache.org/jira/browse/DRILL-4955
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Text & CSV
>Affects Versions: 1.9.0
>Reporter: Charles Givre
>  Labels: features
> Fix For: Future
>
>
> I've been experimenting with a generic log parser for Drill.  The basic 
> concept is that if you wanted Drill to ingest log files such as this MySQL 
> log:
> {code}
> 070823 21:00:32   1 Connect root@localhost on test1
> 070823 21:00:48   1 Query   show tables
> 070823 21:00:56   1 Query   select * from category
> 070917 16:29:01  21 Query   select * from location
> 070917 16:29:12  21 Query   select * from location where id = 1 LIMIT 
> 1
> {code}
> You probably could do it with the various string manipulation methods such as 
> split, substring etc. but you'd end up with some ugly and very complex 
> queries.
> The extension I've built allows you to supply Drill with a regex for the 
> formatting and a list of fields as shown below.
> {code}
> "log": {
>   "type": "log",
>   "extensions": [
> "log"
>   ],
>   "fieldNames": [
> "date",
> "time",
> "pid",
> "action",
> "query"
>   ],
>   "pattern": 
> "(\\d{6})\\s(\\d{2}:\\d{2}:\\d{2})\\s+(\\d+)\\s(\\w+)\\s+(.+)"
> }
> {code}
> You can then query this log files in this format in Drill.  I'd like to 
> submit this for inclusion in Drill if there is interest.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5568) Include hadoop-common jars inside drill-jdbc-all.jar

2017-06-19 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-5568:
-
Labels: ready-to-commit  (was: )

> Include hadoop-common jars inside drill-jdbc-all.jar
> 
>
> Key: DRILL-5568
> URL: https://issues.apache.org/jira/browse/DRILL-5568
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>  Labels: ready-to-commit
>
> With Sasl support in 1.10 the authentication using username/password was 
> moved to Plain Mechanism of Sasl Framework. There are couple of Hadoop 
> classes like Configuration.java and UserGroupInformation.java defined in 
> hadoop-common package which were used in DrillClient for security mechanisms 
> like Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency 
> inside _drill-jdbc-all.jar_  Without it the application using this driver 
> will fail to connect to Drill with authentication enabled.
> Today this jar (which is JDBC driver for Drill) already has lots of other 
> dependencies which DrillClient relies on like Netty, etc. But the way we add 
> these dependencies are under *oadd* namespace so that the application using 
> this driver won't end up in conflict with it's own version of same 
> dependencies. As part of this JIRA it will include hadoop-common dependencies 
> under same namespace. This will allow an application to connect to Drill 
> using this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054964#comment-16054964
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/856
  
@ppadma / @parthchandra ... can you review this?


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-19 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-3640:

Fix Version/s: (was: Future)
   1.11.0

> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054954#comment-16054954
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

GitHub user kkhatua opened a pull request:

https://github.com/apache/drill/pull/856

DRILL-3640: Support JDBC Statement.setQueryTimeout(int)

Allow for queries to be cancelled if they don't complete within the 
stipulated time.
We submit a timeout-managing task that sleeps for the stipulated period 
before trying to cancel the query (JDBC Statement). 
It might be worth having a similar feature as a System/Session variable so 
that the same can be achieved via SQLLine or REST APIs, but is beyond the scope 
of this JIRA's PR.  

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kkhatua/drill DRILL-3640

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/856.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #856


commit e3c5c0f69b8f433dd159a37557c1bc4410ba537e
Author: Kunal Khatua 
Date:   2017-06-19T23:26:52Z

DRILL-3640: Support JDBC Statement.setQueryTimeout(int)

Allow for queries to be cancelled if they don't complete within the 
stipulated time.




> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: Future
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5568) Include hadoop-common jars inside drill-jdbc-all.jar

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054924#comment-16054924
 ] 

ASF GitHub Bot commented on DRILL-5568:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/849#discussion_r122847120
  
--- Diff: exec/jdbc-all/pom.xml ---
@@ -483,6 +518,269 @@
 
   
 
+  default
--- End diff --

Yes didn't found any other way to set the default value.


> Include hadoop-common jars inside drill-jdbc-all.jar
> 
>
> Key: DRILL-5568
> URL: https://issues.apache.org/jira/browse/DRILL-5568
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>
> With Sasl support in 1.10 the authentication using username/password was 
> moved to Plain Mechanism of Sasl Framework. There are couple of Hadoop 
> classes like Configuration.java and UserGroupInformation.java defined in 
> hadoop-common package which were used in DrillClient for security mechanisms 
> like Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency 
> inside _drill-jdbc-all.jar_  Without it the application using this driver 
> will fail to connect to Drill with authentication enabled.
> Today this jar (which is JDBC driver for Drill) already has lots of other 
> dependencies which DrillClient relies on like Netty, etc. But the way we add 
> these dependencies are under *oadd* namespace so that the application using 
> this driver won't end up in conflict with it's own version of same 
> dependencies. As part of this JIRA it will include hadoop-common dependencies 
> under same namespace. This will allow an application to connect to Drill 
> using this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5568) Include hadoop-common jars inside drill-jdbc-all.jar

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054921#comment-16054921
 ] 

ASF GitHub Bot commented on DRILL-5568:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/849#discussion_r122847012
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/security/SecurityConfiguration.java
 ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.rpc.security;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+
+
+public class SecurityConfiguration extends Configuration {
+  //private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(SecurityConfiguration.class);
+
+  public SecurityConfiguration() {
+super();
+updateGroupMapping();
+  }
+
+  /**
+   * Update the Group Mapping class name to add namespace prefix retrieved 
from System Property. This is needed since
+   * in drill-jdbc-all jar we are packaging hadoop dependencies under that 
namespace. This will help application
+   * using this jar as driver to avoid conflict with it's own hadoop 
dependency if any.
+   */
+  private void updateGroupMapping() {
+final String originalClassName = 
get(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING);
+final String profilePrefix = System.getProperty("namespacePrefix");
+final String updatedClassName = (profilePrefix != null) ? 
(profilePrefix + originalClassName)
+: 
originalClassName;
+set(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING, 
updatedClassName);
--- End diff --

Changed to use` Strings.isNullOrEmpty`. I am already trimming the value of 
prefix befire setting it inside system property.


> Include hadoop-common jars inside drill-jdbc-all.jar
> 
>
> Key: DRILL-5568
> URL: https://issues.apache.org/jira/browse/DRILL-5568
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>
> With Sasl support in 1.10 the authentication using username/password was 
> moved to Plain Mechanism of Sasl Framework. There are couple of Hadoop 
> classes like Configuration.java and UserGroupInformation.java defined in 
> hadoop-common package which were used in DrillClient for security mechanisms 
> like Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency 
> inside _drill-jdbc-all.jar_  Without it the application using this driver 
> will fail to connect to Drill with authentication enabled.
> Today this jar (which is JDBC driver for Drill) already has lots of other 
> dependencies which DrillClient relies on like Netty, etc. But the way we add 
> these dependencies are under *oadd* namespace so that the application using 
> this driver won't end up in conflict with it's own version of same 
> dependencies. As part of this JIRA it will include hadoop-common dependencies 
> under same namespace. This will allow an application to connect to Drill 
> using this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5568) Include hadoop-common jars inside drill-jdbc-all.jar

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054922#comment-16054922
 ] 

ASF GitHub Bot commented on DRILL-5568:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/849#discussion_r122847079
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillFactory.java ---
@@ -37,6 +38,24 @@
   protected final int major;
   protected final int minor;
 
+  static {
+Properties prop = new Properties();
--- End diff --

Moved it to SecurityConfiguration.java. All unit tests are also passing and 
the application is also running fine.


> Include hadoop-common jars inside drill-jdbc-all.jar
> 
>
> Key: DRILL-5568
> URL: https://issues.apache.org/jira/browse/DRILL-5568
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>
> With Sasl support in 1.10 the authentication using username/password was 
> moved to Plain Mechanism of Sasl Framework. There are couple of Hadoop 
> classes like Configuration.java and UserGroupInformation.java defined in 
> hadoop-common package which were used in DrillClient for security mechanisms 
> like Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency 
> inside _drill-jdbc-all.jar_  Without it the application using this driver 
> will fail to connect to Drill with authentication enabled.
> Today this jar (which is JDBC driver for Drill) already has lots of other 
> dependencies which DrillClient relies on like Netty, etc. But the way we add 
> these dependencies are under *oadd* namespace so that the application using 
> this driver won't end up in conflict with it's own version of same 
> dependencies. As part of this JIRA it will include hadoop-common dependencies 
> under same namespace. This will allow an application to connect to Drill 
> using this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5568) Include hadoop-common jars inside drill-jdbc-all.jar

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054918#comment-16054918
 ] 

ASF GitHub Bot commented on DRILL-5568:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/849#discussion_r122846916
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/security/SecurityConfiguration.java
 ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.rpc.security;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+
+
+public class SecurityConfiguration extends Configuration {
+  //private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(SecurityConfiguration.class);
+
+  public SecurityConfiguration() {
+super();
+updateGroupMapping();
+  }
+
+  /**
+   * Update the Group Mapping class name to add namespace prefix retrieved 
from System Property. This is needed since
+   * in drill-jdbc-all jar we are packaging hadoop dependencies under that 
namespace. This will help application
+   * using this jar as driver to avoid conflict with it's own hadoop 
dependency if any.
--- End diff --

Fixed


> Include hadoop-common jars inside drill-jdbc-all.jar
> 
>
> Key: DRILL-5568
> URL: https://issues.apache.org/jira/browse/DRILL-5568
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>
> With Sasl support in 1.10 the authentication using username/password was 
> moved to Plain Mechanism of Sasl Framework. There are couple of Hadoop 
> classes like Configuration.java and UserGroupInformation.java defined in 
> hadoop-common package which were used in DrillClient for security mechanisms 
> like Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency 
> inside _drill-jdbc-all.jar_  Without it the application using this driver 
> will fail to connect to Drill with authentication enabled.
> Today this jar (which is JDBC driver for Drill) already has lots of other 
> dependencies which DrillClient relies on like Netty, etc. But the way we add 
> these dependencies are under *oadd* namespace so that the application using 
> this driver won't end up in conflict with it's own version of same 
> dependencies. As part of this JIRA it will include hadoop-common dependencies 
> under same namespace. This will allow an application to connect to Drill 
> using this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5568) Include hadoop-common jars inside drill-jdbc-all.jar

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054919#comment-16054919
 ] 

ASF GitHub Bot commented on DRILL-5568:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/849#discussion_r122846935
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/security/SecurityConfiguration.java
 ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.rpc.security;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+
+
+public class SecurityConfiguration extends Configuration {
+  //private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(SecurityConfiguration.class);
+
+  public SecurityConfiguration() {
+super();
+updateGroupMapping();
+  }
+
+  /**
+   * Update the Group Mapping class name to add namespace prefix retrieved 
from System Property. This is needed since
+   * in drill-jdbc-all jar we are packaging hadoop dependencies under that 
namespace. This will help application
+   * using this jar as driver to avoid conflict with it's own hadoop 
dependency if any.
+   */
+  private void updateGroupMapping() {
+final String originalClassName = 
get(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING);
+final String profilePrefix = System.getProperty("namespacePrefix");
--- End diff --

Updated


> Include hadoop-common jars inside drill-jdbc-all.jar
> 
>
> Key: DRILL-5568
> URL: https://issues.apache.org/jira/browse/DRILL-5568
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>
> With Sasl support in 1.10 the authentication using username/password was 
> moved to Plain Mechanism of Sasl Framework. There are couple of Hadoop 
> classes like Configuration.java and UserGroupInformation.java defined in 
> hadoop-common package which were used in DrillClient for security mechanisms 
> like Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency 
> inside _drill-jdbc-all.jar_  Without it the application using this driver 
> will fail to connect to Drill with authentication enabled.
> Today this jar (which is JDBC driver for Drill) already has lots of other 
> dependencies which DrillClient relies on like Netty, etc. But the way we add 
> these dependencies are under *oadd* namespace so that the application using 
> this driver won't end up in conflict with it's own version of same 
> dependencies. As part of this JIRA it will include hadoop-common dependencies 
> under same namespace. This will allow an application to connect to Drill 
> using this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5513) Managed External Sort : OOM error during the merge phase

2017-06-19 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054733#comment-16054733
 ] 

Paul Rogers commented on DRILL-5513:


Actually, in this bug, the problem is more specific:

{code}
Spilled 2 output batches, each of 7536640 bytes, 35460 records
...
Merging 31 on-disk runs, alloc. memory = 0, avail. memory = 1332
13:50:02.766 [26b7d472-c3fe-5f83-5ae6-34f7b6995d2b:frag:5:0] TRACE Read 35460 
records in 3367595957 us; size = 9437184, memory = 9437192
{code}

The code has a flat-out bug. Memory available for merge is only 13 MB, and each 
spill batch expected to take 6.6 MB. Merging can work with only two batches at 
a time, but is trying to merge 31.

Further, the spill batch sizes are too large given the power-of-two rounding 
that occurs when reading. Even with two batches of 9 MB each, there is not 
sufficient memory to merge them.

> Managed External Sort : OOM error during the merge phase
> 
>
> Key: DRILL-5513
> URL: https://issues.apache.org/jira/browse/DRILL-5513
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Attachments: 26e5f7b8-71e8-afca-e72e-fad7be2b2416.sys.drill, 
> drillbit.log
>
>
> git.commit.id.abbrev=1e0a14c
> No of nodes in cluster : 1
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> The below query fails with an OOM
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_query` = 100;
> alter session set `planner.memory.max_query_memory_per_node` = 652428800;
> select count(*) from (select s1.type type, flatten(s1.rms.rptd) rptds from 
> (select d.type type, d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid) s1 
> order by s1.rms.mapid);
> {code}
> Exception from the logs
> {code}
> 2017-05-15 12:58:46,646 [BitServer-4] DEBUG 
> o.a.drill.exec.work.foreman.Foreman - 26e5f7b8-71e8-afca-e72e-fad7be2b2416: 
> State change requested RUNNING --> FAILED
> org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One 
> or more nodes ran out of memory while executing the query.
> Unable to allocate buffer of size 2097152 due to memory limit. Current 
> allocation: 19791880
> Fragment 5:2
> [Error Id: bb67176f-a780-400d-88c9-06fea131ea64 on qa-node190.qa.lab:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) Unable to allocate 
> buffer of size 2097152 due to memory limit. Current allocation: 19791880
> org.apache.drill.exec.memory.BaseAllocator.buffer():220
> org.apache.drill.exec.memory.BaseAllocator.buffer():195
> org.apache.drill.exec.vector.BigIntVector.reAlloc():212
> org.apache.drill.exec.vector.BigIntVector.copyFromSafe():324
> org.apache.drill.exec.vector.NullableBigIntVector.copyFromSafe():367
> 
> org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe():328
> 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe():360
> 
> org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe():220
> org.apache.drill.exec.vector.complex.MapVector.copyFromSafe():82
> 
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen4494.doCopy():34
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen4494.next():76
> 
> org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next():234
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns():1214
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():689
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():559
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.s

[jira] [Commented] (DRILL-5513) Managed External Sort : OOM error during the merge phase

2017-06-19 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054729#comment-16054729
 ] 

Paul Rogers commented on DRILL-5513:


See [this 
post|https://github.com/paul-rogers/drill/wiki/Drill-Spill-File-Format] for an 
explanation of the memory issue with the spill file format.

The existing sort code in {{PriorityQueueCopierTemplate}} loads a fixed number 
of record batches, which requires that the code be able to predict the memory 
use of those batches. The prior discussion shows that, in general, the code 
cannot correctly predict the memory size, due to the change in vector storage 
format.

The solution, then, is to modify {{PriorityQueueCopierTemplate}} to load up to 
a given number of batches, *or* up to a given memory limit.

However, in the worst case, a failure will occur if memory does not allow at 
least two runs to be loaded.

> Managed External Sort : OOM error during the merge phase
> 
>
> Key: DRILL-5513
> URL: https://issues.apache.org/jira/browse/DRILL-5513
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Attachments: 26e5f7b8-71e8-afca-e72e-fad7be2b2416.sys.drill, 
> drillbit.log
>
>
> git.commit.id.abbrev=1e0a14c
> No of nodes in cluster : 1
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> The below query fails with an OOM
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_query` = 100;
> alter session set `planner.memory.max_query_memory_per_node` = 652428800;
> select count(*) from (select s1.type type, flatten(s1.rms.rptd) rptds from 
> (select d.type type, d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid) s1 
> order by s1.rms.mapid);
> {code}
> Exception from the logs
> {code}
> 2017-05-15 12:58:46,646 [BitServer-4] DEBUG 
> o.a.drill.exec.work.foreman.Foreman - 26e5f7b8-71e8-afca-e72e-fad7be2b2416: 
> State change requested RUNNING --> FAILED
> org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One 
> or more nodes ran out of memory while executing the query.
> Unable to allocate buffer of size 2097152 due to memory limit. Current 
> allocation: 19791880
> Fragment 5:2
> [Error Id: bb67176f-a780-400d-88c9-06fea131ea64 on qa-node190.qa.lab:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) Unable to allocate 
> buffer of size 2097152 due to memory limit. Current allocation: 19791880
> org.apache.drill.exec.memory.BaseAllocator.buffer():220
> org.apache.drill.exec.memory.BaseAllocator.buffer():195
> org.apache.drill.exec.vector.BigIntVector.reAlloc():212
> org.apache.drill.exec.vector.BigIntVector.copyFromSafe():324
> org.apache.drill.exec.vector.NullableBigIntVector.copyFromSafe():367
> 
> org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe():328
> 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe():360
> 
> org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe():220
> org.apache.drill.exec.vector.complex.MapVector.copyFromSafe():82
> 
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen4494.doCopy():34
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen4494.next():76
> 
> org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next():234
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns():1214
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():689
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():559
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.secu

[jira] [Assigned] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-19 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua reassigned DRILL-3640:
---

Assignee: Kunal Khatua

> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: Future
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4824) Null maps / lists and non-provided state support for JSON fields. Numeric types promotion.

2017-06-19 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054642#comment-16054642
 ] 

Paul Rogers commented on DRILL-4824:


Turns out this issue has a long history. Added links to the many other tickets 
related to this topic.

> Null maps / lists and non-provided state support for JSON fields. Numeric 
> types promotion.
> --
>
> Key: DRILL-4824
> URL: https://issues.apache.org/jira/browse/DRILL-4824
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.0.0
>Reporter: Roman
>Assignee: Volodymyr Vysotskyi
>
> There is incorrect output in case of JSON file with complex nested data.
> _JSON:_
> {code:none|title=example.json|borderStyle=solid}
> {
> "Field1" : {
> }
> }
> {
> "Field1" : {
> "InnerField1": {"key1":"value1"},
> "InnerField2": {"key2":"value2"}
> }
> }
> {
> "Field1" : {
> "InnerField3" : ["value3", "value4"],
> "InnerField4" : ["value5", "value6"]
> }
> }
> {code}
> _Query:_
> {code:sql}
> select Field1 from dfs.`/tmp/example.json`
> {code}
> _Incorrect result:_
> {code:none}
> +---+
> |  Field1   |
> +---+
> {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{"key1":"value1"},"InnerField2" 
> {"key2":"value2"},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--+
> {code}
> Theres is no need to output missing fields. In case of deeply nested 
> structure we will get unreadable result for user.
> _Correct result:_
> {code:none}
> +--+
> | Field1   |
> +--+
> |{} 
> {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}}
> {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-4149) Escape Character Not Used for TSVs

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-4149:
--

Assignee: Paul Rogers

> Escape Character Not Used for TSVs
> --
>
> Key: DRILL-4149
> URL: https://issues.apache.org/jira/browse/DRILL-4149
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.3.0
>Reporter: Matt Welsh
>Assignee: Paul Rogers
>Priority: Minor
>
> Escape Character does not escape tabs in TSVs
> For instance query:
> select * from df.`bug.tsz`;
> With Storage Format configured as:
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv"
>   ],
>   "escape": "\\",
>   "delimiter": "\t"
> },
> bug.tsv file:
> testval   1   2   3   sometext
> testval   4   5   6   some text with a tab between here\  
> here
> This returns 5 columns for first and 6 for second.  Should be 5 for both.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-4266:
--

Assignee: Paul Rogers

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Paul Rogers
> Attachments: drill.log.2016-01-12-16, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> memComsumption.txt, test.tar, WebUI_500_iterations.txt
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-4286) Have an ability to put server in quiescent mode of operation

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-4286:
--

Assignee: Venkata Jyothsna Donapati  (was: Paul Rogers)

> Have an ability to put server in quiescent mode of operation
> 
>
> Key: DRILL-4286
> URL: https://issues.apache.org/jira/browse/DRILL-4286
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Flow
>Reporter: Victoria Markman
>Assignee: Venkata Jyothsna Donapati
>
> I think drill will benefit from mode of operation that is called "quiescent" 
> in some databases. 
> From IBM Informix server documentation:
> {code}
> Change gracefully from online to quiescent mode
> Take the database server gracefully from online mode to quiescent mode to 
> restrict access to the database server without interrupting current 
> processing. After you perform this task, the database server sets a flag that 
> prevents new sessions from gaining access to the database server. The current 
> sessions are allowed to finish processing. After you initiate the mode 
> change, it cannot be canceled. During the mode change from online to 
> quiescent, the database server is considered to be in Shutdown mode.
> {code}
> This is different from shutdown, when processes are terminated. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5596) Cannot connect to some AWS Regions with S3

2017-06-19 Thread Jack Ingoldsby (JIRA)

Jack Ingoldsby created DRILL-5596:
-

 Summary: Cannot connect to some AWS Regions with S3
 Key: DRILL-5596
 URL: https://issues.apache.org/jira/browse/DRILL-5596
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Other
Affects Versions: 1.10.0
 Environment: Windows Embedded
Reporter: Jack Ingoldsby


Hi,
I created a bucket  in Ohio, but I could not not connect. Set up same bucket 
configuration in Northern Virginia, and it worked.

Appears to be a a known issue

http://drill-user.incubator.apache.narkive.com/Ue0zF3kp/s3-storage-plugin-not-working-for-signature-v4-regions
also 
https://issues.apache.org/jira/browse/DRILL-5317

The concern is to be able to use Drill with S3 as part of of our architecture, 
we would need to be sure it would work with all regions consistently



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-4286) Have an ability to put server in quiescent mode of operation

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-4286:
--

Assignee: Paul Rogers

> Have an ability to put server in quiescent mode of operation
> 
>
> Key: DRILL-4286
> URL: https://issues.apache.org/jira/browse/DRILL-4286
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Flow
>Reporter: Victoria Markman
>Assignee: Paul Rogers
>
> I think drill will benefit from mode of operation that is called "quiescent" 
> in some databases. 
> From IBM Informix server documentation:
> {code}
> Change gracefully from online to quiescent mode
> Take the database server gracefully from online mode to quiescent mode to 
> restrict access to the database server without interrupting current 
> processing. After you perform this task, the database server sets a flag that 
> prevents new sessions from gaining access to the database server. The current 
> sessions are allowed to finish processing. After you initiate the mode 
> change, it cannot be canceled. During the mode change from online to 
> quiescent, the database server is considered to be in Shutdown mode.
> {code}
> This is different from shutdown, when processes are terminated. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-4371) Enhance scan to report file and column name on failure.

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-4371:
--

Assignee: Paul Rogers

> Enhance scan to report file and column name on failure.
> ---
>
> Key: DRILL-4371
> URL: https://issues.apache.org/jira/browse/DRILL-4371
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.5.0
>Reporter: Hanifi Gunes
>Assignee: Paul Rogers
>
> ScanBatch does not seem to report file and column name for some failure 
> scenarios. One such case was pointed out by John on user list at this 
> [thread|https://mail-archives.apache.org/mod_mbox/drill-user/201602.mbox/%3CCAKOFcwqLy%3D26LVKokm7EWizoZdYXafqH0RMXK-oYrpQkq5BELQ%40mail.gmail.com%3E].
>  We should improve upon failure cases so as to provide more context.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-4608) Csv with Headers reader is not case insensitive

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-4608:
--

Assignee: Paul Rogers

> Csv with Headers reader is not case insensitive
> ---
>
> Key: DRILL-4608
> URL: https://issues.apache.org/jira/browse/DRILL-4608
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Reporter: Jacques Nadeau
>Assignee: Paul Rogers
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5568) Include hadoop-common jars inside drill-jdbc-all.jar

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054624#comment-16054624
 ] 

ASF GitHub Bot commented on DRILL-5568:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/849#discussion_r122799104
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/security/SecurityConfiguration.java
 ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.rpc.security;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+
+
+public class SecurityConfiguration extends Configuration {
+  //private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(SecurityConfiguration.class);
+
+  public SecurityConfiguration() {
+super();
+updateGroupMapping();
+  }
+
+  /**
+   * Update the Group Mapping class name to add namespace prefix retrieved 
from System Property. This is needed since
+   * in drill-jdbc-all jar we are packaging hadoop dependencies under that 
namespace. This will help application
+   * using this jar as driver to avoid conflict with it's own hadoop 
dependency if any.
--- End diff --

Maybe add a comment that the property is needed only when Hadoop classes 
are relocated. In a normal build, the property is not set and Hadoop classes 
are used normally.


> Include hadoop-common jars inside drill-jdbc-all.jar
> 
>
> Key: DRILL-5568
> URL: https://issues.apache.org/jira/browse/DRILL-5568
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>
> With Sasl support in 1.10 the authentication using username/password was 
> moved to Plain Mechanism of Sasl Framework. There are couple of Hadoop 
> classes like Configuration.java and UserGroupInformation.java defined in 
> hadoop-common package which were used in DrillClient for security mechanisms 
> like Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency 
> inside _drill-jdbc-all.jar_  Without it the application using this driver 
> will fail to connect to Drill with authentication enabled.
> Today this jar (which is JDBC driver for Drill) already has lots of other 
> dependencies which DrillClient relies on like Netty, etc. But the way we add 
> these dependencies are under *oadd* namespace so that the application using 
> this driver won't end up in conflict with it's own version of same 
> dependencies. As part of this JIRA it will include hadoop-common dependencies 
> under same namespace. This will allow an application to connect to Drill 
> using this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5568) Include hadoop-common jars inside drill-jdbc-all.jar

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054625#comment-16054625
 ] 

ASF GitHub Bot commented on DRILL-5568:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/849#discussion_r122797316
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/security/SecurityConfiguration.java
 ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.rpc.security;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+
+
+public class SecurityConfiguration extends Configuration {
+  //private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(SecurityConfiguration.class);
+
+  public SecurityConfiguration() {
+super();
+updateGroupMapping();
+  }
+
+  /**
+   * Update the Group Mapping class name to add namespace prefix retrieved 
from System Property. This is needed since
+   * in drill-jdbc-all jar we are packaging hadoop dependencies under that 
namespace. This will help application
+   * using this jar as driver to avoid conflict with it's own hadoop 
dependency if any.
+   */
+  private void updateGroupMapping() {
+final String originalClassName = 
get(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING);
+final String profilePrefix = System.getProperty("namespacePrefix");
+final String updatedClassName = (profilePrefix != null) ? 
(profilePrefix + originalClassName)
+: 
originalClassName;
+set(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING, 
updatedClassName);
--- End diff --

```
if (! Strings.isNullOrEmpty(profilePrefix)) {
  set(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING, 
profilePrefix.trim() + originalClassName);
}
```
?
Handles the case where the prefix is missing, or exists, but is empty. Or, 
perhaps I'm being overly cautions about stray spaces, empty properties, and not 
changing properties when not needed...


> Include hadoop-common jars inside drill-jdbc-all.jar
> 
>
> Key: DRILL-5568
> URL: https://issues.apache.org/jira/browse/DRILL-5568
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>
> With Sasl support in 1.10 the authentication using username/password was 
> moved to Plain Mechanism of Sasl Framework. There are couple of Hadoop 
> classes like Configuration.java and UserGroupInformation.java defined in 
> hadoop-common package which were used in DrillClient for security mechanisms 
> like Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency 
> inside _drill-jdbc-all.jar_  Without it the application using this driver 
> will fail to connect to Drill with authentication enabled.
> Today this jar (which is JDBC driver for Drill) already has lots of other 
> dependencies which DrillClient relies on like Netty, etc. But the way we add 
> these dependencies are under *oadd* namespace so that the application using 
> this driver won't end up in conflict with it's own version of same 
> dependencies. As part of this JIRA it will include hadoop-common dependencies 
> under same namespace. This will allow an application to connect to Drill 
> using this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5568) Include hadoop-common jars inside drill-jdbc-all.jar

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054622#comment-16054622
 ] 

ASF GitHub Bot commented on DRILL-5568:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/849#discussion_r122800022
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillFactory.java ---
@@ -37,6 +38,24 @@
   protected final int major;
   protected final int minor;
 
+  static {
+Properties prop = new Properties();
--- End diff --

Actually, I think this code can be moved into `SecurityConfiguration.java`.

Java handles resources by looking in the class path, which is formed from 
all jars. A long as the resource file is somewhere in the top level of some 
jar, this code will find it. So, despite what I said in person the other day, 
the code need not be here to work.

That said, the property file DOES have to be in the JDBC-all package to 
avoid having multiple files of the same name in the class path.

Please try it out to see if the code works in `SecurityConfiguration.java`.


> Include hadoop-common jars inside drill-jdbc-all.jar
> 
>
> Key: DRILL-5568
> URL: https://issues.apache.org/jira/browse/DRILL-5568
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>
> With Sasl support in 1.10 the authentication using username/password was 
> moved to Plain Mechanism of Sasl Framework. There are couple of Hadoop 
> classes like Configuration.java and UserGroupInformation.java defined in 
> hadoop-common package which were used in DrillClient for security mechanisms 
> like Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency 
> inside _drill-jdbc-all.jar_  Without it the application using this driver 
> will fail to connect to Drill with authentication enabled.
> Today this jar (which is JDBC driver for Drill) already has lots of other 
> dependencies which DrillClient relies on like Netty, etc. But the way we add 
> these dependencies are under *oadd* namespace so that the application using 
> this driver won't end up in conflict with it's own version of same 
> dependencies. As part of this JIRA it will include hadoop-common dependencies 
> under same namespace. This will allow an application to connect to Drill 
> using this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5568) Include hadoop-common jars inside drill-jdbc-all.jar

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054626#comment-16054626
 ] 

ASF GitHub Bot commented on DRILL-5568:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/849#discussion_r122796433
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/security/SecurityConfiguration.java
 ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.rpc.security;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+
+
+public class SecurityConfiguration extends Configuration {
+  //private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(SecurityConfiguration.class);
+
+  public SecurityConfiguration() {
+super();
+updateGroupMapping();
+  }
+
+  /**
+   * Update the Group Mapping class name to add namespace prefix retrieved 
from System Property. This is needed since
+   * in drill-jdbc-all jar we are packaging hadoop dependencies under that 
namespace. This will help application
+   * using this jar as driver to avoid conflict with it's own hadoop 
dependency if any.
+   */
+  private void updateGroupMapping() {
+final String originalClassName = 
get(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING);
+final String profilePrefix = System.getProperty("namespacePrefix");
--- End diff --

Would recommend a name that is a bit less generic. Maybe 
"drill.security.namespacePrefix"?


> Include hadoop-common jars inside drill-jdbc-all.jar
> 
>
> Key: DRILL-5568
> URL: https://issues.apache.org/jira/browse/DRILL-5568
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>
> With Sasl support in 1.10 the authentication using username/password was 
> moved to Plain Mechanism of Sasl Framework. There are couple of Hadoop 
> classes like Configuration.java and UserGroupInformation.java defined in 
> hadoop-common package which were used in DrillClient for security mechanisms 
> like Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency 
> inside _drill-jdbc-all.jar_  Without it the application using this driver 
> will fail to connect to Drill with authentication enabled.
> Today this jar (which is JDBC driver for Drill) already has lots of other 
> dependencies which DrillClient relies on like Netty, etc. But the way we add 
> these dependencies are under *oadd* namespace so that the application using 
> this driver won't end up in conflict with it's own version of same 
> dependencies. As part of this JIRA it will include hadoop-common dependencies 
> under same namespace. This will allow an application to connect to Drill 
> using this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5568) Include hadoop-common jars inside drill-jdbc-all.jar

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054623#comment-16054623
 ] 

ASF GitHub Bot commented on DRILL-5568:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/849#discussion_r122799235
  
--- Diff: exec/jdbc-all/pom.xml ---
@@ -483,6 +518,269 @@
 
   
 
+  default
--- End diff --

Not real happy about the amount of redundancy. But, I suppose Maven offers 
no alternative...


> Include hadoop-common jars inside drill-jdbc-all.jar
> 
>
> Key: DRILL-5568
> URL: https://issues.apache.org/jira/browse/DRILL-5568
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>
> With Sasl support in 1.10 the authentication using username/password was 
> moved to Plain Mechanism of Sasl Framework. There are couple of Hadoop 
> classes like Configuration.java and UserGroupInformation.java defined in 
> hadoop-common package which were used in DrillClient for security mechanisms 
> like Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency 
> inside _drill-jdbc-all.jar_  Without it the application using this driver 
> will fail to connect to Drill with authentication enabled.
> Today this jar (which is JDBC driver for Drill) already has lots of other 
> dependencies which DrillClient relies on like Netty, etc. But the way we add 
> these dependencies are under *oadd* namespace so that the application using 
> this driver won't end up in conflict with it's own version of same 
> dependencies. As part of this JIRA it will include hadoop-common dependencies 
> under same namespace. This will allow an application to connect to Drill 
> using this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-4843) Trailing spaces in CSV column headers cause IndexOutOfBoundsException

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-4843:
--

Assignee: Paul Rogers

> Trailing spaces in CSV column headers cause IndexOutOfBoundsException
> -
>
> Key: DRILL-4843
> URL: https://issues.apache.org/jira/browse/DRILL-4843
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.6.0, 1.7.0
> Environment: MapR Community cluster on CentOS 7.2
>Reporter: Matt Keranen
>Assignee: Paul Rogers
>
> When a CSV file with a header row has spaces after commas, an IOBE is thrown 
> when trying to reference column names. For example, this will cause the 
> exeption:
> {{col1, col2, col3}}
> Where this will not
> {{col1,col2,col3}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-4845) Malformed CSV throws IllegalArgumentException

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-4845:
--

Assignee: Paul Rogers

> Malformed CSV throws IllegalArgumentException
> -
>
> Key: DRILL-4845
> URL: https://issues.apache.org/jira/browse/DRILL-4845
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.6.0, 1.7.0
> Environment: CentOS 7
>Reporter: Matt Keranen
>Assignee: Paul Rogers
>
> When reading CSV data, if lines are malformed, often an error such as the 
> following is reported:
> Error: SYSTEM ERROR: IllegalArgumentException: length: -15150 (expected: 
> >= 0)
> Neither the error message reported to the client (with verbose errors on), 
> nor on the drillbit describe what or where the problem is.
> Perhaps the CSV parser can throw exceptions that would help in locating what 
> in the source data is not parsing?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-4960) Wrong columns after scanning Json files where some files have missing columns

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-4960:
--

Assignee: Paul Rogers

> Wrong columns after scanning Json files where some files have missing columns
> -
>
> Key: DRILL-4960
> URL: https://issues.apache.org/jira/browse/DRILL-4960
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.8.0
> Environment: Mac
>Reporter: Boaz Ben-Zvi
>Assignee: Paul Rogers
>
> (This problem may be more general than just Json)
> To recreate: Scan two small Json files (e.g. copy twice 
> contrib/storage-mongo/src/test/resources/emp.json ) where in one of the files 
> a whole column was eliminated (e.g. "last_name"). 
> A "normal" scan (the missing column shows up as nulls):
> 0: jdbc:drill:zk=local> select * from `drill/data/emp`;
> +--+-+-+--+--+-+++
> | employee_id  |  full_name  | first_name  |  last_name   | 
> position_id  | rating  |  position  | isFTE  |
> +--+-+-+--+--+-+++
> | 1101 | Steve Eurich| Steve   | Eurich   | 16
>| 23.0| Store T| true   |
> | 1102 | Mary Pierson| Mary| Pierson  | 16
>| 45.6| Store T| true   |
> | 1103 | Leo Jones   | Leo | Jones| 16
>| 85.94   | Store Tem  | true   |
> | 1104 | Nancy Beatty| Nancy   | Beatty   | 16
>| 97.16   | Store T| false  |
> | 1105 | Clara McNight   | Clara   | McNight  | 16
>| 81.25   | Store  | true   |
> | 1106 | null| Marcella| Isaacs   | 17
>| 67.86   | Stor   | false  |
> | 1107 | Charlotte Yonce | Charlotte   | Yonce| 17
>| 52.17   | Stor   | true   |
> | 1108 | Benjamin Foster | Benjamin| Foster   | 17
>| 89.8| Stor   | false  |
> | 1109 | John Reed   | John| Reed | 17
>| 12.9| Store Per  | false  |
> | 1110 | Lynn Kwiatkowski| Lynn| Kwiatkowski  | 17
>| 25.76   | St | true   |
> |  | Donald Vann | Donald  | Vann | 17
>| 34.86   | Store Per  | false  |
> | 1112 | null| William | Smith| null  
>| 79.06   | St | true   |
> | 1113 | Amy Hensley | Amy | Hensley  | 17
>| 82.96   | Store Pe   | false  |
> | 1114 | Judy Owens  | Judy| Owens| 17
>| 24.6| Store Per  | true   |
> | 1115 | Frederick Castillo  | Frederick   | Castillo | 17
>| 82.36   | S  | false  |
> | 1116 | Phil Munoz  | Phil| Munoz| 17
>| 97.63   | Store Per  | false  |
> | 1117 | Lori Lightfoot  | Lori| Lightfoot| 17
>| 39.16   | Store  | true   |
> | 1| Kumar   | Anil| B| 19
>| 45.45   | Store  | true   |
> | 2| Kamesh  | Bh  | Venkata  | null  
>| 32.89   | Store  | true   |
> | 1101 | Steve Eurich| Steve   | null | 16
>| 23.0| Store T| true   |
> | 1102 | Mary Pierson| Mary| null | 16
>| 45.6| Store T| true   |
> | 1103 | Leo Jones   | Leo | null | 16
>| 85.94   | Store Tem  | true   |
> | 1104 | Nancy Beatty| Nancy   | null | 16
>| 97.16   | Store T| false  |
> | 1105 | Clara McNight   | Clara   | null | 16
>| 81.25   | Store  | true   |
> | 1106 | null| Marcella| null | 17
>| 67.86   | Stor   | false  |
> | 1107 | Charlotte Yonce | Charlotte   | null | 17
>| 52.17   | Stor   | true   |
> | 1108 | Benjamin Foster | Benjamin| null | 17
>| 89.8| Stor   | false  |
> | 1109 | John Reed   | John| null | 17
>| 12.9| Store Per  | false  |
> | 1110 | Lynn Kwiatkowski| Lynn| null | 17
>| 25.76   | St | true   |
> |  | Donald Vann | Donald  | null | 17
>|

[jira] [Assigned] (DRILL-5016) Config param drill.exec.sort.purge.threshold is misnamed

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-5016:
--

Assignee: (was: Paul Rogers)

> Config param drill.exec.sort.purge.threshold is misnamed
> 
>
> Key: DRILL-5016
> URL: https://issues.apache.org/jira/browse/DRILL-5016
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Priority: Minor
>
> The Drill config system provides a property called 
> {{drill.exec.sort.purge.threshold}}. The name suggests that this is a 
> parameter related to sorting. Perhaps it controls something having to do with 
> when we purge buffered batches from memory in the ExternalSortBatch?
> In fact, this is actually {{drill.exec.topn.purge-threshold}} - it affects 
> only the Top-N operator, not sort.
> To make this change, rename the config attribute in {{ExecConstants}} from
> {code}
>   String BATCH_PURGE_THRESHOLD = "drill.exec.sort.purge.threshold";
> {code}
> to:
> {code}
>   String TOP_N_PURGE_THRESHOLD = "drill.exec.topn.purge-threshold";
> {code}
> To permit backward compatibility, modify the use in TopNBatch to check the 
> old value, use it if set, else use the new value.
> {code}
> // Check pre x.y config parameter for backward compatibility.
> if ( ! context.getConfig( ).isEmpty( "drill.exec.sort.purge.threshold" ) 
> ) {
>   batchPurgeThreshold = 
> context.getConfig().getInt("drill.exec.sort.purge.threshold");
> } else {
>   batchPurgeThreshold = context.getConfig().getInt(ExecConstants. 
> TOP_N_PURGE_THRESHOLD);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5021) ExternalSortBatch redundantly redefines the batch schema

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-5021:
--

Assignee: Paul Rogers

> ExternalSortBatch redundantly redefines the batch schema
> 
>
> Key: DRILL-5021
> URL: https://issues.apache.org/jira/browse/DRILL-5021
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Much code in the {{ExternalSortBatch}} (ESB) deals with building vector 
> batches and schemas. However, ESB cannot handle schema changes. The only 
> valid schema difference is the same field path in a different position in the 
> vector array. Given this restriction, the code can be simplified (and sped 
> up) by exploiting the fact that all batches are required to have the same 
> conceptual schema (same set of fields, but perhaps in different vector order) 
> and most probably, the same physical schema (same fields and same vector 
> order.) Note that, because of the way that the {{getValueVectorId()}} method 
> works, each lookup of a value vector is an O(n) operation, so that each 
> remapping of vectors is O(n^2).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5016) Config param drill.exec.sort.purge.threshold is misnamed

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-5016:
--

Assignee: Paul Rogers

> Config param drill.exec.sort.purge.threshold is misnamed
> 
>
> Key: DRILL-5016
> URL: https://issues.apache.org/jira/browse/DRILL-5016
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> The Drill config system provides a property called 
> {{drill.exec.sort.purge.threshold}}. The name suggests that this is a 
> parameter related to sorting. Perhaps it controls something having to do with 
> when we purge buffered batches from memory in the ExternalSortBatch?
> In fact, this is actually {{drill.exec.topn.purge-threshold}} - it affects 
> only the Top-N operator, not sort.
> To make this change, rename the config attribute in {{ExecConstants}} from
> {code}
>   String BATCH_PURGE_THRESHOLD = "drill.exec.sort.purge.threshold";
> {code}
> to:
> {code}
>   String TOP_N_PURGE_THRESHOLD = "drill.exec.topn.purge-threshold";
> {code}
> To permit backward compatibility, modify the use in TopNBatch to check the 
> old value, use it if set, else use the new value.
> {code}
> // Check pre x.y config parameter for backward compatibility.
> if ( ! context.getConfig( ).isEmpty( "drill.exec.sort.purge.threshold" ) 
> ) {
>   batchPurgeThreshold = 
> context.getConfig().getInt("drill.exec.sort.purge.threshold");
> } else {
>   batchPurgeThreshold = context.getConfig().getInt(ExecConstants. 
> TOP_N_PURGE_THRESHOLD);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5060) External Sort does not recognize query cancellation during long events

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-5060:
--

Assignee: Paul Rogers

> External Sort does not recognize query cancellation during long events
> --
>
> Key: DRILL-5060
> URL: https://issues.apache.org/jira/browse/DRILL-5060
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> The external sort operator performs in-memory or on-disk sorting. Several 
> functions within external sort can take an extended time. During this time, 
> the operator does not check if a query cancellation has occurred. These long 
> events are:
> * Spilling a group of batches to disk
> * In-memory sort
> Note that "re-spilling" of spill files is covered by the first item above: 
> re-spill uses the same code to write the new file as is used for the original 
> "first generation" spill files.
> Also, the final merge is handled because the downstream operator will check 
> for cancellation prior to fetching each batch from the sort operator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5100) External Sort does not manage memory requirements of a schema change

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-5100:
--

Assignee: (was: Paul Rogers)

> External Sort does not manage memory requirements of a schema change
> 
>
> Key: DRILL-5100
> URL: https://issues.apache.org/jira/browse/DRILL-5100
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>
> The external sort is given a fixed amount of memory to hold buffered 
> in-memory batches prior to spilling. External sort also handles certain 
> schema changes when union vectors are enabled. When a schema change occurs, 
> existing vectors are coerced into the new schema format, perhaps replacing an 
> existing vector with a new union vector.
> This conversion requires (direct) memory. When done when the external sort 
> has already almost filled its in-memory buffer, the conversion process can 
> cause memory overflow and failure.
> The following show the allocated memory before and after schema changes in 
> the unit tests {{TestExternalSort.testNumericTypes}}:
> {code}
> Before: 134144
> After: 150528
> Before: 150528
> After: 166912
> {code}
> Union vectors appear to be larger than the original BIGINT vectors. External 
> sort must anticipate this and perhaps spill to ensure sufficient room exists 
> for the new, larger vectors.
> Further, the conversion process itself requires that two copies of each 
> vector be in memory: the original and the new, converted one. The external 
> sort does not check to ensure this much working memory is available, leading 
> to potential OOM errors during each vector conversion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5146) Unnecessary spilling to disk by sort when we only have 5000 rows with one column

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-5146:
--

Assignee: Paul Rogers

> Unnecessary spilling to disk by sort when we only have 5000 rows with one 
> column
> 
>
> Key: DRILL-5146
> URL: https://issues.apache.org/jira/browse/DRILL-5146
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Attachments: 27a52efb-0ce6-f2ad-7216-aef007926649.sys.drill, 
> data.tgz, spill.log
>
>
> git.commit.id.abbrev=cf2b7c7
> The below query spills to disk for the sort. The dataset contains 5000 files 
> and each file contains a single record. 
> {code}
> select * from dfs.`/drill/testdata/resource-manager/5000files/text` order by 
> columns[1];
> {code}
> Enviironment :
> {code}
> DRILL_MAX_DIRECT_MEMORY="16G"
> DRILL_MAX_HEAP="4G"
> {code}
> I attached the dataset, logs and the profile



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5171) TestSort unit test tests the external sort, not the in-memory SortBatch

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-5171:
--

Assignee: Paul Rogers

> TestSort unit test tests the external sort, not the in-memory SortBatch
> ---
>
> Key: DRILL-5171
> URL: https://issues.apache.org/jira/browse/DRILL-5171
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>
> Drill provides two sort operators: an in-memory {{SortBatch}} and a spillable 
> {{ExternalSortBatch}}. The external sort is adaptive: it sorts in memory when 
> it can, and spills when necessary. Perhaps for this reason, the in-memory 
> sort appears to be deprecated (but is not marked as such.)
> The in-memory sort has associated test case: {{TestSort}} and 
> {{TestSimpleSort}}. When run, {{TestSort}} actually uses the external sort. 
> {{TestSimpleSort}} has a single test case which is disabled.
> The result is that no tests exist for the in-memory sort. That operator 
> should be marked as deprecated, or the test cases adjusted to actually 
> exercise the in-memory sort operator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5163) External sort on Mac creates a separate child process per spill via HDFS FS

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-5163:
--

Assignee: Paul Rogers

> External sort on Mac creates a separate child process per spill via HDFS FS
> ---
>
> Key: DRILL-5163
> URL: https://issues.apache.org/jira/browse/DRILL-5163
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> The external sort operator spills to disk. Spill files are created and 
> written using the HDFS file system. For performance, HDFS uses native 
> libraries to access the file system. These native libraries are not available 
> on the Mac. As a result, some operations are implemented using a shower, 
> Java-only path. One of these operations (need details) is implemented by 
> forking a child process.
> When run in a debugger on the Mac, the behavior shows up as the furious 
> creation and deletion of threads to manage the child processes: one per 
> spill. Because of this behavior, performance of external sort is slow. Of 
> course, no production code uses Drill on a Mac, so this is more of a nuisance 
> than a real bug, which is why it is marked as an improvement.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5209) Standardize Drill's batch size

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-5209:
--

Assignee: Paul Rogers

> Standardize Drill's batch size
> --
>
> Key: DRILL-5209
> URL: https://issues.apache.org/jira/browse/DRILL-5209
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Drill is columnar, implemented as a set of value vectors. Value vectors 
> consume memory, which is a fixed resource on each Drillbit. Effective 
> resource management requires the ability to control (or at least predict) 
> resource usage.
> Most data consists of more than one column. A collection of columns (or rows, 
> depending on your perspective) is a record batch.
> Many parts of Drill use 64K rows as the target size of a record batch. The 
> Flatten operator targets batch sizes of 512 MB. The text scan operator 
> appears to target batch sizes of 128 MB. Other operators may use other sizes.
> Operators that target 64K rows use, essentially, unknown and potentially 
> unlimited amounts of memory. While 64K rows of an integer each is fine, 64K 
> rows of Varchar columns of 50K each leads to a batch of 3.2 GB in size, which 
> is rather large.
> This ticket requests three improvements.
> 1. Define a preferred batch size which is a balance between various needs: 
> memory use, network efficiency, benefits of vector operations, etc.
> 2. Provide a reliable way to learn the size of each row as it is added to a 
> batch.
> 3. Use the above to limit batches to the preferred batch size.
> The above will go a long way to easing the task of managing memory because 
> the planner will have some hope of understanding how much memory to allocate 
> to various operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5282) Rationalize record batch sizes in all readers and operators

2017-06-19 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-5282:
--

Assignee: Paul Rogers

> Rationalize record batch sizes in all readers and operators
> ---
>
> Key: DRILL-5282
> URL: https://issues.apache.org/jira/browse/DRILL-5282
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>
> Drill uses record batches to process data. A record batch consists of a 
> "bundle" of vectors that, combined, hold the data for some number of records.
> The key consideration for a record batch is memory consumed. Various 
> operators and readers have vastly different ideas of the size of a batch. The 
> text reader can produce batches of 100s of K, while the flatten operator 
> produces batches of half a GB. Other operators are randomly in between. Some 
> readers produce batches of unlimited size driven by average row width.
> Another key consideration is record count. Batches have a hard physical limit 
> of 64K (the number indexed by a two-byte selection vector.) Some operators 
> produce this much, others far less. In one case, we saw a reader that 
> produced 64K+1 records.
> A final consideration is the size of individual vectors. Drill incurs severe 
> memory fragmentation when vectors grow above 16 MB.
> In some cases, operators (such as the Parquet reader) allocate large batches, 
> but only partially fill them, creating a large amount of wasted space. That 
> space adds up when we must buffer it during a sort.
> This ticket asks to research an optimal batch size. Create a framework to 
> build such batches. Retrofit all operators that produce batches to use that 
> framework to produce uniform batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (DRILL-3968) JDBC driver seems to be not compatible with previous versions of apache drill.

2017-06-19 Thread Nicolas GERARD (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054251#comment-16054251
 ] 

Nicolas GERARD edited comment on DRILL-3968 at 6/19/17 4:03 PM:


I got also the same error when trying to connect directly through JDBC with the 
URL "jdbc:drill:drillbit=localhost".

After some search and try, I found that the problem was related with this issue:
https://issues.apache.org/jira/browse/DRILL-5101

I performed as advised:
mkdir %userprofile%\drill
mkdir %userprofile%\drill\udf
mkdir %userprofile%\drill\udf\registry
mkdir %userprofile%\drill\udf\tmp
mkdir %userprofile%\drill\udf\staging
takeown /R /F %userprofile%\drill 

And my error was gone.

Work on version 1.10.0

I could see this problem for all version above 1.1.0.


was (Author: gerardnico):
I got also the same error when trying to connect directly through JDBC with the 
URL "jdbc:drill:drillbit=localhost".

After some search and try, I found that the problem was related with this issue:
https://issues.apache.org/jira/browse/DRILL-5101

I performed as advised:
mkdir %userprofile%\drill
mkdir %userprofile%\drill\udf
mkdir %userprofile%\drill\udf\registry
mkdir %userprofile%\drill\udf\tmp
mkdir %userprofile%\drill\udf\staging
takeown /R /F %userprofile%\drill 

And my error was gone.

Work on version 1.10.0

> JDBC driver seems to be not compatible with previous versions of apache drill.
> --
>
> Key: DRILL-3968
> URL: https://issues.apache.org/jira/browse/DRILL-3968
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Dmitriy
>
> When try to execute query using jdbc-driver-all version 1.2.0 to environment 
> with drill 1.0.0 or 1.1.0  I have the following exception
> Exception in thread "main" java.sql.SQLException: Unexpected 
> RuntimeException: java.lang.IndexOutOfBoundsException: Index: 0
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:261)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1359)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:74)
>   at 
> oadd.net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
>   at 
> oadd.net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351)
>   at 
> oadd.net.hydromatic.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:78)
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:97)
>   at TestNewDriver.test(TestNewDriver.java:24)
>   at TestNewDriver.main(TestNewDriver.java:17)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0
>   at java.util.Collections$EmptyList.get(Collections.java:4454)
>   at 
> oadd.org.apache.drill.exec.proto.UserBitShared$SerializedField.getChild(UserBitShared.java:8390)
>   at 
> oadd.org.apache.drill.exec.vector.NullableVarCharVector.load(NullableVarCharVector.java:258)
>   at 
> oadd.org.apache.drill.exec.record.RecordBatchLoader.load(RecordBatchLoader.java:102)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:223)
>   ... 14 more



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (DRILL-3968) JDBC driver seems to be not compatible with previous versions of apache drill.

2017-06-19 Thread Nicolas GERARD (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054251#comment-16054251
 ] 

Nicolas GERARD edited comment on DRILL-3968 at 6/19/17 4:02 PM:


I got also the same error when trying to connect directly through JDBC with the 
URL "jdbc:drill:drillbit=localhost".

After some search and try, I found that the problem was related with this issue:
https://issues.apache.org/jira/browse/DRILL-5101

I performed as advised:
mkdir %userprofile%\drill
mkdir %userprofile%\drill\udf
mkdir %userprofile%\drill\udf\registry
mkdir %userprofile%\drill\udf\tmp
mkdir %userprofile%\drill\udf\staging
takeown /R /F %userprofile%\drill 

And my error was gone.

Work on version 1.10.0


was (Author: gerardnico):
I got also the same error when trying to connect directly through JDBC with the 
URL "jdbc:drill:drillbit=localhost".

After some search and try, I found that the problem was related with this issue:
https://issues.apache.org/jira/browse/DRILL-5101

I performed as advised:
mkdir %userprofile%\drill
mkdir %userprofile%\drill\udf
mkdir %userprofile%\drill\udf\registry
mkdir %userprofile%\drill\udf\tmp
mkdir %userprofile%\drill\udf\staging
takeown /R /F %userprofile%\drill 

And my error was gone.

> JDBC driver seems to be not compatible with previous versions of apache drill.
> --
>
> Key: DRILL-3968
> URL: https://issues.apache.org/jira/browse/DRILL-3968
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Dmitriy
>
> When try to execute query using jdbc-driver-all version 1.2.0 to environment 
> with drill 1.0.0 or 1.1.0  I have the following exception
> Exception in thread "main" java.sql.SQLException: Unexpected 
> RuntimeException: java.lang.IndexOutOfBoundsException: Index: 0
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:261)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1359)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:74)
>   at 
> oadd.net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
>   at 
> oadd.net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351)
>   at 
> oadd.net.hydromatic.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:78)
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:97)
>   at TestNewDriver.test(TestNewDriver.java:24)
>   at TestNewDriver.main(TestNewDriver.java:17)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0
>   at java.util.Collections$EmptyList.get(Collections.java:4454)
>   at 
> oadd.org.apache.drill.exec.proto.UserBitShared$SerializedField.getChild(UserBitShared.java:8390)
>   at 
> oadd.org.apache.drill.exec.vector.NullableVarCharVector.load(NullableVarCharVector.java:258)
>   at 
> oadd.org.apache.drill.exec.record.RecordBatchLoader.load(RecordBatchLoader.java:102)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:223)
>   ... 14 more



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3968) JDBC driver seems to be not compatible with previous versions of apache drill.

2017-06-19 Thread Nicolas GERARD (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054251#comment-16054251
 ] 

Nicolas GERARD commented on DRILL-3968:
---

I got also the same error when trying to connect directly through JDBC with the 
URL "jdbc:drill:drillbit=localhost".

After some search and try, I found that the problem was related with this issue:
https://issues.apache.org/jira/browse/DRILL-5101

I performed as advised:
mkdir %userprofile%\drill
mkdir %userprofile%\drill\udf
mkdir %userprofile%\drill\udf\registry
mkdir %userprofile%\drill\udf\tmp
mkdir %userprofile%\drill\udf\staging
takeown /R /F %userprofile%\drill 

And my error was gone.

> JDBC driver seems to be not compatible with previous versions of apache drill.
> --
>
> Key: DRILL-3968
> URL: https://issues.apache.org/jira/browse/DRILL-3968
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Dmitriy
>
> When try to execute query using jdbc-driver-all version 1.2.0 to environment 
> with drill 1.0.0 or 1.1.0  I have the following exception
> Exception in thread "main" java.sql.SQLException: Unexpected 
> RuntimeException: java.lang.IndexOutOfBoundsException: Index: 0
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:261)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1359)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:74)
>   at 
> oadd.net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
>   at 
> oadd.net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351)
>   at 
> oadd.net.hydromatic.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:78)
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:97)
>   at TestNewDriver.test(TestNewDriver.java:24)
>   at TestNewDriver.main(TestNewDriver.java:17)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0
>   at java.util.Collections$EmptyList.get(Collections.java:4454)
>   at 
> oadd.org.apache.drill.exec.proto.UserBitShared$SerializedField.getChild(UserBitShared.java:8390)
>   at 
> oadd.org.apache.drill.exec.vector.NullableVarCharVector.load(NullableVarCharVector.java:258)
>   at 
> oadd.org.apache.drill.exec.record.RecordBatchLoader.load(RecordBatchLoader.java:102)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:223)
>   ... 14 more



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054181#comment-16054181
 ] 

ASF GitHub Bot commented on DRILL-5517:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/840#discussion_r118806486
  
--- Diff: 
exec/memory/base/src/main/java/io/netty/buffer/UnsafeDirectLittleEndian.java ---
@@ -174,6 +175,40 @@ public ByteBuf setDouble(int index, double value) {
 return this;
   }
 
+  // Clone of the super class checkIndex, but this version returns a 
boolean rather
+  // than throwing an exception.
+
+  protected boolean hasCapacity(int index, int fieldLength) {
+if (fieldLength < 0) {
--- End diff --

change this to an assertion as per our discussion?


> Provide size-aware set operations in value vectors
> --
>
> Key: DRILL-5517
> URL: https://issues.apache.org/jira/browse/DRILL-5517
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is 
> to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") 
> Effort starts by providing "size-aware" set operations in value vectors which:
> * Operate as {{setSafe()}} while vectors are below 16 MB.
> * Throw a new, specific exception ({{VectorOverflowException}}) if setting 
> the value (and growing the vector) would exceed the vector limit.
> The methods in value vectors then become the foundation on which we can 
> construct size-aware record batch "writers."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054183#comment-16054183
 ] 

ASF GitHub Bot commented on DRILL-5517:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/840#discussion_r122526465
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -806,10 +998,32 @@ public void generateTestDataAlt(int size) {
 }
 
<#-- type.width -->
+/**
+ * Backfill missing offsets from the given last written position to the
+ * given current write position. Used by the "new" size-safe column
+ * writers to allow skipping values. The set() and 
setSafe()
+ * do not fill empties. See DRILL-5529 and DRILL-.
+ * @param lastWrite the position of the last valid write: the offset
+ * to be copied forward
+ * @param index the current write position filling occurs up to,
+ * but not including, this position
+ */
+
+public void fillEmptiesBounded(int lastWrite, int index)
+throws VectorOverflowException {
+  for (int i = lastWrite; i < index; i++) {
+<#if type.width <= 8>
--- End diff --

You can avoid an extra op in the loop by adjusting the bounds of the 
induction variable. The compiler's  induction variable analysis might 
automatically figure this one out.

public void fillEmptiesBounded(int lastWrite, int index)
throws VectorOverflowException {
  for (int i = lastWrite + 1; i <= index; i++) {
setSafe(i, (int) 0);<-- 
one less addition in the loop and less register pressure
  }
}


> Provide size-aware set operations in value vectors
> --
>
> Key: DRILL-5517
> URL: https://issues.apache.org/jira/browse/DRILL-5517
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is 
> to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") 
> Effort starts by providing "size-aware" set operations in value vectors which:
> * Operate as {{setSafe()}} while vectors are below 16 MB.
> * Throw a new, specific exception ({{VectorOverflowException}}) if setting 
> the value (and growing the vector) would exceed the vector limit.
> The methods in value vectors then become the foundation on which we can 
> construct size-aware record batch "writers."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054180#comment-16054180
 ] 

ASF GitHub Bot commented on DRILL-5517:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/840#discussion_r122533121
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -548,6 +567,23 @@ public void setSafe(int index, ByteBuffer bytes, int 
start, int length) {
   }
 }
 
+public void setScalar(int index, DrillBuf bytes, int start, int 
length) throws VectorOverflowException {
+  assert index >= 0;
+
+  if (index >= MAX_ROW_COUNT) {
+throw new VectorOverflowException();
+  }
+  int currentOffset = offsetVector.getAccessor().get(index);
+  final int newSize = currentOffset + length;
+  if (newSize > MAX_BUFFER_SIZE) {
+throw new VectorOverflowException();
+  }
+  while (! data.setBytesBounded(currentOffset, bytes, start, length)) {
--- End diff --

indentation


> Provide size-aware set operations in value vectors
> --
>
> Key: DRILL-5517
> URL: https://issues.apache.org/jira/browse/DRILL-5517
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is 
> to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") 
> Effort starts by providing "size-aware" set operations in value vectors which:
> * Operate as {{setSafe()}} while vectors are below 16 MB.
> * Throw a new, specific exception ({{VectorOverflowException}}) if setting 
> the value (and growing the vector) would exceed the vector limit.
> The methods in value vectors then become the foundation on which we can 
> construct size-aware record batch "writers."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054182#comment-16054182
 ] 

ASF GitHub Bot commented on DRILL-5517:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/840#discussion_r122541355
  
--- Diff: 
exec/memory/base/src/main/java/io/netty/buffer/UnsafeDirectLittleEndian.java ---
@@ -174,6 +175,40 @@ public ByteBuf setDouble(int index, double value) {
 return this;
   }
 
+  // Clone of the super class checkIndex, but this version returns a 
boolean rather
+  // than throwing an exception.
+
+  protected boolean hasCapacity(int index, int fieldLength) {
+if (fieldLength < 0) {
+throw new IllegalArgumentException("length: " + fieldLength + " 
(expected: >= 0)");
+}
+return (! (index < 0 || index > capacity() - fieldLength));
+  }
+
+  // Clone of the super class setBytes(), but with bounds checking done as 
a boolean,
+  // not assertion.
+
+  public boolean setBytesBounded(int index, byte[] src, int srcIndex, int 
length) {
+if (! hasCapacity(index, length)) {
+  return false;
+}
+if (length != 0) {
--- End diff --

Remove length check


> Provide size-aware set operations in value vectors
> --
>
> Key: DRILL-5517
> URL: https://issues.apache.org/jira/browse/DRILL-5517
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is 
> to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") 
> Effort starts by providing "size-aware" set operations in value vectors which:
> * Operate as {{setSafe()}} while vectors are below 16 MB.
> * Throw a new, specific exception ({{VectorOverflowException}}) if setting 
> the value (and growing the vector) would exceed the vector limit.
> The methods in value vectors then become the foundation on which we can 
> construct size-aware record batch "writers."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5589) JDBC client crashes after successful authentication if trace logging is enabled.

2017-06-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5589:

Fix Version/s: 1.11.0

> JDBC client crashes after successful authentication if trace logging is 
> enabled.
> 
>
> Key: DRILL-5589
> URL: https://issues.apache.org/jira/browse/DRILL-5589
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> When authentication is completed then with latest changes we [dispose the 
> saslClient instance | 
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/rpc/security/AuthenticationOutcomeListener.java#L295]
>  if encryption is not enabled. Then later in caller we try to [log the 
> mechanism name | 
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/rpc/security/AuthenticationOutcomeListener.java#L136]
>  using saslClient instance with trace level logging. This will cause the 
> client to crash since the saslClient instance is already disposed before 
> logging. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5544) Out of heap running CTAS against text delimited

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053782#comment-16053782
 ] 

ASF GitHub Bot commented on DRILL-5544:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/846


> Out of heap running CTAS against text delimited
> ---
>
> Key: DRILL-5544
> URL: https://issues.apache.org/jira/browse/DRILL-5544
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
> Environment: - 2 or 4 nodes cluster
> - 4G or 8G of Java heap and more than 8G of direct memory
> - planner.width.max_per_node = 40
> - store.parquet.compression = none
> To generate lineitem.tbl file unzip dbgen.tgz archive and run:
> {code}dbgen -TL -s 500{code}
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
> Attachments: dbgen.tgz
>
>
> This query causes the drillbit to hang:
> {code}
> create table xyz as
> select
> cast(columns[0] as bigint) l_orderkey,
> cast(columns[1] as integer) l_poartkey,
> cast(columns[2] as integer) l_suppkey,
> cast(columns[3] as integer) l_linenumber,
> cast(columns[4] as double) l_quantity,
> cast(columns[5] as double) l_extendedprice,
> cast(columns[6] as double) l_discount,
> cast(columns[7] as double) l_tax,
> cast(columns[8] as char(1)) l_returnflag,
> cast(columns[9] as char(1)) l_linestatus,
> cast(columns[10] as date) l_shipdate,
> cast(columns[11] as date) l_commitdate,
> cast(columns[12] as date) l_receiptdate,
> cast(columns[13] as char(25)) l_shipinstruct,
> cast(columns[14] as char(10)) l_shipmode,
> cast(columns[15] as varchar(44)) l_comment
> from
> `lineitem.tbl`;
> {code}
> OOM "Java heap space" from the drillbit.log:
> {code:title=drillbit.log|borderStyle=solid}
> ...
> 2017-02-07 22:38:11,031 [2765b496-0b5b-a3df-c252-a8bb9cd2e52f:frag:1:53] 
> DEBUG o.a.d.e.s.p.ParquetDirectByteBufferAllocator - 
> ParquetDirectByteBufferAllocator: Allocated 209715 bytes. Allocated 
> ByteBuffer id: 1563631814
> 2017-02-07 22:38:16,478 [2765b496-0b5b-a3df-c252-a8bb9cd2e52f:frag:1:1] ERROR 
> o.a.d.exec.server.BootStrapContext - 
> org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception.
> java.lang.OutOfMemoryError: Java heap space
> 2017-02-07 22:38:17,391 [2765b496-0b5b-a3df-c252-a8bb9cd2e52f:frag:1:13] 
> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> FragmentExecutor.
> ...
> {code}
> To reproduce the issue please see environment details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5514) Enhance VectorContainer to merge two row sets

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053781#comment-16053781
 ] 

ASF GitHub Bot commented on DRILL-5514:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/837


> Enhance VectorContainer to merge two row sets
> -
>
> Key: DRILL-5514
> URL: https://issues.apache.org/jira/browse/DRILL-5514
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> Consider the concept of a "record batch" in Drill. On the one hand, one can 
> envision a record batch as a stack of records:
> {code}
> | a1 | b1 | c1 |
> 
> | a2 | b2 | c2 |
> {code}
> But, Drill is columnar. So a record batch is really a "bundle" of vectors:
> {code}
> | a1 || b1 || c1 |
> | a2 || b2 || c2 |
> {code}
> There are times when it is handy to build up a record batch as a merge of two 
> different vector bundles:
> {code}
> -- bundle 1 ---- bundle 2 --
> | a1 || b1 || c1 |
> | a2 || b2 || c2 |
> {code}
> For example, consider a reader. The reader implementation might read columns 
> (a, b) from a file, say. Then, the "{{ScanBatch}}" might add (c) as an 
> implicit vector (the file name, say.) The merged set of vectors comprises the 
> final schema: (a, b, c).
> This ticket asks for the code to do the merge:
> * Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
> * Merge two vector containers C1 and C2 to create a new container, C3, that 
> holds the merger of the vectors from the first two.
> Clearly, the merge only makes sense if:
> * The two input containers have the same row count, and
> * The columns in each input container are distinct.
> Because this feature is also useful for tests, add the merge to the "row set" 
> tools also.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5589) JDBC client crashes after successful authentication if trace logging is enabled.

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053783#comment-16053783
 ] 

ASF GitHub Bot commented on DRILL-5589:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/854


> JDBC client crashes after successful authentication if trace logging is 
> enabled.
> 
>
> Key: DRILL-5589
> URL: https://issues.apache.org/jira/browse/DRILL-5589
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>  Labels: ready-to-commit
>
> When authentication is completed then with latest changes we [dispose the 
> saslClient instance | 
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/rpc/security/AuthenticationOutcomeListener.java#L295]
>  if encryption is not enabled. Then later in caller we try to [log the 
> mechanism name | 
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/rpc/security/AuthenticationOutcomeListener.java#L136]
>  using saslClient instance with trace level logging. This will cause the 
> client to crash since the saslClient instance is already disposed before 
> logging. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5325) Implement sub-operator unit tests for managed external sort

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053735#comment-16053735
 ] 

ASF GitHub Bot commented on DRILL-5325:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/808
  
Revoke +1 since there are failures in unit tests:

> Failed tests: 
  TestExternalSortExec.testSortSpec:175 expected:<100> but 
was:<2000>



> Implement sub-operator unit tests for managed external sort
> ---
>
> Key: DRILL-5325
> URL: https://issues.apache.org/jira/browse/DRILL-5325
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> Validate the proposed sub-operator test framework, by creating low-level unit 
> tests for the managed version of the external sort.
> The external sort has a small number of existing tests, but those tests are 
> quite superficial; the "managed sort" project found many bugs. The managed 
> sort itself was tested with ad-hoc system-level tests created using the new 
> "cluster fixture" framework. But, again, such tests could not reach deep 
> inside the sort code to exercise very specific conditions.
> As a result, we spent far too much time using QA functional tests to identify 
> specific code issues.
> Using the sub-opeator unit test framework, we can instead test each bit of 
> functionality at the unit test level.
> If doing so works, and is practical, it can serve as a model for other 
> operator testing projects.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5325) Implement sub-operator unit tests for managed external sort

2017-06-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5325:

Labels:   (was: ready-to-commit)

> Implement sub-operator unit tests for managed external sort
> ---
>
> Key: DRILL-5325
> URL: https://issues.apache.org/jira/browse/DRILL-5325
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> Validate the proposed sub-operator test framework, by creating low-level unit 
> tests for the managed version of the external sort.
> The external sort has a small number of existing tests, but those tests are 
> quite superficial; the "managed sort" project found many bugs. The managed 
> sort itself was tested with ad-hoc system-level tests created using the new 
> "cluster fixture" framework. But, again, such tests could not reach deep 
> inside the sort code to exercise very specific conditions.
> As a result, we spent far too much time using QA functional tests to identify 
> specific code issues.
> Using the sub-opeator unit test framework, we can instead test each bit of 
> functionality at the unit test level.
> If doing so works, and is practical, it can serve as a model for other 
> operator testing projects.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5325) Implement sub-operator unit tests for managed external sort

2017-06-19 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5325:

Labels: ready-to-commit  (was: )

> Implement sub-operator unit tests for managed external sort
> ---
>
> Key: DRILL-5325
> URL: https://issues.apache.org/jira/browse/DRILL-5325
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> Validate the proposed sub-operator test framework, by creating low-level unit 
> tests for the managed version of the external sort.
> The external sort has a small number of existing tests, but those tests are 
> quite superficial; the "managed sort" project found many bugs. The managed 
> sort itself was tested with ad-hoc system-level tests created using the new 
> "cluster fixture" framework. But, again, such tests could not reach deep 
> inside the sort code to exercise very specific conditions.
> As a result, we spent far too much time using QA functional tests to identify 
> specific code issues.
> Using the sub-opeator unit test framework, we can instead test each bit of 
> functionality at the unit test level.
> If doing so works, and is practical, it can serve as a model for other 
> operator testing projects.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5325) Implement sub-operator unit tests for managed external sort

2017-06-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053710#comment-16053710
 ] 

ASF GitHub Bot commented on DRILL-5325:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/808
  
+1


> Implement sub-operator unit tests for managed external sort
> ---
>
> Key: DRILL-5325
> URL: https://issues.apache.org/jira/browse/DRILL-5325
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> Validate the proposed sub-operator test framework, by creating low-level unit 
> tests for the managed version of the external sort.
> The external sort has a small number of existing tests, but those tests are 
> quite superficial; the "managed sort" project found many bugs. The managed 
> sort itself was tested with ad-hoc system-level tests created using the new 
> "cluster fixture" framework. But, again, such tests could not reach deep 
> inside the sort code to exercise very specific conditions.
> As a result, we spent far too much time using QA functional tests to identify 
> specific code issues.
> Using the sub-opeator unit test framework, we can instead test each bit of 
> functionality at the unit test level.
> If doing so works, and is practical, it can serve as a model for other 
> operator testing projects.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5513) Managed External Sort : OOM error during the merge phase

2017-06-19 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053554#comment-16053554
 ] 

Paul Rogers commented on DRILL-5513:


The problem now is that the amount of memory needed to read a spilled batch is 
significantly larger than the amount of memory the batch consumed when written. 
From the (revised) logs:

{code}
ExternalSortBatch - Merge batch size: net = 666 bytes,
   gross = 888 bytes, records = 35460;
   spill file size: 268435456 bytes
...
PriorityQueueCopierWrapper - Merged 35460 records,
   consuming 7536640 bytes of memory
...
BatchGroup - Read 35460 records; size = 9437184
{code}

The sort decided a spill batch should contain 6,666,666 bytes of data. It 
assumed vectors are 75% full (25% internal fragmentation) for a memory target 
of 8,888,888 allocated bytes.

Actual spill was close: actual allocated memory was 7,536,640 bytes.

But, then the assumptions fell apart on re-reading this same data. The read 
needed 9,437,184 bytes of memory. Since this is much more than was expected, we 
got whacked by the operator memory allocator killing us due to an excessive 
allocation. (Sigh... Is this really a constructive use of time?)

Furthermore, since we have only 20 MB of memory, rereading two spilled batches 
consumes 18,874,368 bytes, leaving only 1,125,632 to hold an output batch that 
is supposed to be 6,666,668 bytes in size (net) or 8,888,888 bytes gross.

Have to come up with some way to anticipate the unexpected excessive memory use 
and reduce the spill batch size accordingly. (Or, have to track down the reason 
for the excessive use and fix it...)

> Managed External Sort : OOM error during the merge phase
> 
>
> Key: DRILL-5513
> URL: https://issues.apache.org/jira/browse/DRILL-5513
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Attachments: 26e5f7b8-71e8-afca-e72e-fad7be2b2416.sys.drill, 
> drillbit.log
>
>
> git.commit.id.abbrev=1e0a14c
> No of nodes in cluster : 1
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> The below query fails with an OOM
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_query` = 100;
> alter session set `planner.memory.max_query_memory_per_node` = 652428800;
> select count(*) from (select s1.type type, flatten(s1.rms.rptd) rptds from 
> (select d.type type, d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid) s1 
> order by s1.rms.mapid);
> {code}
> Exception from the logs
> {code}
> 2017-05-15 12:58:46,646 [BitServer-4] DEBUG 
> o.a.drill.exec.work.foreman.Foreman - 26e5f7b8-71e8-afca-e72e-fad7be2b2416: 
> State change requested RUNNING --> FAILED
> org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One 
> or more nodes ran out of memory while executing the query.
> Unable to allocate buffer of size 2097152 due to memory limit. Current 
> allocation: 19791880
> Fragment 5:2
> [Error Id: bb67176f-a780-400d-88c9-06fea131ea64 on qa-node190.qa.lab:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) Unable to allocate 
> buffer of size 2097152 due to memory limit. Current allocation: 19791880
> org.apache.drill.exec.memory.BaseAllocator.buffer():220
> org.apache.drill.exec.memory.BaseAllocator.buffer():195
> org.apache.drill.exec.vector.BigIntVector.reAlloc():212
> org.apache.drill.exec.vector.BigIntVector.copyFromSafe():324
> org.apache.drill.exec.vector.NullableBigIntVector.copyFromSafe():367
> 
> org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe():328
> 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe():360
> 
> org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe():220
> org.apache.drill.exec.vector.complex.MapVector.copyFromSafe():82
> 
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen4494.doCopy():34
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen4494.next():76
> 
> org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next():234
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns():1214
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():689
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():559
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.recor

76 matches

Mail list logo