date:20170621

[jira] [Updated] (DRILL-5599) Notify StatusHandlerListener that batch sending has failed even if channel is still open

2017-06-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5599:

Fix Version/s: 1.11.0

> Notify StatusHandlerListener that batch sending has failed even if channel is 
> still open 
> -
>
> Key: DRILL-5599
> URL: https://issues.apache.org/jira/browse/DRILL-5599
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
> Attachments: sample.json
>
>
> *Issue*
> Queries stay in CANCELLATION_REQUESTED state after connection with client was 
> interrupted. Jstack shows that threads for such queries are blocked and 
> waiting to semaphore to be released.
> {noformat}
> "26b70318-ddde-9ead-eee2-0828da97b59f:frag:0:0" daemon prio=10 
> tid=0x7f56dc3c9000 nid=0x25fd waiting on condition [0x7f56b31dc000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0006f4688ab0> (a 
> java.util.concurrent.Semaphore$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
>   at java.util.concurrent.Semaphore.acquire(Semaphore.java:472)
>   at 
> org.apache.drill.exec.ops.SendingAccountor.waitForSendComplete(SendingAccountor.java:48)
>   - locked <0x0006f4688a78> (a 
> org.apache.drill.exec.ops.SendingAccountor)
>   at 
> org.apache.drill.exec.ops.FragmentContext.waitForSendComplete(FragmentContext.java:486)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:134)
>   at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.close(ScreenCreator.java:141)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:313)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264)
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:  - <0x00073f800b68> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}
> *Reproduce*
> Ran modified ConcurrencyTest.java referenced in DRILL-4338 and cancel after 
> 2-3 seconds. ConcurrencyTest.java should be modified as follows:
> {{ExecutorService executor = Executors.newFixedThreadPool(10);}} and execute 
> 200 queries  {{for (int i = 1; i <= 200; i++)}}.
> Query: {{select * from dfs.`sample.json`}}, data set is attached.
> *Problem description*
> Looks like the problem occurs when the server has sent data to the client and 
> waiting from the client confirmation that data was received. In this case 
> [{{ChannelListenerWithCoordinationId}}|https://github.com/apache/drill/blob/master/exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java#L118]
>  is used for tracking. {{ChannelListenerWithCoordinationId}} contains 
> {{StatusHandler}} which keeps track of sent batches. It updates 
> {{SendingAccountor}} with information about how many batches were sent and 
> how many batches have reached the client (successfully or not).
> When sent operation is complete (successfully or not) 
> {{operationComplete(ChannelFuture future)}} is called. Given future contains 
> information if sent operation was successful or not, failure cause, channel 
> status etc. If sent operation was successful we do nothing since in this case 
> client sent us acknowledgment and when we received it, we notified 
> {{StatusHandlerListener}} has batch was received. But if sent operation has 
> failed, we need to notify {{StatusHandler}} was sent has unsuccessful.
> {{operationComplete(ChannelFuture future)}} code:
> {code}
>   if (!future.isSuccess()) {
> removeFromMap(coordinationId);
> if (future.channel().isActive()) {
>   throw new RpcException("Future failed");
> } else {
>   setException(new ChannelClosedException());
> }
>

[jira] [Updated] (DRILL-5599) Notify StatusHandlerListener that batch sending has failed even if channel is still open

2017-06-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5599:

Description: 
*Issue*
Queries stay in CANCELLATION_REQUESTED state after connection with client was 
interrupted. Jstack shows that threads for such queries are blocked and waiting 
to semaphore to be released.
{noformat}
"26b70318-ddde-9ead-eee2-0828da97b59f:frag:0:0" daemon prio=10 
tid=0x7f56dc3c9000 nid=0x25fd waiting on condition [0x7f56b31dc000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0006f4688ab0> (a 
java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:472)
at 
org.apache.drill.exec.ops.SendingAccountor.waitForSendComplete(SendingAccountor.java:48)
- locked <0x0006f4688a78> (a 
org.apache.drill.exec.ops.SendingAccountor)
at 
org.apache.drill.exec.ops.FragmentContext.waitForSendComplete(FragmentContext.java:486)
at 
org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:134)
at 
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.close(ScreenCreator.java:141)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:313)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264)
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
   Locked ownable synchronizers:- <0x00073f800b68> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{noformat}

*Reproduce*
Ran modified ConcurrencyTest.java referenced in DRILL-4338 and cancel after 2-3 
seconds. ConcurrencyTest.java should be modified as follows:
{{ExecutorService executor = Executors.newFixedThreadPool(10);}} and execute 
200 queries  {{for (int i = 1; i <= 200; i++)}}.
Query: {{select * from dfs.`sample.json`}}, data set is attached.

*Problem description*
Looks like the problem occurs when the server has sent data to the client and 
waiting from the client confirmation that data was received. In this case 
[{{ChannelListenerWithCoordinationId}}|https://github.com/apache/drill/blob/master/exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java#L118]
 is used for tracking. {{ChannelListenerWithCoordinationId}} contains 
{{StatusHandler}} which keeps track of sent batches. It updates 
{{SendingAccountor}} with information about how many batches were sent and how 
many batches have reached the client (successfully or not).

When sent operation is complete (successfully or not) 
{{operationComplete(ChannelFuture future)}} is called. Given future contains 
information if sent operation was successful or not, failure cause, channel 
status etc. If sent operation was successful we do nothing since in this case 
client sent us acknowledgment and when we received it, we notified 
{{StatusHandlerListener}} that batch was received. But if sent operation has 
failed, we need to notify {{StatusHandler}} that sent was unsuccessful.

{{operationComplete(ChannelFuture future)}} code:
{code}
  if (!future.isSuccess()) {
removeFromMap(coordinationId);
if (future.channel().isActive()) {
  throw new RpcException("Future failed");
} else {
  setException(new ChannelClosedException());
}
  }
}
{code}

Method {{setException}} notifies {{StatusHandler}} that batch sent has failed 
but it's only called when channel is closed. When channel is still open we just 
throw {{RpcException}}. This is where the problem occurs. 
{{operationComplete(ChannelFuture future)}} is called via Netty 
{{DefaultPromise.notifyListener0}} method which catches {{Throwable}} and just 
logs it. So even of we throw exception nobody is notified about it, especially 
{{StatusHandler}}.

*Fix*
Use {{setException}} even if channel is still open instead of throwing 
exception.

This problem was also raised in 
[PR-463|https://github.com/apache/drill/pull/463] but was decided to be fixed 
in the sco

[jira] [Commented] (DRILL-5599) Notify StatusHandlerListener that batch sending has failed even if channel is still open

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057326#comment-16057326
 ] 

ASF GitHub Bot commented on DRILL-5599:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/857
  
Reworded error message. @paul-rogers and @ppadma thanks for code review!


> Notify StatusHandlerListener that batch sending has failed even if channel is 
> still open 
> -
>
> Key: DRILL-5599
> URL: https://issues.apache.org/jira/browse/DRILL-5599
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
> Attachments: sample.json
>
>
> *Issue*
> Queries stay in CANCELLATION_REQUESTED state after connection with client was 
> interrupted. Jstack shows that threads for such queries are blocked and 
> waiting to semaphore to be released.
> {noformat}
> "26b70318-ddde-9ead-eee2-0828da97b59f:frag:0:0" daemon prio=10 
> tid=0x7f56dc3c9000 nid=0x25fd waiting on condition [0x7f56b31dc000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0006f4688ab0> (a 
> java.util.concurrent.Semaphore$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
>   at java.util.concurrent.Semaphore.acquire(Semaphore.java:472)
>   at 
> org.apache.drill.exec.ops.SendingAccountor.waitForSendComplete(SendingAccountor.java:48)
>   - locked <0x0006f4688a78> (a 
> org.apache.drill.exec.ops.SendingAccountor)
>   at 
> org.apache.drill.exec.ops.FragmentContext.waitForSendComplete(FragmentContext.java:486)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:134)
>   at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.close(ScreenCreator.java:141)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:313)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264)
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:  - <0x00073f800b68> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}
> *Reproduce*
> Ran modified ConcurrencyTest.java referenced in DRILL-4338 and cancel after 
> 2-3 seconds. ConcurrencyTest.java should be modified as follows:
> {{ExecutorService executor = Executors.newFixedThreadPool(10);}} and execute 
> 200 queries  {{for (int i = 1; i <= 200; i++)}}.
> Query: {{select * from dfs.`sample.json`}}, data set is attached.
> *Problem description*
> Looks like the problem occurs when the server has sent data to the client and 
> waiting from the client confirmation that data was received. In this case 
> [{{ChannelListenerWithCoordinationId}}|https://github.com/apache/drill/blob/master/exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java#L118]
>  is used for tracking. {{ChannelListenerWithCoordinationId}} contains 
> {{StatusHandler}} which keeps track of sent batches. It updates 
> {{SendingAccountor}} with information about how many batches were sent and 
> how many batches have reached the client (successfully or not).
> When sent operation is complete (successfully or not) 
> {{operationComplete(ChannelFuture future)}} is called. Given future contains 
> information if sent operation was successful or not, failure cause, channel 
> status etc. If sent operation was successful we do nothing since in this case 
> client sent us acknowledgment and when we received it, we notified 
> {{StatusHandlerListener}} that batch was received. But if sent operation has 
> failed, we need to notify {{StatusHandler}} that sent was unsuccessful.
> {{operationComplete(ChannelFuture future)}} code:
> {code}
>   if (!future.isSuccess()) {
> removeFromMap

[jira] [Updated] (DRILL-5130) UNION ALL difference in results

2017-06-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5130:

Fix Version/s: 1.11.0

> UNION ALL difference in results
> ---
>
> Key: DRILL-5130
> URL: https://issues.apache.org/jira/browse/DRILL-5130
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow, Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Arina Ielchiieva
> Fix For: 1.11.0
>
>
> Drill 1.9.0 git commit ID: 51246693
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> values(1,2,3,4,5,6) union all 
> values(7,8,9,10,11,12);
> +-+-+-+-+-+-+
> | EXPR$0  | EXPR$1  | EXPR$2  | EXPR$3  | EXPR$4  | EXPR$5  |
> +-+-+-+-+-+-+
> | 7   | 8   | 9   | 10  | 11  | 12  |
> | 7   | 8   | 9   | 10  | 11  | 12  |
> +-+-+-+-+-+-+
> 2 rows selected (0.209 seconds)
> {noformat}
> Postgres 9.3
> {noformat}
> postgres=# values(1,2,3,4,5,6) union all values(7,8,9,10,11,12);
>  column1 | column2 | column3 | column4 | column5 | column6 
> -+-+-+-+-+-
>1 |   2 |   3 |   4 |   5 |   6
>7 |   8 |   9 |  10 |  11 |  12
> (2 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (DRILL-5130) UNION ALL difference in results

2017-06-21 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050618#comment-16050618
 ] 

Arina Ielchiieva edited comment on DRILL-5130 at 6/21/17 11:06 AM:
---

The problem is with incorrectly overridden explainTerms method. This method is 
responsible for describing the inputs and attributes of the relational 
expression.
In DrillValuesRel this method was incorrectly overriden, in ValuesPrel is was 
not overriden at all. Thus two Values nodes with the same row type and row 
count were considered to be the same, though their values were different. 
During planning Calcite discarded duplicated DrillValuesRel and ValuesPrel 
(duplicates are found by comparing string representation of two relational 
expressions to generate such representation explainTerms is used) and used the 
same one for both Values expressions.

Query:
{noformat}
values('a') union all values('b')
{noformat}

Plan:
{noformat}
00-00Screen
00-01  Project(EXPR$0=[$0])
00-02UnionAll(all=[true])
00-04  Values
00-03  Values
{noformat}


was (Author: arina):
The problem is with incorrectly overridden explainTerms method. This method is 
responsible for describing the inputs and attributes of the relational 
expression.
In DrillValuesRel this method was incorrectly overriden, in ValuesPrel is was 
not overriden at all. Thus two Values nodes with the same row type and row 
count were considered to be the same, though their values were different. 
During planning Calcite removed discarded duplicated DrillValuesRel and 
ValuesPrel (duplicates are found by comparing string representation of two 
relational expressions to generate such representation explainTerms is used) 
and used the same one for both Values expressions.

Query:
{noformat}
values('a') union all values('b')
{noformat}

Plan:
{noformat}
00-00Screen
00-01  Project(EXPR$0=[$0])
00-02UnionAll(all=[true])
00-04  Values
00-03  Values
{noformat}

> UNION ALL difference in results
> ---
>
> Key: DRILL-5130
> URL: https://issues.apache.org/jira/browse/DRILL-5130
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow, Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Arina Ielchiieva
> Fix For: 1.11.0
>
>
> Drill 1.9.0 git commit ID: 51246693
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> values(1,2,3,4,5,6) union all 
> values(7,8,9,10,11,12);
> +-+-+-+-+-+-+
> | EXPR$0  | EXPR$1  | EXPR$2  | EXPR$3  | EXPR$4  | EXPR$5  |
> +-+-+-+-+-+-+
> | 7   | 8   | 9   | 10  | 11  | 12  |
> | 7   | 8   | 9   | 10  | 11  | 12  |
> +-+-+-+-+-+-+
> 2 rows selected (0.209 seconds)
> {noformat}
> Postgres 9.3
> {noformat}
> postgres=# values(1,2,3,4,5,6) union all values(7,8,9,10,11,12);
>  column1 | column2 | column3 | column4 | column5 | column6 
> -+-+-+-+-+-
>1 |   2 |   3 |   4 |   5 |   6
>7 |   8 |   9 |  10 |  11 |  12
> (2 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5130) UNION ALL difference in results

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057331#comment-16057331
 ] 

ASF GitHub Bot commented on DRILL-5130:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/853#discussion_r123218383
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillValuesRelBase.java
 ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.common;
+
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.AbstractRelNode;
+import org.apache.calcite.rel.RelWriter;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.drill.common.JSONOptions;
+
+public abstract class DrillValuesRelBase extends AbstractRelNode {
--- End diff --

Done.


> UNION ALL difference in results
> ---
>
> Key: DRILL-5130
> URL: https://issues.apache.org/jira/browse/DRILL-5130
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow, Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Arina Ielchiieva
> Fix For: 1.11.0
>
>
> Drill 1.9.0 git commit ID: 51246693
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> values(1,2,3,4,5,6) union all 
> values(7,8,9,10,11,12);
> +-+-+-+-+-+-+
> | EXPR$0  | EXPR$1  | EXPR$2  | EXPR$3  | EXPR$4  | EXPR$5  |
> +-+-+-+-+-+-+
> | 7   | 8   | 9   | 10  | 11  | 12  |
> | 7   | 8   | 9   | 10  | 11  | 12  |
> +-+-+-+-+-+-+
> 2 rows selected (0.209 seconds)
> {noformat}
> Postgres 9.3
> {noformat}
> postgres=# values(1,2,3,4,5,6) union all values(7,8,9,10,11,12);
>  column1 | column2 | column3 | column4 | column5 | column6 
> -+-+-+-+-+-
>1 |   2 |   3 |   4 |   5 |   6
>7 |   8 |   9 |  10 |  11 |  12
> (2 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (DRILL-2975) Extended Json : Time type reporting data which is dependent on the system on which it ran

2017-06-21 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka closed DRILL-2975.
--
   Resolution: Not A Bug
Fix Version/s: (was: Future)

According to the conversation in 
[DRILL-4116|https://issues.apache.org/jira/browse/DRILL-4116] it seems that 
drillbits on these two machines work with different timezones. 
Therefore Drill behaviour described in the jira is expected and jira can be 
closed.

[~rkins] If something is missed, please reopen the jira.


> Extended Json : Time type reporting data which is dependent on the system on 
> which it ran
> -
>
> Key: DRILL-2975
> URL: https://issues.apache.org/jira/browse/DRILL-2975
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>Priority: Critical
>
> git.commit.id.abbrev=3b19076
> Data :
> {code}
> {
>   "int_col" : {"$numberLong": 1},
>   "date_col" : {"$dateDay": "2012-05-22"},
>   "time_col"  : {"$time": "19:20:30.45Z"}
> }
> {code}
> System 1 :
> {code}
> 0: jdbc:drill:schema=dfs_eea> select time_col from `extended_json/data1.json` 
> d;
> ++
> |  time_col  |
> ++
> | 19:20:30.450 |
> ++
> {code}
> System 2 :
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexP> select time_col from 
> `temp.json`;
> ++
> |  time_col  |
> ++
> | 11:20:30.450 |
> ++
> {code}
> The above results are inconsistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread Laurent Goujon (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057749#comment-16057749
 ] 

Laurent Goujon commented on DRILL-3640:
---

There are two PR linked to that JIRA, should both of them be applied?

Also, according to the JDBC spec, this methods applies to the {{execute*}} 
methods and might apply to the ResultSet methods, but this is not required. 
Also for executeBatch, it's up to the driver to decide if timeout is per query 
or total, but that should be in the javadoc I guess.

> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057772#comment-16057772
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123294241
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ---
@@ -38,8 +44,12 @@
 // methods for compatibility.)
 class DrillStatementImpl extends AvaticaStatement implements 
DrillStatement,
  
DrillRemoteStatement {
+  //Not using the DrillbitContext's ExecutorService as this is threadPool 
is light-weight (threads wake up to cancel tasks) but needs a low response time
+  private static ExecutorService queryTimeoutTaskPool = 
Executors.newCachedThreadPool(new NamedThreadFactory("q-timeout-"));
--- End diff --

I believe this is unnecessary: DrillClient provides an asynchronous API, 
which is used by the JDBC driver, so all the timeout logic could be done 
without the use of thread pool. You might want to look at DrillCursor which is 
where all the magic happens I believe.


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057773#comment-16057773
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123292200
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/SqlTimeoutException.java ---
@@ -23,12 +23,17 @@
  * Indicates that an operation timed out. This is not an error; you can
  * retry the operation.
  */
-public class SqlTimeoutException
-extends SQLException
-{
+public class SqlTimeoutException extends SQLException {
+
+  private static final long serialVersionUID = 2017_06_20L;
--- End diff --

maybe use -1  or use your IDE serialVersionUID generator (I know there's an 
algorithm for it...), but I don't think Drill is making any effort to make this 
class serializable so it could be a `@SuppressedWarning` annotation as well.


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057774#comment-16057774
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123294533
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ---
@@ -497,14 +594,64 @@ public boolean isPoolable() throws SQLException {
 
   @Override
   public void closeOnCompletion() throws SQLException {
-throwIfClosed();
+throwIfTimedOutOrClosed();
 super.closeOnCompletion();
   }
 
   @Override
   public boolean isCloseOnCompletion() throws SQLException {
-throwIfClosed();
+throwIfTimedOutOrClosed();
 return super.isCloseOnCompletion();
   }
 
 }
+
+/**
+ * Timeout Trigger required for canceling of running queries
+ */
+class TimeoutTrigger implements Callable {
+  private int timeoutInSeconds;
+
+  /**
+   * Get Timeout period in seconds
+   */
+  public int getTimeoutInSeconds() {
+return timeoutInSeconds;
+  }
+
+  private DrillStatementImpl statementHandle;
+
+  //Default Constructor is Invalid
+  @SuppressWarnings("unused")
+  private TimeoutTrigger() {}
+
+  /**
+   * Timeout Constructor
+   * @param stmtContext   Statement Handle
+   * @param timeoutInSec  Timeout defined in seconds
+   */
+  TimeoutTrigger(DrillStatementImpl stmtContext, int timeoutInSec) {
+timeoutInSeconds = timeoutInSec;
+statementHandle = stmtContext;
+  }
+
+  @Override
+  public Boolean call() throws Exception {
+try {
+  Thread.sleep(timeoutInSeconds*1000L);
--- End diff --

`TimeUnit.SECONDS.sleep(timeoutInSeconds)`


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057778#comment-16057778
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123295256
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -64,13 +65,17 @@
   org.slf4j.LoggerFactory.getLogger(DrillResultSetImpl.class);
 
   private final DrillConnectionImpl connection;
+  private DrillStatementImpl drillStatement = null;
   private volatile boolean hasPendingCancelationNotification = false;
 
   DrillResultSetImpl(AvaticaStatement statement, Meta.Signature signature,
  ResultSetMetaData resultSetMetaData, TimeZone 
timeZone,
  Meta.Frame firstFrame) {
 super(statement, signature, resultSetMetaData, timeZone, firstFrame);
 connection = (DrillConnectionImpl) statement.getConnection();
+if (statement instanceof DrillStatementImpl) {
--- End diff --

what about prepared statement?


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057771#comment-16057771
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123293197
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ---
@@ -159,24 +230,25 @@ public void cleanUp() {
   public int getQueryTimeout() throws AlreadyClosedSqlException
   {
 throwIfClosed();
-return 0;  // (No no timeout.)
--- End diff --

might want to check what was Avatica's behavior before we overriden it...


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057775#comment-16057775
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123295947
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -125,7 +154,7 @@ protected void cancel() {
   // (Not delegated.)
   @Override
   public boolean next() throws SQLException {
-throwIfClosed();
+throwIfTimedOutOrClosed();
--- End diff --

if we chose to support `queryTimeout` for `ResultSet`, shouldn't we 
interrupt `next()` too if the operation is taking too long? As per your code, 
it seems the exception would be thrown after facts...


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1605#comment-1605
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123296355
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -1384,7 +1402,7 @@ public void updateRowId( String columnLabel, RowId x 
) throws SQLException {
 
   @Override
   public int getHoldability() throws SQLException {
-throwIfClosed();
+throwIfTimedOutOrClosed();
--- End diff --

not sure if it makes sense (and correct per spec) to throw 
SqlTimeoutException here... (and similar methods)


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057776#comment-16057776
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123293533
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ---
@@ -204,7 +276,7 @@ public boolean isClosed() {
 
   @Override
   public int getMaxFieldSize() throws SQLException {
-throwIfClosed();
+throwIfTimedOutOrClosed();
--- End diff --

this methods cannot throw SqlTimeoutException... only `execute*` methods 
can, and eventually some ResultSet methods (but this part is optional per JDBC 
standard)


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057824#comment-16057824
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123303854
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ---
@@ -422,6 +507,9 @@ public ResultSet getGeneratedKeys() throws SQLException 
{
   public int executeUpdate(String sql, int autoGeneratedKeys) throws 
SQLException {
 throwIfClosed();
 try {
+  if (timeoutTrigger != null) {
--- End diff --

so the trigger is created, but if super.executeUpdate takes more than 
queryTimeout seconds, the method is not interrupted. I don't believe this is 
conform to JDBC spec, or somehow useful to the end-user (if query takes 2min 
but timeout is 60s, exception should be thrown after 60s, not 120s...)


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5130) UNION ALL difference in results

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057869#comment-16057869
 ] 

ASF GitHub Bot commented on DRILL-5130:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/853#discussion_r123312311
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillValuesRelBase.java
 ---
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.common;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.AbstractRelNode;
+import org.apache.calcite.rel.RelWriter;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.drill.common.JSONOptions;
+
+/**
+ * Base class for logical and physical Values implemented in Drill.
+ */
+public abstract class DrillValuesRelBase extends AbstractRelNode {
--- End diff --

Calcite has an abstract class Values. Neither of DrillValueRel or ValuePrel 
extends from that Calcite class. Will it help solve the problem if we extend 
from Calcite's class?


https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/Values.java



> UNION ALL difference in results
> ---
>
> Key: DRILL-5130
> URL: https://issues.apache.org/jira/browse/DRILL-5130
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow, Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Arina Ielchiieva
> Fix For: 1.11.0
>
>
> Drill 1.9.0 git commit ID: 51246693
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> values(1,2,3,4,5,6) union all 
> values(7,8,9,10,11,12);
> +-+-+-+-+-+-+
> | EXPR$0  | EXPR$1  | EXPR$2  | EXPR$3  | EXPR$4  | EXPR$5  |
> +-+-+-+-+-+-+
> | 7   | 8   | 9   | 10  | 11  | 12  |
> | 7   | 8   | 9   | 10  | 11  | 12  |
> +-+-+-+-+-+-+
> 2 rows selected (0.209 seconds)
> {noformat}
> Postgres 9.3
> {noformat}
> postgres=# values(1,2,3,4,5,6) union all values(7,8,9,10,11,12);
>  column1 | column2 | column3 | column4 | column5 | column6 
> -+-+-+-+-+-
>1 |   2 |   3 |   4 |   5 |   6
>7 |   8 |   9 |  10 |  11 |  12
> (2 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057943#comment-16057943
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123324366
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -64,13 +65,17 @@
   org.slf4j.LoggerFactory.getLogger(DrillResultSetImpl.class);
 
   private final DrillConnectionImpl connection;
+  private DrillStatementImpl drillStatement = null;
   private volatile boolean hasPendingCancelationNotification = false;
 
   DrillResultSetImpl(AvaticaStatement statement, Meta.Signature signature,
  ResultSetMetaData resultSetMetaData, TimeZone 
timeZone,
  Meta.Frame firstFrame) {
 super(statement, signature, resultSetMetaData, timeZone, firstFrame);
 connection = (DrillConnectionImpl) statement.getConnection();
+if (statement instanceof DrillStatementImpl) {
--- End diff --

The original DrillStatement threw an exception for setting the query time 
out, but I don't see the same for the DrillPreparedStatement, which tries to 
call Avatica's implementation. My testing indicates that this is effectively a 
No-Op. I can extend it to that as well, but I was hoping to hear back from you 
first. 


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057945#comment-16057945
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123325052
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/SqlTimeoutException.java ---
@@ -23,12 +23,17 @@
  * Indicates that an operation timed out. This is not an error; you can
  * retry the operation.
  */
-public class SqlTimeoutException
-extends SQLException
-{
+public class SqlTimeoutException extends SQLException {
+
+  private static final long serialVersionUID = 2017_06_20L;
--- End diff --

Fair enough... will set it to a -1. 
I saw a similar assignment for _InvalidParameterSqlException_ , which was 
set as a timestamp, so I just tried to follow the convention.


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057951#comment-16057951
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123325595
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ---
@@ -38,8 +44,12 @@
 // methods for compatibility.)
 class DrillStatementImpl extends AvaticaStatement implements 
DrillStatement,
  
DrillRemoteStatement {
+  //Not using the DrillbitContext's ExecutorService as this is threadPool 
is light-weight (threads wake up to cancel tasks) but needs a low response time
+  private static ExecutorService queryTimeoutTaskPool = 
Executors.newCachedThreadPool(new NamedThreadFactory("q-timeout-"));
--- End diff --

I'm not sure if there is a way for me to reference back to the Statement 
object... since the objective is to simply have a sleeping thread in this pool 
timeout and **cancel** the query. Let me see look around.


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057963#comment-16057963
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123327453
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ---
@@ -38,8 +44,12 @@
 // methods for compatibility.)
 class DrillStatementImpl extends AvaticaStatement implements 
DrillStatement,
  
DrillRemoteStatement {
+  //Not using the DrillbitContext's ExecutorService as this is threadPool 
is light-weight (threads wake up to cancel tasks) but needs a low response time
+  private static ExecutorService queryTimeoutTaskPool = 
Executors.newCachedThreadPool(new NamedThreadFactory("q-timeout-"));
--- End diff --

DrillCursor is handling all the logic of executing queries, and waiting for 
results. It has access to the connection and the statement, so you would know 
the timeout (if set). In the cursor, we are using a lock for the first message, 
and a blocking queue for the batches, but when waiting on those, there's no 
timeout set. Instead we could use query timeout (or the remaining time left 
since the beginning of the execution) and throws SqlTimeoutException when the 
locks throws TimeoutException themselves.

In that scenario, no thread pool involved (except the one for I/O but it 
was already existing)


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057965#comment-16057965
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123327737
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ---
@@ -159,24 +230,25 @@ public void cleanUp() {
   public int getQueryTimeout() throws AlreadyClosedSqlException
   {
 throwIfClosed();
-return 0;  // (No no timeout.)
--- End diff --

Interestingly, AvaticaStatement returns the timeout value that was set... 
but does not honour it! :) 
Originally the setter would trigger a NotSupported exception and the 
explicit return was the default 0
Now that we're able to support it, I can read Avatica's value directly. 



> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057978#comment-16057978
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123329050
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ---
@@ -204,7 +276,7 @@ public boolean isClosed() {
 
   @Override
   public int getMaxFieldSize() throws SQLException {
-throwIfClosed();
+throwIfTimedOutOrClosed();
--- End diff --

The throwIfTimedOutOrClosed() call is basically a wrapper around a 
sequential check for, first the timed-out state, followed by the check for the 
closed state. A timed-out query (i.e. statement/resultset) is already in a 
closed state, but we need to throw the correct exception (in this case, the 
timeout), which is why it was done like that. 
My understanding was that any execute and data fetch operations can throw 
timeoutExceptions. Are you suggesting that for such '_getter_' methods, only an 
AlreadyClosed exception needs to be thrown and not time out?


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057979#comment-16057979
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123329155
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ---
@@ -497,14 +594,64 @@ public boolean isPoolable() throws SQLException {
 
   @Override
   public void closeOnCompletion() throws SQLException {
-throwIfClosed();
+throwIfTimedOutOrClosed();
 super.closeOnCompletion();
   }
 
   @Override
   public boolean isCloseOnCompletion() throws SQLException {
-throwIfClosed();
+throwIfTimedOutOrClosed();
 return super.isCloseOnCompletion();
   }
 
 }
+
+/**
+ * Timeout Trigger required for canceling of running queries
+ */
+class TimeoutTrigger implements Callable {
+  private int timeoutInSeconds;
+
+  /**
+   * Get Timeout period in seconds
+   */
+  public int getTimeoutInSeconds() {
+return timeoutInSeconds;
+  }
+
+  private DrillStatementImpl statementHandle;
+
+  //Default Constructor is Invalid
+  @SuppressWarnings("unused")
+  private TimeoutTrigger() {}
+
+  /**
+   * Timeout Constructor
+   * @param stmtContext   Statement Handle
+   * @param timeoutInSec  Timeout defined in seconds
+   */
+  TimeoutTrigger(DrillStatementImpl stmtContext, int timeoutInSec) {
+timeoutInSeconds = timeoutInSec;
+statementHandle = stmtContext;
+  }
+
+  @Override
+  public Boolean call() throws Exception {
+try {
+  Thread.sleep(timeoutInSeconds*1000L);
--- End diff --

+1 Will make the change.


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057981#comment-16057981
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123329682
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ---
@@ -204,7 +276,7 @@ public boolean isClosed() {
 
   @Override
   public int getMaxFieldSize() throws SQLException {
-throwIfClosed();
+throwIfTimedOutOrClosed();
--- End diff --

yes, execute and (optionally) data fetch operations. Other methods are not 
impacted.


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057983#comment-16057983
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123329889
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ---
@@ -159,24 +230,25 @@ public void cleanUp() {
   public int getQueryTimeout() throws AlreadyClosedSqlException
   {
 throwIfClosed();
-return 0;  // (No no timeout.)
--- End diff --

it's an optional thing, but the spec doesn't say what happens if you set it 
but it's actually not used...


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5325) Implement sub-operator unit tests for managed external sort

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057984#comment-16057984
 ] 

ASF GitHub Bot commented on DRILL-5325:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/808


> Implement sub-operator unit tests for managed external sort
> ---
>
> Key: DRILL-5325
> URL: https://issues.apache.org/jira/browse/DRILL-5325
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> Validate the proposed sub-operator test framework, by creating low-level unit 
> tests for the managed version of the external sort.
> The external sort has a small number of existing tests, but those tests are 
> quite superficial; the "managed sort" project found many bugs. The managed 
> sort itself was tested with ad-hoc system-level tests created using the new 
> "cluster fixture" framework. But, again, such tests could not reach deep 
> inside the sort code to exercise very specific conditions.
> As a result, we spent far too much time using QA functional tests to identify 
> specific code issues.
> Using the sub-opeator unit test framework, we can instead test each bit of 
> functionality at the unit test level.
> If doing so works, and is practical, it can serve as a model for other 
> operator testing projects.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057985#comment-16057985
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123330075
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/SqlTimeoutException.java ---
@@ -23,12 +23,17 @@
  * Indicates that an operation timed out. This is not an error; you can
  * retry the operation.
  */
-public class SqlTimeoutException
-extends SQLException
-{
+public class SqlTimeoutException extends SQLException {
+
+  private static final long serialVersionUID = 2017_06_20L;
--- End diff --

missed this one. Honestly not super important (feel free to ignore my 
initial comment)


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057991#comment-16057991
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123330684
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ---
@@ -422,6 +507,9 @@ public ResultSet getGeneratedKeys() throws SQLException 
{
   public int executeUpdate(String sql, int autoGeneratedKeys) throws 
SQLException {
 throwIfClosed();
 try {
+  if (timeoutTrigger != null) {
--- End diff --

I'm submit the timeout trigger to the pool and counting on that trigger 
doing a query cancellation to do that. I don't think Drill supports 
executeUpdate, but as long as a query cancellation for updates does the 
rollback of the transaction, this should suffice. This worked well for large 
queries where the execute###() call was longer than the timeout period and 
allowed for the cancellation to do the interrupt.


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057995#comment-16057995
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/858#discussion_r123331635
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -125,7 +154,7 @@ protected void cancel() {
   // (Not delegated.)
   @Override
   public boolean next() throws SQLException {
-throwIfClosed();
+throwIfTimedOutOrClosed();
--- End diff --

The query cancellation should take care of it. It'll be hard to have a unit 
test specifically for this, but I'll try.


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5601) Rollup of External Sort memory management fixes

2017-06-21 Thread Paul Rogers (JIRA)

Paul Rogers created DRILL-5601:
--

 Summary: Rollup of External Sort memory management fixes
 Key: DRILL-5601
 URL: https://issues.apache.org/jira/browse/DRILL-5601
 Project: Apache Drill
  Issue Type: Task
Affects Versions: 1.11.0
Reporter: Paul Rogers
Assignee: Paul Rogers
 Fix For: 1.11.0


Rollup of a set of specific JIRA entries that all relate to the very difficult 
problem of managing memory within Drill in order for the external sort to stay 
within a memory budget. In general, the fixes relate to better estimating 
memory used by the three ways that Drill allocates vector memory (see 
DRILL-5522) and to predicting the size of vectors that the sort will create, to 
avoid repeated realloc-copy cycles (see DRILL-5594).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058073#comment-16058073
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123316880
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -179,10 +182,18 @@ private Metadata(FileSystem fs, ParquetFormatConfig 
formatConfig) {
 
 for (final FileStatus file : fs.listStatus(p, new DrillPathFilter())) {
   if (file.isDirectory()) {
+String subdirectoryName = file.getPath().getName();
 ParquetTableMetadata_v3 subTableMetadata = 
(createMetaFilesRecursively(file.getPath().toString())).getLeft();
-metaDataList.addAll(subTableMetadata.files);
-directoryList.addAll(subTableMetadata.directories);
-directoryList.add(file.getPath().toString());
+for (ParquetFileMetadata_v3 pfm_v3 : subTableMetadata.files) {
+  // Construction of the relative file path by adding subdirectory 
name and inner relative file path
+  String relativePath = Joiner.on("/").join(subdirectoryName, 
pfm_v3.getPath());
--- End diff --

Regarding the `paths` I answered in the general comment. 

I refused from merging path names recursively. Instead of that I've 
implemented a new `MetadataPathUtils.createMetadataWithRelativePaths()` method 
that converts absolute paths to relative ones and creates a new metadata for 
the cache files. 



> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058074#comment-16058074
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123329475
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -264,15 +275,18 @@ private ParquetTableMetadata_v3 
getParquetTableMetadata(List fileSta
   /**
* Get a list of file metadata for a list of parquet files
*
-   * @param fileStatuses
-   * @return
+   * @param parquetTableMetadata_v3 can store column schema info from all 
the files and row groups
+   * @param fileStatuses list of the parquet files statuses
+   * @param absolutePathInMetadata true if result metadata files should 
contain absolute paths, false for relative paths.
+   *   Relative paths in the metadata are only 
necessary while creating meta cache files.
+   * @return list of the parquet file metadata (parquet metadata for every 
file)
* @throws IOException
*/
-  private List getParquetFileMetadata_v3(
-  ParquetTableMetadata_v3 parquetTableMetadata_v3, List 
fileStatuses) throws IOException {
+  private List 
getParquetFileMetadata_v3(ParquetTableMetadata_v3 parquetTableMetadata_v3,
+  List fileStatuses, boolean absolutePathInMetadata) 
throws IOException {
--- End diff --

Using of boolean flag is deleted.

For now we create and gather metadata only with absolute paths. But before 
writing based on the old metadata the new metadata with relative paths is 
created.

Agree. It makes sense to check every path while converting it. Done.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058076#comment-16058076
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123342097
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ---
@@ -398,6 +398,23 @@ public void testDrill4877() throws Exception {
 
   }
 
+  @Test // DRILL-3867
+  public void testMoveCache() throws Exception {
+String tableName = "nation_move";
+String newTableName = "nation_moved";
+test("use dfs_test.tmp");
+test("create table `%s/t1` as select * from cp.`tpch/nation.parquet`", 
tableName);
+test("create table `%s/t2` as select * from cp.`tpch/nation.parquet`", 
tableName);
+test(String.format("refresh table metadata %s", tableName));
+checkForMetadataFile(tableName);
+File srcFile = new File(getDfsTestTmpSchemaLocation(), tableName);
+File dstFile = new File(getDfsTestTmpSchemaLocation(), newTableName);
+FileUtils.moveDirectory(srcFile, dstFile);
+Assert.assertFalse("Cache file was not moved successfully", 
srcFile.exists());
+int rowCount = testSql(String.format("select * from %s", 
newTableName));
+Assert.assertEquals(50, rowCount);
+  }
+
--- End diff --

There is no requirement for them to use absolute paths. After this fix they 
can be upgraded to use relative paths. (I'm going to open separate jira for 
it). Therefore a new test case for metadata cache files with absolute paths was 
added.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058075#comment-16058075
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123339909
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -748,6 +771,22 @@ public ParquetTableMetadataDirs(List 
directories) {
   return directories;
 }
 
+/** If directories list contains relative paths, update it to absolute 
ones
+ * @param baseDir base parent directory
+ */
+@JsonIgnore public void updateRelativePaths(Path baseDir) {
+  if (!directories.isEmpty()) {
+// It is enough to check the first path to decide if updating 
needed
+if (!new Path(directories.get(0)).isAbsolute()) {
--- End diff --

It is possible to replace String with Path for directories paths due to 
implementing custom `JsonSerializer` and `JsonDeserializer`. But 
then it will be necessary to convert every `Path` from lists back into 
`String`, because a String paths are used in a lot of places: `FileSelection`, 
`Metadata`, `ParquetGroupScan`, `ReadEntryWithPath`, `FileWork`, 
`FormatSelection`, `FormatPlugin`,  `PartitionLocation`and so on.

I am totally agree with replacing `String` with `Path` requirement. But it 
should be done not only for parquet and in context of separate jira. I am going 
to create it. 


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058071#comment-16058071
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123331277
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -748,6 +771,22 @@ public ParquetTableMetadataDirs(List 
directories) {
   return directories;
 }
 
+/** If directories list contains relative paths, update it to absolute 
ones
--- End diff --

Yes, we do, internally we use absolute paths (for the FileSelection, 
FileStatus, ReadEntryWithPath). 

By the way it is possible to convert paths to absolute ones just before 
retrieving, but converting immediately after deserializing has advantages: 
avoiding of the keeping the metadata with appropriate `baseDir`, avoiding of 
the over number of checking the type of the path and avoiding an extra 
converting paths (when the data is retrieved several times from one metadata 
object).


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058072#comment-16058072
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123340606
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -1413,6 +1452,31 @@ public ColumnTypeMetadata_v3 
getColumnTypeInfo(String[] name) {
   return directories;
 }
 
+/** If directories list and file metadata list contain relative paths, 
update it to absolute ones
+ * @param baseDir base parent directory
+ */
+@JsonIgnore public void updateRelativePaths(Path baseDir) {
--- End diff --

I combined the general code for this two methods and created a separate 
helper methods.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5256) Exception in thread "main" java.lang.ExceptionInInitializerError

2017-06-21 Thread N Campbell (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058393#comment-16058393
 ] 

N Campbell commented on DRILL-5256:
---

Is anyone looking at this issue?

Drill 1.10. 
SQL Squirrel 
IBM JRE

Caused by: java.lang.NullPointerException
at 
oadd.io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.(PooledByteBufAllocatorL.java:93)
at 
oadd.io.netty.buffer.PooledByteBufAllocatorL.(PooledByteBufAllocatorL.java:56)
at 
oadd.org.apache.drill.exec.memory.AllocationManager.(AllocationManager.java:60)
... 16 more


> Exception in thread "main" java.lang.ExceptionInInitializerError
> 
>
> Key: DRILL-5256
> URL: https://issues.apache.org/jira/browse/DRILL-5256
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.7.0, 1.8.0
> Environment: Windows 7, IBM SDK 1.6 and IBM SDK 1.7
>Reporter: Vasu
>  Labels: jvm
>
> Below error while connecting to Drill Server 
> Exception in thread "main" java.lang.ExceptionInInitializerError
> at java.lang.J9VMInternals.initialize(J9VMInternals.java:257)
> at 
> oadd.org.apache.drill.exec.memory.BaseAllocator.(BaseAllocator.java:44)
> at java.lang.J9VMInternals.initializeImpl(Native Method)
> at java.lang.J9VMInternals.initialize(J9VMInternals.java:235)
> at java.lang.J9VMInternals.initialize(J9VMInternals.java:202)
> at 
> oadd.org.apache.drill.exec.memory.RootAllocatorFactory.newRoot(RootAllocatorFactory.java:38)
> at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:143)
> at 
> org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:64)
> at org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:69)
> at 
> oadd.net.hydromatic.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:126)
> at org.apache.drill.jdbc.Driver.connect(Driver.java:72)
> at java.sql.DriverManager.getConnection(DriverManager.java:583)
> at java.sql.DriverManager.getConnection(DriverManager.java:245)
> at com.trianz.drill.ApacheDrillDemo.main(ApacheDrillDemo.java:13)
> Caused by: java.lang.NullPointerException
> at 
> oadd.io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.(PooledByteBufAllocatorL.java:93)
> at 
> oadd.io.netty.buffer.PooledByteBufAllocatorL.(PooledByteBufAllocatorL.java:56)
> at 
> oadd.org.apache.drill.exec.memory.AllocationManager.(AllocationManager.java:60)
> at java.lang.J9VMInternals.initializeImpl(Native Method)
> at java.lang.J9VMInternals.initialize(J9VMInternals.java:235)
> ... 13 more
> When I tried to debug in to source code, following is the place where we are 
> getting NULL POINTER EXCEPTION
>  
> drill/exec/memory/base/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java
>  
> Line: 93: this.chunkSize = directArenas[0].chunkSize;
> Below is the code snapshot.
>  public InnerAllocator() {
>   super(true);
>   try {
> Field f = 
> PooledByteBufAllocator.class.getDeclaredField("directArenas");
> f.setAccessible(true);
> this.directArenas = (PoolArena[]) f.get(this);
>   } catch (Exception e) {
> throw new RuntimeException("Failure while initializing allocator.  
> Unable to retrieve direct arenas field.", e);
>   }
>   this.chunkSize = directArenas[0].chunkSize;
> if (memoryLogger.isTraceEnabled()) {
> Can anyone please help on this? Thanks in advance.
> Thanks,
> Vasu T



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058445#comment-16058445
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r123138037
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/SubOperatorTest.java ---
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ 
**/
--- End diff --

Checkstyle does not care. In fact, earlier headers used to use a line of 
stars across the top as well, but the first three characters, /**, caused the 
comment to look like Javadoc, so we've been deprecating that style. Still, 
fixed this one as well.


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058447#comment-16058447
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r123140150
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/SchemaBuilder.java ---
@@ -96,27 +137,57 @@ public SchemaBuilder withSVMode(SelectionVectorMode 
svMode) {
 
   public SchemaBuilder() { }
 
+  public SchemaBuilder(BatchSchema baseSchema) {
+for (MaterializedField field : baseSchema) {
+  columns.add(field);
+}
+  }
+
   public SchemaBuilder add(String pathName, MajorType type) {
-MaterializedField col = MaterializedField.create(pathName, type);
+return add(MaterializedField.create(pathName, type));
+  }
+
+  public SchemaBuilder add(MaterializedField col) {
 columns.add(col);
 return this;
   }
 
+  public static MaterializedField columnSchema(String pathName, MinorType 
type, DataMode mode) {
+return MaterializedField.create(pathName,
+MajorType.newBuilder()
--- End diff --

Just saving an unnecessary object creation. The schema builder is for cases 
where we set more than the "basic three" properties. This method handles the 
vast majority of the cases in which we use just the "basic three".


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058448#comment-16058448
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r123138411
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/SubOperatorTest.java ---
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ 
**/
+package org.apache.drill.test;
+
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+
+public class SubOperatorTest extends DrillTest {
+
+  protected static OperatorFixture fixture;
+
+  @BeforeClass
+  public static void setUpBeforeClass() throws Exception {
+fixture = OperatorFixture.standardFixture();
+  }
+
+  @AfterClass
+  public static void tearDownAfterClass() throws Exception {
--- End diff --

There is a reason... Junit also allows a \@Before and \@After tag that are 
per-test actions. Those are often called `setup()` and `teardown()`. The static 
per-class versions are often called with the names used here. Still, renamed 
them to be a bit clearer.


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058449#comment-16058449
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r123140560
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/RowSetTest.java 
---
@@ -309,13 +317,38 @@ private void testDoubleRW() {
 assertEquals(0, reader.column(0).getDouble(), 0.01);
 assertTrue(reader.next());
 assertEquals(Double.MAX_VALUE, reader.column(0).getDouble(), 0.01);
+assertEquals(Double.MAX_VALUE, (double) reader.column(0).getObject(), 
0.01);
--- End diff --

Casting is required because `assertEquals` does not know what to do with an 
`Object` as its second argument. The cast to `double` forces a cast to 
`Double`, then unboxing.


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058452#comment-16058452
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r123140827
  
--- Diff: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/impl/TupleReaderImpl.java
 ---
@@ -101,8 +88,61 @@ public String getAsString(int colIndex) {
   return "\"" + colReader.getString() + "\"";
 case DECIMAL:
   return colReader.getDecimal().toPlainString();
+case ARRAY:
+  return getArrayAsString(colReader.array());
 default:
   throw new IllegalArgumentException("Unsupported type " + 
colReader.valueType());
 }
   }
+
+  private String bytesToString(byte[] value) {
+StringBuilder buf = new StringBuilder()
+.append("[");
+int len = Math.min(value.length, 20);
+for (int i = 0; i < len;  i++) {
+  if (i > 0) {
+buf.append(", ");
+  }
+  buf.append((int) value[i]);
+}
+if (value.length > len) {
+  buf.append("...");
+}
+buf.append("]");
+return buf.toString();
+  }
+
+  private String getArrayAsString(ArrayReader array) {
+StringBuilder buf = new StringBuilder();
+buf.append("[");
+for (int i = 0; i < array.size(); i++) {
+  if (i > 0) {
+buf.append( ", " );
+  }
+  switch (array.valueType()) {
--- End diff --

Handled via the `default` clause?


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058450#comment-16058450
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r123139571
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/SchemaBuilder.java ---
@@ -53,6 +53,47 @@
 public class SchemaBuilder {
 
   /**
+   * Build a column schema (AKA "materialized field") based on name and a
+   * variety of schema options. Every column needs a name and (minor) type,
+   * some may need a mode other than required, may need a width, may
+   * need scale and precision, and so on.
+   */
+
+  // TODO: Add map methods
+
+  public static class ColumnBuilder {
+private String name;
+private MajorType.Builder typeBuilder;
--- End diff --

Fixed.


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058451#comment-16058451
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r123140434
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/RowSetTest.java 
---
@@ -289,13 +295,15 @@ private void testFloatRW() {
 assertEquals(0, reader.column(0).getDouble(), 0.01);
 assertTrue(reader.next());
 assertEquals(Float.MAX_VALUE, reader.column(0).getDouble(), 0.01);
+assertEquals((double) Float.MAX_VALUE, (double) 
reader.column(0).getObject(), 0.01);
--- End diff --

Fixed.


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058446#comment-16058446
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r123139892
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/SchemaBuilder.java ---
@@ -96,27 +137,57 @@ public SchemaBuilder withSVMode(SelectionVectorMode 
svMode) {
 
   public SchemaBuilder() { }
 
+  public SchemaBuilder(BatchSchema baseSchema) {
+for (MaterializedField field : baseSchema) {
+  columns.add(field);
--- End diff --

Fixed.


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5496) Must restart drillbits whenever a secure Hive metastore is restarted

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058455#comment-16058455
 ] 

ASF GitHub Bot commented on DRILL-5496:
---

Github user paul-rogers closed the pull request at:

https://github.com/apache/drill/pull/833


> Must restart drillbits whenever a secure Hive metastore is restarted
> 
>
> Key: DRILL-5496
> URL: https://issues.apache.org/jira/browse/DRILL-5496
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> DRILL-4964: "Drill fails to connect to hive metastore after hive metastore is 
> restarted unless drillbits are restarted also" attempted to fix a bug in 
> Drill in which Drill hangs if Hive is restarted. Now, we see that all 
> subsequent "show schemas" queries fail.
> Steps to repro:
> 1. Build a secure cluster (we used MapR)
> 2. Install Hive and Drill services
> 3. Configure drill impersonation and authentication
> 4. Restart hivemeta service
> 5. Connect to drill and execute query involving hive storage, issue occurs
> 6. Restart the drill-bits services and execute the query, issue is no longer 
> hit
> The problem occurs in the same place as the earlier fix, but might represent 
> a slightly different use case: in this case the connection is secure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5496) Must restart drillbits whenever a secure Hive metastore is restarted

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058454#comment-16058454
 ] 

ASF GitHub Bot commented on DRILL-5496:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/833
  
Closing PR as change was merged into master (but without the magic commit 
message that closes this automagically.)


> Must restart drillbits whenever a secure Hive metastore is restarted
> 
>
> Key: DRILL-5496
> URL: https://issues.apache.org/jira/browse/DRILL-5496
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> DRILL-4964: "Drill fails to connect to hive metastore after hive metastore is 
> restarted unless drillbits are restarted also" attempted to fix a bug in 
> Drill in which Drill hangs if Hive is restarted. Now, we see that all 
> subsequent "show schemas" queries fail.
> Steps to repro:
> 1. Build a secure cluster (we used MapR)
> 2. Install Hive and Drill services
> 3. Configure drill impersonation and authentication
> 4. Restart hivemeta service
> 5. Connect to drill and execute query involving hive storage, issue occurs
> 6. Restart the drill-bits services and execute the query, issue is no longer 
> hit
> The problem occurs in the same place as the earlier fix, but might represent 
> a slightly different use case: in this case the connection is secure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-2478) Validating values assigned to SYSTEM/SESSION configuration parameters

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058470#comment-16058470
 ] 

ASF GitHub Bot commented on DRILL-2478:
---

GitHub user ppadma opened a pull request:

https://github.com/apache/drill/pull/859

DRILL-2478: Validating values assigned to SYSTEM/SESSION configuratio…

…n parameters


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ppadma/drill DRILL-2478

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/859.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #859


commit 2f10ea3edf14aefff767db5dc8990c3374dc8553
Author: Padma Penumarthy 
Date:   2017-06-21T23:14:26Z

DRILL-2478: Validating values assigned to SYSTEM/SESSION configuration 
parameters




> Validating values assigned to SYSTEM/SESSION configuration parameters
> -
>
> Key: DRILL-2478
> URL: https://issues.apache.org/jira/browse/DRILL-2478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.11.0
> Environment: {code}
> 0: jdbc:drill:> select * from sys.version;
> +++-+-++
> | commit_id  | commit_message | commit_time | build_email | build_time |
> +++-+-++
> | f658a3c513ddf7f2d1b0ad7aa1f3f65049a594fe | DRILL-2209 Insert 
> ProjectOperator with MuxExchange | 09.03.2015 @ 01:49:18 EDT | Unknown | 
> 09.03.2015 @ 04:50:05 EDT |
> +++-+-++
> 1 row selected (0.046 seconds)
> {code}
>Reporter: Khurram Faraaz
> Fix For: Future
>
>
> Values that are assigned to configuration parameters of type SYSTEM and 
> SESSION must be validated. Currently any value can be assigned to some of the 
> SYSTEM/SESSION type parameters.
> Here are two examples where assignment of invalid values to store.format does 
> not result in any error.
> {code}
> 0: jdbc:drill:> alter session set `store.format`='1';
> +++
> | ok |  summary   |
> +++
> | true   | store.format updated. |
> +++
> 1 row selected (0.02 seconds)
> {code}
> {code}
> 0: jdbc:drill:> alter session set `store.format`='foo';
> +++
> | ok |  summary   |
> +++
> | true   | store.format updated. |
> +++
> 1 row selected (0.039 seconds)
> {code}
> In some cases values to some of the configuration parameters are validated, 
> like in this example, where trying to assign an invalid value to parameter 
> store.parquet.compression results in an error, which is correct. However, 
> this kind of validation is not performed for every configuration parameter of 
> SYSTEM/SESSION type. These values that are assigned to parameters must be 
> validated, and report errors if incorrect values are assigned by users.
> {code}
> 0: jdbc:drill:> alter session set `store.parquet.compression`='anything';
> Query failed: ExpressionParsingException: Option store.parquet.compression 
> must be one of: [snappy, gzip, none]
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-2478) Validating values assigned to SYSTEM/SESSION configuration parameters

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058507#comment-16058507
 ] 

ASF GitHub Bot commented on DRILL-2478:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/859#discussion_r123396761
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
@@ -166,7 +166,7 @@
   String DEFAULT_TEMPORARY_WORKSPACE = 
"drill.exec.default_temporary_workspace";
 
   String OUTPUT_FORMAT_OPTION = "store.format";
-  OptionValidator OUTPUT_FORMAT_VALIDATOR = new 
StringValidator(OUTPUT_FORMAT_OPTION, "parquet");
+  OptionValidator OUTPUT_FORMAT_VALIDATOR = new 
EnumeratedStringValidator(OUTPUT_FORMAT_OPTION, "parquet", "parquet", "json", 
"psv", "csv", "tsv", "csvh");
--- End diff --

Is this the right approach? This validator means that anyone who adds an 
output file format must modify this file and rebuild all of Drill in order to 
use that format by default. Not sure we want such tight binding.

Seems a better idea would be to register all known output formats at 
runtime, validate the option against that list, warn if the format is unknown, 
and default to Parquet.


> Validating values assigned to SYSTEM/SESSION configuration parameters
> -
>
> Key: DRILL-2478
> URL: https://issues.apache.org/jira/browse/DRILL-2478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.11.0
> Environment: {code}
> 0: jdbc:drill:> select * from sys.version;
> +++-+-++
> | commit_id  | commit_message | commit_time | build_email | build_time |
> +++-+-++
> | f658a3c513ddf7f2d1b0ad7aa1f3f65049a594fe | DRILL-2209 Insert 
> ProjectOperator with MuxExchange | 09.03.2015 @ 01:49:18 EDT | Unknown | 
> 09.03.2015 @ 04:50:05 EDT |
> +++-+-++
> 1 row selected (0.046 seconds)
> {code}
>Reporter: Khurram Faraaz
> Fix For: Future
>
>
> Values that are assigned to configuration parameters of type SYSTEM and 
> SESSION must be validated. Currently any value can be assigned to some of the 
> SYSTEM/SESSION type parameters.
> Here are two examples where assignment of invalid values to store.format does 
> not result in any error.
> {code}
> 0: jdbc:drill:> alter session set `store.format`='1';
> +++
> | ok |  summary   |
> +++
> | true   | store.format updated. |
> +++
> 1 row selected (0.02 seconds)
> {code}
> {code}
> 0: jdbc:drill:> alter session set `store.format`='foo';
> +++
> | ok |  summary   |
> +++
> | true   | store.format updated. |
> +++
> 1 row selected (0.039 seconds)
> {code}
> In some cases values to some of the configuration parameters are validated, 
> like in this example, where trying to assign an invalid value to parameter 
> store.parquet.compression results in an error, which is correct. However, 
> this kind of validation is not performed for every configuration parameter of 
> SYSTEM/SESSION type. These values that are assigned to parameters must be 
> validated, and report errors if incorrect values are assigned by users.
> {code}
> 0: jdbc:drill:> alter session set `store.parquet.compression`='anything';
> Query failed: ExpressionParsingException: Option store.parquet.compression 
> must be one of: [snappy, gzip, none]
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058525#comment-16058525
 ] 

ASF GitHub Bot commented on DRILL-5432:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/831#discussion_r123397439
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapRecordReader.java
 ---
@@ -0,0 +1,307 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.dto.ColumnDto;
+import org.apache.drill.exec.store.pcap.schema.PcapTypes;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.NullableBigIntVector;
+import org.apache.drill.exec.vector.NullableIntVector;
+import org.apache.drill.exec.vector.NullableTimeStampVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.drill.exec.store.pcap.Utils.parseBytesToASCII;
+
+public class PcapRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapRecordReader.class);
+
+  private static final int BATCH_SIZE = 40_000;
+
+  private OutputMutator output;
+
+  private PacketDecoder decoder;
+  private ImmutableList projectedCols;
+
+  private byte[] buffer;
+  private int offset = 0;
+  private InputStream in;
+  private int validBytes;
+
+  private String inputPath;
+  private List projectedColumns;
+
+  private static final Map TYPES;
+
+  private static class ProjectedColumnInfo {
+ValueVector vv;
+ColumnDto pcapColumn;
+  }
+
+  static {
+TYPES = ImmutableMap.builder()
+.put(PcapTypes.STRING, MinorType.VARCHAR)
+.put(PcapTypes.INTEGER, MinorType.INT)
+.put(PcapTypes.LONG, MinorType.BIGINT)
+.put(PcapTypes.TIMESTAMP, MinorType.TIMESTAMP)
+.build();
+  }
+
+  public PcapRecordReader(final String inputPath,
+  final List projectedColumns) {
+this.inputPath = inputPath;
+this.projectedColumns = projectedColumns;
+  }
+
+  @Override
+  public void setup(final OperatorContext context, final OutputMutator 
output) throws ExecutionSetupException {
+try {
+
+  this.output = output;
+  this.buffer = new byte[10];
+  this.in = new FileInputStream(inputPath);
+  this.decoder = new PacketDecoder(in);
+  this.validBytes = in.re

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058523#comment-16058523
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r123397036
  
--- Diff: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/impl/TupleReaderImpl.java
 ---
@@ -101,8 +88,61 @@ public String getAsString(int colIndex) {
   return "\"" + colReader.getString() + "\"";
 case DECIMAL:
   return colReader.getDecimal().toPlainString();
+case ARRAY:
+  return getArrayAsString(colReader.array());
 default:
   throw new IllegalArgumentException("Unsupported type " + 
colReader.valueType());
 }
   }
+
+  private String bytesToString(byte[] value) {
+StringBuilder buf = new StringBuilder()
+.append("[");
+int len = Math.min(value.length, 20);
+for (int i = 0; i < len;  i++) {
+  if (i > 0) {
+buf.append(", ");
+  }
+  buf.append((int) value[i]);
+}
+if (value.length > len) {
+  buf.append("...");
+}
+buf.append("]");
+return buf.toString();
+  }
+
+  private String getArrayAsString(ArrayReader array) {
+StringBuilder buf = new StringBuilder();
+buf.append("[");
+for (int i = 0; i < array.size(); i++) {
+  if (i > 0) {
+buf.append( ", " );
+  }
+  switch (array.valueType()) {
--- End diff --

For `getObject()` implementation it's throwing specifically 
`UnsupportedOperationException` for Map and Array type whereas for default type 
it throws `IllegalArgumentException`. I thought the idea was Map and Array are 
valid ArgumentType but not supported for now, hence the comment.


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058524#comment-16058524
 ] 

ASF GitHub Bot commented on DRILL-5518:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/851#discussion_r123397308
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/RowSetTest.java 
---
@@ -309,13 +317,38 @@ private void testDoubleRW() {
 assertEquals(0, reader.column(0).getDouble(), 0.01);
 assertTrue(reader.next());
 assertEquals(Double.MAX_VALUE, reader.column(0).getDouble(), 0.01);
+assertEquals(Double.MAX_VALUE, (double) reader.column(0).getObject(), 
0.01);
--- End diff --

Yes but my understanding is that `RowSetReader.column(0)` will return 
`ColumnReader` and you implemented `getObject` in `AbstractColumnReader` which 
is handing the type there. So `getObject` will return correct type of value ? 


> Roll-up of a number of test framework enhancements
> --
>
> Key: DRILL-5518
> URL: https://issues.apache.org/jira/browse/DRILL-5518
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent development work identified a number of minor enhancements to the 
> "sub-operator" unit tests:
> * Create a {{SubOperatorTest}} base class to do routine setup and shutdown.
> * Additional methods to simplify creating complex schemas with field widths.
> * Define a test workspace with plugin-specific options (as for the CSV 
> storage plugin)
> * When verifying row sets, add methods to verify and release just the 
> "actual" batch in addition to the existing method for verify and free both 
> the actual and expected batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058540#comment-16058540
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123398393
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/BaseTestQuery.java 
---
@@ -593,4 +610,49 @@ private void convert(List batches) 
throws SchemaChangeException
   }
 }
   }
+
+  private static String replaceWorkingPathInString(String orig) {
+return orig.replaceAll(Pattern.quote("[WORKING_PATH]"), 
Matcher.quoteReplacement(TestTools.getWorkingPath()));
+  }
+
+  protected static void copyDirectoryIntoTempSpace(String resourcesDir) 
throws IOException {
+copyDirectoryIntoTempSpace(resourcesDir, null);
+  }
+
+  protected static void copyDirectoryIntoTempSpace(String resourcesDir, 
String destinationSubDir) throws IOException {
+Path destination = destinationSubDir != null ? new 
Path(getDfsTestTmpSchemaLocation(), destinationSubDir)
+: new Path(getDfsTestTmpSchemaLocation());
+fs.copyFromLocalFile(
+new Path(replaceWorkingPathInString(resourcesDir)),
+destination);
+  }
+
+  /**
+   * Metadata cache files include full paths to the files that have been 
scanned.
+   *
+   * There is no way to generate a metadata cache file with absolute paths 
that
+   * will be guaranteed to be available on an arbitrary test machine.
+   *
--- End diff --

Very small suggestion: Javadoc is HTML-formatted. Insert a  between 
paragraphs.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058539#comment-16058539
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123398337
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/BaseTestQuery.java 
---
@@ -593,4 +610,49 @@ private void convert(List batches) 
throws SchemaChangeException
   }
 }
   }
+
+  private static String replaceWorkingPathInString(String orig) {
+return orig.replaceAll(Pattern.quote("[WORKING_PATH]"), 
Matcher.quoteReplacement(TestTools.getWorkingPath()));
+  }
+
+  protected static void copyDirectoryIntoTempSpace(String resourcesDir) 
throws IOException {
+copyDirectoryIntoTempSpace(resourcesDir, null);
+  }
+
+  protected static void copyDirectoryIntoTempSpace(String resourcesDir, 
String destinationSubDir) throws IOException {
+Path destination = destinationSubDir != null ? new 
Path(getDfsTestTmpSchemaLocation(), destinationSubDir)
+: new Path(getDfsTestTmpSchemaLocation());
+fs.copyFromLocalFile(
+new Path(replaceWorkingPathInString(resourcesDir)),
+destination);
+  }
+
+  /**
+   * Metadata cache files include full paths to the files that have been 
scanned.
--- End diff --

For older files with the marker, should we just replace the marker to be 
relative and take advantage of this improvement? Can that be done without 
having to edit the old files?


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-3867) Store relative paths in metadata file

2017-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058538#comment-16058538
 ] 

ASF GitHub Bot commented on DRILL-3867:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/824#discussion_r123397635
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -748,6 +771,22 @@ public ParquetTableMetadataDirs(List 
directories) {
   return directories;
 }
 
+/** If directories list contains relative paths, update it to absolute 
ones
--- End diff --

Thanks for the explanation.


> Store relative paths in metadata file
> -
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5602) Vector corruption when allocating a repeated map vector

2017-06-21 Thread Paul Rogers (JIRA)

Paul Rogers created DRILL-5602:
--

 Summary: Vector corruption when allocating a repeated map vector
 Key: DRILL-5602
 URL: https://issues.apache.org/jira/browse/DRILL-5602
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers
Assignee: Paul Rogers
 Fix For: 1.11.0


The query in DRILL-5513 highlighted a problem described in DRILL-5594: that the 
external sort did not properly allocate its spill batch vectors, and instead 
allowed them to grow by doubling. While fixing that issue, a new issue became 
clear.

The method to allocate a repeated map vector, however, has a serious bug, as 
described in DRILL-5530: value vectors do not zero-fill the first allocation 
for a vector (though subsequent reallocs are zero-filled.)

If the code worked correctly, here is the behavior when writing to the first 
element of the list:

* Access the offset vector at offset 0. Should be 0.
* Write the new value at that offset. Since the first offset is 0, the first 
value is written at 0 in the value vector.
* Write into offset 1 the value at offset 0 plus the length of the new value.

But, the offset vector is not initialized to zero. Instead, offset 0 contains 
the value 16 million. Now:

* Access the offset vector at offset 0. Value is 16 million.
* Write the new value at that offset. Write at position 16 million. This 
requires growing the value vector from its present size to 16 MB.

The problem is here in {{RepeatedMapVector}}:

{code}
  public void allocateOffsetsNew(int groupCount) {
offsets.allocateNew(groupCount + 1);
  }
{code}

Notice that there is no code to set the value at offset 0.

Then, in the {{UInt4Vector}}:

{code}
  public void allocateNew(final int valueCount) {
allocateBytes(valueCount * 4);
  }

  private void allocateBytes(final long size) {
...
data = allocator.buffer(curSize);
...
{code}

The above eventually calls the Netty memory allocator, which explicitly states 
that, for performance reasons, it does not zero-fill its buffers.

The code works in small tests because the new buffer comes from Java direct 
memory, which *does* zero-fill the buffer.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5602) Vector corruption when allocating a repeated, variable-width vector

2017-06-21 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5602:
---
Summary: Vector corruption when allocating a repeated, variable-width 
vector  (was: Vector corruption when allocating a repeated map vector)

> Vector corruption when allocating a repeated, variable-width vector
> ---
>
> Key: DRILL-5602
> URL: https://issues.apache.org/jira/browse/DRILL-5602
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The query in DRILL-5513 highlighted a problem described in DRILL-5594: that 
> the external sort did not properly allocate its spill batch vectors, and 
> instead allowed them to grow by doubling. While fixing that issue, a new 
> issue became clear.
> The method to allocate a repeated map vector, however, has a serious bug, as 
> described in DRILL-5530: value vectors do not zero-fill the first allocation 
> for a vector (though subsequent reallocs are zero-filled.)
> If the code worked correctly, here is the behavior when writing to the first 
> element of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first 
> value is written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains 
> the value 16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This 
> requires growing the value vector from its present size to 16 MB.
> The problem is here in {{RepeatedMapVector}}:
> {code}
>   public void allocateOffsetsNew(int groupCount) {
> offsets.allocateNew(groupCount + 1);
>   }
> {code}
> Notice that there is no code to set the value at offset 0.
> Then, in the {{UInt4Vector}}:
> {code}
>   public void allocateNew(final int valueCount) {
> allocateBytes(valueCount * 4);
>   }
>   private void allocateBytes(final long size) {
> ...
> data = allocator.buffer(curSize);
> ...
> {code}
> The above eventually calls the Netty memory allocator, which explicitly 
> states that, for performance reasons, it does not zero-fill its buffers.
> The code works in small tests because the new buffer comes from Java direct 
> memory, which *does* zero-fill the buffer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5602) Vector corruption when allocating a repeated, variable-width vector

2017-06-21 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058700#comment-16058700
 ] 

Paul Rogers commented on DRILL-5602:


It appears that other vectors have the same issue.

* Repeated map vector (discussed above)
* Variable-width vector (see below)
* All repeated value vectors (see below)

The {{ListVector}} does not have the problem because it does not have the 
{{allocateNew(int valueCount)}} method. This is its own bug...

The following is code from the {{VarCharVector}}:

{code}
  @Override
  public void allocateNew(int totalBytes, int valueCount) {
...
  offsetVector.allocateNew(valueCount + 1);
...
data.readerIndex(0);
allocationSizeInBytes = totalBytes;
offsetVector.zeroVector();
  }
{code}

Notice that the above does not set the initial offset to zero.

Typical repeated vector code (from {{RepeatedIntVector}}:

{code}
  @Override
  public void allocateNew(int valueCount, int innerValueCount) {
...
  offsets.allocateNew(valueCount + 1);
  values.allocateNew(innerValueCount);
...
offsets.zeroVector();
mutator.reset();
  }
{code}

For {{RepeatedListVector}}:

{code}
  @Override
  public void allocateNew(int valueCount, int innerValueCount) {
clear();
getOffsetVector().allocateNew(valueCount + 1);
getMutator().reset();
  }
{code}

> Vector corruption when allocating a repeated, variable-width vector
> ---
>
> Key: DRILL-5602
> URL: https://issues.apache.org/jira/browse/DRILL-5602
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The query in DRILL-5513 highlighted a problem described in DRILL-5594: that 
> the external sort did not properly allocate its spill batch vectors, and 
> instead allowed them to grow by doubling. While fixing that issue, a new 
> issue became clear.
> The method to allocate a repeated map vector, however, has a serious bug, as 
> described in DRILL-5530: value vectors do not zero-fill the first allocation 
> for a vector (though subsequent reallocs are zero-filled.)
> If the code worked correctly, here is the behavior when writing to the first 
> element of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first 
> value is written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains 
> the value 16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This 
> requires growing the value vector from its present size to 16 MB.
> The problem is here in {{RepeatedMapVector}}:
> {code}
>   public void allocateOffsetsNew(int groupCount) {
> offsets.allocateNew(groupCount + 1);
>   }
> {code}
> Notice that there is no code to set the value at offset 0.
> Then, in the {{UInt4Vector}}:
> {code}
>   public void allocateNew(final int valueCount) {
> allocateBytes(valueCount * 4);
>   }
>   private void allocateBytes(final long size) {
> ...
> data = allocator.buffer(curSize);
> ...
> {code}
> The above eventually calls the Netty memory allocator, which explicitly 
> states that, for performance reasons, it does not zero-fill its buffers.
> The code works in small tests because the new buffer comes from Java direct 
> memory, which *does* zero-fill the buffer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5602) Vector corruption when allocating a repeated, variable-width vector

2017-06-21 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058715#comment-16058715
 ] 

Paul Rogers commented on DRILL-5602:


The problem also exists in the other forms of {{allocateNew}}. From 
{{VarCharVector}}:

{code}
  @Override
  public boolean allocateNewSafe() {
  ...
  data = allocator.buffer(requestedSize);
  allocationSizeInBytes = requestedSize;
  offsetVector.allocateNew();
...
data.readerIndex(0);
offsetVector.zeroVector();
return true;
  }
{code}

Again, the offsets buffer is not initialized. Perhaps code that uses this form 
does the required initialization. It would be better to do it in the vector, 
rather than each bit of code that allocates vectors...

> Vector corruption when allocating a repeated, variable-width vector
> ---
>
> Key: DRILL-5602
> URL: https://issues.apache.org/jira/browse/DRILL-5602
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The query in DRILL-5513 highlighted a problem described in DRILL-5594: that 
> the external sort did not properly allocate its spill batch vectors, and 
> instead allowed them to grow by doubling. While fixing that issue, a new 
> issue became clear.
> The method to allocate a repeated map vector, however, has a serious bug, as 
> described in DRILL-5530: value vectors do not zero-fill the first allocation 
> for a vector (though subsequent reallocs are zero-filled.)
> If the code worked correctly, here is the behavior when writing to the first 
> element of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first 
> value is written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains 
> the value 16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This 
> requires growing the value vector from its present size to 16 MB.
> The problem is here in {{RepeatedMapVector}}:
> {code}
>   public void allocateOffsetsNew(int groupCount) {
> offsets.allocateNew(groupCount + 1);
>   }
> {code}
> Notice that there is no code to set the value at offset 0.
> Then, in the {{UInt4Vector}}:
> {code}
>   public void allocateNew(final int valueCount) {
> allocateBytes(valueCount * 4);
>   }
>   private void allocateBytes(final long size) {
> ...
> data = allocator.buffer(curSize);
> ...
> {code}
> The above eventually calls the Netty memory allocator, which explicitly 
> states that, for performance reasons, it does not zero-fill its buffers.
> The code works in small tests because the new buffer comes from Java direct 
> memory, which *does* zero-fill the buffer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5513) Managed External Sort : OOM error during the merge phase

2017-06-21 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058729#comment-16058729
 ] 

Paul Rogers commented on DRILL-5513:


After the fixes described here, in DRILL-5594 and DRILL-5602, the query now 
runs correctly:

{code}
End of sort. Total write bytes: 291402473, Total read bytes: 291402473
Results: 1 records, 2 batches, ... 
{code}

> Managed External Sort : OOM error during the merge phase
> 
>
> Key: DRILL-5513
> URL: https://issues.apache.org/jira/browse/DRILL-5513
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Attachments: 26e5f7b8-71e8-afca-e72e-fad7be2b2416.sys.drill, 
> drillbit.log
>
>
> git.commit.id.abbrev=1e0a14c
> No of nodes in cluster : 1
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> The below query fails with an OOM
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_query` = 100;
> alter session set `planner.memory.max_query_memory_per_node` = 652428800;
> select count(*) from (select s1.type type, flatten(s1.rms.rptd) rptds from 
> (select d.type type, d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid) s1 
> order by s1.rms.mapid);
> {code}
> Exception from the logs
> {code}
> 2017-05-15 12:58:46,646 [BitServer-4] DEBUG 
> o.a.drill.exec.work.foreman.Foreman - 26e5f7b8-71e8-afca-e72e-fad7be2b2416: 
> State change requested RUNNING --> FAILED
> org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One 
> or more nodes ran out of memory while executing the query.
> Unable to allocate buffer of size 2097152 due to memory limit. Current 
> allocation: 19791880
> Fragment 5:2
> [Error Id: bb67176f-a780-400d-88c9-06fea131ea64 on qa-node190.qa.lab:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) Unable to allocate 
> buffer of size 2097152 due to memory limit. Current allocation: 19791880
> org.apache.drill.exec.memory.BaseAllocator.buffer():220
> org.apache.drill.exec.memory.BaseAllocator.buffer():195
> org.apache.drill.exec.vector.BigIntVector.reAlloc():212
> org.apache.drill.exec.vector.BigIntVector.copyFromSafe():324
> org.apache.drill.exec.vector.NullableBigIntVector.copyFromSafe():367
> 
> org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe():328
> 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe():360
> 
> org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe():220
> org.apache.drill.exec.vector.complex.MapVector.copyFromSafe():82
> 
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen4494.doCopy():34
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen4494.next():76
> 
> org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next():234
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns():1214
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():689
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():559
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
> at 
> org.apache.drill.exec.work.foreman.QueryManager$1.stat

[jira] [Comment Edited] (DRILL-5602) Vector corruption when allocating a repeated, variable-width vector

2017-06-21 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058700#comment-16058700
 ] 

Paul Rogers edited comment on DRILL-5602 at 6/22/17 4:50 AM:
-

It appears that other vectors have an expensive solution:

* Variable-width vector (see below)
* All repeated value vectors (see below)

The following is code from the {{VarCharVector}}:

{code}
  @Override
  public void allocateNew(int totalBytes, int valueCount) {
...
  offsetVector.allocateNew(valueCount + 1);
...
data.readerIndex(0);
allocationSizeInBytes = totalBytes;
offsetVector.zeroVector();
  }
{code}

Notice that the above calls {{zeroVector()}} which (unnecessarily) zeros the 
entire offsets vector. This results in the first position being set to zero.

Typical repeated vector code (from {{RepeatedIntVector}}:

{code}
  @Override
  public void allocateNew(int valueCount, int innerValueCount) {
...
  offsets.allocateNew(valueCount + 1);
  values.allocateNew(innerValueCount);
...
offsets.zeroVector();
mutator.reset();
  }
{code}

The above also calls {{zeroVector()}}


was (Author: paul-rogers):
It appears that other vectors have the same issue.

* Repeated map vector (discussed above)
* Variable-width vector (see below)
* All repeated value vectors (see below)

The {{ListVector}} does not have the problem because it does not have the 
{{allocateNew(int valueCount)}} method. This is its own bug...

The following is code from the {{VarCharVector}}:

{code}
  @Override
  public void allocateNew(int totalBytes, int valueCount) {
...
  offsetVector.allocateNew(valueCount + 1);
...
data.readerIndex(0);
allocationSizeInBytes = totalBytes;
offsetVector.zeroVector();
  }
{code}

Notice that the above does not set the initial offset to zero.

Typical repeated vector code (from {{RepeatedIntVector}}:

{code}
  @Override
  public void allocateNew(int valueCount, int innerValueCount) {
...
  offsets.allocateNew(valueCount + 1);
  values.allocateNew(innerValueCount);
...
offsets.zeroVector();
mutator.reset();
  }
{code}

For {{RepeatedListVector}}:

{code}
  @Override
  public void allocateNew(int valueCount, int innerValueCount) {
clear();
getOffsetVector().allocateNew(valueCount + 1);
getMutator().reset();
  }
{code}

> Vector corruption when allocating a repeated, variable-width vector
> ---
>
> Key: DRILL-5602
> URL: https://issues.apache.org/jira/browse/DRILL-5602
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The query in DRILL-5513 highlighted a problem described in DRILL-5594: that 
> the external sort did not properly allocate its spill batch vectors, and 
> instead allowed them to grow by doubling. While fixing that issue, a new 
> issue became clear.
> The method to allocate a repeated map vector, however, has a serious bug, as 
> described in DRILL-5530: value vectors do not zero-fill the first allocation 
> for a vector (though subsequent reallocs are zero-filled.)
> If the code worked correctly, here is the behavior when writing to the first 
> element of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first 
> value is written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains 
> the value 16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This 
> requires growing the value vector from its present size to 16 MB.
> The problem is here in {{RepeatedMapVector}}:
> {code}
>   public void allocateOffsetsNew(int groupCount) {
> offsets.allocateNew(groupCount + 1);
>   }
> {code}
> Notice that there is no code to set the value at offset 0.
> Then, in the {{UInt4Vector}}:
> {code}
>   public void allocateNew(final int valueCount) {
> allocateBytes(valueCount * 4);
>   }
>   private void allocateBytes(final long size) {
> ...
> data = allocator.buffer(curSize);
> ...
> {code}
> The above eventually calls the Netty memory allocator, which explicitly 
> states that, for performance reasons, it does not zero-fill its buffers.
> The code works in small tests because the new buffer comes from Java direct 
> memory, which *does* zero-fill the buffer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5602) Vector corruption when allocating a repeated, variable-width vector

2017-06-21 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058743#comment-16058743
 ] 

Paul Rogers commented on DRILL-5602:


{{RepeatedListVector}} does have the offset corruption bug:

{code}
  @Override
  public void allocateNew(int valueCount, int innerValueCount) {
clear();
getOffsetVector().allocateNew(valueCount + 1);
getMutator().reset();
  }
{code}

Notice no call to {{setZero()}} and no explicit code to set position 0 to 0.

> Vector corruption when allocating a repeated, variable-width vector
> ---
>
> Key: DRILL-5602
> URL: https://issues.apache.org/jira/browse/DRILL-5602
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The query in DRILL-5513 highlighted a problem described in DRILL-5594: that 
> the external sort did not properly allocate its spill batch vectors, and 
> instead allowed them to grow by doubling. While fixing that issue, a new 
> issue became clear.
> The method to allocate a repeated map vector, however, has a serious bug, as 
> described in DRILL-5530: value vectors do not zero-fill the first allocation 
> for a vector (though subsequent reallocs are zero-filled.)
> If the code worked correctly, here is the behavior when writing to the first 
> element of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first 
> value is written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains 
> the value 16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This 
> requires growing the value vector from its present size to 16 MB.
> The problem is here in {{RepeatedMapVector}}:
> {code}
>   public void allocateOffsetsNew(int groupCount) {
> offsets.allocateNew(groupCount + 1);
>   }
> {code}
> Notice that there is no code to set the value at offset 0.
> Then, in the {{UInt4Vector}}:
> {code}
>   public void allocateNew(final int valueCount) {
> allocateBytes(valueCount * 4);
>   }
>   private void allocateBytes(final long size) {
> ...
> data = allocator.buffer(curSize);
> ...
> {code}
> The above eventually calls the Netty memory allocator, which explicitly 
> states that, for performance reasons, it does not zero-fill its buffers.
> The code works in small tests because the new buffer comes from Java direct 
> memory, which *does* zero-fill the buffer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (DRILL-5602) Vector corruption when allocating a repeated, variable-width vector

2017-06-21 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058715#comment-16058715
 ] 

Paul Rogers edited comment on DRILL-5602 at 6/22/17 4:52 AM:
-

The problem does not occur in the other forms of {{allocateNew}}, which is why 
the problem has not often been seen. From {{VarCharVector}}:

{code}
  @Override
  public boolean allocateNewSafe() {
  ...
  data = allocator.buffer(requestedSize);
  allocationSizeInBytes = requestedSize;
  offsetVector.allocateNew();
...
data.readerIndex(0);
offsetVector.zeroVector(); // <-- Zeros whole vector
return true;
  }
{code}



was (Author: paul-rogers):
The problem also exists in the other forms of {{allocateNew}}. From 
{{VarCharVector}}:

{code}
  @Override
  public boolean allocateNewSafe() {
  ...
  data = allocator.buffer(requestedSize);
  allocationSizeInBytes = requestedSize;
  offsetVector.allocateNew();
...
data.readerIndex(0);
offsetVector.zeroVector();
return true;
  }
{code}

Again, the offsets buffer is not initialized. Perhaps code that uses this form 
does the required initialization. It would be better to do it in the vector, 
rather than each bit of code that allocates vectors...

> Vector corruption when allocating a repeated, variable-width vector
> ---
>
> Key: DRILL-5602
> URL: https://issues.apache.org/jira/browse/DRILL-5602
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The query in DRILL-5513 highlighted a problem described in DRILL-5594: that 
> the external sort did not properly allocate its spill batch vectors, and 
> instead allowed them to grow by doubling. While fixing that issue, a new 
> issue became clear.
> The method to allocate a repeated map vector, however, has a serious bug, as 
> described in DRILL-5530: value vectors do not zero-fill the first allocation 
> for a vector (though subsequent reallocs are zero-filled.)
> If the code worked correctly, here is the behavior when writing to the first 
> element of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first 
> value is written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains 
> the value 16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This 
> requires growing the value vector from its present size to 16 MB.
> The problem is here in {{RepeatedMapVector}}:
> {code}
>   public void allocateOffsetsNew(int groupCount) {
> offsets.allocateNew(groupCount + 1);
>   }
> {code}
> Notice that there is no code to set the value at offset 0.
> Then, in the {{UInt4Vector}}:
> {code}
>   public void allocateNew(final int valueCount) {
> allocateBytes(valueCount * 4);
>   }
>   private void allocateBytes(final long size) {
> ...
> data = allocator.buffer(curSize);
> ...
> {code}
> The above eventually calls the Netty memory allocator, which explicitly 
> states that, for performance reasons, it does not zero-fill its buffers.
> The code works in small tests because the new buffer comes from Java direct 
> memory, which *does* zero-fill the buffer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5602) Repeated List Vector fails to initialize the offset vector

2017-06-21 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5602:
---
Summary: Repeated List Vector fails to initialize the offset vector  (was: 
Vector corruption when allocating a repeated, variable-width vector)

> Repeated List Vector fails to initialize the offset vector
> --
>
> Key: DRILL-5602
> URL: https://issues.apache.org/jira/browse/DRILL-5602
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The query in DRILL-5513 highlighted a problem described in DRILL-5594: that 
> the external sort did not properly allocate its spill batch vectors, and 
> instead allowed them to grow by doubling. While fixing that issue, a new 
> issue became clear.
> The method to allocate a repeated map vector, however, has a serious bug, as 
> described in DRILL-5530: value vectors do not zero-fill the first allocation 
> for a vector (though subsequent reallocs are zero-filled.)
> If the code worked correctly, here is the behavior when writing to the first 
> element of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first 
> value is written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains 
> the value 16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This 
> requires growing the value vector from its present size to 16 MB.
> The problem is here in {{RepeatedMapVector}}:
> {code}
>   public void allocateOffsetsNew(int groupCount) {
> offsets.allocateNew(groupCount + 1);
>   }
> {code}
> Notice that there is no code to set the value at offset 0.
> Then, in the {{UInt4Vector}}:
> {code}
>   public void allocateNew(final int valueCount) {
> allocateBytes(valueCount * 4);
>   }
>   private void allocateBytes(final long size) {
> ...
> data = allocator.buffer(curSize);
> ...
> {code}
> The above eventually calls the Netty memory allocator, which explicitly 
> states that, for performance reasons, it does not zero-fill its buffers.
> The code works in small tests because the new buffer comes from Java direct 
> memory, which *does* zero-fill the buffer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5602) Repeated List Vector fails to initialize the offset vector

2017-06-21 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5602:
---
Description: 
The code that allocates a new {{RepeatedListVector}} does not initialize the 
first offset to zero as required:

{code}
  @Override
  public void allocateNew(int valueCount, int innerValueCount) {
clear();
getOffsetVector().allocateNew(valueCount + 1);
getMutator().reset();
  }
{code}

Since Netty does not zero-fill vectors, the result is vector corruption.

If the code worked correctly, here is the behavior when writing to the first 
element of the list:

* Access the offset vector at offset 0. Should be 0.
* Write the new value at that offset. Since the first offset is 0, the first 
value is written at 0 in the value vector.
* Write into offset 1 the value at offset 0 plus the length of the new value.

But, the offset vector is not initialized to zero. Instead, offset 0 contains 
the value 16 million. Now:

* Access the offset vector at offset 0. Value is 16 million.
* Write the new value at that offset. Write at position 16 million. This 
requires growing the value vector from its present size to 16 MB.


  was:
The query in DRILL-5513 highlighted a problem described in DRILL-5594: that the 
external sort did not properly allocate its spill batch vectors, and instead 
allowed them to grow by doubling. While fixing that issue, a new issue became 
clear.

The method to allocate a repeated map vector, however, has a serious bug, as 
described in DRILL-5530: value vectors do not zero-fill the first allocation 
for a vector (though subsequent reallocs are zero-filled.)

If the code worked correctly, here is the behavior when writing to the first 
element of the list:

* Access the offset vector at offset 0. Should be 0.
* Write the new value at that offset. Since the first offset is 0, the first 
value is written at 0 in the value vector.
* Write into offset 1 the value at offset 0 plus the length of the new value.

But, the offset vector is not initialized to zero. Instead, offset 0 contains 
the value 16 million. Now:

* Access the offset vector at offset 0. Value is 16 million.
* Write the new value at that offset. Write at position 16 million. This 
requires growing the value vector from its present size to 16 MB.

The problem is here in {{RepeatedMapVector}}:

{code}
  public void allocateOffsetsNew(int groupCount) {
offsets.allocateNew(groupCount + 1);
  }
{code}

Notice that there is no code to set the value at offset 0.

Then, in the {{UInt4Vector}}:

{code}
  public void allocateNew(final int valueCount) {
allocateBytes(valueCount * 4);
  }

  private void allocateBytes(final long size) {
...
data = allocator.buffer(curSize);
...
{code}

The above eventually calls the Netty memory allocator, which explicitly states 
that, for performance reasons, it does not zero-fill its buffers.

The code works in small tests because the new buffer comes from Java direct 
memory, which *does* zero-fill the buffer.



> Repeated List Vector fails to initialize the offset vector
> --
>
> Key: DRILL-5602
> URL: https://issues.apache.org/jira/browse/DRILL-5602
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The code that allocates a new {{RepeatedListVector}} does not initialize the 
> first offset to zero as required:
> {code}
>   @Override
>   public void allocateNew(int valueCount, int innerValueCount) {
> clear();
> getOffsetVector().allocateNew(valueCount + 1);
> getMutator().reset();
>   }
> {code}
> Since Netty does not zero-fill vectors, the result is vector corruption.
> If the code worked correctly, here is the behavior when writing to the first 
> element of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first 
> value is written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains 
> the value 16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This 
> requires growing the value vector from its present size to 16 MB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Issue Comment Deleted] (DRILL-5602) Repeated List Vector fails to initialize the offset vector

2017-06-21 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5602:
---
Comment: was deleted

(was: {{RepeatedListVector}} does have the offset corruption bug:

{code}
  @Override
  public void allocateNew(int valueCount, int innerValueCount) {
clear();
getOffsetVector().allocateNew(valueCount + 1);
getMutator().reset();
  }
{code}

Notice no call to {{setZero()}} and no explicit code to set position 0 to 0.)

> Repeated List Vector fails to initialize the offset vector
> --
>
> Key: DRILL-5602
> URL: https://issues.apache.org/jira/browse/DRILL-5602
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The code that allocates a new {{RepeatedListVector}} does not initialize the 
> first offset to zero as required:
> {code}
>   @Override
>   public void allocateNew(int valueCount, int innerValueCount) {
> clear();
> getOffsetVector().allocateNew(valueCount + 1);
> getMutator().reset();
>   }
> {code}
> Since Netty does not zero-fill vectors, the result is vector corruption.
> If the code worked correctly, here is the behavior when writing to the first 
> element of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first 
> value is written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains 
> the value 16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This 
> requires growing the value vector from its present size to 16 MB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Issue Comment Deleted] (DRILL-5602) Repeated List Vector fails to initialize the offset vector

2017-06-21 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5602:
---
Comment: was deleted

(was: The problem does not occur in the other forms of {{allocateNew}}, which 
is why the problem has not often been seen. From {{VarCharVector}}:

{code}
  @Override
  public boolean allocateNewSafe() {
  ...
  data = allocator.buffer(requestedSize);
  allocationSizeInBytes = requestedSize;
  offsetVector.allocateNew();
...
data.readerIndex(0);
offsetVector.zeroVector(); // <-- Zeros whole vector
return true;
  }
{code}
)

> Repeated List Vector fails to initialize the offset vector
> --
>
> Key: DRILL-5602
> URL: https://issues.apache.org/jira/browse/DRILL-5602
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The code that allocates a new {{RepeatedListVector}} does not initialize the 
> first offset to zero as required:
> {code}
>   @Override
>   public void allocateNew(int valueCount, int innerValueCount) {
> clear();
> getOffsetVector().allocateNew(valueCount + 1);
> getMutator().reset();
>   }
> {code}
> Since Netty does not zero-fill vectors, the result is vector corruption.
> If the code worked correctly, here is the behavior when writing to the first 
> element of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first 
> value is written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains 
> the value 16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This 
> requires growing the value vector from its present size to 16 MB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (DRILL-5163) External sort on Mac creates a separate child process per spill via HDFS FS

2017-06-21 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved DRILL-5163.

   Resolution: Fixed
Fix Version/s: 1.11.0

Fixed as part of DRILL-5325

> External sort on Mac creates a separate child process per spill via HDFS FS
> ---
>
> Key: DRILL-5163
> URL: https://issues.apache.org/jira/browse/DRILL-5163
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> The external sort operator spills to disk. Spill files are created and 
> written using the HDFS file system. For performance, HDFS uses native 
> libraries to access the file system. These native libraries are not available 
> on the Mac. As a result, some operations are implemented using a shower, 
> Java-only path. One of these operations (need details) is implemented by 
> forking a child process.
> When run in a debugger on the Mac, the behavior shows up as the furious 
> creation and deletion of threads to manage the child processes: one per 
> spill. Because of this behavior, performance of external sort is slow. Of 
> course, no production code uses Drill on a Mac, so this is more of a nuisance 
> than a real bug, which is why it is marked as an improvement.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

70 matches

Mail list logo