[jira] [Updated] (DRILL-5599) Notify StatusHandlerListener that batch sending has failed even if channel is still open
[ https://issues.apache.org/jira/browse/DRILL-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5599: Fix Version/s: 1.11.0 > Notify StatusHandlerListener that batch sending has failed even if channel is > still open > - > > Key: DRILL-5599 > URL: https://issues.apache.org/jira/browse/DRILL-5599 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: ready-to-commit > Fix For: 1.11.0 > > Attachments: sample.json > > > *Issue* > Queries stay in CANCELLATION_REQUESTED state after connection with client was > interrupted. Jstack shows that threads for such queries are blocked and > waiting to semaphore to be released. > {noformat} > "26b70318-ddde-9ead-eee2-0828da97b59f:frag:0:0" daemon prio=10 > tid=0x7f56dc3c9000 nid=0x25fd waiting on condition [0x7f56b31dc000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0006f4688ab0> (a > java.util.concurrent.Semaphore$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:472) > at > org.apache.drill.exec.ops.SendingAccountor.waitForSendComplete(SendingAccountor.java:48) > - locked <0x0006f4688a78> (a > org.apache.drill.exec.ops.SendingAccountor) > at > org.apache.drill.exec.ops.FragmentContext.waitForSendComplete(FragmentContext.java:486) > at > org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:134) > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.close(ScreenCreator.java:141) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:313) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) >Locked ownable synchronizers: - <0x00073f800b68> (a > java.util.concurrent.ThreadPoolExecutor$Worker) > {noformat} > *Reproduce* > Ran modified ConcurrencyTest.java referenced in DRILL-4338 and cancel after > 2-3 seconds. ConcurrencyTest.java should be modified as follows: > {{ExecutorService executor = Executors.newFixedThreadPool(10);}} and execute > 200 queries {{for (int i = 1; i <= 200; i++)}}. > Query: {{select * from dfs.`sample.json`}}, data set is attached. > *Problem description* > Looks like the problem occurs when the server has sent data to the client and > waiting from the client confirmation that data was received. In this case > [{{ChannelListenerWithCoordinationId}}|https://github.com/apache/drill/blob/master/exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java#L118] > is used for tracking. {{ChannelListenerWithCoordinationId}} contains > {{StatusHandler}} which keeps track of sent batches. It updates > {{SendingAccountor}} with information about how many batches were sent and > how many batches have reached the client (successfully or not). > When sent operation is complete (successfully or not) > {{operationComplete(ChannelFuture future)}} is called. Given future contains > information if sent operation was successful or not, failure cause, channel > status etc. If sent operation was successful we do nothing since in this case > client sent us acknowledgment and when we received it, we notified > {{StatusHandlerListener}} has batch was received. But if sent operation has > failed, we need to notify {{StatusHandler}} was sent has unsuccessful. > {{operationComplete(ChannelFuture future)}} code: > {code} > if (!future.isSuccess()) { > removeFromMap(coordinationId); > if (future.channel().isActive()) { > throw new RpcException("Future failed"); > } else { > setException(new ChannelClosedException()); > } >
[jira] [Updated] (DRILL-5599) Notify StatusHandlerListener that batch sending has failed even if channel is still open
[ https://issues.apache.org/jira/browse/DRILL-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5599: Description: *Issue* Queries stay in CANCELLATION_REQUESTED state after connection with client was interrupted. Jstack shows that threads for such queries are blocked and waiting to semaphore to be released. {noformat} "26b70318-ddde-9ead-eee2-0828da97b59f:frag:0:0" daemon prio=10 tid=0x7f56dc3c9000 nid=0x25fd waiting on condition [0x7f56b31dc000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0006f4688ab0> (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:472) at org.apache.drill.exec.ops.SendingAccountor.waitForSendComplete(SendingAccountor.java:48) - locked <0x0006f4688a78> (a org.apache.drill.exec.ops.SendingAccountor) at org.apache.drill.exec.ops.FragmentContext.waitForSendComplete(FragmentContext.java:486) at org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:134) at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.close(ScreenCreator.java:141) at org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:313) at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155) at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264) at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers:- <0x00073f800b68> (a java.util.concurrent.ThreadPoolExecutor$Worker) {noformat} *Reproduce* Ran modified ConcurrencyTest.java referenced in DRILL-4338 and cancel after 2-3 seconds. ConcurrencyTest.java should be modified as follows: {{ExecutorService executor = Executors.newFixedThreadPool(10);}} and execute 200 queries {{for (int i = 1; i <= 200; i++)}}. Query: {{select * from dfs.`sample.json`}}, data set is attached. *Problem description* Looks like the problem occurs when the server has sent data to the client and waiting from the client confirmation that data was received. In this case [{{ChannelListenerWithCoordinationId}}|https://github.com/apache/drill/blob/master/exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java#L118] is used for tracking. {{ChannelListenerWithCoordinationId}} contains {{StatusHandler}} which keeps track of sent batches. It updates {{SendingAccountor}} with information about how many batches were sent and how many batches have reached the client (successfully or not). When sent operation is complete (successfully or not) {{operationComplete(ChannelFuture future)}} is called. Given future contains information if sent operation was successful or not, failure cause, channel status etc. If sent operation was successful we do nothing since in this case client sent us acknowledgment and when we received it, we notified {{StatusHandlerListener}} that batch was received. But if sent operation has failed, we need to notify {{StatusHandler}} that sent was unsuccessful. {{operationComplete(ChannelFuture future)}} code: {code} if (!future.isSuccess()) { removeFromMap(coordinationId); if (future.channel().isActive()) { throw new RpcException("Future failed"); } else { setException(new ChannelClosedException()); } } } {code} Method {{setException}} notifies {{StatusHandler}} that batch sent has failed but it's only called when channel is closed. When channel is still open we just throw {{RpcException}}. This is where the problem occurs. {{operationComplete(ChannelFuture future)}} is called via Netty {{DefaultPromise.notifyListener0}} method which catches {{Throwable}} and just logs it. So even of we throw exception nobody is notified about it, especially {{StatusHandler}}. *Fix* Use {{setException}} even if channel is still open instead of throwing exception. This problem was also raised in [PR-463|https://github.com/apache/drill/pull/463] but was decided to be fixed in the sco
[jira] [Commented] (DRILL-5599) Notify StatusHandlerListener that batch sending has failed even if channel is still open
[ https://issues.apache.org/jira/browse/DRILL-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057326#comment-16057326 ] ASF GitHub Bot commented on DRILL-5599: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/857 Reworded error message. @paul-rogers and @ppadma thanks for code review! > Notify StatusHandlerListener that batch sending has failed even if channel is > still open > - > > Key: DRILL-5599 > URL: https://issues.apache.org/jira/browse/DRILL-5599 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: ready-to-commit > Fix For: 1.11.0 > > Attachments: sample.json > > > *Issue* > Queries stay in CANCELLATION_REQUESTED state after connection with client was > interrupted. Jstack shows that threads for such queries are blocked and > waiting to semaphore to be released. > {noformat} > "26b70318-ddde-9ead-eee2-0828da97b59f:frag:0:0" daemon prio=10 > tid=0x7f56dc3c9000 nid=0x25fd waiting on condition [0x7f56b31dc000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0006f4688ab0> (a > java.util.concurrent.Semaphore$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:472) > at > org.apache.drill.exec.ops.SendingAccountor.waitForSendComplete(SendingAccountor.java:48) > - locked <0x0006f4688a78> (a > org.apache.drill.exec.ops.SendingAccountor) > at > org.apache.drill.exec.ops.FragmentContext.waitForSendComplete(FragmentContext.java:486) > at > org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:134) > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.close(ScreenCreator.java:141) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:313) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) >Locked ownable synchronizers: - <0x00073f800b68> (a > java.util.concurrent.ThreadPoolExecutor$Worker) > {noformat} > *Reproduce* > Ran modified ConcurrencyTest.java referenced in DRILL-4338 and cancel after > 2-3 seconds. ConcurrencyTest.java should be modified as follows: > {{ExecutorService executor = Executors.newFixedThreadPool(10);}} and execute > 200 queries {{for (int i = 1; i <= 200; i++)}}. > Query: {{select * from dfs.`sample.json`}}, data set is attached. > *Problem description* > Looks like the problem occurs when the server has sent data to the client and > waiting from the client confirmation that data was received. In this case > [{{ChannelListenerWithCoordinationId}}|https://github.com/apache/drill/blob/master/exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java#L118] > is used for tracking. {{ChannelListenerWithCoordinationId}} contains > {{StatusHandler}} which keeps track of sent batches. It updates > {{SendingAccountor}} with information about how many batches were sent and > how many batches have reached the client (successfully or not). > When sent operation is complete (successfully or not) > {{operationComplete(ChannelFuture future)}} is called. Given future contains > information if sent operation was successful or not, failure cause, channel > status etc. If sent operation was successful we do nothing since in this case > client sent us acknowledgment and when we received it, we notified > {{StatusHandlerListener}} that batch was received. But if sent operation has > failed, we need to notify {{StatusHandler}} that sent was unsuccessful. > {{operationComplete(ChannelFuture future)}} code: > {code} > if (!future.isSuccess()) { > removeFromMap
[jira] [Updated] (DRILL-5130) UNION ALL difference in results
[ https://issues.apache.org/jira/browse/DRILL-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5130: Fix Version/s: 1.11.0 > UNION ALL difference in results > --- > > Key: DRILL-5130 > URL: https://issues.apache.org/jira/browse/DRILL-5130 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow, Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Khurram Faraaz >Assignee: Arina Ielchiieva > Fix For: 1.11.0 > > > Drill 1.9.0 git commit ID: 51246693 > {noformat} > 0: jdbc:drill:schema=dfs.tmp> values(1,2,3,4,5,6) union all > values(7,8,9,10,11,12); > +-+-+-+-+-+-+ > | EXPR$0 | EXPR$1 | EXPR$2 | EXPR$3 | EXPR$4 | EXPR$5 | > +-+-+-+-+-+-+ > | 7 | 8 | 9 | 10 | 11 | 12 | > | 7 | 8 | 9 | 10 | 11 | 12 | > +-+-+-+-+-+-+ > 2 rows selected (0.209 seconds) > {noformat} > Postgres 9.3 > {noformat} > postgres=# values(1,2,3,4,5,6) union all values(7,8,9,10,11,12); > column1 | column2 | column3 | column4 | column5 | column6 > -+-+-+-+-+- >1 | 2 | 3 | 4 | 5 | 6 >7 | 8 | 9 | 10 | 11 | 12 > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (DRILL-5130) UNION ALL difference in results
[ https://issues.apache.org/jira/browse/DRILL-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050618#comment-16050618 ] Arina Ielchiieva edited comment on DRILL-5130 at 6/21/17 11:06 AM: --- The problem is with incorrectly overridden explainTerms method. This method is responsible for describing the inputs and attributes of the relational expression. In DrillValuesRel this method was incorrectly overriden, in ValuesPrel is was not overriden at all. Thus two Values nodes with the same row type and row count were considered to be the same, though their values were different. During planning Calcite discarded duplicated DrillValuesRel and ValuesPrel (duplicates are found by comparing string representation of two relational expressions to generate such representation explainTerms is used) and used the same one for both Values expressions. Query: {noformat} values('a') union all values('b') {noformat} Plan: {noformat} 00-00Screen 00-01 Project(EXPR$0=[$0]) 00-02UnionAll(all=[true]) 00-04 Values 00-03 Values {noformat} was (Author: arina): The problem is with incorrectly overridden explainTerms method. This method is responsible for describing the inputs and attributes of the relational expression. In DrillValuesRel this method was incorrectly overriden, in ValuesPrel is was not overriden at all. Thus two Values nodes with the same row type and row count were considered to be the same, though their values were different. During planning Calcite removed discarded duplicated DrillValuesRel and ValuesPrel (duplicates are found by comparing string representation of two relational expressions to generate such representation explainTerms is used) and used the same one for both Values expressions. Query: {noformat} values('a') union all values('b') {noformat} Plan: {noformat} 00-00Screen 00-01 Project(EXPR$0=[$0]) 00-02UnionAll(all=[true]) 00-04 Values 00-03 Values {noformat} > UNION ALL difference in results > --- > > Key: DRILL-5130 > URL: https://issues.apache.org/jira/browse/DRILL-5130 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow, Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Khurram Faraaz >Assignee: Arina Ielchiieva > Fix For: 1.11.0 > > > Drill 1.9.0 git commit ID: 51246693 > {noformat} > 0: jdbc:drill:schema=dfs.tmp> values(1,2,3,4,5,6) union all > values(7,8,9,10,11,12); > +-+-+-+-+-+-+ > | EXPR$0 | EXPR$1 | EXPR$2 | EXPR$3 | EXPR$4 | EXPR$5 | > +-+-+-+-+-+-+ > | 7 | 8 | 9 | 10 | 11 | 12 | > | 7 | 8 | 9 | 10 | 11 | 12 | > +-+-+-+-+-+-+ > 2 rows selected (0.209 seconds) > {noformat} > Postgres 9.3 > {noformat} > postgres=# values(1,2,3,4,5,6) union all values(7,8,9,10,11,12); > column1 | column2 | column3 | column4 | column5 | column6 > -+-+-+-+-+- >1 | 2 | 3 | 4 | 5 | 6 >7 | 8 | 9 | 10 | 11 | 12 > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5130) UNION ALL difference in results
[ https://issues.apache.org/jira/browse/DRILL-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057331#comment-16057331 ] ASF GitHub Bot commented on DRILL-5130: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/853#discussion_r123218383 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillValuesRelBase.java --- @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to you under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.common; + + +import org.apache.calcite.plan.RelOptCluster; +import org.apache.calcite.plan.RelTraitSet; +import org.apache.calcite.rel.AbstractRelNode; +import org.apache.calcite.rel.RelWriter; +import org.apache.calcite.rel.type.RelDataType; +import org.apache.drill.common.JSONOptions; + +public abstract class DrillValuesRelBase extends AbstractRelNode { --- End diff -- Done. > UNION ALL difference in results > --- > > Key: DRILL-5130 > URL: https://issues.apache.org/jira/browse/DRILL-5130 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow, Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Khurram Faraaz >Assignee: Arina Ielchiieva > Fix For: 1.11.0 > > > Drill 1.9.0 git commit ID: 51246693 > {noformat} > 0: jdbc:drill:schema=dfs.tmp> values(1,2,3,4,5,6) union all > values(7,8,9,10,11,12); > +-+-+-+-+-+-+ > | EXPR$0 | EXPR$1 | EXPR$2 | EXPR$3 | EXPR$4 | EXPR$5 | > +-+-+-+-+-+-+ > | 7 | 8 | 9 | 10 | 11 | 12 | > | 7 | 8 | 9 | 10 | 11 | 12 | > +-+-+-+-+-+-+ > 2 rows selected (0.209 seconds) > {noformat} > Postgres 9.3 > {noformat} > postgres=# values(1,2,3,4,5,6) union all values(7,8,9,10,11,12); > column1 | column2 | column3 | column4 | column5 | column6 > -+-+-+-+-+- >1 | 2 | 3 | 4 | 5 | 6 >7 | 8 | 9 | 10 | 11 | 12 > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (DRILL-2975) Extended Json : Time type reporting data which is dependent on the system on which it ran
[ https://issues.apache.org/jira/browse/DRILL-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka closed DRILL-2975. -- Resolution: Not A Bug Fix Version/s: (was: Future) According to the conversation in [DRILL-4116|https://issues.apache.org/jira/browse/DRILL-4116] it seems that drillbits on these two machines work with different timezones. Therefore Drill behaviour described in the jira is expected and jira can be closed. [~rkins] If something is missed, please reopen the jira. > Extended Json : Time type reporting data which is dependent on the system on > which it ran > - > > Key: DRILL-2975 > URL: https://issues.apache.org/jira/browse/DRILL-2975 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka >Priority: Critical > > git.commit.id.abbrev=3b19076 > Data : > {code} > { > "int_col" : {"$numberLong": 1}, > "date_col" : {"$dateDay": "2012-05-22"}, > "time_col" : {"$time": "19:20:30.45Z"} > } > {code} > System 1 : > {code} > 0: jdbc:drill:schema=dfs_eea> select time_col from `extended_json/data1.json` > d; > ++ > | time_col | > ++ > | 19:20:30.450 | > ++ > {code} > System 2 : > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexP> select time_col from > `temp.json`; > ++ > | time_col | > ++ > | 11:20:30.450 | > ++ > {code} > The above results are inconsistent. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057749#comment-16057749 ] Laurent Goujon commented on DRILL-3640: --- There are two PR linked to that JIRA, should both of them be applied? Also, according to the JDBC spec, this methods applies to the {{execute*}} methods and might apply to the ResultSet methods, but this is not required. Also for executeBatch, it's up to the driver to decide if timeout is per query or total, but that should be in the javadoc I guess. > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057772#comment-16057772 ] ASF GitHub Bot commented on DRILL-3640: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123294241 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java --- @@ -38,8 +44,12 @@ // methods for compatibility.) class DrillStatementImpl extends AvaticaStatement implements DrillStatement, DrillRemoteStatement { + //Not using the DrillbitContext's ExecutorService as this is threadPool is light-weight (threads wake up to cancel tasks) but needs a low response time + private static ExecutorService queryTimeoutTaskPool = Executors.newCachedThreadPool(new NamedThreadFactory("q-timeout-")); --- End diff -- I believe this is unnecessary: DrillClient provides an asynchronous API, which is used by the JDBC driver, so all the timeout logic could be done without the use of thread pool. You might want to look at DrillCursor which is where all the magic happens I believe. > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057773#comment-16057773 ] ASF GitHub Bot commented on DRILL-3640: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123292200 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/SqlTimeoutException.java --- @@ -23,12 +23,17 @@ * Indicates that an operation timed out. This is not an error; you can * retry the operation. */ -public class SqlTimeoutException -extends SQLException -{ +public class SqlTimeoutException extends SQLException { + + private static final long serialVersionUID = 2017_06_20L; --- End diff -- maybe use -1 or use your IDE serialVersionUID generator (I know there's an algorithm for it...), but I don't think Drill is making any effort to make this class serializable so it could be a `@SuppressedWarning` annotation as well. > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057774#comment-16057774 ] ASF GitHub Bot commented on DRILL-3640: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123294533 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java --- @@ -497,14 +594,64 @@ public boolean isPoolable() throws SQLException { @Override public void closeOnCompletion() throws SQLException { -throwIfClosed(); +throwIfTimedOutOrClosed(); super.closeOnCompletion(); } @Override public boolean isCloseOnCompletion() throws SQLException { -throwIfClosed(); +throwIfTimedOutOrClosed(); return super.isCloseOnCompletion(); } } + +/** + * Timeout Trigger required for canceling of running queries + */ +class TimeoutTrigger implements Callable { + private int timeoutInSeconds; + + /** + * Get Timeout period in seconds + */ + public int getTimeoutInSeconds() { +return timeoutInSeconds; + } + + private DrillStatementImpl statementHandle; + + //Default Constructor is Invalid + @SuppressWarnings("unused") + private TimeoutTrigger() {} + + /** + * Timeout Constructor + * @param stmtContext Statement Handle + * @param timeoutInSec Timeout defined in seconds + */ + TimeoutTrigger(DrillStatementImpl stmtContext, int timeoutInSec) { +timeoutInSeconds = timeoutInSec; +statementHandle = stmtContext; + } + + @Override + public Boolean call() throws Exception { +try { + Thread.sleep(timeoutInSeconds*1000L); --- End diff -- `TimeUnit.SECONDS.sleep(timeoutInSeconds)` > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057778#comment-16057778 ] ASF GitHub Bot commented on DRILL-3640: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123295256 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java --- @@ -64,13 +65,17 @@ org.slf4j.LoggerFactory.getLogger(DrillResultSetImpl.class); private final DrillConnectionImpl connection; + private DrillStatementImpl drillStatement = null; private volatile boolean hasPendingCancelationNotification = false; DrillResultSetImpl(AvaticaStatement statement, Meta.Signature signature, ResultSetMetaData resultSetMetaData, TimeZone timeZone, Meta.Frame firstFrame) { super(statement, signature, resultSetMetaData, timeZone, firstFrame); connection = (DrillConnectionImpl) statement.getConnection(); +if (statement instanceof DrillStatementImpl) { --- End diff -- what about prepared statement? > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057771#comment-16057771 ] ASF GitHub Bot commented on DRILL-3640: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123293197 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java --- @@ -159,24 +230,25 @@ public void cleanUp() { public int getQueryTimeout() throws AlreadyClosedSqlException { throwIfClosed(); -return 0; // (No no timeout.) --- End diff -- might want to check what was Avatica's behavior before we overriden it... > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057775#comment-16057775 ] ASF GitHub Bot commented on DRILL-3640: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123295947 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java --- @@ -125,7 +154,7 @@ protected void cancel() { // (Not delegated.) @Override public boolean next() throws SQLException { -throwIfClosed(); +throwIfTimedOutOrClosed(); --- End diff -- if we chose to support `queryTimeout` for `ResultSet`, shouldn't we interrupt `next()` too if the operation is taking too long? As per your code, it seems the exception would be thrown after facts... > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1605#comment-1605 ] ASF GitHub Bot commented on DRILL-3640: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123296355 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java --- @@ -1384,7 +1402,7 @@ public void updateRowId( String columnLabel, RowId x ) throws SQLException { @Override public int getHoldability() throws SQLException { -throwIfClosed(); +throwIfTimedOutOrClosed(); --- End diff -- not sure if it makes sense (and correct per spec) to throw SqlTimeoutException here... (and similar methods) > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057776#comment-16057776 ] ASF GitHub Bot commented on DRILL-3640: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123293533 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java --- @@ -204,7 +276,7 @@ public boolean isClosed() { @Override public int getMaxFieldSize() throws SQLException { -throwIfClosed(); +throwIfTimedOutOrClosed(); --- End diff -- this methods cannot throw SqlTimeoutException... only `execute*` methods can, and eventually some ResultSet methods (but this part is optional per JDBC standard) > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057824#comment-16057824 ] ASF GitHub Bot commented on DRILL-3640: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123303854 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java --- @@ -422,6 +507,9 @@ public ResultSet getGeneratedKeys() throws SQLException { public int executeUpdate(String sql, int autoGeneratedKeys) throws SQLException { throwIfClosed(); try { + if (timeoutTrigger != null) { --- End diff -- so the trigger is created, but if super.executeUpdate takes more than queryTimeout seconds, the method is not interrupted. I don't believe this is conform to JDBC spec, or somehow useful to the end-user (if query takes 2min but timeout is 60s, exception should be thrown after 60s, not 120s...) > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5130) UNION ALL difference in results
[ https://issues.apache.org/jira/browse/DRILL-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057869#comment-16057869 ] ASF GitHub Bot commented on DRILL-5130: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/853#discussion_r123312311 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillValuesRelBase.java --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to you under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.common; + +import org.apache.calcite.plan.RelOptCluster; +import org.apache.calcite.plan.RelTraitSet; +import org.apache.calcite.rel.AbstractRelNode; +import org.apache.calcite.rel.RelWriter; +import org.apache.calcite.rel.type.RelDataType; +import org.apache.drill.common.JSONOptions; + +/** + * Base class for logical and physical Values implemented in Drill. + */ +public abstract class DrillValuesRelBase extends AbstractRelNode { --- End diff -- Calcite has an abstract class Values. Neither of DrillValueRel or ValuePrel extends from that Calcite class. Will it help solve the problem if we extend from Calcite's class? https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/Values.java > UNION ALL difference in results > --- > > Key: DRILL-5130 > URL: https://issues.apache.org/jira/browse/DRILL-5130 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow, Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Khurram Faraaz >Assignee: Arina Ielchiieva > Fix For: 1.11.0 > > > Drill 1.9.0 git commit ID: 51246693 > {noformat} > 0: jdbc:drill:schema=dfs.tmp> values(1,2,3,4,5,6) union all > values(7,8,9,10,11,12); > +-+-+-+-+-+-+ > | EXPR$0 | EXPR$1 | EXPR$2 | EXPR$3 | EXPR$4 | EXPR$5 | > +-+-+-+-+-+-+ > | 7 | 8 | 9 | 10 | 11 | 12 | > | 7 | 8 | 9 | 10 | 11 | 12 | > +-+-+-+-+-+-+ > 2 rows selected (0.209 seconds) > {noformat} > Postgres 9.3 > {noformat} > postgres=# values(1,2,3,4,5,6) union all values(7,8,9,10,11,12); > column1 | column2 | column3 | column4 | column5 | column6 > -+-+-+-+-+- >1 | 2 | 3 | 4 | 5 | 6 >7 | 8 | 9 | 10 | 11 | 12 > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057943#comment-16057943 ] ASF GitHub Bot commented on DRILL-3640: --- Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123324366 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java --- @@ -64,13 +65,17 @@ org.slf4j.LoggerFactory.getLogger(DrillResultSetImpl.class); private final DrillConnectionImpl connection; + private DrillStatementImpl drillStatement = null; private volatile boolean hasPendingCancelationNotification = false; DrillResultSetImpl(AvaticaStatement statement, Meta.Signature signature, ResultSetMetaData resultSetMetaData, TimeZone timeZone, Meta.Frame firstFrame) { super(statement, signature, resultSetMetaData, timeZone, firstFrame); connection = (DrillConnectionImpl) statement.getConnection(); +if (statement instanceof DrillStatementImpl) { --- End diff -- The original DrillStatement threw an exception for setting the query time out, but I don't see the same for the DrillPreparedStatement, which tries to call Avatica's implementation. My testing indicates that this is effectively a No-Op. I can extend it to that as well, but I was hoping to hear back from you first. > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057945#comment-16057945 ] ASF GitHub Bot commented on DRILL-3640: --- Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123325052 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/SqlTimeoutException.java --- @@ -23,12 +23,17 @@ * Indicates that an operation timed out. This is not an error; you can * retry the operation. */ -public class SqlTimeoutException -extends SQLException -{ +public class SqlTimeoutException extends SQLException { + + private static final long serialVersionUID = 2017_06_20L; --- End diff -- Fair enough... will set it to a -1. I saw a similar assignment for _InvalidParameterSqlException_ , which was set as a timestamp, so I just tried to follow the convention. > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057951#comment-16057951 ] ASF GitHub Bot commented on DRILL-3640: --- Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123325595 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java --- @@ -38,8 +44,12 @@ // methods for compatibility.) class DrillStatementImpl extends AvaticaStatement implements DrillStatement, DrillRemoteStatement { + //Not using the DrillbitContext's ExecutorService as this is threadPool is light-weight (threads wake up to cancel tasks) but needs a low response time + private static ExecutorService queryTimeoutTaskPool = Executors.newCachedThreadPool(new NamedThreadFactory("q-timeout-")); --- End diff -- I'm not sure if there is a way for me to reference back to the Statement object... since the objective is to simply have a sleeping thread in this pool timeout and **cancel** the query. Let me see look around. > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057963#comment-16057963 ] ASF GitHub Bot commented on DRILL-3640: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123327453 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java --- @@ -38,8 +44,12 @@ // methods for compatibility.) class DrillStatementImpl extends AvaticaStatement implements DrillStatement, DrillRemoteStatement { + //Not using the DrillbitContext's ExecutorService as this is threadPool is light-weight (threads wake up to cancel tasks) but needs a low response time + private static ExecutorService queryTimeoutTaskPool = Executors.newCachedThreadPool(new NamedThreadFactory("q-timeout-")); --- End diff -- DrillCursor is handling all the logic of executing queries, and waiting for results. It has access to the connection and the statement, so you would know the timeout (if set). In the cursor, we are using a lock for the first message, and a blocking queue for the batches, but when waiting on those, there's no timeout set. Instead we could use query timeout (or the remaining time left since the beginning of the execution) and throws SqlTimeoutException when the locks throws TimeoutException themselves. In that scenario, no thread pool involved (except the one for I/O but it was already existing) > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057965#comment-16057965 ] ASF GitHub Bot commented on DRILL-3640: --- Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123327737 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java --- @@ -159,24 +230,25 @@ public void cleanUp() { public int getQueryTimeout() throws AlreadyClosedSqlException { throwIfClosed(); -return 0; // (No no timeout.) --- End diff -- Interestingly, AvaticaStatement returns the timeout value that was set... but does not honour it! :) Originally the setter would trigger a NotSupported exception and the explicit return was the default 0 Now that we're able to support it, I can read Avatica's value directly. > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057978#comment-16057978 ] ASF GitHub Bot commented on DRILL-3640: --- Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123329050 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java --- @@ -204,7 +276,7 @@ public boolean isClosed() { @Override public int getMaxFieldSize() throws SQLException { -throwIfClosed(); +throwIfTimedOutOrClosed(); --- End diff -- The throwIfTimedOutOrClosed() call is basically a wrapper around a sequential check for, first the timed-out state, followed by the check for the closed state. A timed-out query (i.e. statement/resultset) is already in a closed state, but we need to throw the correct exception (in this case, the timeout), which is why it was done like that. My understanding was that any execute and data fetch operations can throw timeoutExceptions. Are you suggesting that for such '_getter_' methods, only an AlreadyClosed exception needs to be thrown and not time out? > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057979#comment-16057979 ] ASF GitHub Bot commented on DRILL-3640: --- Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123329155 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java --- @@ -497,14 +594,64 @@ public boolean isPoolable() throws SQLException { @Override public void closeOnCompletion() throws SQLException { -throwIfClosed(); +throwIfTimedOutOrClosed(); super.closeOnCompletion(); } @Override public boolean isCloseOnCompletion() throws SQLException { -throwIfClosed(); +throwIfTimedOutOrClosed(); return super.isCloseOnCompletion(); } } + +/** + * Timeout Trigger required for canceling of running queries + */ +class TimeoutTrigger implements Callable { + private int timeoutInSeconds; + + /** + * Get Timeout period in seconds + */ + public int getTimeoutInSeconds() { +return timeoutInSeconds; + } + + private DrillStatementImpl statementHandle; + + //Default Constructor is Invalid + @SuppressWarnings("unused") + private TimeoutTrigger() {} + + /** + * Timeout Constructor + * @param stmtContext Statement Handle + * @param timeoutInSec Timeout defined in seconds + */ + TimeoutTrigger(DrillStatementImpl stmtContext, int timeoutInSec) { +timeoutInSeconds = timeoutInSec; +statementHandle = stmtContext; + } + + @Override + public Boolean call() throws Exception { +try { + Thread.sleep(timeoutInSeconds*1000L); --- End diff -- +1 Will make the change. > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057981#comment-16057981 ] ASF GitHub Bot commented on DRILL-3640: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123329682 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java --- @@ -204,7 +276,7 @@ public boolean isClosed() { @Override public int getMaxFieldSize() throws SQLException { -throwIfClosed(); +throwIfTimedOutOrClosed(); --- End diff -- yes, execute and (optionally) data fetch operations. Other methods are not impacted. > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057983#comment-16057983 ] ASF GitHub Bot commented on DRILL-3640: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123329889 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java --- @@ -159,24 +230,25 @@ public void cleanUp() { public int getQueryTimeout() throws AlreadyClosedSqlException { throwIfClosed(); -return 0; // (No no timeout.) --- End diff -- it's an optional thing, but the spec doesn't say what happens if you set it but it's actually not used... > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5325) Implement sub-operator unit tests for managed external sort
[ https://issues.apache.org/jira/browse/DRILL-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057984#comment-16057984 ] ASF GitHub Bot commented on DRILL-5325: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/808 > Implement sub-operator unit tests for managed external sort > --- > > Key: DRILL-5325 > URL: https://issues.apache.org/jira/browse/DRILL-5325 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build & Test >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > Validate the proposed sub-operator test framework, by creating low-level unit > tests for the managed version of the external sort. > The external sort has a small number of existing tests, but those tests are > quite superficial; the "managed sort" project found many bugs. The managed > sort itself was tested with ad-hoc system-level tests created using the new > "cluster fixture" framework. But, again, such tests could not reach deep > inside the sort code to exercise very specific conditions. > As a result, we spent far too much time using QA functional tests to identify > specific code issues. > Using the sub-opeator unit test framework, we can instead test each bit of > functionality at the unit test level. > If doing so works, and is practical, it can serve as a model for other > operator testing projects. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057985#comment-16057985 ] ASF GitHub Bot commented on DRILL-3640: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123330075 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/SqlTimeoutException.java --- @@ -23,12 +23,17 @@ * Indicates that an operation timed out. This is not an error; you can * retry the operation. */ -public class SqlTimeoutException -extends SQLException -{ +public class SqlTimeoutException extends SQLException { + + private static final long serialVersionUID = 2017_06_20L; --- End diff -- missed this one. Honestly not super important (feel free to ignore my initial comment) > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057991#comment-16057991 ] ASF GitHub Bot commented on DRILL-3640: --- Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123330684 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java --- @@ -422,6 +507,9 @@ public ResultSet getGeneratedKeys() throws SQLException { public int executeUpdate(String sql, int autoGeneratedKeys) throws SQLException { throwIfClosed(); try { + if (timeoutTrigger != null) { --- End diff -- I'm submit the timeout trigger to the pool and counting on that trigger doing a query cancellation to do that. I don't think Drill supports executeUpdate, but as long as a query cancellation for updates does the rollback of the transaction, this should suffice. This worked well for large queries where the execute###() call was longer than the timeout period and allowed for the cancellation to do the interrupt. > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)
[ https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057995#comment-16057995 ] ASF GitHub Bot commented on DRILL-3640: --- Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/858#discussion_r123331635 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java --- @@ -125,7 +154,7 @@ protected void cancel() { // (Not delegated.) @Override public boolean next() throws SQLException { -throwIfClosed(); +throwIfTimedOutOrClosed(); --- End diff -- The query cancellation should take care of it. It'll be hard to have a unit test specifically for this, but I'll try. > Drill JDBC driver support Statement.setQueryTimeout(int) > > > Key: DRILL-3640 > URL: https://issues.apache.org/jira/browse/DRILL-3640 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC >Affects Versions: 1.2.0 >Reporter: Chun Chang >Assignee: Kunal Khatua > Fix For: 1.11.0 > > > It would be nice if we have this implemented. Run away queries can be > automatically canceled by setting the timeout. > java.sql.SQLFeatureNotSupportedException: Setting network timeout is not > supported. > at > org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-5601) Rollup of External Sort memory management fixes
Paul Rogers created DRILL-5601: -- Summary: Rollup of External Sort memory management fixes Key: DRILL-5601 URL: https://issues.apache.org/jira/browse/DRILL-5601 Project: Apache Drill Issue Type: Task Affects Versions: 1.11.0 Reporter: Paul Rogers Assignee: Paul Rogers Fix For: 1.11.0 Rollup of a set of specific JIRA entries that all relate to the very difficult problem of managing memory within Drill in order for the external sort to stay within a memory budget. In general, the fixes relate to better estimating memory used by the three ways that Drill allocates vector memory (see DRILL-5522) and to predicting the size of vectors that the sort will create, to avoid repeated realloc-copy cycles (see DRILL-5594). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3867) Store relative paths in metadata file
[ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058073#comment-16058073 ] ASF GitHub Bot commented on DRILL-3867: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/824#discussion_r123316880 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java --- @@ -179,10 +182,18 @@ private Metadata(FileSystem fs, ParquetFormatConfig formatConfig) { for (final FileStatus file : fs.listStatus(p, new DrillPathFilter())) { if (file.isDirectory()) { +String subdirectoryName = file.getPath().getName(); ParquetTableMetadata_v3 subTableMetadata = (createMetaFilesRecursively(file.getPath().toString())).getLeft(); -metaDataList.addAll(subTableMetadata.files); -directoryList.addAll(subTableMetadata.directories); -directoryList.add(file.getPath().toString()); +for (ParquetFileMetadata_v3 pfm_v3 : subTableMetadata.files) { + // Construction of the relative file path by adding subdirectory name and inner relative file path + String relativePath = Joiner.on("/").join(subdirectoryName, pfm_v3.getPath()); --- End diff -- Regarding the `paths` I answered in the general comment. I refused from merging path names recursively. Instead of that I've implemented a new `MetadataPathUtils.createMetadataWithRelativePaths()` method that converts absolute paths to relative ones and creates a new metadata for the cache files. > Store relative paths in metadata file > - > > Key: DRILL-3867 > URL: https://issues.apache.org/jira/browse/DRILL-3867 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Fix For: Future > > > git.commit.id.abbrev=cf4f745 > git.commit.time=29.09.2015 @ 23\:19\:52 UTC > The below sequence of steps reproduces the issue > 1. Create the cache file > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata > dfs.`/drill/testdata/metadata_caching/lineitem`; > +---+-+ > | ok | summary > | > +---+-+ > | true | Successfully updated metadata for table > /drill/testdata/metadata_caching/lineitem. | > +---+-+ > 1 row selected (1.558 seconds) > {code} > 2. Move the directory > {code} > hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/ > {code} > 3. Now run a query on top of it > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit > 1; > Error: SYSTEM ERROR: FileNotFoundException: Requested file > maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist. > [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] > (state=,code=0) > {code} > This is obvious given the fact that we are storing absolute file paths in the > cache file -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3867) Store relative paths in metadata file
[ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058074#comment-16058074 ] ASF GitHub Bot commented on DRILL-3867: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/824#discussion_r123329475 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java --- @@ -264,15 +275,18 @@ private ParquetTableMetadata_v3 getParquetTableMetadata(List fileSta /** * Get a list of file metadata for a list of parquet files * - * @param fileStatuses - * @return + * @param parquetTableMetadata_v3 can store column schema info from all the files and row groups + * @param fileStatuses list of the parquet files statuses + * @param absolutePathInMetadata true if result metadata files should contain absolute paths, false for relative paths. + * Relative paths in the metadata are only necessary while creating meta cache files. + * @return list of the parquet file metadata (parquet metadata for every file) * @throws IOException */ - private List getParquetFileMetadata_v3( - ParquetTableMetadata_v3 parquetTableMetadata_v3, List fileStatuses) throws IOException { + private List getParquetFileMetadata_v3(ParquetTableMetadata_v3 parquetTableMetadata_v3, + List fileStatuses, boolean absolutePathInMetadata) throws IOException { --- End diff -- Using of boolean flag is deleted. For now we create and gather metadata only with absolute paths. But before writing based on the old metadata the new metadata with relative paths is created. Agree. It makes sense to check every path while converting it. Done. > Store relative paths in metadata file > - > > Key: DRILL-3867 > URL: https://issues.apache.org/jira/browse/DRILL-3867 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Fix For: Future > > > git.commit.id.abbrev=cf4f745 > git.commit.time=29.09.2015 @ 23\:19\:52 UTC > The below sequence of steps reproduces the issue > 1. Create the cache file > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata > dfs.`/drill/testdata/metadata_caching/lineitem`; > +---+-+ > | ok | summary > | > +---+-+ > | true | Successfully updated metadata for table > /drill/testdata/metadata_caching/lineitem. | > +---+-+ > 1 row selected (1.558 seconds) > {code} > 2. Move the directory > {code} > hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/ > {code} > 3. Now run a query on top of it > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit > 1; > Error: SYSTEM ERROR: FileNotFoundException: Requested file > maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist. > [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] > (state=,code=0) > {code} > This is obvious given the fact that we are storing absolute file paths in the > cache file -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3867) Store relative paths in metadata file
[ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058076#comment-16058076 ] ASF GitHub Bot commented on DRILL-3867: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/824#discussion_r123342097 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java --- @@ -398,6 +398,23 @@ public void testDrill4877() throws Exception { } + @Test // DRILL-3867 + public void testMoveCache() throws Exception { +String tableName = "nation_move"; +String newTableName = "nation_moved"; +test("use dfs_test.tmp"); +test("create table `%s/t1` as select * from cp.`tpch/nation.parquet`", tableName); +test("create table `%s/t2` as select * from cp.`tpch/nation.parquet`", tableName); +test(String.format("refresh table metadata %s", tableName)); +checkForMetadataFile(tableName); +File srcFile = new File(getDfsTestTmpSchemaLocation(), tableName); +File dstFile = new File(getDfsTestTmpSchemaLocation(), newTableName); +FileUtils.moveDirectory(srcFile, dstFile); +Assert.assertFalse("Cache file was not moved successfully", srcFile.exists()); +int rowCount = testSql(String.format("select * from %s", newTableName)); +Assert.assertEquals(50, rowCount); + } + --- End diff -- There is no requirement for them to use absolute paths. After this fix they can be upgraded to use relative paths. (I'm going to open separate jira for it). Therefore a new test case for metadata cache files with absolute paths was added. > Store relative paths in metadata file > - > > Key: DRILL-3867 > URL: https://issues.apache.org/jira/browse/DRILL-3867 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Fix For: Future > > > git.commit.id.abbrev=cf4f745 > git.commit.time=29.09.2015 @ 23\:19\:52 UTC > The below sequence of steps reproduces the issue > 1. Create the cache file > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata > dfs.`/drill/testdata/metadata_caching/lineitem`; > +---+-+ > | ok | summary > | > +---+-+ > | true | Successfully updated metadata for table > /drill/testdata/metadata_caching/lineitem. | > +---+-+ > 1 row selected (1.558 seconds) > {code} > 2. Move the directory > {code} > hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/ > {code} > 3. Now run a query on top of it > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit > 1; > Error: SYSTEM ERROR: FileNotFoundException: Requested file > maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist. > [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] > (state=,code=0) > {code} > This is obvious given the fact that we are storing absolute file paths in the > cache file -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3867) Store relative paths in metadata file
[ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058075#comment-16058075 ] ASF GitHub Bot commented on DRILL-3867: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/824#discussion_r123339909 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java --- @@ -748,6 +771,22 @@ public ParquetTableMetadataDirs(List directories) { return directories; } +/** If directories list contains relative paths, update it to absolute ones + * @param baseDir base parent directory + */ +@JsonIgnore public void updateRelativePaths(Path baseDir) { + if (!directories.isEmpty()) { +// It is enough to check the first path to decide if updating needed +if (!new Path(directories.get(0)).isAbsolute()) { --- End diff -- It is possible to replace String with Path for directories paths due to implementing custom `JsonSerializer` and `JsonDeserializer`. But then it will be necessary to convert every `Path` from lists back into `String`, because a String paths are used in a lot of places: `FileSelection`, `Metadata`, `ParquetGroupScan`, `ReadEntryWithPath`, `FileWork`, `FormatSelection`, `FormatPlugin`, `PartitionLocation`and so on. I am totally agree with replacing `String` with `Path` requirement. But it should be done not only for parquet and in context of separate jira. I am going to create it. > Store relative paths in metadata file > - > > Key: DRILL-3867 > URL: https://issues.apache.org/jira/browse/DRILL-3867 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Fix For: Future > > > git.commit.id.abbrev=cf4f745 > git.commit.time=29.09.2015 @ 23\:19\:52 UTC > The below sequence of steps reproduces the issue > 1. Create the cache file > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata > dfs.`/drill/testdata/metadata_caching/lineitem`; > +---+-+ > | ok | summary > | > +---+-+ > | true | Successfully updated metadata for table > /drill/testdata/metadata_caching/lineitem. | > +---+-+ > 1 row selected (1.558 seconds) > {code} > 2. Move the directory > {code} > hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/ > {code} > 3. Now run a query on top of it > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit > 1; > Error: SYSTEM ERROR: FileNotFoundException: Requested file > maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist. > [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] > (state=,code=0) > {code} > This is obvious given the fact that we are storing absolute file paths in the > cache file -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3867) Store relative paths in metadata file
[ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058071#comment-16058071 ] ASF GitHub Bot commented on DRILL-3867: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/824#discussion_r123331277 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java --- @@ -748,6 +771,22 @@ public ParquetTableMetadataDirs(List directories) { return directories; } +/** If directories list contains relative paths, update it to absolute ones --- End diff -- Yes, we do, internally we use absolute paths (for the FileSelection, FileStatus, ReadEntryWithPath). By the way it is possible to convert paths to absolute ones just before retrieving, but converting immediately after deserializing has advantages: avoiding of the keeping the metadata with appropriate `baseDir`, avoiding of the over number of checking the type of the path and avoiding an extra converting paths (when the data is retrieved several times from one metadata object). > Store relative paths in metadata file > - > > Key: DRILL-3867 > URL: https://issues.apache.org/jira/browse/DRILL-3867 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Fix For: Future > > > git.commit.id.abbrev=cf4f745 > git.commit.time=29.09.2015 @ 23\:19\:52 UTC > The below sequence of steps reproduces the issue > 1. Create the cache file > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata > dfs.`/drill/testdata/metadata_caching/lineitem`; > +---+-+ > | ok | summary > | > +---+-+ > | true | Successfully updated metadata for table > /drill/testdata/metadata_caching/lineitem. | > +---+-+ > 1 row selected (1.558 seconds) > {code} > 2. Move the directory > {code} > hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/ > {code} > 3. Now run a query on top of it > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit > 1; > Error: SYSTEM ERROR: FileNotFoundException: Requested file > maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist. > [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] > (state=,code=0) > {code} > This is obvious given the fact that we are storing absolute file paths in the > cache file -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3867) Store relative paths in metadata file
[ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058072#comment-16058072 ] ASF GitHub Bot commented on DRILL-3867: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/824#discussion_r123340606 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java --- @@ -1413,6 +1452,31 @@ public ColumnTypeMetadata_v3 getColumnTypeInfo(String[] name) { return directories; } +/** If directories list and file metadata list contain relative paths, update it to absolute ones + * @param baseDir base parent directory + */ +@JsonIgnore public void updateRelativePaths(Path baseDir) { --- End diff -- I combined the general code for this two methods and created a separate helper methods. > Store relative paths in metadata file > - > > Key: DRILL-3867 > URL: https://issues.apache.org/jira/browse/DRILL-3867 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Fix For: Future > > > git.commit.id.abbrev=cf4f745 > git.commit.time=29.09.2015 @ 23\:19\:52 UTC > The below sequence of steps reproduces the issue > 1. Create the cache file > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata > dfs.`/drill/testdata/metadata_caching/lineitem`; > +---+-+ > | ok | summary > | > +---+-+ > | true | Successfully updated metadata for table > /drill/testdata/metadata_caching/lineitem. | > +---+-+ > 1 row selected (1.558 seconds) > {code} > 2. Move the directory > {code} > hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/ > {code} > 3. Now run a query on top of it > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit > 1; > Error: SYSTEM ERROR: FileNotFoundException: Requested file > maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist. > [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] > (state=,code=0) > {code} > This is obvious given the fact that we are storing absolute file paths in the > cache file -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5256) Exception in thread "main" java.lang.ExceptionInInitializerError
[ https://issues.apache.org/jira/browse/DRILL-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058393#comment-16058393 ] N Campbell commented on DRILL-5256: --- Is anyone looking at this issue? Drill 1.10. SQL Squirrel IBM JRE Caused by: java.lang.NullPointerException at oadd.io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.(PooledByteBufAllocatorL.java:93) at oadd.io.netty.buffer.PooledByteBufAllocatorL.(PooledByteBufAllocatorL.java:56) at oadd.org.apache.drill.exec.memory.AllocationManager.(AllocationManager.java:60) ... 16 more > Exception in thread "main" java.lang.ExceptionInInitializerError > > > Key: DRILL-5256 > URL: https://issues.apache.org/jira/browse/DRILL-5256 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.7.0, 1.8.0 > Environment: Windows 7, IBM SDK 1.6 and IBM SDK 1.7 >Reporter: Vasu > Labels: jvm > > Below error while connecting to Drill Server > Exception in thread "main" java.lang.ExceptionInInitializerError > at java.lang.J9VMInternals.initialize(J9VMInternals.java:257) > at > oadd.org.apache.drill.exec.memory.BaseAllocator.(BaseAllocator.java:44) > at java.lang.J9VMInternals.initializeImpl(Native Method) > at java.lang.J9VMInternals.initialize(J9VMInternals.java:235) > at java.lang.J9VMInternals.initialize(J9VMInternals.java:202) > at > oadd.org.apache.drill.exec.memory.RootAllocatorFactory.newRoot(RootAllocatorFactory.java:38) > at > org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:143) > at > org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:64) > at org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:69) > at > oadd.net.hydromatic.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:126) > at org.apache.drill.jdbc.Driver.connect(Driver.java:72) > at java.sql.DriverManager.getConnection(DriverManager.java:583) > at java.sql.DriverManager.getConnection(DriverManager.java:245) > at com.trianz.drill.ApacheDrillDemo.main(ApacheDrillDemo.java:13) > Caused by: java.lang.NullPointerException > at > oadd.io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.(PooledByteBufAllocatorL.java:93) > at > oadd.io.netty.buffer.PooledByteBufAllocatorL.(PooledByteBufAllocatorL.java:56) > at > oadd.org.apache.drill.exec.memory.AllocationManager.(AllocationManager.java:60) > at java.lang.J9VMInternals.initializeImpl(Native Method) > at java.lang.J9VMInternals.initialize(J9VMInternals.java:235) > ... 13 more > When I tried to debug in to source code, following is the place where we are > getting NULL POINTER EXCEPTION > > drill/exec/memory/base/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java > > Line: 93: this.chunkSize = directArenas[0].chunkSize; > Below is the code snapshot. > public InnerAllocator() { > super(true); > try { > Field f = > PooledByteBufAllocator.class.getDeclaredField("directArenas"); > f.setAccessible(true); > this.directArenas = (PoolArena[]) f.get(this); > } catch (Exception e) { > throw new RuntimeException("Failure while initializing allocator. > Unable to retrieve direct arenas field.", e); > } > this.chunkSize = directArenas[0].chunkSize; > if (memoryLogger.isTraceEnabled()) { > Can anyone please help on this? Thanks in advance. > Thanks, > Vasu T -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements
[ https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058445#comment-16058445 ] ASF GitHub Bot commented on DRILL-5518: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/851#discussion_r123138037 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/SubOperatorTest.java --- @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + **/ --- End diff -- Checkstyle does not care. In fact, earlier headers used to use a line of stars across the top as well, but the first three characters, /**, caused the comment to look like Javadoc, so we've been deprecating that style. Still, fixed this one as well. > Roll-up of a number of test framework enhancements > -- > > Key: DRILL-5518 > URL: https://issues.apache.org/jira/browse/DRILL-5518 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > Recent development work identified a number of minor enhancements to the > "sub-operator" unit tests: > * Create a {{SubOperatorTest}} base class to do routine setup and shutdown. > * Additional methods to simplify creating complex schemas with field widths. > * Define a test workspace with plugin-specific options (as for the CSV > storage plugin) > * When verifying row sets, add methods to verify and release just the > "actual" batch in addition to the existing method for verify and free both > the actual and expected batches. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements
[ https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058447#comment-16058447 ] ASF GitHub Bot commented on DRILL-5518: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/851#discussion_r123140150 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/SchemaBuilder.java --- @@ -96,27 +137,57 @@ public SchemaBuilder withSVMode(SelectionVectorMode svMode) { public SchemaBuilder() { } + public SchemaBuilder(BatchSchema baseSchema) { +for (MaterializedField field : baseSchema) { + columns.add(field); +} + } + public SchemaBuilder add(String pathName, MajorType type) { -MaterializedField col = MaterializedField.create(pathName, type); +return add(MaterializedField.create(pathName, type)); + } + + public SchemaBuilder add(MaterializedField col) { columns.add(col); return this; } + public static MaterializedField columnSchema(String pathName, MinorType type, DataMode mode) { +return MaterializedField.create(pathName, +MajorType.newBuilder() --- End diff -- Just saving an unnecessary object creation. The schema builder is for cases where we set more than the "basic three" properties. This method handles the vast majority of the cases in which we use just the "basic three". > Roll-up of a number of test framework enhancements > -- > > Key: DRILL-5518 > URL: https://issues.apache.org/jira/browse/DRILL-5518 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > Recent development work identified a number of minor enhancements to the > "sub-operator" unit tests: > * Create a {{SubOperatorTest}} base class to do routine setup and shutdown. > * Additional methods to simplify creating complex schemas with field widths. > * Define a test workspace with plugin-specific options (as for the CSV > storage plugin) > * When verifying row sets, add methods to verify and release just the > "actual" batch in addition to the existing method for verify and free both > the actual and expected batches. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements
[ https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058448#comment-16058448 ] ASF GitHub Bot commented on DRILL-5518: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/851#discussion_r123138411 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/SubOperatorTest.java --- @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + **/ +package org.apache.drill.test; + +import org.junit.AfterClass; +import org.junit.BeforeClass; + +public class SubOperatorTest extends DrillTest { + + protected static OperatorFixture fixture; + + @BeforeClass + public static void setUpBeforeClass() throws Exception { +fixture = OperatorFixture.standardFixture(); + } + + @AfterClass + public static void tearDownAfterClass() throws Exception { --- End diff -- There is a reason... Junit also allows a \@Before and \@After tag that are per-test actions. Those are often called `setup()` and `teardown()`. The static per-class versions are often called with the names used here. Still, renamed them to be a bit clearer. > Roll-up of a number of test framework enhancements > -- > > Key: DRILL-5518 > URL: https://issues.apache.org/jira/browse/DRILL-5518 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > Recent development work identified a number of minor enhancements to the > "sub-operator" unit tests: > * Create a {{SubOperatorTest}} base class to do routine setup and shutdown. > * Additional methods to simplify creating complex schemas with field widths. > * Define a test workspace with plugin-specific options (as for the CSV > storage plugin) > * When verifying row sets, add methods to verify and release just the > "actual" batch in addition to the existing method for verify and free both > the actual and expected batches. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements
[ https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058449#comment-16058449 ] ASF GitHub Bot commented on DRILL-5518: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/851#discussion_r123140560 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/RowSetTest.java --- @@ -309,13 +317,38 @@ private void testDoubleRW() { assertEquals(0, reader.column(0).getDouble(), 0.01); assertTrue(reader.next()); assertEquals(Double.MAX_VALUE, reader.column(0).getDouble(), 0.01); +assertEquals(Double.MAX_VALUE, (double) reader.column(0).getObject(), 0.01); --- End diff -- Casting is required because `assertEquals` does not know what to do with an `Object` as its second argument. The cast to `double` forces a cast to `Double`, then unboxing. > Roll-up of a number of test framework enhancements > -- > > Key: DRILL-5518 > URL: https://issues.apache.org/jira/browse/DRILL-5518 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > Recent development work identified a number of minor enhancements to the > "sub-operator" unit tests: > * Create a {{SubOperatorTest}} base class to do routine setup and shutdown. > * Additional methods to simplify creating complex schemas with field widths. > * Define a test workspace with plugin-specific options (as for the CSV > storage plugin) > * When verifying row sets, add methods to verify and release just the > "actual" batch in addition to the existing method for verify and free both > the actual and expected batches. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements
[ https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058452#comment-16058452 ] ASF GitHub Bot commented on DRILL-5518: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/851#discussion_r123140827 --- Diff: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/impl/TupleReaderImpl.java --- @@ -101,8 +88,61 @@ public String getAsString(int colIndex) { return "\"" + colReader.getString() + "\""; case DECIMAL: return colReader.getDecimal().toPlainString(); +case ARRAY: + return getArrayAsString(colReader.array()); default: throw new IllegalArgumentException("Unsupported type " + colReader.valueType()); } } + + private String bytesToString(byte[] value) { +StringBuilder buf = new StringBuilder() +.append("["); +int len = Math.min(value.length, 20); +for (int i = 0; i < len; i++) { + if (i > 0) { +buf.append(", "); + } + buf.append((int) value[i]); +} +if (value.length > len) { + buf.append("..."); +} +buf.append("]"); +return buf.toString(); + } + + private String getArrayAsString(ArrayReader array) { +StringBuilder buf = new StringBuilder(); +buf.append("["); +for (int i = 0; i < array.size(); i++) { + if (i > 0) { +buf.append( ", " ); + } + switch (array.valueType()) { --- End diff -- Handled via the `default` clause? > Roll-up of a number of test framework enhancements > -- > > Key: DRILL-5518 > URL: https://issues.apache.org/jira/browse/DRILL-5518 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > Recent development work identified a number of minor enhancements to the > "sub-operator" unit tests: > * Create a {{SubOperatorTest}} base class to do routine setup and shutdown. > * Additional methods to simplify creating complex schemas with field widths. > * Define a test workspace with plugin-specific options (as for the CSV > storage plugin) > * When verifying row sets, add methods to verify and release just the > "actual" batch in addition to the existing method for verify and free both > the actual and expected batches. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements
[ https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058450#comment-16058450 ] ASF GitHub Bot commented on DRILL-5518: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/851#discussion_r123139571 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/SchemaBuilder.java --- @@ -53,6 +53,47 @@ public class SchemaBuilder { /** + * Build a column schema (AKA "materialized field") based on name and a + * variety of schema options. Every column needs a name and (minor) type, + * some may need a mode other than required, may need a width, may + * need scale and precision, and so on. + */ + + // TODO: Add map methods + + public static class ColumnBuilder { +private String name; +private MajorType.Builder typeBuilder; --- End diff -- Fixed. > Roll-up of a number of test framework enhancements > -- > > Key: DRILL-5518 > URL: https://issues.apache.org/jira/browse/DRILL-5518 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > Recent development work identified a number of minor enhancements to the > "sub-operator" unit tests: > * Create a {{SubOperatorTest}} base class to do routine setup and shutdown. > * Additional methods to simplify creating complex schemas with field widths. > * Define a test workspace with plugin-specific options (as for the CSV > storage plugin) > * When verifying row sets, add methods to verify and release just the > "actual" batch in addition to the existing method for verify and free both > the actual and expected batches. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements
[ https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058451#comment-16058451 ] ASF GitHub Bot commented on DRILL-5518: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/851#discussion_r123140434 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/RowSetTest.java --- @@ -289,13 +295,15 @@ private void testFloatRW() { assertEquals(0, reader.column(0).getDouble(), 0.01); assertTrue(reader.next()); assertEquals(Float.MAX_VALUE, reader.column(0).getDouble(), 0.01); +assertEquals((double) Float.MAX_VALUE, (double) reader.column(0).getObject(), 0.01); --- End diff -- Fixed. > Roll-up of a number of test framework enhancements > -- > > Key: DRILL-5518 > URL: https://issues.apache.org/jira/browse/DRILL-5518 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > Recent development work identified a number of minor enhancements to the > "sub-operator" unit tests: > * Create a {{SubOperatorTest}} base class to do routine setup and shutdown. > * Additional methods to simplify creating complex schemas with field widths. > * Define a test workspace with plugin-specific options (as for the CSV > storage plugin) > * When verifying row sets, add methods to verify and release just the > "actual" batch in addition to the existing method for verify and free both > the actual and expected batches. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements
[ https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058446#comment-16058446 ] ASF GitHub Bot commented on DRILL-5518: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/851#discussion_r123139892 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/SchemaBuilder.java --- @@ -96,27 +137,57 @@ public SchemaBuilder withSVMode(SelectionVectorMode svMode) { public SchemaBuilder() { } + public SchemaBuilder(BatchSchema baseSchema) { +for (MaterializedField field : baseSchema) { + columns.add(field); --- End diff -- Fixed. > Roll-up of a number of test framework enhancements > -- > > Key: DRILL-5518 > URL: https://issues.apache.org/jira/browse/DRILL-5518 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > Recent development work identified a number of minor enhancements to the > "sub-operator" unit tests: > * Create a {{SubOperatorTest}} base class to do routine setup and shutdown. > * Additional methods to simplify creating complex schemas with field widths. > * Define a test workspace with plugin-specific options (as for the CSV > storage plugin) > * When verifying row sets, add methods to verify and release just the > "actual" batch in addition to the existing method for verify and free both > the actual and expected batches. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5496) Must restart drillbits whenever a secure Hive metastore is restarted
[ https://issues.apache.org/jira/browse/DRILL-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058455#comment-16058455 ] ASF GitHub Bot commented on DRILL-5496: --- Github user paul-rogers closed the pull request at: https://github.com/apache/drill/pull/833 > Must restart drillbits whenever a secure Hive metastore is restarted > > > Key: DRILL-5496 > URL: https://issues.apache.org/jira/browse/DRILL-5496 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.11.0 > > > DRILL-4964: "Drill fails to connect to hive metastore after hive metastore is > restarted unless drillbits are restarted also" attempted to fix a bug in > Drill in which Drill hangs if Hive is restarted. Now, we see that all > subsequent "show schemas" queries fail. > Steps to repro: > 1. Build a secure cluster (we used MapR) > 2. Install Hive and Drill services > 3. Configure drill impersonation and authentication > 4. Restart hivemeta service > 5. Connect to drill and execute query involving hive storage, issue occurs > 6. Restart the drill-bits services and execute the query, issue is no longer > hit > The problem occurs in the same place as the earlier fix, but might represent > a slightly different use case: in this case the connection is secure. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5496) Must restart drillbits whenever a secure Hive metastore is restarted
[ https://issues.apache.org/jira/browse/DRILL-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058454#comment-16058454 ] ASF GitHub Bot commented on DRILL-5496: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/833 Closing PR as change was merged into master (but without the magic commit message that closes this automagically.) > Must restart drillbits whenever a secure Hive metastore is restarted > > > Key: DRILL-5496 > URL: https://issues.apache.org/jira/browse/DRILL-5496 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.11.0 > > > DRILL-4964: "Drill fails to connect to hive metastore after hive metastore is > restarted unless drillbits are restarted also" attempted to fix a bug in > Drill in which Drill hangs if Hive is restarted. Now, we see that all > subsequent "show schemas" queries fail. > Steps to repro: > 1. Build a secure cluster (we used MapR) > 2. Install Hive and Drill services > 3. Configure drill impersonation and authentication > 4. Restart hivemeta service > 5. Connect to drill and execute query involving hive storage, issue occurs > 6. Restart the drill-bits services and execute the query, issue is no longer > hit > The problem occurs in the same place as the earlier fix, but might represent > a slightly different use case: in this case the connection is secure. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-2478) Validating values assigned to SYSTEM/SESSION configuration parameters
[ https://issues.apache.org/jira/browse/DRILL-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058470#comment-16058470 ] ASF GitHub Bot commented on DRILL-2478: --- GitHub user ppadma opened a pull request: https://github.com/apache/drill/pull/859 DRILL-2478: Validating values assigned to SYSTEM/SESSION configuratio… …n parameters You can merge this pull request into a Git repository by running: $ git pull https://github.com/ppadma/drill DRILL-2478 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/859.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #859 commit 2f10ea3edf14aefff767db5dc8990c3374dc8553 Author: Padma Penumarthy Date: 2017-06-21T23:14:26Z DRILL-2478: Validating values assigned to SYSTEM/SESSION configuration parameters > Validating values assigned to SYSTEM/SESSION configuration parameters > - > > Key: DRILL-2478 > URL: https://issues.apache.org/jira/browse/DRILL-2478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.11.0 > Environment: {code} > 0: jdbc:drill:> select * from sys.version; > +++-+-++ > | commit_id | commit_message | commit_time | build_email | build_time | > +++-+-++ > | f658a3c513ddf7f2d1b0ad7aa1f3f65049a594fe | DRILL-2209 Insert > ProjectOperator with MuxExchange | 09.03.2015 @ 01:49:18 EDT | Unknown | > 09.03.2015 @ 04:50:05 EDT | > +++-+-++ > 1 row selected (0.046 seconds) > {code} >Reporter: Khurram Faraaz > Fix For: Future > > > Values that are assigned to configuration parameters of type SYSTEM and > SESSION must be validated. Currently any value can be assigned to some of the > SYSTEM/SESSION type parameters. > Here are two examples where assignment of invalid values to store.format does > not result in any error. > {code} > 0: jdbc:drill:> alter session set `store.format`='1'; > +++ > | ok | summary | > +++ > | true | store.format updated. | > +++ > 1 row selected (0.02 seconds) > {code} > {code} > 0: jdbc:drill:> alter session set `store.format`='foo'; > +++ > | ok | summary | > +++ > | true | store.format updated. | > +++ > 1 row selected (0.039 seconds) > {code} > In some cases values to some of the configuration parameters are validated, > like in this example, where trying to assign an invalid value to parameter > store.parquet.compression results in an error, which is correct. However, > this kind of validation is not performed for every configuration parameter of > SYSTEM/SESSION type. These values that are assigned to parameters must be > validated, and report errors if incorrect values are assigned by users. > {code} > 0: jdbc:drill:> alter session set `store.parquet.compression`='anything'; > Query failed: ExpressionParsingException: Option store.parquet.compression > must be one of: [snappy, gzip, none] > Error: exception while executing query: Failure while executing query. > (state=,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-2478) Validating values assigned to SYSTEM/SESSION configuration parameters
[ https://issues.apache.org/jira/browse/DRILL-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058507#comment-16058507 ] ASF GitHub Bot commented on DRILL-2478: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/859#discussion_r123396761 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java --- @@ -166,7 +166,7 @@ String DEFAULT_TEMPORARY_WORKSPACE = "drill.exec.default_temporary_workspace"; String OUTPUT_FORMAT_OPTION = "store.format"; - OptionValidator OUTPUT_FORMAT_VALIDATOR = new StringValidator(OUTPUT_FORMAT_OPTION, "parquet"); + OptionValidator OUTPUT_FORMAT_VALIDATOR = new EnumeratedStringValidator(OUTPUT_FORMAT_OPTION, "parquet", "parquet", "json", "psv", "csv", "tsv", "csvh"); --- End diff -- Is this the right approach? This validator means that anyone who adds an output file format must modify this file and rebuild all of Drill in order to use that format by default. Not sure we want such tight binding. Seems a better idea would be to register all known output formats at runtime, validate the option against that list, warn if the format is unknown, and default to Parquet. > Validating values assigned to SYSTEM/SESSION configuration parameters > - > > Key: DRILL-2478 > URL: https://issues.apache.org/jira/browse/DRILL-2478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.11.0 > Environment: {code} > 0: jdbc:drill:> select * from sys.version; > +++-+-++ > | commit_id | commit_message | commit_time | build_email | build_time | > +++-+-++ > | f658a3c513ddf7f2d1b0ad7aa1f3f65049a594fe | DRILL-2209 Insert > ProjectOperator with MuxExchange | 09.03.2015 @ 01:49:18 EDT | Unknown | > 09.03.2015 @ 04:50:05 EDT | > +++-+-++ > 1 row selected (0.046 seconds) > {code} >Reporter: Khurram Faraaz > Fix For: Future > > > Values that are assigned to configuration parameters of type SYSTEM and > SESSION must be validated. Currently any value can be assigned to some of the > SYSTEM/SESSION type parameters. > Here are two examples where assignment of invalid values to store.format does > not result in any error. > {code} > 0: jdbc:drill:> alter session set `store.format`='1'; > +++ > | ok | summary | > +++ > | true | store.format updated. | > +++ > 1 row selected (0.02 seconds) > {code} > {code} > 0: jdbc:drill:> alter session set `store.format`='foo'; > +++ > | ok | summary | > +++ > | true | store.format updated. | > +++ > 1 row selected (0.039 seconds) > {code} > In some cases values to some of the configuration parameters are validated, > like in this example, where trying to assign an invalid value to parameter > store.parquet.compression results in an error, which is correct. However, > this kind of validation is not performed for every configuration parameter of > SYSTEM/SESSION type. These values that are assigned to parameters must be > validated, and report errors if incorrect values are assigned by users. > {code} > 0: jdbc:drill:> alter session set `store.parquet.compression`='anything'; > Query failed: ExpressionParsingException: Option store.parquet.compression > must be one of: [snappy, gzip, none] > Error: exception while executing query: Failure while executing query. > (state=,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5432) Want a memory format for PCAP files
[ https://issues.apache.org/jira/browse/DRILL-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058525#comment-16058525 ] ASF GitHub Bot commented on DRILL-5432: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/831#discussion_r123397439 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapRecordReader.java --- @@ -0,0 +1,307 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to you under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.pcap; + +import com.google.common.collect.ImmutableList; +import com.google.common.collect.ImmutableMap; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.TypeProtos.MajorType; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.exception.SchemaChangeException; +import org.apache.drill.exec.expr.TypeHelper; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.pcap.decoder.Packet; +import org.apache.drill.exec.store.pcap.decoder.PacketDecoder; +import org.apache.drill.exec.store.pcap.dto.ColumnDto; +import org.apache.drill.exec.store.pcap.schema.PcapTypes; +import org.apache.drill.exec.store.pcap.schema.Schema; +import org.apache.drill.exec.vector.NullableBigIntVector; +import org.apache.drill.exec.vector.NullableIntVector; +import org.apache.drill.exec.vector.NullableTimeStampVector; +import org.apache.drill.exec.vector.NullableVarCharVector; +import org.apache.drill.exec.vector.ValueVector; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.FileInputStream; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.util.List; +import java.util.Map; + +import static java.nio.charset.StandardCharsets.UTF_8; +import static org.apache.drill.exec.store.pcap.Utils.parseBytesToASCII; + +public class PcapRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(PcapRecordReader.class); + + private static final int BATCH_SIZE = 40_000; + + private OutputMutator output; + + private PacketDecoder decoder; + private ImmutableList projectedCols; + + private byte[] buffer; + private int offset = 0; + private InputStream in; + private int validBytes; + + private String inputPath; + private List projectedColumns; + + private static final Map TYPES; + + private static class ProjectedColumnInfo { +ValueVector vv; +ColumnDto pcapColumn; + } + + static { +TYPES = ImmutableMap.builder() +.put(PcapTypes.STRING, MinorType.VARCHAR) +.put(PcapTypes.INTEGER, MinorType.INT) +.put(PcapTypes.LONG, MinorType.BIGINT) +.put(PcapTypes.TIMESTAMP, MinorType.TIMESTAMP) +.build(); + } + + public PcapRecordReader(final String inputPath, + final List projectedColumns) { +this.inputPath = inputPath; +this.projectedColumns = projectedColumns; + } + + @Override + public void setup(final OperatorContext context, final OutputMutator output) throws ExecutionSetupException { +try { + + this.output = output; + this.buffer = new byte[10]; + this.in = new FileInputStream(inputPath); + this.decoder = new PacketDecoder(in); + this.validBytes = in.re
[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements
[ https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058523#comment-16058523 ] ASF GitHub Bot commented on DRILL-5518: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/851#discussion_r123397036 --- Diff: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/impl/TupleReaderImpl.java --- @@ -101,8 +88,61 @@ public String getAsString(int colIndex) { return "\"" + colReader.getString() + "\""; case DECIMAL: return colReader.getDecimal().toPlainString(); +case ARRAY: + return getArrayAsString(colReader.array()); default: throw new IllegalArgumentException("Unsupported type " + colReader.valueType()); } } + + private String bytesToString(byte[] value) { +StringBuilder buf = new StringBuilder() +.append("["); +int len = Math.min(value.length, 20); +for (int i = 0; i < len; i++) { + if (i > 0) { +buf.append(", "); + } + buf.append((int) value[i]); +} +if (value.length > len) { + buf.append("..."); +} +buf.append("]"); +return buf.toString(); + } + + private String getArrayAsString(ArrayReader array) { +StringBuilder buf = new StringBuilder(); +buf.append("["); +for (int i = 0; i < array.size(); i++) { + if (i > 0) { +buf.append( ", " ); + } + switch (array.valueType()) { --- End diff -- For `getObject()` implementation it's throwing specifically `UnsupportedOperationException` for Map and Array type whereas for default type it throws `IllegalArgumentException`. I thought the idea was Map and Array are valid ArgumentType but not supported for now, hence the comment. > Roll-up of a number of test framework enhancements > -- > > Key: DRILL-5518 > URL: https://issues.apache.org/jira/browse/DRILL-5518 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > Recent development work identified a number of minor enhancements to the > "sub-operator" unit tests: > * Create a {{SubOperatorTest}} base class to do routine setup and shutdown. > * Additional methods to simplify creating complex schemas with field widths. > * Define a test workspace with plugin-specific options (as for the CSV > storage plugin) > * When verifying row sets, add methods to verify and release just the > "actual" batch in addition to the existing method for verify and free both > the actual and expected batches. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5518) Roll-up of a number of test framework enhancements
[ https://issues.apache.org/jira/browse/DRILL-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058524#comment-16058524 ] ASF GitHub Bot commented on DRILL-5518: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/851#discussion_r123397308 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/RowSetTest.java --- @@ -309,13 +317,38 @@ private void testDoubleRW() { assertEquals(0, reader.column(0).getDouble(), 0.01); assertTrue(reader.next()); assertEquals(Double.MAX_VALUE, reader.column(0).getDouble(), 0.01); +assertEquals(Double.MAX_VALUE, (double) reader.column(0).getObject(), 0.01); --- End diff -- Yes but my understanding is that `RowSetReader.column(0)` will return `ColumnReader` and you implemented `getObject` in `AbstractColumnReader` which is handing the type there. So `getObject` will return correct type of value ? > Roll-up of a number of test framework enhancements > -- > > Key: DRILL-5518 > URL: https://issues.apache.org/jira/browse/DRILL-5518 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > Recent development work identified a number of minor enhancements to the > "sub-operator" unit tests: > * Create a {{SubOperatorTest}} base class to do routine setup and shutdown. > * Additional methods to simplify creating complex schemas with field widths. > * Define a test workspace with plugin-specific options (as for the CSV > storage plugin) > * When verifying row sets, add methods to verify and release just the > "actual" batch in addition to the existing method for verify and free both > the actual and expected batches. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3867) Store relative paths in metadata file
[ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058540#comment-16058540 ] ASF GitHub Bot commented on DRILL-3867: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/824#discussion_r123398393 --- Diff: exec/java-exec/src/test/java/org/apache/drill/BaseTestQuery.java --- @@ -593,4 +610,49 @@ private void convert(List batches) throws SchemaChangeException } } } + + private static String replaceWorkingPathInString(String orig) { +return orig.replaceAll(Pattern.quote("[WORKING_PATH]"), Matcher.quoteReplacement(TestTools.getWorkingPath())); + } + + protected static void copyDirectoryIntoTempSpace(String resourcesDir) throws IOException { +copyDirectoryIntoTempSpace(resourcesDir, null); + } + + protected static void copyDirectoryIntoTempSpace(String resourcesDir, String destinationSubDir) throws IOException { +Path destination = destinationSubDir != null ? new Path(getDfsTestTmpSchemaLocation(), destinationSubDir) +: new Path(getDfsTestTmpSchemaLocation()); +fs.copyFromLocalFile( +new Path(replaceWorkingPathInString(resourcesDir)), +destination); + } + + /** + * Metadata cache files include full paths to the files that have been scanned. + * + * There is no way to generate a metadata cache file with absolute paths that + * will be guaranteed to be available on an arbitrary test machine. + * --- End diff -- Very small suggestion: Javadoc is HTML-formatted. Insert a between paragraphs. > Store relative paths in metadata file > - > > Key: DRILL-3867 > URL: https://issues.apache.org/jira/browse/DRILL-3867 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Fix For: Future > > > git.commit.id.abbrev=cf4f745 > git.commit.time=29.09.2015 @ 23\:19\:52 UTC > The below sequence of steps reproduces the issue > 1. Create the cache file > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata > dfs.`/drill/testdata/metadata_caching/lineitem`; > +---+-+ > | ok | summary > | > +---+-+ > | true | Successfully updated metadata for table > /drill/testdata/metadata_caching/lineitem. | > +---+-+ > 1 row selected (1.558 seconds) > {code} > 2. Move the directory > {code} > hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/ > {code} > 3. Now run a query on top of it > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit > 1; > Error: SYSTEM ERROR: FileNotFoundException: Requested file > maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist. > [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] > (state=,code=0) > {code} > This is obvious given the fact that we are storing absolute file paths in the > cache file -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3867) Store relative paths in metadata file
[ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058539#comment-16058539 ] ASF GitHub Bot commented on DRILL-3867: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/824#discussion_r123398337 --- Diff: exec/java-exec/src/test/java/org/apache/drill/BaseTestQuery.java --- @@ -593,4 +610,49 @@ private void convert(List batches) throws SchemaChangeException } } } + + private static String replaceWorkingPathInString(String orig) { +return orig.replaceAll(Pattern.quote("[WORKING_PATH]"), Matcher.quoteReplacement(TestTools.getWorkingPath())); + } + + protected static void copyDirectoryIntoTempSpace(String resourcesDir) throws IOException { +copyDirectoryIntoTempSpace(resourcesDir, null); + } + + protected static void copyDirectoryIntoTempSpace(String resourcesDir, String destinationSubDir) throws IOException { +Path destination = destinationSubDir != null ? new Path(getDfsTestTmpSchemaLocation(), destinationSubDir) +: new Path(getDfsTestTmpSchemaLocation()); +fs.copyFromLocalFile( +new Path(replaceWorkingPathInString(resourcesDir)), +destination); + } + + /** + * Metadata cache files include full paths to the files that have been scanned. --- End diff -- For older files with the marker, should we just replace the marker to be relative and take advantage of this improvement? Can that be done without having to edit the old files? > Store relative paths in metadata file > - > > Key: DRILL-3867 > URL: https://issues.apache.org/jira/browse/DRILL-3867 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Fix For: Future > > > git.commit.id.abbrev=cf4f745 > git.commit.time=29.09.2015 @ 23\:19\:52 UTC > The below sequence of steps reproduces the issue > 1. Create the cache file > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata > dfs.`/drill/testdata/metadata_caching/lineitem`; > +---+-+ > | ok | summary > | > +---+-+ > | true | Successfully updated metadata for table > /drill/testdata/metadata_caching/lineitem. | > +---+-+ > 1 row selected (1.558 seconds) > {code} > 2. Move the directory > {code} > hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/ > {code} > 3. Now run a query on top of it > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit > 1; > Error: SYSTEM ERROR: FileNotFoundException: Requested file > maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist. > [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] > (state=,code=0) > {code} > This is obvious given the fact that we are storing absolute file paths in the > cache file -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3867) Store relative paths in metadata file
[ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058538#comment-16058538 ] ASF GitHub Bot commented on DRILL-3867: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/824#discussion_r123397635 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java --- @@ -748,6 +771,22 @@ public ParquetTableMetadataDirs(List directories) { return directories; } +/** If directories list contains relative paths, update it to absolute ones --- End diff -- Thanks for the explanation. > Store relative paths in metadata file > - > > Key: DRILL-3867 > URL: https://issues.apache.org/jira/browse/DRILL-3867 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Fix For: Future > > > git.commit.id.abbrev=cf4f745 > git.commit.time=29.09.2015 @ 23\:19\:52 UTC > The below sequence of steps reproduces the issue > 1. Create the cache file > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata > dfs.`/drill/testdata/metadata_caching/lineitem`; > +---+-+ > | ok | summary > | > +---+-+ > | true | Successfully updated metadata for table > /drill/testdata/metadata_caching/lineitem. | > +---+-+ > 1 row selected (1.558 seconds) > {code} > 2. Move the directory > {code} > hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/ > {code} > 3. Now run a query on top of it > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit > 1; > Error: SYSTEM ERROR: FileNotFoundException: Requested file > maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist. > [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] > (state=,code=0) > {code} > This is obvious given the fact that we are storing absolute file paths in the > cache file -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-5602) Vector corruption when allocating a repeated map vector
Paul Rogers created DRILL-5602: -- Summary: Vector corruption when allocating a repeated map vector Key: DRILL-5602 URL: https://issues.apache.org/jira/browse/DRILL-5602 Project: Apache Drill Issue Type: Bug Affects Versions: 1.10.0 Reporter: Paul Rogers Assignee: Paul Rogers Fix For: 1.11.0 The query in DRILL-5513 highlighted a problem described in DRILL-5594: that the external sort did not properly allocate its spill batch vectors, and instead allowed them to grow by doubling. While fixing that issue, a new issue became clear. The method to allocate a repeated map vector, however, has a serious bug, as described in DRILL-5530: value vectors do not zero-fill the first allocation for a vector (though subsequent reallocs are zero-filled.) If the code worked correctly, here is the behavior when writing to the first element of the list: * Access the offset vector at offset 0. Should be 0. * Write the new value at that offset. Since the first offset is 0, the first value is written at 0 in the value vector. * Write into offset 1 the value at offset 0 plus the length of the new value. But, the offset vector is not initialized to zero. Instead, offset 0 contains the value 16 million. Now: * Access the offset vector at offset 0. Value is 16 million. * Write the new value at that offset. Write at position 16 million. This requires growing the value vector from its present size to 16 MB. The problem is here in {{RepeatedMapVector}}: {code} public void allocateOffsetsNew(int groupCount) { offsets.allocateNew(groupCount + 1); } {code} Notice that there is no code to set the value at offset 0. Then, in the {{UInt4Vector}}: {code} public void allocateNew(final int valueCount) { allocateBytes(valueCount * 4); } private void allocateBytes(final long size) { ... data = allocator.buffer(curSize); ... {code} The above eventually calls the Netty memory allocator, which explicitly states that, for performance reasons, it does not zero-fill its buffers. The code works in small tests because the new buffer comes from Java direct memory, which *does* zero-fill the buffer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5602) Vector corruption when allocating a repeated, variable-width vector
[ https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-5602: --- Summary: Vector corruption when allocating a repeated, variable-width vector (was: Vector corruption when allocating a repeated map vector) > Vector corruption when allocating a repeated, variable-width vector > --- > > Key: DRILL-5602 > URL: https://issues.apache.org/jira/browse/DRILL-5602 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > The query in DRILL-5513 highlighted a problem described in DRILL-5594: that > the external sort did not properly allocate its spill batch vectors, and > instead allowed them to grow by doubling. While fixing that issue, a new > issue became clear. > The method to allocate a repeated map vector, however, has a serious bug, as > described in DRILL-5530: value vectors do not zero-fill the first allocation > for a vector (though subsequent reallocs are zero-filled.) > If the code worked correctly, here is the behavior when writing to the first > element of the list: > * Access the offset vector at offset 0. Should be 0. > * Write the new value at that offset. Since the first offset is 0, the first > value is written at 0 in the value vector. > * Write into offset 1 the value at offset 0 plus the length of the new value. > But, the offset vector is not initialized to zero. Instead, offset 0 contains > the value 16 million. Now: > * Access the offset vector at offset 0. Value is 16 million. > * Write the new value at that offset. Write at position 16 million. This > requires growing the value vector from its present size to 16 MB. > The problem is here in {{RepeatedMapVector}}: > {code} > public void allocateOffsetsNew(int groupCount) { > offsets.allocateNew(groupCount + 1); > } > {code} > Notice that there is no code to set the value at offset 0. > Then, in the {{UInt4Vector}}: > {code} > public void allocateNew(final int valueCount) { > allocateBytes(valueCount * 4); > } > private void allocateBytes(final long size) { > ... > data = allocator.buffer(curSize); > ... > {code} > The above eventually calls the Netty memory allocator, which explicitly > states that, for performance reasons, it does not zero-fill its buffers. > The code works in small tests because the new buffer comes from Java direct > memory, which *does* zero-fill the buffer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5602) Vector corruption when allocating a repeated, variable-width vector
[ https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058700#comment-16058700 ] Paul Rogers commented on DRILL-5602: It appears that other vectors have the same issue. * Repeated map vector (discussed above) * Variable-width vector (see below) * All repeated value vectors (see below) The {{ListVector}} does not have the problem because it does not have the {{allocateNew(int valueCount)}} method. This is its own bug... The following is code from the {{VarCharVector}}: {code} @Override public void allocateNew(int totalBytes, int valueCount) { ... offsetVector.allocateNew(valueCount + 1); ... data.readerIndex(0); allocationSizeInBytes = totalBytes; offsetVector.zeroVector(); } {code} Notice that the above does not set the initial offset to zero. Typical repeated vector code (from {{RepeatedIntVector}}: {code} @Override public void allocateNew(int valueCount, int innerValueCount) { ... offsets.allocateNew(valueCount + 1); values.allocateNew(innerValueCount); ... offsets.zeroVector(); mutator.reset(); } {code} For {{RepeatedListVector}}: {code} @Override public void allocateNew(int valueCount, int innerValueCount) { clear(); getOffsetVector().allocateNew(valueCount + 1); getMutator().reset(); } {code} > Vector corruption when allocating a repeated, variable-width vector > --- > > Key: DRILL-5602 > URL: https://issues.apache.org/jira/browse/DRILL-5602 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > The query in DRILL-5513 highlighted a problem described in DRILL-5594: that > the external sort did not properly allocate its spill batch vectors, and > instead allowed them to grow by doubling. While fixing that issue, a new > issue became clear. > The method to allocate a repeated map vector, however, has a serious bug, as > described in DRILL-5530: value vectors do not zero-fill the first allocation > for a vector (though subsequent reallocs are zero-filled.) > If the code worked correctly, here is the behavior when writing to the first > element of the list: > * Access the offset vector at offset 0. Should be 0. > * Write the new value at that offset. Since the first offset is 0, the first > value is written at 0 in the value vector. > * Write into offset 1 the value at offset 0 plus the length of the new value. > But, the offset vector is not initialized to zero. Instead, offset 0 contains > the value 16 million. Now: > * Access the offset vector at offset 0. Value is 16 million. > * Write the new value at that offset. Write at position 16 million. This > requires growing the value vector from its present size to 16 MB. > The problem is here in {{RepeatedMapVector}}: > {code} > public void allocateOffsetsNew(int groupCount) { > offsets.allocateNew(groupCount + 1); > } > {code} > Notice that there is no code to set the value at offset 0. > Then, in the {{UInt4Vector}}: > {code} > public void allocateNew(final int valueCount) { > allocateBytes(valueCount * 4); > } > private void allocateBytes(final long size) { > ... > data = allocator.buffer(curSize); > ... > {code} > The above eventually calls the Netty memory allocator, which explicitly > states that, for performance reasons, it does not zero-fill its buffers. > The code works in small tests because the new buffer comes from Java direct > memory, which *does* zero-fill the buffer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5602) Vector corruption when allocating a repeated, variable-width vector
[ https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058715#comment-16058715 ] Paul Rogers commented on DRILL-5602: The problem also exists in the other forms of {{allocateNew}}. From {{VarCharVector}}: {code} @Override public boolean allocateNewSafe() { ... data = allocator.buffer(requestedSize); allocationSizeInBytes = requestedSize; offsetVector.allocateNew(); ... data.readerIndex(0); offsetVector.zeroVector(); return true; } {code} Again, the offsets buffer is not initialized. Perhaps code that uses this form does the required initialization. It would be better to do it in the vector, rather than each bit of code that allocates vectors... > Vector corruption when allocating a repeated, variable-width vector > --- > > Key: DRILL-5602 > URL: https://issues.apache.org/jira/browse/DRILL-5602 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > The query in DRILL-5513 highlighted a problem described in DRILL-5594: that > the external sort did not properly allocate its spill batch vectors, and > instead allowed them to grow by doubling. While fixing that issue, a new > issue became clear. > The method to allocate a repeated map vector, however, has a serious bug, as > described in DRILL-5530: value vectors do not zero-fill the first allocation > for a vector (though subsequent reallocs are zero-filled.) > If the code worked correctly, here is the behavior when writing to the first > element of the list: > * Access the offset vector at offset 0. Should be 0. > * Write the new value at that offset. Since the first offset is 0, the first > value is written at 0 in the value vector. > * Write into offset 1 the value at offset 0 plus the length of the new value. > But, the offset vector is not initialized to zero. Instead, offset 0 contains > the value 16 million. Now: > * Access the offset vector at offset 0. Value is 16 million. > * Write the new value at that offset. Write at position 16 million. This > requires growing the value vector from its present size to 16 MB. > The problem is here in {{RepeatedMapVector}}: > {code} > public void allocateOffsetsNew(int groupCount) { > offsets.allocateNew(groupCount + 1); > } > {code} > Notice that there is no code to set the value at offset 0. > Then, in the {{UInt4Vector}}: > {code} > public void allocateNew(final int valueCount) { > allocateBytes(valueCount * 4); > } > private void allocateBytes(final long size) { > ... > data = allocator.buffer(curSize); > ... > {code} > The above eventually calls the Netty memory allocator, which explicitly > states that, for performance reasons, it does not zero-fill its buffers. > The code works in small tests because the new buffer comes from Java direct > memory, which *does* zero-fill the buffer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5513) Managed External Sort : OOM error during the merge phase
[ https://issues.apache.org/jira/browse/DRILL-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058729#comment-16058729 ] Paul Rogers commented on DRILL-5513: After the fixes described here, in DRILL-5594 and DRILL-5602, the query now runs correctly: {code} End of sort. Total write bytes: 291402473, Total read bytes: 291402473 Results: 1 records, 2 batches, ... {code} > Managed External Sort : OOM error during the merge phase > > > Key: DRILL-5513 > URL: https://issues.apache.org/jira/browse/DRILL-5513 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Attachments: 26e5f7b8-71e8-afca-e72e-fad7be2b2416.sys.drill, > drillbit.log > > > git.commit.id.abbrev=1e0a14c > No of nodes in cluster : 1 > DRILL_MAX_DIRECT_MEMORY="32G" > DRILL_MAX_HEAP="4G" > The below query fails with an OOM > {code} > ALTER SESSION SET `exec.sort.disable_managed` = false; > alter session set `planner.width.max_per_query` = 100; > alter session set `planner.memory.max_query_memory_per_node` = 652428800; > select count(*) from (select s1.type type, flatten(s1.rms.rptd) rptds from > (select d.type type, d.uid uid, flatten(d.map.rm) rms from > dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid) s1 > order by s1.rms.mapid); > {code} > Exception from the logs > {code} > 2017-05-15 12:58:46,646 [BitServer-4] DEBUG > o.a.drill.exec.work.foreman.Foreman - 26e5f7b8-71e8-afca-e72e-fad7be2b2416: > State change requested RUNNING --> FAILED > org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One > or more nodes ran out of memory while executing the query. > Unable to allocate buffer of size 2097152 due to memory limit. Current > allocation: 19791880 > Fragment 5:2 > [Error Id: bb67176f-a780-400d-88c9-06fea131ea64 on qa-node190.qa.lab:31010] > (org.apache.drill.exec.exception.OutOfMemoryException) Unable to allocate > buffer of size 2097152 due to memory limit. Current allocation: 19791880 > org.apache.drill.exec.memory.BaseAllocator.buffer():220 > org.apache.drill.exec.memory.BaseAllocator.buffer():195 > org.apache.drill.exec.vector.BigIntVector.reAlloc():212 > org.apache.drill.exec.vector.BigIntVector.copyFromSafe():324 > org.apache.drill.exec.vector.NullableBigIntVector.copyFromSafe():367 > > org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe():328 > > org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe():360 > > org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe():220 > org.apache.drill.exec.vector.complex.MapVector.copyFromSafe():82 > > org.apache.drill.exec.test.generated.PriorityQueueCopierGen4494.doCopy():34 > org.apache.drill.exec.test.generated.PriorityQueueCopierGen4494.next():76 > > org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next():234 > > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns():1214 > > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():689 > > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():559 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():415 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():227 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 > at > org.apache.drill.exec.work.foreman.QueryManager$1.stat
[jira] [Comment Edited] (DRILL-5602) Vector corruption when allocating a repeated, variable-width vector
[ https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058700#comment-16058700 ] Paul Rogers edited comment on DRILL-5602 at 6/22/17 4:50 AM: - It appears that other vectors have an expensive solution: * Variable-width vector (see below) * All repeated value vectors (see below) The following is code from the {{VarCharVector}}: {code} @Override public void allocateNew(int totalBytes, int valueCount) { ... offsetVector.allocateNew(valueCount + 1); ... data.readerIndex(0); allocationSizeInBytes = totalBytes; offsetVector.zeroVector(); } {code} Notice that the above calls {{zeroVector()}} which (unnecessarily) zeros the entire offsets vector. This results in the first position being set to zero. Typical repeated vector code (from {{RepeatedIntVector}}: {code} @Override public void allocateNew(int valueCount, int innerValueCount) { ... offsets.allocateNew(valueCount + 1); values.allocateNew(innerValueCount); ... offsets.zeroVector(); mutator.reset(); } {code} The above also calls {{zeroVector()}} was (Author: paul-rogers): It appears that other vectors have the same issue. * Repeated map vector (discussed above) * Variable-width vector (see below) * All repeated value vectors (see below) The {{ListVector}} does not have the problem because it does not have the {{allocateNew(int valueCount)}} method. This is its own bug... The following is code from the {{VarCharVector}}: {code} @Override public void allocateNew(int totalBytes, int valueCount) { ... offsetVector.allocateNew(valueCount + 1); ... data.readerIndex(0); allocationSizeInBytes = totalBytes; offsetVector.zeroVector(); } {code} Notice that the above does not set the initial offset to zero. Typical repeated vector code (from {{RepeatedIntVector}}: {code} @Override public void allocateNew(int valueCount, int innerValueCount) { ... offsets.allocateNew(valueCount + 1); values.allocateNew(innerValueCount); ... offsets.zeroVector(); mutator.reset(); } {code} For {{RepeatedListVector}}: {code} @Override public void allocateNew(int valueCount, int innerValueCount) { clear(); getOffsetVector().allocateNew(valueCount + 1); getMutator().reset(); } {code} > Vector corruption when allocating a repeated, variable-width vector > --- > > Key: DRILL-5602 > URL: https://issues.apache.org/jira/browse/DRILL-5602 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > The query in DRILL-5513 highlighted a problem described in DRILL-5594: that > the external sort did not properly allocate its spill batch vectors, and > instead allowed them to grow by doubling. While fixing that issue, a new > issue became clear. > The method to allocate a repeated map vector, however, has a serious bug, as > described in DRILL-5530: value vectors do not zero-fill the first allocation > for a vector (though subsequent reallocs are zero-filled.) > If the code worked correctly, here is the behavior when writing to the first > element of the list: > * Access the offset vector at offset 0. Should be 0. > * Write the new value at that offset. Since the first offset is 0, the first > value is written at 0 in the value vector. > * Write into offset 1 the value at offset 0 plus the length of the new value. > But, the offset vector is not initialized to zero. Instead, offset 0 contains > the value 16 million. Now: > * Access the offset vector at offset 0. Value is 16 million. > * Write the new value at that offset. Write at position 16 million. This > requires growing the value vector from its present size to 16 MB. > The problem is here in {{RepeatedMapVector}}: > {code} > public void allocateOffsetsNew(int groupCount) { > offsets.allocateNew(groupCount + 1); > } > {code} > Notice that there is no code to set the value at offset 0. > Then, in the {{UInt4Vector}}: > {code} > public void allocateNew(final int valueCount) { > allocateBytes(valueCount * 4); > } > private void allocateBytes(final long size) { > ... > data = allocator.buffer(curSize); > ... > {code} > The above eventually calls the Netty memory allocator, which explicitly > states that, for performance reasons, it does not zero-fill its buffers. > The code works in small tests because the new buffer comes from Java direct > memory, which *does* zero-fill the buffer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5602) Vector corruption when allocating a repeated, variable-width vector
[ https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058743#comment-16058743 ] Paul Rogers commented on DRILL-5602: {{RepeatedListVector}} does have the offset corruption bug: {code} @Override public void allocateNew(int valueCount, int innerValueCount) { clear(); getOffsetVector().allocateNew(valueCount + 1); getMutator().reset(); } {code} Notice no call to {{setZero()}} and no explicit code to set position 0 to 0. > Vector corruption when allocating a repeated, variable-width vector > --- > > Key: DRILL-5602 > URL: https://issues.apache.org/jira/browse/DRILL-5602 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > The query in DRILL-5513 highlighted a problem described in DRILL-5594: that > the external sort did not properly allocate its spill batch vectors, and > instead allowed them to grow by doubling. While fixing that issue, a new > issue became clear. > The method to allocate a repeated map vector, however, has a serious bug, as > described in DRILL-5530: value vectors do not zero-fill the first allocation > for a vector (though subsequent reallocs are zero-filled.) > If the code worked correctly, here is the behavior when writing to the first > element of the list: > * Access the offset vector at offset 0. Should be 0. > * Write the new value at that offset. Since the first offset is 0, the first > value is written at 0 in the value vector. > * Write into offset 1 the value at offset 0 plus the length of the new value. > But, the offset vector is not initialized to zero. Instead, offset 0 contains > the value 16 million. Now: > * Access the offset vector at offset 0. Value is 16 million. > * Write the new value at that offset. Write at position 16 million. This > requires growing the value vector from its present size to 16 MB. > The problem is here in {{RepeatedMapVector}}: > {code} > public void allocateOffsetsNew(int groupCount) { > offsets.allocateNew(groupCount + 1); > } > {code} > Notice that there is no code to set the value at offset 0. > Then, in the {{UInt4Vector}}: > {code} > public void allocateNew(final int valueCount) { > allocateBytes(valueCount * 4); > } > private void allocateBytes(final long size) { > ... > data = allocator.buffer(curSize); > ... > {code} > The above eventually calls the Netty memory allocator, which explicitly > states that, for performance reasons, it does not zero-fill its buffers. > The code works in small tests because the new buffer comes from Java direct > memory, which *does* zero-fill the buffer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (DRILL-5602) Vector corruption when allocating a repeated, variable-width vector
[ https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058715#comment-16058715 ] Paul Rogers edited comment on DRILL-5602 at 6/22/17 4:52 AM: - The problem does not occur in the other forms of {{allocateNew}}, which is why the problem has not often been seen. From {{VarCharVector}}: {code} @Override public boolean allocateNewSafe() { ... data = allocator.buffer(requestedSize); allocationSizeInBytes = requestedSize; offsetVector.allocateNew(); ... data.readerIndex(0); offsetVector.zeroVector(); // <-- Zeros whole vector return true; } {code} was (Author: paul-rogers): The problem also exists in the other forms of {{allocateNew}}. From {{VarCharVector}}: {code} @Override public boolean allocateNewSafe() { ... data = allocator.buffer(requestedSize); allocationSizeInBytes = requestedSize; offsetVector.allocateNew(); ... data.readerIndex(0); offsetVector.zeroVector(); return true; } {code} Again, the offsets buffer is not initialized. Perhaps code that uses this form does the required initialization. It would be better to do it in the vector, rather than each bit of code that allocates vectors... > Vector corruption when allocating a repeated, variable-width vector > --- > > Key: DRILL-5602 > URL: https://issues.apache.org/jira/browse/DRILL-5602 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > The query in DRILL-5513 highlighted a problem described in DRILL-5594: that > the external sort did not properly allocate its spill batch vectors, and > instead allowed them to grow by doubling. While fixing that issue, a new > issue became clear. > The method to allocate a repeated map vector, however, has a serious bug, as > described in DRILL-5530: value vectors do not zero-fill the first allocation > for a vector (though subsequent reallocs are zero-filled.) > If the code worked correctly, here is the behavior when writing to the first > element of the list: > * Access the offset vector at offset 0. Should be 0. > * Write the new value at that offset. Since the first offset is 0, the first > value is written at 0 in the value vector. > * Write into offset 1 the value at offset 0 plus the length of the new value. > But, the offset vector is not initialized to zero. Instead, offset 0 contains > the value 16 million. Now: > * Access the offset vector at offset 0. Value is 16 million. > * Write the new value at that offset. Write at position 16 million. This > requires growing the value vector from its present size to 16 MB. > The problem is here in {{RepeatedMapVector}}: > {code} > public void allocateOffsetsNew(int groupCount) { > offsets.allocateNew(groupCount + 1); > } > {code} > Notice that there is no code to set the value at offset 0. > Then, in the {{UInt4Vector}}: > {code} > public void allocateNew(final int valueCount) { > allocateBytes(valueCount * 4); > } > private void allocateBytes(final long size) { > ... > data = allocator.buffer(curSize); > ... > {code} > The above eventually calls the Netty memory allocator, which explicitly > states that, for performance reasons, it does not zero-fill its buffers. > The code works in small tests because the new buffer comes from Java direct > memory, which *does* zero-fill the buffer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5602) Repeated List Vector fails to initialize the offset vector
[ https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-5602: --- Summary: Repeated List Vector fails to initialize the offset vector (was: Vector corruption when allocating a repeated, variable-width vector) > Repeated List Vector fails to initialize the offset vector > -- > > Key: DRILL-5602 > URL: https://issues.apache.org/jira/browse/DRILL-5602 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > The query in DRILL-5513 highlighted a problem described in DRILL-5594: that > the external sort did not properly allocate its spill batch vectors, and > instead allowed them to grow by doubling. While fixing that issue, a new > issue became clear. > The method to allocate a repeated map vector, however, has a serious bug, as > described in DRILL-5530: value vectors do not zero-fill the first allocation > for a vector (though subsequent reallocs are zero-filled.) > If the code worked correctly, here is the behavior when writing to the first > element of the list: > * Access the offset vector at offset 0. Should be 0. > * Write the new value at that offset. Since the first offset is 0, the first > value is written at 0 in the value vector. > * Write into offset 1 the value at offset 0 plus the length of the new value. > But, the offset vector is not initialized to zero. Instead, offset 0 contains > the value 16 million. Now: > * Access the offset vector at offset 0. Value is 16 million. > * Write the new value at that offset. Write at position 16 million. This > requires growing the value vector from its present size to 16 MB. > The problem is here in {{RepeatedMapVector}}: > {code} > public void allocateOffsetsNew(int groupCount) { > offsets.allocateNew(groupCount + 1); > } > {code} > Notice that there is no code to set the value at offset 0. > Then, in the {{UInt4Vector}}: > {code} > public void allocateNew(final int valueCount) { > allocateBytes(valueCount * 4); > } > private void allocateBytes(final long size) { > ... > data = allocator.buffer(curSize); > ... > {code} > The above eventually calls the Netty memory allocator, which explicitly > states that, for performance reasons, it does not zero-fill its buffers. > The code works in small tests because the new buffer comes from Java direct > memory, which *does* zero-fill the buffer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5602) Repeated List Vector fails to initialize the offset vector
[ https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-5602: --- Description: The code that allocates a new {{RepeatedListVector}} does not initialize the first offset to zero as required: {code} @Override public void allocateNew(int valueCount, int innerValueCount) { clear(); getOffsetVector().allocateNew(valueCount + 1); getMutator().reset(); } {code} Since Netty does not zero-fill vectors, the result is vector corruption. If the code worked correctly, here is the behavior when writing to the first element of the list: * Access the offset vector at offset 0. Should be 0. * Write the new value at that offset. Since the first offset is 0, the first value is written at 0 in the value vector. * Write into offset 1 the value at offset 0 plus the length of the new value. But, the offset vector is not initialized to zero. Instead, offset 0 contains the value 16 million. Now: * Access the offset vector at offset 0. Value is 16 million. * Write the new value at that offset. Write at position 16 million. This requires growing the value vector from its present size to 16 MB. was: The query in DRILL-5513 highlighted a problem described in DRILL-5594: that the external sort did not properly allocate its spill batch vectors, and instead allowed them to grow by doubling. While fixing that issue, a new issue became clear. The method to allocate a repeated map vector, however, has a serious bug, as described in DRILL-5530: value vectors do not zero-fill the first allocation for a vector (though subsequent reallocs are zero-filled.) If the code worked correctly, here is the behavior when writing to the first element of the list: * Access the offset vector at offset 0. Should be 0. * Write the new value at that offset. Since the first offset is 0, the first value is written at 0 in the value vector. * Write into offset 1 the value at offset 0 plus the length of the new value. But, the offset vector is not initialized to zero. Instead, offset 0 contains the value 16 million. Now: * Access the offset vector at offset 0. Value is 16 million. * Write the new value at that offset. Write at position 16 million. This requires growing the value vector from its present size to 16 MB. The problem is here in {{RepeatedMapVector}}: {code} public void allocateOffsetsNew(int groupCount) { offsets.allocateNew(groupCount + 1); } {code} Notice that there is no code to set the value at offset 0. Then, in the {{UInt4Vector}}: {code} public void allocateNew(final int valueCount) { allocateBytes(valueCount * 4); } private void allocateBytes(final long size) { ... data = allocator.buffer(curSize); ... {code} The above eventually calls the Netty memory allocator, which explicitly states that, for performance reasons, it does not zero-fill its buffers. The code works in small tests because the new buffer comes from Java direct memory, which *does* zero-fill the buffer. > Repeated List Vector fails to initialize the offset vector > -- > > Key: DRILL-5602 > URL: https://issues.apache.org/jira/browse/DRILL-5602 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > The code that allocates a new {{RepeatedListVector}} does not initialize the > first offset to zero as required: > {code} > @Override > public void allocateNew(int valueCount, int innerValueCount) { > clear(); > getOffsetVector().allocateNew(valueCount + 1); > getMutator().reset(); > } > {code} > Since Netty does not zero-fill vectors, the result is vector corruption. > If the code worked correctly, here is the behavior when writing to the first > element of the list: > * Access the offset vector at offset 0. Should be 0. > * Write the new value at that offset. Since the first offset is 0, the first > value is written at 0 in the value vector. > * Write into offset 1 the value at offset 0 plus the length of the new value. > But, the offset vector is not initialized to zero. Instead, offset 0 contains > the value 16 million. Now: > * Access the offset vector at offset 0. Value is 16 million. > * Write the new value at that offset. Write at position 16 million. This > requires growing the value vector from its present size to 16 MB. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Issue Comment Deleted] (DRILL-5602) Repeated List Vector fails to initialize the offset vector
[ https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-5602: --- Comment: was deleted (was: {{RepeatedListVector}} does have the offset corruption bug: {code} @Override public void allocateNew(int valueCount, int innerValueCount) { clear(); getOffsetVector().allocateNew(valueCount + 1); getMutator().reset(); } {code} Notice no call to {{setZero()}} and no explicit code to set position 0 to 0.) > Repeated List Vector fails to initialize the offset vector > -- > > Key: DRILL-5602 > URL: https://issues.apache.org/jira/browse/DRILL-5602 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > The code that allocates a new {{RepeatedListVector}} does not initialize the > first offset to zero as required: > {code} > @Override > public void allocateNew(int valueCount, int innerValueCount) { > clear(); > getOffsetVector().allocateNew(valueCount + 1); > getMutator().reset(); > } > {code} > Since Netty does not zero-fill vectors, the result is vector corruption. > If the code worked correctly, here is the behavior when writing to the first > element of the list: > * Access the offset vector at offset 0. Should be 0. > * Write the new value at that offset. Since the first offset is 0, the first > value is written at 0 in the value vector. > * Write into offset 1 the value at offset 0 plus the length of the new value. > But, the offset vector is not initialized to zero. Instead, offset 0 contains > the value 16 million. Now: > * Access the offset vector at offset 0. Value is 16 million. > * Write the new value at that offset. Write at position 16 million. This > requires growing the value vector from its present size to 16 MB. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Issue Comment Deleted] (DRILL-5602) Repeated List Vector fails to initialize the offset vector
[ https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-5602: --- Comment: was deleted (was: The problem does not occur in the other forms of {{allocateNew}}, which is why the problem has not often been seen. From {{VarCharVector}}: {code} @Override public boolean allocateNewSafe() { ... data = allocator.buffer(requestedSize); allocationSizeInBytes = requestedSize; offsetVector.allocateNew(); ... data.readerIndex(0); offsetVector.zeroVector(); // <-- Zeros whole vector return true; } {code} ) > Repeated List Vector fails to initialize the offset vector > -- > > Key: DRILL-5602 > URL: https://issues.apache.org/jira/browse/DRILL-5602 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > The code that allocates a new {{RepeatedListVector}} does not initialize the > first offset to zero as required: > {code} > @Override > public void allocateNew(int valueCount, int innerValueCount) { > clear(); > getOffsetVector().allocateNew(valueCount + 1); > getMutator().reset(); > } > {code} > Since Netty does not zero-fill vectors, the result is vector corruption. > If the code worked correctly, here is the behavior when writing to the first > element of the list: > * Access the offset vector at offset 0. Should be 0. > * Write the new value at that offset. Since the first offset is 0, the first > value is written at 0 in the value vector. > * Write into offset 1 the value at offset 0 plus the length of the new value. > But, the offset vector is not initialized to zero. Instead, offset 0 contains > the value 16 million. Now: > * Access the offset vector at offset 0. Value is 16 million. > * Write the new value at that offset. Write at position 16 million. This > requires growing the value vector from its present size to 16 MB. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (DRILL-5163) External sort on Mac creates a separate child process per spill via HDFS FS
[ https://issues.apache.org/jira/browse/DRILL-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-5163. Resolution: Fixed Fix Version/s: 1.11.0 Fixed as part of DRILL-5325 > External sort on Mac creates a separate child process per spill via HDFS FS > --- > > Key: DRILL-5163 > URL: https://issues.apache.org/jira/browse/DRILL-5163 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > The external sort operator spills to disk. Spill files are created and > written using the HDFS file system. For performance, HDFS uses native > libraries to access the file system. These native libraries are not available > on the Mac. As a result, some operations are implemented using a shower, > Java-only path. One of these operations (need details) is implemented by > forking a child process. > When run in a debugger on the Mac, the behavior shows up as the furious > creation and deletion of threads to manage the child processes: one per > spill. Because of this behavior, performance of external sort is slow. Of > course, no production code uses Drill on a Mac, so this is more of a nuisance > than a real bug, which is why it is marked as an improvement. -- This message was sent by Atlassian JIRA (v6.4.14#64029)