[jira] [Commented] (DRILL-4301) OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.

2016-01-25 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114900#comment-15114900
 ] 

Khurram Faraaz commented on DRILL-4301:
---

>From the test run log, the Exception occurs immediately after the failed test.
Tests were run using a Docker setup, and the drillbit.log was erased after test 
run, so I am not sure that was the test which caused the Exception. We will 
have to re-run tests and retain logs to know which test lead to that exception.

> OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to 
> spill.
> ---
>
> Key: DRILL-4301
> URL: https://issues.apache.org/jira/browse/DRILL-4301
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.5.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
>
> Query below in Functional tests, fails due to OOM 
> {code}
> select * from dfs.`/drill/testdata/metadata_caching/fewtypes_boolpartition` 
> where bool_col = true;
> {code}
> Drill version : drill-1.5.0
> JAVA_VERSION=1.8.0
> {noformat}
> version   commit_id   commit_message  commit_time build_email 
> build_time
> 1.5.0-SNAPSHOT2f0e3f27e630d5ac15cdaef808564e01708c3c55
> DRILL-4190 Don't hold on to batches from left side of merge join.   
> 20.01.2016 @ 22:30:26 UTC   Unknown 20.01.2016 @ 23:48:33 UTC
> framework/framework/resources/Functional/metadata_caching/data/bool_partition1.q
>  (connection: 808078113)
> [#1378] Query failed: 
> oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: 
> One or more nodes ran out of memory while executing the query.
> Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.
> batchGroups.size 0
> spilledBatchGroups.size 0
> allocated memory 48326272
> allocator limit 46684427
> Fragment 0:0
> [Error Id: 97d58ea3-8aff-48cf-a25e-32363b8e0ecd on drill-demod2:31010]
>   at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374)
>   at 
> oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252)
>   at 
> oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:285)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:257)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandle

[jira] [Updated] (DRILL-4291) Ensure the jdbc-all driver jar includes classes required to return VarChar[]

2016-01-25 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-4291:
--
Assignee: Jason Altekruse  (was: Jacques Nadeau)

> Ensure the jdbc-all driver jar includes classes required to return VarChar[]
> 
>
> Key: DRILL-4291
> URL: https://issues.apache.org/jira/browse/DRILL-4291
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 0.5.0
> Environment: Linux / 1.5-SNAPSHOT
>Reporter: Stefán Baxter
>Assignee: Jason Altekruse
>
> Hi,
> We are using the 1.5-SNAPSHOT version of the JDBC drilver (all) and we seem 
> to be getting this old thing:
> https://issues.apache.org/jira/browse/DRILL-2482
> We are either doing something wrong or this or this is a regression. Has 
> anyone else experienced not being able to get nested structures via the 
> latest JDBC driver?
> (I'm going to pull the lastest from master to be sure this has not been 
> solved)
> The error we get when accessing a field containing a sub-structure is :
> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Text
> at 
> oadd.org.apache.drill.exec.util.JsonStringArrayList.(JsonStringArrayList.java:35)
> at 
> oadd.org.apache.drill.exec.vector.RepeatedVarCharVector$Accessor.getObject(RepeatedVarCharVector.java:293)
> at 
> oadd.org.apache.drill.exec.vector.RepeatedVarCharVector$Accessor.getObject(RepeatedVarCharVector.java:290)
> at 
> oadd.org.apache.drill.exec.vector.accessor.GenericAccessor.getObject(GenericAccessor.java:44)
> at 
> oadd.org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:148)
> at 
> org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getObject(TypeConvertingSqlAccessor.java:795)
> at 
> org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:179)
> at 
> oadd.net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:420)
> Regards,
>  -Stefan



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4308) Aggregate operations on dir columns can be more efficient for certain use cases

2016-01-25 Thread Aman Sinha (JIRA)
Aman Sinha created DRILL-4308:
-

 Summary: Aggregate operations on dir columns can be more 
efficient for certain use cases
 Key: DRILL-4308
 URL: https://issues.apache.org/jira/browse/DRILL-4308
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Affects Versions: 1.4.0
Reporter: Aman Sinha


For queries that perform plain aggregates or DISTINCT operations on the 
directory partition columns (dir0, dir1 etc.) and there are no other columns 
referenced in the query, the performance could be substantially improved by not 
having to scan the entire dataset.   

Consider the following types of queries:
{noformat}
select  min(dir0) from largetable;
select  distinct dir0 from largetable;
{noformat}

The number of distinct values of dir columns is typically quite small and 
there's no reason to scan the large table.  This is also come as user feedback 
from some Drill users.  Of course, if there's any other column referenced in 
the query (WHERE, ORDER-BY etc.) then we cannot apply this optimization.  





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4246) New allocator causing a flatten regression test to fail with IllegalStateException

2016-01-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4246:
-
Reviewer: Victoria Markman

> New allocator causing a flatten regression test to fail with 
> IllegalStateException
> --
>
> Key: DRILL-4246
> URL: https://issues.apache.org/jira/browse/DRILL-4246
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.5.0
>Reporter: Deneche A. Hakim
>Assignee: Jacques Nadeau
>Priority: Blocker
> Fix For: 1.5.0
>
>
> We are seeing the following error in the test cluster:
> {noformat}
> /framework/resources/Functional/flatten_operators/10rows/filter3.q
> Query: 
> select uid, flatten(events) from `data.json` where uid > 1
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Unaccounted for 
> outstanding allocation (851968)
> Allocator(op:0:0:0:Screen) 100/851968/1941504/100 
> (res/actual/peak/limit)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4278) Memory leak when using LIMIT

2016-01-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4278:
-
Reviewer: Victoria Markman

> Memory leak when using LIMIT
> 
>
> Key: DRILL-4278
> URL: https://issues.apache.org/jira/browse/DRILL-4278
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.4.0, 1.5.0
> Environment: OS X
> 0: jdbc:drill:zk=local> select * from sys.version;
> +--+---+-++++
> | version  | commit_id |   
> commit_message|commit_time |
> build_email | build_time |
> +--+---+-++++
> | 1.4.0| 32b871b24c7b69f59a1d2e70f444eed6e599e825  | 
> [maven-release-plugin] prepare release drill-1.4.0  | 08.12.2015 @ 00:24:59 
> PST  | venki.koruka...@gmail.com  | 08.12.2015 @ 01:14:39 PST  |
> +--+---+-++++
> 0: jdbc:drill:zk=local> select * from sys.options where status <> 'DEFAULT';
> +-+---+-+--+--+-+---++
> |name | kind  |  type   |  status  | num_val  | 
> string_val  | bool_val  | float_val  |
> +-+---+-+--+--+-+---++
> | planner.slice_target| LONG  | SYSTEM  | CHANGED  | 10   | null  
>   | null  | null   |
> | planner.width.max_per_node  | LONG  | SYSTEM  | CHANGED  | 5| null  
>   | null  | null   |
> +-+---+-+--+--+-+---++
> 2 rows selected (0.16 seconds)
>Reporter: jean-claude
>Assignee: Jacques Nadeau
> Fix For: 1.5.0
>
>
> copy the parquet files in the samples directory so that you have a 12 or so
> $ ls -lha /apache-drill-1.4.0/sample-data/nationsMF/
> nationsMF1.parquet
> nationsMF2.parquet
> nationsMF3.parquet
> create a file with a few thousand lines like these
> select * from dfs.`/Users/jccote/apache-drill-1.4.0/sample-data/nationsMF` 
> limit 500;
> start drill
> $ /apache-drill-1.4.0/bin/drill-embeded
> reduce the slice target size to force drill to use multiple fragment/threads
> jdbc:drill:zk=local> system set planner.slice_target=10;
> now run the list of queries from the file your created above
> jdbc:drill:zk=local> !run /Users/jccote/test-memory-leak-using-limit.sql
> the java heap space keeps going up until the old space is at 100% and 
> eventually you get an OutOfMemoryException in drill
> $ jstat -gccause 86850 5s
>   S0 S1 E  O  M CCSYGC YGCTFGCFGCT 
> GCTLGCC GCC 
>   0.00   0.00 100.00 100.00  98.56  96.71   2279   26.682   240  458.139  
> 484.821 GCLocker Initiated GC Ergonomics  
>   0.00   0.00 100.00  99.99  98.56  96.71   2279   26.682   242  461.347  
> 488.028 Allocation Failure   Ergonomics  
>   0.00   0.00 100.00  99.99  98.56  96.71   2279   26.682   245  466.630  
> 493.311 Allocation Failure   Ergonomics  
>   0.00   0.00 100.00  99.99  98.56  96.71   2279   26.682   247  470.020  
> 496.702 Allocation Failure   Ergonomics  
> If you do the same test but do not use the LIMIT then the memory usage does 
> not go up.
> If you add a where clause so that no results are returned, then the memory 
> usage does not go up.
> Something with the RPC layer?
> Also it seems sensitive to the number of fragments/threads. If you limit it 
> to one fragment/thread the memory usage goes up much slower.
> I have used parquet files and CSV files. In either case the behaviour is the 
> same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4256) Performance regression in hive planning

2016-01-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4256:
-
Reviewer: Rahul Challapalli

> Performance regression in hive planning
> ---
>
> Key: DRILL-4256
> URL: https://issues.apache.org/jira/browse/DRILL-4256
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Rahul Challapalli
>Assignee: Venki Korukanti
> Fix For: 1.5.0
>
> Attachments: jstack.tgz
>
>
> Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> The fix for reading hive tables backed by hbase caused a performance 
> regression. The data set used in the below test has ~3700 partitions and the 
> filter in the query would ensure only 1 partition get selected.
> {code}
> Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~25 seconds
> {code}
> {code}
> Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~6.5 seconds
> {code}
> Since the data is large, I couldn't attach it here. Reach out to me if you 
> need additional information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4291) Ensure the jdbc-all driver jar includes classes required to return VarChar[]

2016-01-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115800#comment-15115800
 ] 

ASF GitHub Bot commented on DRILL-4291:
---

Github user jaltekruse commented on the pull request:

https://github.com/apache/drill/pull/336#issuecomment-174627297
  
+1


> Ensure the jdbc-all driver jar includes classes required to return VarChar[]
> 
>
> Key: DRILL-4291
> URL: https://issues.apache.org/jira/browse/DRILL-4291
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 0.5.0
> Environment: Linux / 1.5-SNAPSHOT
>Reporter: Stefán Baxter
>Assignee: Jason Altekruse
>
> Hi,
> We are using the 1.5-SNAPSHOT version of the JDBC drilver (all) and we seem 
> to be getting this old thing:
> https://issues.apache.org/jira/browse/DRILL-2482
> We are either doing something wrong or this or this is a regression. Has 
> anyone else experienced not being able to get nested structures via the 
> latest JDBC driver?
> (I'm going to pull the lastest from master to be sure this has not been 
> solved)
> The error we get when accessing a field containing a sub-structure is :
> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Text
> at 
> oadd.org.apache.drill.exec.util.JsonStringArrayList.(JsonStringArrayList.java:35)
> at 
> oadd.org.apache.drill.exec.vector.RepeatedVarCharVector$Accessor.getObject(RepeatedVarCharVector.java:293)
> at 
> oadd.org.apache.drill.exec.vector.RepeatedVarCharVector$Accessor.getObject(RepeatedVarCharVector.java:290)
> at 
> oadd.org.apache.drill.exec.vector.accessor.GenericAccessor.getObject(GenericAccessor.java:44)
> at 
> oadd.org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:148)
> at 
> org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getObject(TypeConvertingSqlAccessor.java:795)
> at 
> org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:179)
> at 
> oadd.net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:420)
> Regards,
>  -Stefan



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4291) Ensure the jdbc-all driver jar includes classes required to return VarChar[]

2016-01-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115798#comment-15115798
 ] 

ASF GitHub Bot commented on DRILL-4291:
---

Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/336#discussion_r50741405
  
--- Diff: exec/vector/src/main/java/org/apache/drill/exec/util/Text.java ---
@@ -0,0 +1,612 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.util;
+
+import java.io.DataInput;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.CharBuffer;
+import java.nio.charset.CharacterCodingException;
+import java.nio.charset.Charset;
+import java.nio.charset.CharsetDecoder;
+import java.nio.charset.CharsetEncoder;
+import java.nio.charset.CodingErrorAction;
+import java.nio.charset.MalformedInputException;
+import java.text.CharacterIterator;
+import java.text.StringCharacterIterator;
+import java.util.Arrays;
+
+import com.fasterxml.jackson.core.JsonGenerationException;
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.databind.SerializerProvider;
+import com.fasterxml.jackson.databind.annotation.JsonSerialize;
+import com.fasterxml.jackson.databind.ser.std.StdSerializer;
+
+/**
+ * A simplified byte wrapper similar to Hadoop's Text class without all 
the dependencies. Lifted from Hadoop 2.7.1
+ */
+@JsonSerialize(using = Text.TextSerializer.class)
+public class Text {
--- End diff --

Because a few real changes were made to this class (it isn't just copying 
the source to avoid including the jar that contains it) it would be good to 
have an explicit pointer to the version of this class that this was based on. 
Is this pulled out of the Hadoop source at the commit tagged for the 2.7.1 
release?

It might have been cleaner to do this in two commits, one just checking in 
the class exactly as it appears in Hadoop, with just the packaged changed and a 
follow up commit the with refactoring to remove the dependencies imported in 
the hadoop version of the class. At least a mention of a commit ID this was 
pulled out of would give a pointer to the exact version this was based on if we 
ever had to apply a patch from Hadoop mainline that didn't merge cleanly.


> Ensure the jdbc-all driver jar includes classes required to return VarChar[]
> 
>
> Key: DRILL-4291
> URL: https://issues.apache.org/jira/browse/DRILL-4291
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 0.5.0
> Environment: Linux / 1.5-SNAPSHOT
>Reporter: Stefán Baxter
>Assignee: Jason Altekruse
>
> Hi,
> We are using the 1.5-SNAPSHOT version of the JDBC drilver (all) and we seem 
> to be getting this old thing:
> https://issues.apache.org/jira/browse/DRILL-2482
> We are either doing something wrong or this or this is a regression. Has 
> anyone else experienced not being able to get nested structures via the 
> latest JDBC driver?
> (I'm going to pull the lastest from master to be sure this has not been 
> solved)
> The error we get when accessing a field containing a sub-structure is :
> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Text
> at 
> oadd.org.apache.drill.exec.util.JsonStringArrayList.(JsonStringArrayList.java:35)
> at 
> oadd.org.apache.drill.exec.vector.RepeatedVarCharVector$Accessor.getObject(RepeatedVarCharVector.java:293)
> at 
> oadd.org.apache.drill.exec.vector.RepeatedVarCharVector$Accessor.getObject(RepeatedVarCharVector.java:290)
> at 
> oadd.org.apache.drill.exec.vector.accessor.GenericAccessor.getObject(GenericAccessor.java:44)
> at 
> oadd.org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:148)
> at 
> org.apache.drill.jdbc.

[jira] [Created] (DRILL-4309) Make this option store.hive.optimize_scan_with_native_readers=true default

2016-01-25 Thread Sean Hsuan-Yi Chu (JIRA)
Sean Hsuan-Yi Chu created DRILL-4309:


 Summary: Make this option 
store.hive.optimize_scan_with_native_readers=true default
 Key: DRILL-4309
 URL: https://issues.apache.org/jira/browse/DRILL-4309
 Project: Apache Drill
  Issue Type: Improvement
  Components: Query Planning & Optimization
Reporter: Sean Hsuan-Yi Chu
Assignee: Sean Hsuan-Yi Chu
 Fix For: 1.5.0


This new feature has been around and used/tests in many scenarios. 

We should enable this feature by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-01-25 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116281#comment-15116281
 ] 

Jason Altekruse commented on DRILL-4203:


[~zfong] That is correct. The only extra complexity is that I have added an 
option that allows users to optionally turn-off auto-correction for any files 
that are not certain to have been created by Drill.

The default behavior will be to check the file level created-by metadata, if we 
know it is a version of Drill after the fix, not correction will happen 
regardless of the setting of the option. Similarly for a file with a drill 
version string, that indicates the data was written before this fix, we will 
always correct the data, regardless of the setting of this flag.

The only complicated case is where there is not enough metadata to determine if 
it is a Drill file or not. In this case we will check the values in the file, 
either in the file level min/max statistics when the reader is initialized or 
when the file lacks min/max value statistics (it's a pre-1.0 drill file) we 
will have to defer detection until actually reading individual data pages. 
Checks at both of these levels can be disabled by the option.

The nature of the bug caused a really significant shift of the dates, putting 
them thousands of years into the future. Thus auto-correction as the default 
isn't high risk as it extremely unlikely users will have created a database 
full of dates in this range. That being said, the option is included to cover 
any such cases.

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Jason Altekruse
>Priority: Critical
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4261) add support for RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

2016-01-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116301#comment-15116301
 ] 

ASF GitHub Bot commented on DRILL-4261:
---

GitHub user adeneche opened a pull request:

https://github.com/apache/drill/pull/337

Drill-4262: add support for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

1st 4 commits are part of 
[DRILL-4261](https://issues.apache.org/jira/browse/DRILL-4261) and only 3 last 
commits are part of this pull request. I will update this PR once 
[DRILL-4261](https://issues.apache.org/jira/browse/DRILL-4261) has been merged

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adeneche/incubator-drill DRILL-4262

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/337.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #337


commit c5c2eda7997c5dfb9335b02a25272c0e6ddc48b1
Author: adeneche 
Date:   2016-01-19T21:33:22Z

passing WindowPOP to window functions and framers

commit 82395aefce54eb89eb98fd7420f4fc6a4b369be1
Author: adeneche 
Date:   2016-01-19T21:34:59Z

WindowPOP.Bound to describe FRAME bounds

commit 9e1568f54d51894ae5d8740b7c0e8aa9067c
Author: adeneche 
Date:   2016-01-20T19:41:26Z

DRILL-4261: Add support for RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
FOLLOWING

commit 5a1b0112faef7d6992791c44305d10facc57f390
Author: adeneche 
Date:   2016-01-22T01:35:13Z

fixed unit test

commit aeacbd3099b7499b58ddec09b982ef63900db598
Author: adeneche 
Date:   2016-01-25T19:53:00Z

FrameSupportTemplate doesn't need to use Partition class

commit 2a6e540333f3b16cdbe2112320962306e8616a27
Author: adeneche 
Date:   2016-01-25T20:18:06Z

pass "isRows" to WindowPOP

commit 4120761326728057d43f1ee03c01e390c128d67c
Author: adeneche 
Date:   2016-01-25T20:19:22Z

DRILL-4262: add support for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW




> add support for RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> -
>
> Key: DRILL-4261
> URL: https://issues.apache.org/jira/browse/DRILL-4261
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Relational Operators
>Reporter: Deneche A. Hakim
>Assignee: Aman Sinha
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4246) New allocator causing a flatten regression test to fail with IllegalStateException

2016-01-25 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116356#comment-15116356
 ] 

Victoria Markman commented on DRILL-4246:
-

We have not seen this failure in the last couple of runs. I think it is fixed.

> New allocator causing a flatten regression test to fail with 
> IllegalStateException
> --
>
> Key: DRILL-4246
> URL: https://issues.apache.org/jira/browse/DRILL-4246
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.5.0
>Reporter: Deneche A. Hakim
>Assignee: Jacques Nadeau
>Priority: Blocker
> Fix For: 1.5.0
>
>
> We are seeing the following error in the test cluster:
> {noformat}
> /framework/resources/Functional/flatten_operators/10rows/filter3.q
> Query: 
> select uid, flatten(events) from `data.json` where uid > 1
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Unaccounted for 
> outstanding allocation (851968)
> Allocator(op:0:0:0:Screen) 100/851968/1941504/100 
> (res/actual/peak/limit)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4246) New allocator causing a flatten regression test to fail with IllegalStateException

2016-01-25 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman closed DRILL-4246.
---

> New allocator causing a flatten regression test to fail with 
> IllegalStateException
> --
>
> Key: DRILL-4246
> URL: https://issues.apache.org/jira/browse/DRILL-4246
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.5.0
>Reporter: Deneche A. Hakim
>Assignee: Jacques Nadeau
>Priority: Blocker
> Fix For: 1.5.0
>
>
> We are seeing the following error in the test cluster:
> {noformat}
> /framework/resources/Functional/flatten_operators/10rows/filter3.q
> Query: 
> select uid, flatten(events) from `data.json` where uid > 1
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Unaccounted for 
> outstanding allocation (851968)
> Allocator(op:0:0:0:Screen) 100/851968/1941504/100 
> (res/actual/peak/limit)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4203) Parquet File : Date is stored wrongly

2016-01-25 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116281#comment-15116281
 ] 

Jason Altekruse edited comment on DRILL-4203 at 1/26/16 12:28 AM:
--

[~zfong] That is correct. The only extra complexity is that I have added an 
option that allows users to optionally turn-off auto-correction for any files 
that are not certain to have been created by Drill.

The default behavior will be to check the file level created-by metadata, if we 
know it is a version of Drill after the fix, no correction will happen 
regardless of the setting of the option. Similarly for a file with a drill 
version string, that indicates the data was written before this fix, we will 
always correct the data, regardless of the setting of this flag.

The only complicated case is where there is not enough metadata to determine if 
it is a Drill file or not. In this case we will check the values in the file, 
either in the file level min/max statistics when the reader is initialized or 
when the file lacks min/max value statistics (it's a pre-1.0 drill file) we 
will have to defer detection until actually reading individual data pages. 
Checks at both of these levels can be disabled by the option.

The nature of the bug caused a really significant shift of the dates, putting 
them thousands of years into the future. Thus auto-correction as the default 
isn't high risk as it extremely unlikely users will have created a database 
full of dates in this range. That being said, the option is included to cover 
any such cases.


was (Author: jaltekruse):
[~zfong] That is correct. The only extra complexity is that I have added an 
option that allows users to optionally turn-off auto-correction for any files 
that are not certain to have been created by Drill.

The default behavior will be to check the file level created-by metadata, if we 
know it is a version of Drill after the fix, not correction will happen 
regardless of the setting of the option. Similarly for a file with a drill 
version string, that indicates the data was written before this fix, we will 
always correct the data, regardless of the setting of this flag.

The only complicated case is where there is not enough metadata to determine if 
it is a Drill file or not. In this case we will check the values in the file, 
either in the file level min/max statistics when the reader is initialized or 
when the file lacks min/max value statistics (it's a pre-1.0 drill file) we 
will have to defer detection until actually reading individual data pages. 
Checks at both of these levels can be disabled by the option.

The nature of the bug caused a really significant shift of the dates, putting 
them thousands of years into the future. Thus auto-correction as the default 
isn't high risk as it extremely unlikely users will have created a database 
full of dates in this range. That being said, the option is included to cover 
any such cases.

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Jason Altekruse
>Priority: Critical
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> -

[jira] [Created] (DRILL-4310) Memory leak in hash partition sender when query is cancelled

2016-01-25 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-4310:
---

 Summary: Memory leak in hash partition sender when query is 
cancelled
 Key: DRILL-4310
 URL: https://issues.apache.org/jira/browse/DRILL-4310
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 0.5.0
Reporter: Victoria Markman


Query got cancelled (still investigating what caused cancellation).

Here is an excerpt from drillbit.log
{code}
2016-01-26 00:46:29,627 [29593ea8-88d2-612e-7c58-aa11652c4072:frag:2:2] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
Allocator[op:2:2:0:HashPartitionSender] closed with outstanding buffers 
allocated (4).
Allocator(op:2:2:0:HashPartitionSender) 100/10240/2140160/100 
(res/actual/peak/limit)
  child allocators: 0
  ledgers: 4
ledger[10892635] allocator: op:2:2:0:HashPartitionSender), isOwning: true, 
size: 4096, references: 1, life: 23697371310917183..0, allocatorManager: 
[7140397, life: 23697371310913697..0] holds 1 buffers.
DrillBuf[13122380], udle: [7140398 0..4096]
ledger[10892636] allocator: op:2:2:0:HashPartitionSender), isOwning: true, 
size: 1024, references: 1, life: 23697371311045504..0, allocatorManager: 
[7140398, life: 23697371311041789..0] holds 1 buffers.
DrillBuf[13122381], udle: [7140399 0..1024]
ledger[10892634] allocator: op:2:2:0:HashPartitionSender), isOwning: true, 
size: 4096, references: 1, life: 23697371310795164..0, allocatorManager: 
[7140396, life: 23697371310789988..0] holds 1 buffers.
DrillBuf[13122379], udle: [7140397 0..4096]
ledger[10892513] allocator: op:2:2:0:HashPartitionSender), isOwning: true, 
size: 1024, references: 1, life: 23697371288488073..0, allocatorManager: 
[7140275, life: 23697371288484282..0] holds 1 buffers.
DrillBuf[13122245], udle: [7140276 0..1024]
  reservations: 0

Fragment 2:2

[Error Id: 043c5d25-c4de-4a70-9cb1-d4987822ee3b on atsqa4-134.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
IllegalStateException: Allocator[op:2:2:0:HashPartitionSender] closed with 
outstanding buffers allocated (4).
Allocator(op:2:2:0:HashPartitionSender) 100/10240/2140160/100 
(res/actual/peak/limit)
  child allocators: 0
  ledgers: 4
ledger[10892635] allocator: op:2:2:0:HashPartitionSender), isOwning: true, 
size: 4096, references: 1, life: 23697371310917183..0, allocatorManager: 
[7140397, life: 23697371310913697..0] holds 1 buffers.
DrillBuf[13122380], udle: [7140398 0..4096]
ledger[10892636] allocator: op:2:2:0:HashPartitionSender), isOwning: true, 
size: 1024, references: 1, life: 23697371311045504..0, allocatorManager: 
[7140398, life: 23697371311041789..0] holds 1 buffers.
DrillBuf[13122381], udle: [7140399 0..1024]
ledger[10892634] allocator: op:2:2:0:HashPartitionSender), isOwning: true, 
size: 4096, references: 1, life: 23697371310795164..0, allocatorManager: 
[7140396, life: 23697371310789988..0] holds 1 buffers.
DrillBuf[13122379], udle: [7140397 0..4096]
ledger[10892513] allocator: op:2:2:0:HashPartitionSender), isOwning: true, 
size: 1024, references: 1, life: 23697371288488073..0, allocatorManager: 
[7140275, life: 23697371288484282..0] holds 1 buffers.
DrillBuf[13122245], udle: [7140276 0..1024]
  reservations: 0
{code}

Reproduced twice by running: ./run.sh -s Advanced/tpcds/tpcds_sf100/original -g 
smoke -t 600 -n 10 -i 100 -m

Cluster configuration: vanilla, 48GB of memory, 4GB heap.

Attaching query profile and logs. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4310) Memory leak in hash partition sender when query is cancelled

2016-01-25 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-4310:

Attachment: 29593ea8-88d2-612e-7c58-aa11652c4072.sys.drill
drillbit.log.136
drillbit.log.135
drillbit.log.134
drillbit.log.133

> Memory leak in hash partition sender when query is cancelled
> 
>
> Key: DRILL-4310
> URL: https://issues.apache.org/jira/browse/DRILL-4310
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 0.5.0
>Reporter: Victoria Markman
> Attachments: 29593ea8-88d2-612e-7c58-aa11652c4072.sys.drill, 
> drillbit.log.133, drillbit.log.134, drillbit.log.135, drillbit.log.136
>
>
> Query got cancelled (still investigating what caused cancellation).
> Here is an excerpt from drillbit.log
> {code}
> 2016-01-26 00:46:29,627 [29593ea8-88d2-612e-7c58-aa11652c4072:frag:2:2] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> Allocator[op:2:2:0:HashPartitionSender] closed with outstanding buffers 
> allocated (4).
> Allocator(op:2:2:0:HashPartitionSender) 100/10240/2140160/100 
> (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 4
> ledger[10892635] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310917183..0, 
> allocatorManager: [7140397, life: 23697371310913697..0] holds 1 buffers.
> DrillBuf[13122380], udle: [7140398 0..4096]
> ledger[10892636] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371311045504..0, 
> allocatorManager: [7140398, life: 23697371311041789..0] holds 1 buffers.
> DrillBuf[13122381], udle: [7140399 0..1024]
> ledger[10892634] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310795164..0, 
> allocatorManager: [7140396, life: 23697371310789988..0] holds 1 buffers.
> DrillBuf[13122379], udle: [7140397 0..4096]
> ledger[10892513] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371288488073..0, 
> allocatorManager: [7140275, life: 23697371288484282..0] holds 1 buffers.
> DrillBuf[13122245], udle: [7140276 0..1024]
>   reservations: 0
> Fragment 2:2
> [Error Id: 043c5d25-c4de-4a70-9cb1-d4987822ee3b on atsqa4-134.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Allocator[op:2:2:0:HashPartitionSender] closed with 
> outstanding buffers allocated (4).
> Allocator(op:2:2:0:HashPartitionSender) 100/10240/2140160/100 
> (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 4
> ledger[10892635] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310917183..0, 
> allocatorManager: [7140397, life: 23697371310913697..0] holds 1 buffers.
> DrillBuf[13122380], udle: [7140398 0..4096]
> ledger[10892636] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371311045504..0, 
> allocatorManager: [7140398, life: 23697371311041789..0] holds 1 buffers.
> DrillBuf[13122381], udle: [7140399 0..1024]
> ledger[10892634] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310795164..0, 
> allocatorManager: [7140396, life: 23697371310789988..0] holds 1 buffers.
> DrillBuf[13122379], udle: [7140397 0..4096]
> ledger[10892513] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371288488073..0, 
> allocatorManager: [7140275, life: 23697371288484282..0] holds 1 buffers.
> DrillBuf[13122245], udle: [7140276 0..1024]
>   reservations: 0
> {code}
> Reproduced twice by running: ./run.sh -s Advanced/tpcds/tpcds_sf100/original 
> -g smoke -t 600 -n 10 -i 100 -m
> Cluster configuration: vanilla, 48GB of memory, 4GB heap.
> Attaching query profile and logs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4262) add support for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

2016-01-25 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim updated DRILL-4262:

Assignee: Aman Sinha  (was: Deneche A. Hakim)

> add support for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
> 
>
> Key: DRILL-4262
> URL: https://issues.apache.org/jira/browse/DRILL-4262
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Relational Operators
>Reporter: Deneche A. Hakim
>Assignee: Aman Sinha
> Fix For: Future
>
>
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is already supported as the 
> default frame when an ORDER clause is present in the window definition.
> We need to add support for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4262) add support for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

2016-01-25 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116602#comment-15116602
 ] 

Deneche A. Hakim commented on DRILL-4262:
-

Opened pull request [#337|https://github.com/apache/drill/pull/337]

> add support for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
> 
>
> Key: DRILL-4262
> URL: https://issues.apache.org/jira/browse/DRILL-4262
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Relational Operators
>Reporter: Deneche A. Hakim
>Assignee: Aman Sinha
> Fix For: Future
>
>
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is already supported as the 
> default frame when an ORDER clause is present in the window definition.
> We need to add support for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4262) add support for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

2016-01-25 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim updated DRILL-4262:

Assignee: Aman Sinha  (was: Deneche A. Hakim)

> add support for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
> 
>
> Key: DRILL-4262
> URL: https://issues.apache.org/jira/browse/DRILL-4262
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Relational Operators
>Reporter: Deneche A. Hakim
>Assignee: Aman Sinha
> Fix For: Future
>
>
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is already supported as the 
> default frame when an ORDER clause is present in the window definition.
> We need to add support for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4262) add support for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

2016-01-25 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim reassigned DRILL-4262:
---

Assignee: Deneche A. Hakim  (was: Aman Sinha)

> add support for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
> 
>
> Key: DRILL-4262
> URL: https://issues.apache.org/jira/browse/DRILL-4262
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Relational Operators
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
> Fix For: Future
>
>
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is already supported as the 
> default frame when an ORDER clause is present in the window definition.
> We need to add support for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4308) Aggregate operations on dir columns can be more efficient for certain use cases

2016-01-25 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116604#comment-15116604
 ] 

Jason Altekruse commented on DRILL-4308:


[~amansinha100] One of these queries is possible today, the other should be 
simple to implement by exposing the same information given in "show files" 
command in a way that can have a filter applied to it (today it isn't really a 
query, it's a special case). As show files includes a field isDirectory, this 
should be as simple as apply a filter to this data.

the first can be written as: select dir0 from largetable where dir0 = maxdir() 
limit 1

> Aggregate operations on dir columns can be more efficient for certain use 
> cases
> --
>
> Key: DRILL-4308
> URL: https://issues.apache.org/jira/browse/DRILL-4308
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>
> For queries that perform plain aggregates or DISTINCT operations on the 
> directory partition columns (dir0, dir1 etc.) and there are no other columns 
> referenced in the query, the performance could be substantially improved by 
> not having to scan the entire dataset.   
> Consider the following types of queries:
> {noformat}
> select  min(dir0) from largetable;
> select  distinct dir0 from largetable;
> {noformat}
> The number of distinct values of dir columns is typically quite small and 
> there's no reason to scan the large table.  This is also come as user 
> feedback from some Drill users.  Of course, if there's any other column 
> referenced in the query (WHERE, ORDER-BY etc.) then we cannot apply this 
> optimization.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4310) Memory leak in hash partition sender when query is cancelled

2016-01-25 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116609#comment-15116609
 ] 

Deneche A. Hakim commented on DRILL-4310:
-

Looking at the Foreman's log (133) it seems that the query failed because the 
RPC connection between the foreman node and the client timed out, this is what 
caused the remaining fragments to be cancelled:
{noformat}
2016-01-26 00:45:16,276 [UserServer-1] INFO  o.a.drill.exec.rpc.user.UserServer 
- RPC connection /10.10.88.133:31010 <--> /10.10.88.133:59875 (user client) 
timed out.  Timeout was set to 30 seconds. Closing connection.
2016-01-26 00:45:16,278 [UserServer-1] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 295940a8-3662-16ba-4c63-b28acb67e0a6:0:0: 
State change requested FAILED --> FAILED
2016-01-26 00:45:16,279 [UserServer-1] INFO  o.a.drill.exec.rpc.user.UserServer 
- RPC connection /10.10.88.133:31010 <--> /10.10.88.133:59882 (user client) 
timed out.  Timeout was set to 30 seconds. Closing connection.
2016-01-26 00:45:16,280 [UserServer-1] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 295940a8-3662-16ba-4c63-b28acb67e0a6:0:0: 
State change requested FAILED --> FAILED
2016-01-26 00:45:16,338 [UserServer-1] INFO  o.a.drill.exec.rpc.user.UserServer 
- RPC connection /10.10.88.133:31010 <--> /10.10.88.133:59885 (user client) 
timed out.  Timeout was set to 30 seconds. Closing connection.
2016-01-26 00:45:16,340 [UserServer-1] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 295940a8-3662-16ba-4c63-b28acb67e0a6:0:0: 
State change requested FAILED --> FAILED
{noformat}

> Memory leak in hash partition sender when query is cancelled
> 
>
> Key: DRILL-4310
> URL: https://issues.apache.org/jira/browse/DRILL-4310
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 0.5.0
>Reporter: Victoria Markman
> Attachments: 29593ea8-88d2-612e-7c58-aa11652c4072.sys.drill, 
> drillbit.log.133, drillbit.log.134, drillbit.log.135, drillbit.log.136
>
>
> Query got cancelled (still investigating what caused cancellation).
> Here is an excerpt from drillbit.log
> {code}
> 2016-01-26 00:46:29,627 [29593ea8-88d2-612e-7c58-aa11652c4072:frag:2:2] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> Allocator[op:2:2:0:HashPartitionSender] closed with outstanding buffers 
> allocated (4).
> Allocator(op:2:2:0:HashPartitionSender) 100/10240/2140160/100 
> (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 4
> ledger[10892635] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310917183..0, 
> allocatorManager: [7140397, life: 23697371310913697..0] holds 1 buffers.
> DrillBuf[13122380], udle: [7140398 0..4096]
> ledger[10892636] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371311045504..0, 
> allocatorManager: [7140398, life: 23697371311041789..0] holds 1 buffers.
> DrillBuf[13122381], udle: [7140399 0..1024]
> ledger[10892634] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310795164..0, 
> allocatorManager: [7140396, life: 23697371310789988..0] holds 1 buffers.
> DrillBuf[13122379], udle: [7140397 0..4096]
> ledger[10892513] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371288488073..0, 
> allocatorManager: [7140275, life: 23697371288484282..0] holds 1 buffers.
> DrillBuf[13122245], udle: [7140276 0..1024]
>   reservations: 0
> Fragment 2:2
> [Error Id: 043c5d25-c4de-4a70-9cb1-d4987822ee3b on atsqa4-134.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Allocator[op:2:2:0:HashPartitionSender] closed with 
> outstanding buffers allocated (4).
> Allocator(op:2:2:0:HashPartitionSender) 100/10240/2140160/100 
> (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 4
> ledger[10892635] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310917183..0, 
> allocatorManager: [7140397, life: 23697371310913697..0] holds 1 buffers.
> DrillBuf[13122380], udle: [7140398 0..4096]
> ledger[10892636] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371311045504..0, 
> allocatorManager: [7140398, life: 23697371311041789..0] holds 1 buffers.
> DrillBuf[13122381], udle: [7140399 0..1024]
> ledger[10892634] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310795164..0, 
> allocatorManager: [7140396, life: 23697371310789988..0] holds 1 buffers.
> 

[jira] [Issue Comment Deleted] (DRILL-4310) Memory leak in hash partition sender when query is cancelled

2016-01-25 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim updated DRILL-4310:

Comment: was deleted

(was: Looking at the Foreman's log (133) it seems that the query failed because 
the RPC connection between the foreman node and the client timed out, this is 
what caused the remaining fragments to be cancelled:
{noformat}
2016-01-26 00:45:16,276 [UserServer-1] INFO  o.a.drill.exec.rpc.user.UserServer 
- RPC connection /10.10.88.133:31010 <--> /10.10.88.133:59875 (user client) 
timed out.  Timeout was set to 30 seconds. Closing connection.
2016-01-26 00:45:16,278 [UserServer-1] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 295940a8-3662-16ba-4c63-b28acb67e0a6:0:0: 
State change requested FAILED --> FAILED
2016-01-26 00:45:16,279 [UserServer-1] INFO  o.a.drill.exec.rpc.user.UserServer 
- RPC connection /10.10.88.133:31010 <--> /10.10.88.133:59882 (user client) 
timed out.  Timeout was set to 30 seconds. Closing connection.
2016-01-26 00:45:16,280 [UserServer-1] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 295940a8-3662-16ba-4c63-b28acb67e0a6:0:0: 
State change requested FAILED --> FAILED
2016-01-26 00:45:16,338 [UserServer-1] INFO  o.a.drill.exec.rpc.user.UserServer 
- RPC connection /10.10.88.133:31010 <--> /10.10.88.133:59885 (user client) 
timed out.  Timeout was set to 30 seconds. Closing connection.
2016-01-26 00:45:16,340 [UserServer-1] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 295940a8-3662-16ba-4c63-b28acb67e0a6:0:0: 
State change requested FAILED --> FAILED
{noformat})

> Memory leak in hash partition sender when query is cancelled
> 
>
> Key: DRILL-4310
> URL: https://issues.apache.org/jira/browse/DRILL-4310
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 0.5.0
>Reporter: Victoria Markman
> Attachments: 29593ea8-88d2-612e-7c58-aa11652c4072.sys.drill, 
> drillbit.log.133, drillbit.log.134, drillbit.log.135, drillbit.log.136
>
>
> Query got cancelled (still investigating what caused cancellation).
> Here is an excerpt from drillbit.log
> {code}
> 2016-01-26 00:46:29,627 [29593ea8-88d2-612e-7c58-aa11652c4072:frag:2:2] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> Allocator[op:2:2:0:HashPartitionSender] closed with outstanding buffers 
> allocated (4).
> Allocator(op:2:2:0:HashPartitionSender) 100/10240/2140160/100 
> (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 4
> ledger[10892635] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310917183..0, 
> allocatorManager: [7140397, life: 23697371310913697..0] holds 1 buffers.
> DrillBuf[13122380], udle: [7140398 0..4096]
> ledger[10892636] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371311045504..0, 
> allocatorManager: [7140398, life: 23697371311041789..0] holds 1 buffers.
> DrillBuf[13122381], udle: [7140399 0..1024]
> ledger[10892634] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310795164..0, 
> allocatorManager: [7140396, life: 23697371310789988..0] holds 1 buffers.
> DrillBuf[13122379], udle: [7140397 0..4096]
> ledger[10892513] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371288488073..0, 
> allocatorManager: [7140275, life: 23697371288484282..0] holds 1 buffers.
> DrillBuf[13122245], udle: [7140276 0..1024]
>   reservations: 0
> Fragment 2:2
> [Error Id: 043c5d25-c4de-4a70-9cb1-d4987822ee3b on atsqa4-134.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Allocator[op:2:2:0:HashPartitionSender] closed with 
> outstanding buffers allocated (4).
> Allocator(op:2:2:0:HashPartitionSender) 100/10240/2140160/100 
> (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 4
> ledger[10892635] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310917183..0, 
> allocatorManager: [7140397, life: 23697371310913697..0] holds 1 buffers.
> DrillBuf[13122380], udle: [7140398 0..4096]
> ledger[10892636] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 1024, references: 1, life: 23697371311045504..0, 
> allocatorManager: [7140398, life: 23697371311041789..0] holds 1 buffers.
> DrillBuf[13122381], udle: [7140399 0..1024]
> ledger[10892634] allocator: op:2:2:0:HashPartitionSender), isOwning: 
> true, size: 4096, references: 1, life: 23697371310795164..0, 
> allocatorManager: [7140396, life: 23697371310789988..0] holds 1 buffers.
> DrillBuf[13122379], udle:

[jira] [Commented] (DRILL-4308) Aggregate operations on dir columns can be more efficient for certain use cases

2016-01-25 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116679#comment-15116679
 ] 

Aman Sinha commented on DRILL-4308:
---

[~jaltekruse] Right, I should have clarified that these types of queries may be 
generated by an external tool (e.g Tableau) so we would need to do a rule based 
rewrite to use mindir()/maxdir().  Actually, the second case (with DISTINCT) is 
the main reason I created the JIRA.  Using the show files output could be a 
reasonable approach..I haven't looked much into it. 

Incidentally, I am getting a wrong result for the second query below.  I would 
think it should produce the same result as the first query. (My directory 
structure is  year/quarter).  Instead the second query produces 'Q1' for dir0 
which is incorrect.  Any thoughts ? If you think this is an issue, I can file a 
separate JIRA.  
{noformat}
0: jdbc:drill:zk=local> select dir0 from dfs.tmp.testdata order by dir0 limit 1;
+---+
| dir0  |
+---+
| 1994  |
+---+
1 row selected (0.842 seconds)
0: jdbc:drill:zk=local> select dir0 from dfs.tmp.testdata where 
dir0=mindir('dfs.tmp', 'testdata') limit 1;
+---+
| dir0  |
+---+
| Q1|
+---+
1 row selected (0.311 seconds)
{noformat}

> Aggregate operations on dir columns can be more efficient for certain use 
> cases
> --
>
> Key: DRILL-4308
> URL: https://issues.apache.org/jira/browse/DRILL-4308
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>
> For queries that perform plain aggregates or DISTINCT operations on the 
> directory partition columns (dir0, dir1 etc.) and there are no other columns 
> referenced in the query, the performance could be substantially improved by 
> not having to scan the entire dataset.   
> Consider the following types of queries:
> {noformat}
> select  min(dir0) from largetable;
> select  distinct dir0 from largetable;
> {noformat}
> The number of distinct values of dir columns is typically quite small and 
> there's no reason to scan the large table.  This is also come as user 
> feedback from some Drill users.  Of course, if there's any other column 
> referenced in the query (WHERE, ORDER-BY etc.) then we cannot apply this 
> optimization.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)