[jira] [Commented] (DRILL-4237) Skew in hash distribution

2016-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249285#comment-15249285
 ] 

ASF GitHub Bot commented on DRILL-4237:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/485


> Skew in hash distribution
> -
>
> Key: DRILL-4237
> URL: https://issues.apache.org/jira/browse/DRILL-4237
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Chunhui Shi
>
> Apparently, the fix in DRILL-4119 did not fully resolve the data skew issue.  
> It worked fine on the smaller sample of the data set but on another sample of 
> the same data set, it still produces skewed values - see below the hash 
> values which are all odd numbers. 
> {noformat}
> 0: jdbc:drill:zk=local> select columns[0], hash32(columns[0]) from `test.csv` 
> limit 10;
> +---+--+
> |  EXPR$0   |EXPR$1|
> +---+--+
> | f71aaddec3316ae18d43cb1467e88a41  | 1506011089   |
> | 3f3a13bb45618542b5ac9d9536704d3a  | 1105719049   |
> | 6935afd0c693c67bba482cedb7a2919b  | -18137557|
> | ca2a938d6d7e57bda40501578f98c2a8  | -1372666789  |
> | fab7f08402c8836563b0a5c94dbf0aec  | -1930778239  |
> | 9eb4620dcb68a84d17209da279236431  | -970026001   |
> | 16eed4a4e801b98550b4ff504242961e  | 356133757|
> | a46f7935fea578ce61d8dd45bfbc2b3d  | -94010449|
> | 7fdf5344536080c15deb2b5a2975a2b7  | -141361507   |
> | b82560a06e2e51b461c9fe134a8211bd  | -375376717   |
> +---+--+
> {noformat}
> This indicates an underlying issue with the XXHash64 java implementation, 
> which is Drill's implementation of the C version.  One of the key difference 
> as pointed out by [~jnadeau] was the use of unsigned int64 in the C version 
> compared to the Java version which uses (signed) long.  I created an XXHash 
> version using com.google.common.primitives.UnsignedLong.  However, 
> UnsignedLong does not have bit-wise operations that are needed for XXHash 
> such as rotateLeft(),  XOR etc.  One could write wrappers for these but at 
> this point, the question is: should we think of an alternative hash function 
> ? 
> The alternative approach could be the murmur hash for numeric data types that 
> we were using earlier and the Mahout version of hash function for string 
> types 
> (https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/HashHelper.java#L28).
>   As a test, I reverted to this function and was getting good hash 
> distribution for the test data. 
> I could not find any performance comparisons of our perf tests (TPC-H or DS) 
> with the original and newer (XXHash) hash functions.  If performance is 
> comparable, should we revert to the original function ?  
> As an aside, I would like to remove the hash64 versions of the functions 
> since these are not used anywhere. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4459) SchemaChangeException while querying hive json table

2016-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249284#comment-15249284
 ] 

ASF GitHub Bot commented on DRILL-4459:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/431


> SchemaChangeException while querying hive json table
> 
>
> Key: DRILL-4459
> URL: https://issues.apache.org/jira/browse/DRILL-4459
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill, Functions - Hive
>Affects Versions: 1.4.0
> Environment: MapR-Drill 1.4.0
> Hive-1.2.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
>
> getting the SchemaChangeException while querying json documents stored in 
> hive table.
> {noformat}
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> {noformat}
> minimum reproduce
> {noformat}
> created sample json documents using the attached script(randomdata.sh)
> hive>create table simplejson(json string);
> hive>load data local inpath '/tmp/simple.json' into table simplejson;
> now query it through Drill.
> Drill Version
> select * from sys.version;
> +---++-+-++
> | commit_id | commit_message | commit_time | build_email | build_time |
> +---++-+-++
> | eafe0a245a0d4c0234bfbead10c6b2d7c8ef413d | DRILL-3901:  Don't do early 
> expansion of directory in the non-metadata-cache case because it already 
> happens during ParquetGroupScan's metadata gathering operation. | 07.10.2015 
> @ 17:12:57 UTC | Unknown | 07.10.2015 @ 17:36:16 UTC |
> +---++-+-++
> 0: jdbc:drill:zk=> select * from hive.`default`.simplejson where 
> GET_JSON_OBJECT(simplejson.json, '$.DocId') = 'DocId2759947' limit 1;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 1:1
> [Error Id: 74f054a8-6f1d-4ddd-9064-3939fcc82647 on ip-10-0-0-233:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4113) memory leak reported while handling query or shutting down

2016-04-19 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-4113.
---
Resolution: Cannot Reproduce

> memory leak reported while handling query or shutting down
> --
>
> Key: DRILL-4113
> URL: https://issues.apache.org/jira/browse/DRILL-4113
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Chun Chang
>Priority: Critical
>
> With impersonation enabled, I've seen two memory leaks. One reported at query 
> time, one at shutdown.
> At query time:
> {noformat}
> 2015-11-17 19:11:03,595 [29b413b7-958e-c1f3-9d37-c34f96e7bf6a:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29b413b7-958e-c1f3-9d37-c34f96e7bf6a: use `dfs.window_functions`
> 2015-11-17 19:11:03,666 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29b413b7-edbc-9722-120d-66ab3611f250:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2015-11-17 19:11:03,666 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] INFO  
> o.a.d.e.w.f.FragmentStatusReporter - 
> 29b413b7-edbc-9722-120d-66ab3611f250:0:0: State to report: RUNNING
> 2015-11-17 19:11:03,669 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29b413b7-edbc-9722-120d-66ab3611f250:0:0: State change requested RUNNING --> 
> FAILED
> 2015-11-17 19:11:03,669 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29b413b7-edbc-9722-120d-66ab3611f250:0:0: State change requested FAILED --> 
> FAILED
> 2015-11-17 19:11:03,669 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29b413b7-edbc-9722-120d-66ab3611f250:0:0: State change requested FAILED --> 
> FINISHED
> 2015-11-17 19:11:03,674 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> Failure while closing accountor.  Expected private and shared pools to be set 
> to initial values.  However, one or more were not.  Stats are
> zoneinitallocated   delta
> private 100 738112  261888
> shared  00  261888  -261888.
> Fragment 0:0
> [Error Id: 6df67be9-69d4-4a3b-9eae-43ab2404c6d3 on drillats1.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Failure while closing accountor.  Expected private and 
> shared pools to be set to initial values.  However, one or more were not.  
> Stats are
> zoneinitallocated   delta
> private 100 738112  261888
> shared  00  261888  -261888.
> Fragment 0:0
> [Error Id: 6df67be9-69d4-4a3b-9eae-43ab2404c6d3 on drillats1.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: java.lang.IllegalStateException: Failure while closing accountor.  
> Expected private and shared pools to be set to initial values.  However, one 
> or more were not.  Stats are
> zoneinitallocated   delta
> private 100 738112  261888
> shared  00  261888  -261888.
> at 
> org.apache.drill.exec.memory.AtomicRemainder.close(AtomicRemainder.java:199) 
> ~[drill-memory-impl-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.AccountorImpl.close(AccountorImpl.java:365) 
> ~[drill-memory-impl-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.close(TopLevelAllocator.java:326)
>  ~[drill-memory-impl-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> 

[jira] [Commented] (DRILL-4237) Skew in hash distribution

2016-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248551#comment-15248551
 ] 

ASF GitHub Bot commented on DRILL-4237:
---

Github user amansinha100 commented on the pull request:

https://github.com/apache/drill/pull/485#issuecomment-212110488
  
+1  (I had already reviewed the previous #430 


> Skew in hash distribution
> -
>
> Key: DRILL-4237
> URL: https://issues.apache.org/jira/browse/DRILL-4237
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Chunhui Shi
>
> Apparently, the fix in DRILL-4119 did not fully resolve the data skew issue.  
> It worked fine on the smaller sample of the data set but on another sample of 
> the same data set, it still produces skewed values - see below the hash 
> values which are all odd numbers. 
> {noformat}
> 0: jdbc:drill:zk=local> select columns[0], hash32(columns[0]) from `test.csv` 
> limit 10;
> +---+--+
> |  EXPR$0   |EXPR$1|
> +---+--+
> | f71aaddec3316ae18d43cb1467e88a41  | 1506011089   |
> | 3f3a13bb45618542b5ac9d9536704d3a  | 1105719049   |
> | 6935afd0c693c67bba482cedb7a2919b  | -18137557|
> | ca2a938d6d7e57bda40501578f98c2a8  | -1372666789  |
> | fab7f08402c8836563b0a5c94dbf0aec  | -1930778239  |
> | 9eb4620dcb68a84d17209da279236431  | -970026001   |
> | 16eed4a4e801b98550b4ff504242961e  | 356133757|
> | a46f7935fea578ce61d8dd45bfbc2b3d  | -94010449|
> | 7fdf5344536080c15deb2b5a2975a2b7  | -141361507   |
> | b82560a06e2e51b461c9fe134a8211bd  | -375376717   |
> +---+--+
> {noformat}
> This indicates an underlying issue with the XXHash64 java implementation, 
> which is Drill's implementation of the C version.  One of the key difference 
> as pointed out by [~jnadeau] was the use of unsigned int64 in the C version 
> compared to the Java version which uses (signed) long.  I created an XXHash 
> version using com.google.common.primitives.UnsignedLong.  However, 
> UnsignedLong does not have bit-wise operations that are needed for XXHash 
> such as rotateLeft(),  XOR etc.  One could write wrappers for these but at 
> this point, the question is: should we think of an alternative hash function 
> ? 
> The alternative approach could be the murmur hash for numeric data types that 
> we were using earlier and the Mahout version of hash function for string 
> types 
> (https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/HashHelper.java#L28).
>   As a test, I reverted to this function and was getting good hash 
> distribution for the test data. 
> I could not find any performance comparisons of our perf tests (TPC-H or DS) 
> with the original and newer (XXHash) hash functions.  If performance is 
> comparable, should we revert to the original function ?  
> As an aside, I would like to remove the hash64 versions of the functions 
> since these are not used anywhere. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4237) Skew in hash distribution

2016-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248548#comment-15248548
 ] 

ASF GitHub Bot commented on DRILL-4237:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/485#discussion_r60304024
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MurmurHash3.java
 ---
@@ -0,0 +1,280 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+
+package org.apache.drill.exec.expr.fn.impl;
+
+import io.netty.buffer.DrillBuf;
+import io.netty.util.internal.PlatformDependent;
+
+
+/**
+ *
+ * MurmurHash3 was written by Austin Appleby, and is placed in the public
+ * domain.
+ * See http://smhasher.googlecode.com/svn/trunk/MurmurHash3.cpp
+ * MurmurHash3_x64_128
+ * MurmurHash3_x86_32
+ */
+public final class MurmurHash3 extends DrillHash{
+
+   public static final long fmix64(long k) {
+k ^= k >>> 33;
+k *= 0xff51afd7ed558ccdL;
+k ^= k >>> 33;
+k *= 0xc4ceb9fe1a85ec53L;
+k ^= k >>> 33;
+return k;
+  }
+
+  /*
+  Take 64 bit of murmur3_128's output
+   */
+  public static long murmur3_64(long bStart, long bEnd, DrillBuf buffer, 
int seed) {
+
+long h1 = seed & 0xL;
+long h2 = seed & 0xL;
+
+final long c1 = 0x87c37b91114253d5L;
+final long c2 = 0x4cf5ad432745937fL;
+long start = buffer.memoryAddress() + bStart;
+long end = buffer.memoryAddress() + bEnd;
+long length = bEnd - bStart;
+long roundedEnd = start + ( length & 0xFFF0);  // round down to 16 
byte block
+for (long i=start; i

[jira] [Commented] (DRILL-4237) Skew in hash distribution

2016-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248547#comment-15248547
 ] 

ASF GitHub Bot commented on DRILL-4237:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/485#discussion_r60303921
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/HashHelper.java 
---
@@ -60,4 +60,45 @@ public static int hash(ByteBuffer buf, int seed) {
 return h;
   }
 
+  public static int hash32(int val, long seed) {
+double converted = val;
+return hash32(converted, seed);
+  }
+  public static int hash32(long val, long seed) {
+double converted = val;
+return hash32(converted, seed);
+  }
+  public static int hash32(float val, long seed){
+double converted = val;
+return hash32(converted, seed);
+  }
+
+
+  public static long hash64(float val, long seed){
+double converted = val;
+return hash64(converted, seed);
+  }
+  public static long hash64(long val, long seed){
+double converted = val;
+return hash64(converted, seed);
+  }
+
+  public static long hash64(double val, long seed){
+return MurmurHash3.hash64(val, (int)seed);
+  }
+
+  public static long hash64(long start, long end, DrillBuf buffer, long 
seed){
+return MurmurHash3.hash64(start, end, buffer, (int)seed);
+  }
+
+  public static int hash32(double val, long seed) {
+//return 
com.google.common.hash.Hashing.murmur3_128().hashLong(Double.doubleToLongBits(val)).asInt();
--- End diff --

Remove this line


> Skew in hash distribution
> -
>
> Key: DRILL-4237
> URL: https://issues.apache.org/jira/browse/DRILL-4237
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Chunhui Shi
>
> Apparently, the fix in DRILL-4119 did not fully resolve the data skew issue.  
> It worked fine on the smaller sample of the data set but on another sample of 
> the same data set, it still produces skewed values - see below the hash 
> values which are all odd numbers. 
> {noformat}
> 0: jdbc:drill:zk=local> select columns[0], hash32(columns[0]) from `test.csv` 
> limit 10;
> +---+--+
> |  EXPR$0   |EXPR$1|
> +---+--+
> | f71aaddec3316ae18d43cb1467e88a41  | 1506011089   |
> | 3f3a13bb45618542b5ac9d9536704d3a  | 1105719049   |
> | 6935afd0c693c67bba482cedb7a2919b  | -18137557|
> | ca2a938d6d7e57bda40501578f98c2a8  | -1372666789  |
> | fab7f08402c8836563b0a5c94dbf0aec  | -1930778239  |
> | 9eb4620dcb68a84d17209da279236431  | -970026001   |
> | 16eed4a4e801b98550b4ff504242961e  | 356133757|
> | a46f7935fea578ce61d8dd45bfbc2b3d  | -94010449|
> | 7fdf5344536080c15deb2b5a2975a2b7  | -141361507   |
> | b82560a06e2e51b461c9fe134a8211bd  | -375376717   |
> +---+--+
> {noformat}
> This indicates an underlying issue with the XXHash64 java implementation, 
> which is Drill's implementation of the C version.  One of the key difference 
> as pointed out by [~jnadeau] was the use of unsigned int64 in the C version 
> compared to the Java version which uses (signed) long.  I created an XXHash 
> version using com.google.common.primitives.UnsignedLong.  However, 
> UnsignedLong does not have bit-wise operations that are needed for XXHash 
> such as rotateLeft(),  XOR etc.  One could write wrappers for these but at 
> this point, the question is: should we think of an alternative hash function 
> ? 
> The alternative approach could be the murmur hash for numeric data types that 
> we were using earlier and the Mahout version of hash function for string 
> types 
> (https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/HashHelper.java#L28).
>   As a test, I reverted to this function and was getting good hash 
> distribution for the test data. 
> I could not find any performance comparisons of our perf tests (TPC-H or DS) 
> with the original and newer (XXHash) hash functions.  If performance is 
> comparable, should we revert to the original function ?  
> As an aside, I would like to remove the hash64 versions of the functions 
> since these are not used anywhere. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4326) JDBC Storage Plugin for PostgreSQL does not work

2016-04-19 Thread Akon Dey (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akon Dey resolved DRILL-4326.
-
   Resolution: Not A Bug
Fix Version/s: 1.5.0

Please see the comments to determine the correct way to use postgres with drill.

> JDBC Storage Plugin for PostgreSQL does not work
> 
>
> Key: DRILL-4326
> URL: https://issues.apache.org/jira/browse/DRILL-4326
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
> Environment: Mac OS X JDK 1.8 PostgreSQL 9.4.4 PostgreSQL JDBC jars 
> (postgresql-9.2-1004-jdbc4.jar, postgresql-9.1-901-1.jdbc4.jar, )
>Reporter: Akon Dey
> Fix For: 1.5.0
>
>
> Queries with the JDBC Storage Plugin for PostgreSQL fail with DATA_READ ERROR.
> The JDBC Storage Plugin settings in use are:
> {code}
> {
>   "type": "jdbc",
>   "driver": "org.postgresql.Driver",
>   "url": "jdbc:postgresql://127.0.0.1/test",
>   "username": "akon",
>   "password": null,
>   "enabled": false
> }
> {code}
> Please refer to the following stack for further details:
> {noformat}
> Akons-MacBook-Pro:drill akon$ 
> ./distribution/target/apache-drill-1.5.0-SNAPSHOT/apache-drill-1.5.0-SNAPSHOT/bin/drill-embedded
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; 
> support was removed in 8.0
> Jan 29, 2016 9:17:18 AM org.glassfish.jersey.server.ApplicationHandler 
> initialize
> INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 
> 01:25:26...
> apache drill 1.5.0-SNAPSHOT
> "a little sql for your nosql"
> 0: jdbc:drill:zk=local> !verbose
> verbose: on
> 0: jdbc:drill:zk=local> use pgdb;
> +---+---+
> |  ok   |  summary  |
> +---+---+
> | true  | Default schema changed to [pgdb]  |
> +---+---+
> 1 row selected (0.753 seconds)
> 0: jdbc:drill:zk=local> select * from ips;
> Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the 
> SQL query.
> sql SELECT *
> FROM "test"."ips"
> plugin pgdb
> Fragment 0:0
> [Error Id: 26ada06d-e08d-456a-9289-0dec2089b018 on 10.200.104.128:31010] 
> (state=,code=0)
> java.sql.SQLException: DATA_READ ERROR: The JDBC storage plugin failed while 
> trying setup the SQL query.
> sql SELECT *
> FROM "test"."ips"
> plugin pgdb
> Fragment 0:0
> [Error Id: 26ada06d-e08d-456a-9289-0dec2089b018 on 10.200.104.128:31010]
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1923)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:73)
>   at 
> net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
>   at 
> net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351)
>   at 
> net.hydromatic.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:338)
>   at 
> net.hydromatic.avatica.AvaticaStatement.execute(AvaticaStatement.java:69)
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.execute(DrillStatementImpl.java:101)
>   at sqlline.Commands.execute(Commands.java:841)
>   at sqlline.Commands.sql(Commands.java:751)
>   at sqlline.SqlLine.dispatch(SqlLine.java:746)
>   at sqlline.SqlLine.begin(SqlLine.java:621)
>   at sqlline.SqlLine.start(SqlLine.java:375)
>   at sqlline.SqlLine.main(SqlLine.java:268)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: DATA_READ 
> ERROR: The JDBC storage plugin failed while trying setup the SQL query.
> sql SELECT *
> FROM "test"."ips"
> plugin pgdb
> Fragment 0:0
> [Error Id: 26ada06d-e08d-456a-9289-0dec2089b018 on 10.200.104.128:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67)
>   at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374)
>   at 
> org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
>   at 
> org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252)
>   at 
> org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
>   at 

[jira] [Commented] (DRILL-4113) memory leak reported while handling query or shutting down

2016-04-19 Thread Chun Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248409#comment-15248409
 ] 

Chun Chang commented on DRILL-4113:
---

[~jacq...@dremio.com] I am not seeing it anymore.

> memory leak reported while handling query or shutting down
> --
>
> Key: DRILL-4113
> URL: https://issues.apache.org/jira/browse/DRILL-4113
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Chun Chang
>Priority: Critical
>
> With impersonation enabled, I've seen two memory leaks. One reported at query 
> time, one at shutdown.
> At query time:
> {noformat}
> 2015-11-17 19:11:03,595 [29b413b7-958e-c1f3-9d37-c34f96e7bf6a:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29b413b7-958e-c1f3-9d37-c34f96e7bf6a: use `dfs.window_functions`
> 2015-11-17 19:11:03,666 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29b413b7-edbc-9722-120d-66ab3611f250:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2015-11-17 19:11:03,666 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] INFO  
> o.a.d.e.w.f.FragmentStatusReporter - 
> 29b413b7-edbc-9722-120d-66ab3611f250:0:0: State to report: RUNNING
> 2015-11-17 19:11:03,669 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29b413b7-edbc-9722-120d-66ab3611f250:0:0: State change requested RUNNING --> 
> FAILED
> 2015-11-17 19:11:03,669 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29b413b7-edbc-9722-120d-66ab3611f250:0:0: State change requested FAILED --> 
> FAILED
> 2015-11-17 19:11:03,669 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29b413b7-edbc-9722-120d-66ab3611f250:0:0: State change requested FAILED --> 
> FINISHED
> 2015-11-17 19:11:03,674 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> Failure while closing accountor.  Expected private and shared pools to be set 
> to initial values.  However, one or more were not.  Stats are
> zoneinitallocated   delta
> private 100 738112  261888
> shared  00  261888  -261888.
> Fragment 0:0
> [Error Id: 6df67be9-69d4-4a3b-9eae-43ab2404c6d3 on drillats1.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Failure while closing accountor.  Expected private and 
> shared pools to be set to initial values.  However, one or more were not.  
> Stats are
> zoneinitallocated   delta
> private 100 738112  261888
> shared  00  261888  -261888.
> Fragment 0:0
> [Error Id: 6df67be9-69d4-4a3b-9eae-43ab2404c6d3 on drillats1.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: java.lang.IllegalStateException: Failure while closing accountor.  
> Expected private and shared pools to be set to initial values.  However, one 
> or more were not.  Stats are
> zoneinitallocated   delta
> private 100 738112  261888
> shared  00  261888  -261888.
> at 
> org.apache.drill.exec.memory.AtomicRemainder.close(AtomicRemainder.java:199) 
> ~[drill-memory-impl-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.AccountorImpl.close(AccountorImpl.java:365) 
> ~[drill-memory-impl-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.close(TopLevelAllocator.java:326)
>  ~[drill-memory-impl-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> 

[jira] [Commented] (DRILL-4113) memory leak reported while handling query or shutting down

2016-04-19 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248269#comment-15248269
 ] 

Jacques Nadeau commented on DRILL-4113:
---

[~cch...@maprtech.com], can you confirm if this is still happening?

> memory leak reported while handling query or shutting down
> --
>
> Key: DRILL-4113
> URL: https://issues.apache.org/jira/browse/DRILL-4113
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Chun Chang
>Priority: Critical
>
> With impersonation enabled, I've seen two memory leaks. One reported at query 
> time, one at shutdown.
> At query time:
> {noformat}
> 2015-11-17 19:11:03,595 [29b413b7-958e-c1f3-9d37-c34f96e7bf6a:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29b413b7-958e-c1f3-9d37-c34f96e7bf6a: use `dfs.window_functions`
> 2015-11-17 19:11:03,666 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29b413b7-edbc-9722-120d-66ab3611f250:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2015-11-17 19:11:03,666 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] INFO  
> o.a.d.e.w.f.FragmentStatusReporter - 
> 29b413b7-edbc-9722-120d-66ab3611f250:0:0: State to report: RUNNING
> 2015-11-17 19:11:03,669 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29b413b7-edbc-9722-120d-66ab3611f250:0:0: State change requested RUNNING --> 
> FAILED
> 2015-11-17 19:11:03,669 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29b413b7-edbc-9722-120d-66ab3611f250:0:0: State change requested FAILED --> 
> FAILED
> 2015-11-17 19:11:03,669 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29b413b7-edbc-9722-120d-66ab3611f250:0:0: State change requested FAILED --> 
> FINISHED
> 2015-11-17 19:11:03,674 [29b413b7-edbc-9722-120d-66ab3611f250:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> Failure while closing accountor.  Expected private and shared pools to be set 
> to initial values.  However, one or more were not.  Stats are
> zoneinitallocated   delta
> private 100 738112  261888
> shared  00  261888  -261888.
> Fragment 0:0
> [Error Id: 6df67be9-69d4-4a3b-9eae-43ab2404c6d3 on drillats1.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Failure while closing accountor.  Expected private and 
> shared pools to be set to initial values.  However, one or more were not.  
> Stats are
> zoneinitallocated   delta
> private 100 738112  261888
> shared  00  261888  -261888.
> Fragment 0:0
> [Error Id: 6df67be9-69d4-4a3b-9eae-43ab2404c6d3 on drillats1.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: java.lang.IllegalStateException: Failure while closing accountor.  
> Expected private and shared pools to be set to initial values.  However, one 
> or more were not.  Stats are
> zoneinitallocated   delta
> private 100 738112  261888
> shared  00  261888  -261888.
> at 
> org.apache.drill.exec.memory.AtomicRemainder.close(AtomicRemainder.java:199) 
> ~[drill-memory-impl-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.AccountorImpl.close(AccountorImpl.java:365) 
> ~[drill-memory-impl-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.close(TopLevelAllocator.java:326)
>  

[jira] [Commented] (DRILL-4615) Support directory names in schema

2016-04-19 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248208#comment-15248208
 ] 

Jesse Yates commented on DRILL-4615:


You got it exactly [~sphillips]. I'd definitely be up for an attempt. I think I 
see where we do the column/dir filtering in ParquetScanBatchCreator#getBatch, 
but FileSystemPartitionDescriptor seems a bit more vague - is it 
#createPartitionSublists or in #populatePartitionVectors? It seems like 
PartitionLocation should be the point of abstraction. Right now, the 
DFSPartitionLocation just reads the dir[index] and the ParquetPartitionLocation 
throws an exception, so I'm not sure how its all wired together.

Any hints would be appreciated!

> Support directory names in schema
> -
>
> Key: DRILL-4615
> URL: https://issues.apache.org/jira/browse/DRILL-4615
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jesse Yates
>
> In Spark, partitioned parquet output is written with directories like:
> {code}
> /column1=1
>   /column2=hello
>  /data.parquet
>   /column2=world
>  /moredata.parquet
> /column1=2
> {code}
> However, when querying these files with Drill we end up interpreting the 
> directories as strings when what they really are is column names + values. In 
> the data files we only have the remaining columns. Querying this with drill 
> means that you can really only have a couple of data types (far short of what 
> spark/parquet supports) in the column and still have correct operations.
> Given the size of the data, I don't want to have to CTAS all the parquet 
> files (especially as they are being periodically updated). 
> I think this ends up being a nice addition for general file directory reads 
> as well since many people already encode meaning into their directory 
> structure, but having self describing directories is even better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4237) Skew in hash distribution

2016-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248195#comment-15248195
 ] 

ASF GitHub Bot commented on DRILL-4237:
---

Github user chunhui-shi commented on the pull request:

https://github.com/apache/drill/pull/485#issuecomment-212021177
  
https://github.com/apache/drill/pull/430


> Skew in hash distribution
> -
>
> Key: DRILL-4237
> URL: https://issues.apache.org/jira/browse/DRILL-4237
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Chunhui Shi
>
> Apparently, the fix in DRILL-4119 did not fully resolve the data skew issue.  
> It worked fine on the smaller sample of the data set but on another sample of 
> the same data set, it still produces skewed values - see below the hash 
> values which are all odd numbers. 
> {noformat}
> 0: jdbc:drill:zk=local> select columns[0], hash32(columns[0]) from `test.csv` 
> limit 10;
> +---+--+
> |  EXPR$0   |EXPR$1|
> +---+--+
> | f71aaddec3316ae18d43cb1467e88a41  | 1506011089   |
> | 3f3a13bb45618542b5ac9d9536704d3a  | 1105719049   |
> | 6935afd0c693c67bba482cedb7a2919b  | -18137557|
> | ca2a938d6d7e57bda40501578f98c2a8  | -1372666789  |
> | fab7f08402c8836563b0a5c94dbf0aec  | -1930778239  |
> | 9eb4620dcb68a84d17209da279236431  | -970026001   |
> | 16eed4a4e801b98550b4ff504242961e  | 356133757|
> | a46f7935fea578ce61d8dd45bfbc2b3d  | -94010449|
> | 7fdf5344536080c15deb2b5a2975a2b7  | -141361507   |
> | b82560a06e2e51b461c9fe134a8211bd  | -375376717   |
> +---+--+
> {noformat}
> This indicates an underlying issue with the XXHash64 java implementation, 
> which is Drill's implementation of the C version.  One of the key difference 
> as pointed out by [~jnadeau] was the use of unsigned int64 in the C version 
> compared to the Java version which uses (signed) long.  I created an XXHash 
> version using com.google.common.primitives.UnsignedLong.  However, 
> UnsignedLong does not have bit-wise operations that are needed for XXHash 
> such as rotateLeft(),  XOR etc.  One could write wrappers for these but at 
> this point, the question is: should we think of an alternative hash function 
> ? 
> The alternative approach could be the murmur hash for numeric data types that 
> we were using earlier and the Mahout version of hash function for string 
> types 
> (https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/HashHelper.java#L28).
>   As a test, I reverted to this function and was getting good hash 
> distribution for the test data. 
> I could not find any performance comparisons of our perf tests (TPC-H or DS) 
> with the original and newer (XXHash) hash functions.  If performance is 
> comparable, should we revert to the original function ?  
> As an aside, I would like to remove the hash64 versions of the functions 
> since these are not used anywhere. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4237) Skew in hash distribution

2016-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248184#comment-15248184
 ] 

ASF GitHub Bot commented on DRILL-4237:
---

Github user chunhui-shi commented on the pull request:

https://github.com/apache/drill/pull/485#issuecomment-212017670
  
Previous pull request and comments: https://github.com/apache/drill/pull/408


> Skew in hash distribution
> -
>
> Key: DRILL-4237
> URL: https://issues.apache.org/jira/browse/DRILL-4237
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Chunhui Shi
>
> Apparently, the fix in DRILL-4119 did not fully resolve the data skew issue.  
> It worked fine on the smaller sample of the data set but on another sample of 
> the same data set, it still produces skewed values - see below the hash 
> values which are all odd numbers. 
> {noformat}
> 0: jdbc:drill:zk=local> select columns[0], hash32(columns[0]) from `test.csv` 
> limit 10;
> +---+--+
> |  EXPR$0   |EXPR$1|
> +---+--+
> | f71aaddec3316ae18d43cb1467e88a41  | 1506011089   |
> | 3f3a13bb45618542b5ac9d9536704d3a  | 1105719049   |
> | 6935afd0c693c67bba482cedb7a2919b  | -18137557|
> | ca2a938d6d7e57bda40501578f98c2a8  | -1372666789  |
> | fab7f08402c8836563b0a5c94dbf0aec  | -1930778239  |
> | 9eb4620dcb68a84d17209da279236431  | -970026001   |
> | 16eed4a4e801b98550b4ff504242961e  | 356133757|
> | a46f7935fea578ce61d8dd45bfbc2b3d  | -94010449|
> | 7fdf5344536080c15deb2b5a2975a2b7  | -141361507   |
> | b82560a06e2e51b461c9fe134a8211bd  | -375376717   |
> +---+--+
> {noformat}
> This indicates an underlying issue with the XXHash64 java implementation, 
> which is Drill's implementation of the C version.  One of the key difference 
> as pointed out by [~jnadeau] was the use of unsigned int64 in the C version 
> compared to the Java version which uses (signed) long.  I created an XXHash 
> version using com.google.common.primitives.UnsignedLong.  However, 
> UnsignedLong does not have bit-wise operations that are needed for XXHash 
> such as rotateLeft(),  XOR etc.  One could write wrappers for these but at 
> this point, the question is: should we think of an alternative hash function 
> ? 
> The alternative approach could be the murmur hash for numeric data types that 
> we were using earlier and the Mahout version of hash function for string 
> types 
> (https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/HashHelper.java#L28).
>   As a test, I reverted to this function and was getting good hash 
> distribution for the test data. 
> I could not find any performance comparisons of our perf tests (TPC-H or DS) 
> with the original and newer (XXHash) hash functions.  If performance is 
> comparable, should we revert to the original function ?  
> As an aside, I would like to remove the hash64 versions of the functions 
> since these are not used anywhere. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4615) Support directory names in schema

2016-04-19 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248163#comment-15248163
 ] 

Steven Phillips commented on DRILL-4615:


It seems what you are describing is an alternative way of interpreting 
directory attributes. Drill's current approach is to create the columns dir0, 
dir1, etc, which contain the string value of the directory names. These column 
names and values are currently used in two different places in drill. The first 
is for partition pruning during the planning stage, and then in the columns are 
materialized during the actual execution of the scan. You can see examples of 
these uses in the classes: FileSystemPartitionDescriptor, and 
ParquetScanBatchCreator.

We should probably refactor and make abstract the code which materializes the 
partition column names and values into some sort of Attribute Provider, and 
then we could implement an alternate version which interprets the directories 
the way Spark and Hive do.

If this is something you are interested in working on, I can help out.

> Support directory names in schema
> -
>
> Key: DRILL-4615
> URL: https://issues.apache.org/jira/browse/DRILL-4615
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jesse Yates
>
> In Spark, partitioned parquet output is written with directories like:
> {code}
> /column1=1
>   /column2=hello
>  /data.parquet
>   /column2=world
>  /moredata.parquet
> /column1=2
> {code}
> However, when querying these files with Drill we end up interpreting the 
> directories as strings when what they really are is column names + values. In 
> the data files we only have the remaining columns. Querying this with drill 
> means that you can really only have a couple of data types (far short of what 
> spark/parquet supports) in the column and still have correct operations.
> Given the size of the data, I don't want to have to CTAS all the parquet 
> files (especially as they are being periodically updated). 
> I think this ends up being a nice addition for general file directory reads 
> as well since many people already encode meaning into their directory 
> structure, but having self describing directories is even better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4619) Provide bootstrap-cluster-options.json functionality similar to bootstrap-storage-plugins

2016-04-19 Thread John Omernik (JIRA)
John Omernik created DRILL-4619:
---

 Summary: Provide bootstrap-cluster-options.json functionality 
similar to bootstrap-storage-plugins
 Key: DRILL-4619
 URL: https://issues.apache.org/jira/browse/DRILL-4619
 Project: Apache Drill
  Issue Type: Wish
  Components:  Server
Affects Versions: 1.6.0
Reporter: John Omernik


Per https://drill.apache.org/docs/plugin-configuration-basics/
 
bootstrap-storage-plugins.json is a file that allows administrators to provide 
a base storage plugin configuration on instantiation of a new cluster. This 
file is only created on the first initialization of a drill cluster, and per 
the documentation is ignored. (This allows you to create an initial set of 
storage plugins, and then alter them without having them clobbered by the 
bootstrap file). 

This JIRA is about adding a "bootstrap-cluster-options.json" file that can 
provide a similar capability, but instead of creating storage plugins, sets any 
cluster wide settings, once.  Basically, if this file exists in the class path, 
then on registration into Zookeeper, any settings specified in this file will 
be updated in the master cluster settings.  

Like the storage-plugin feature, after the cluster is initialized, this file 
has no more use, and is ignored on all future drillbit startups (so that 
settings changed manually don't get changed).  There are many uses for this 

* Enabling and configuring Multi-tenancy on a new Drill Cluster
* Configuring Default Resource Manager options for your Cluster
* Changing defaults on storage plugin options
  * json read options etc
  * Parquet options
  * etc
* Configuring admin users or groups for your drill cluster
* Updating cluster defaults for compression, storage format etc
* Many more

The implementation should be similar to the storage plugin, and I am guessing 
should use HOCON similar to the drill-override for consistency in naming.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)