[jira] [Commented] (DRILL-3201) Drill UI Authentication

2016-01-11 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092749#comment-15092749
 ] 

Jacques Nadeau commented on DRILL-3201:
---

LGTM

> Drill UI Authentication
> ---
>
> Key: DRILL-3201
> URL: https://issues.apache.org/jira/browse/DRILL-3201
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Affects Versions: 1.0.0
> Environment: Drill 1.0.0
>Reporter: Rajkumar Singh
>Assignee: Venki Korukanti
>  Labels: features
> Fix For: 1.5.0
>
>
> DRILL UI don't have authentication feature thats why any user can cancel the 
> running query or can change the storage plugin configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-3201) Drill UI Authentication

2016-01-11 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092749#comment-15092749
 ] 

Jacques Nadeau edited comment on DRILL-3201 at 1/11/16 9:49 PM:


LGTM +1


was (Author: jnadeau):
LGTM

> Drill UI Authentication
> ---
>
> Key: DRILL-3201
> URL: https://issues.apache.org/jira/browse/DRILL-3201
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Affects Versions: 1.0.0
> Environment: Drill 1.0.0
>Reporter: Rajkumar Singh
>Assignee: Venki Korukanti
>  Labels: features
> Fix For: 1.5.0
>
>
> DRILL UI don't have authentication feature thats why any user can cancel the 
> running query or can change the storage plugin configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4255) SELECT DISTINCT query over JSON data returns UNSUPPORTED OPERATION

2016-01-11 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091746#comment-15091746
 ] 

Khurram Faraaz commented on DRILL-4255:
---

In another similar scenario, below query FAILS with UNSUPPORTED_OPERATION error 
when there is an empty JSON file in the directory.

And when that empty JSON file is removed, and the the same query is submitted, 
it executes fine. So it seems it has to do something with the way we handle 
empty JSON files where are many non-empty JSON files in that directory.

{noformat}

[root@centos-01 ~]# hadoop fs -put empty.json /tmp/MD_332/

0: jdbc:drill:schema=dfs.tmp> select DISTINCT charKey FROM `MD_332`;
Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema 
changes

Fragment 3:0

[Error Id: 78dd06c7-1914-4ec1-88f0-7cb42234b357 on centos-01.qa.lab:31010] 
(state=,code=0)

# Remove empty.son file
[root@centos-01 ~]# hadoop fs -rmr /tmp/MD_332/empty.json

$ run query it runs fine, once the empty.son file is removed.
0: jdbc:drill:schema=dfs.tmp> select DISTINCT charKey FROM `MD_332`;
+--+
| charKey  |
+--+
| MA   |
| WA   |
| WI   |
| AL   |
| AZ   |
| MD   |
| NV   |
| IN   |
| GA   |
| MO   |
| VT   |
| CA   |
| KY   |
| OH   |
| ND   |
| OK   |
| OR   |
| FL   |
| NM   |
| MS   |
| UT   |
| CT   |
| DE   |
| TN   |
| SC   |
| NH   |
| RI   |
| NJ   |
| ME   |
| MI   |
| LA   |
| CO   |
| ID   |
| VA   |
| PA   |
| KS   |
| NC   |
| MN   |
| WY   |
| IA   |
| NY   |
| NE   |
| MT   |
| WV   |
| IL   |
| TX   |
| SD   |
| HI   |
| AK   |
+--+
49 rows selected (3.595 seconds)

{noformat}

> SELECT DISTINCT query over JSON data returns UNSUPPORTED OPERATION
> --
>
> Key: DRILL-4255
> URL: https://issues.apache.org/jira/browse/DRILL-4255
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.4.0
> Environment: CentOS
>Reporter: Khurram Faraaz
>
> SELECT DISTINCT over mapr fs generated audit logs (JSON files) results in 
> unsupported operation. An exact query over another set of JSON data returns 
> correct results.
> MapR Drill 1.4.0, commit ID : 9627a80f
> MapRBuildVersion : 5.1.0.36488.GA
> OS : CentOS x86_64 GNU/Linux
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select distinct t.operation from `auditlogs` t;
> Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema 
> changes
> Fragment 3:3
> [Error Id: 1233bf68-13da-4043-a162-cf6d98c07ec9 on example.com:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-01-08 11:35:35,093 [297060f9-1c7a-b32c-09e8-24b5ad863e73:frag:3:3] INFO  
> o.a.d.e.p.i.aggregate.HashAggBatch - User Error Occurred
> org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION 
> ERROR: Hash aggregate does not support schema changes
> [Error Id: 1233bf68-13da-4043-a162-cf6d98c07ec9 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:144)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> 

[jira] [Commented] (DRILL-4047) Select with options

2016-01-11 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093487#comment-15093487
 ] 

Khurram Faraaz commented on DRILL-4047:
---

[~ julienledem] Can you please provide a specification document that can be 
used to come up with a test plan to verify and test this feature ?

> Select with options
> ---
>
> Key: DRILL-4047
> URL: https://issues.apache.org/jira/browse/DRILL-4047
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 1.4.0
>
>
> Add a mechanism to pass parameters down to the StoragePlugin when writing a 
> Select statement.
> Some discussion here:
> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4006) As json reader reads a field with empty lists, IOOB could happen

2016-01-11 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093298#comment-15093298
 ] 

Khurram Faraaz commented on DRILL-4006:
---

Tests added to Functional/json/json_storage/drill_4006_b.q , and 
Functional/json/json_storage/drill_4006.q

> As json reader reads a field with empty lists, IOOB could happen
> 
>
> Key: DRILL-4006
> URL: https://issues.apache.org/jira/browse/DRILL-4006
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.3.0
>
> Attachments: a.json, b.json, c.json
>
>
> If a field in a json file has many empty lists before a non-empty list, there 
> could be an IOOB exception.
> Running the following query on the folder with files in the attachment can 
> reproduce the observation:
> {code}
> select a from`folder`
> {code}
> Exception:
> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: 
> index: 4448, length: 4 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4261) add support for RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

2016-01-11 Thread Deneche A. Hakim (JIRA)
Deneche A. Hakim created DRILL-4261:
---

 Summary: add support for RANGE BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING
 Key: DRILL-4261
 URL: https://issues.apache.org/jira/browse/DRILL-4261
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Deneche A. Hakim
Assignee: Deneche A. Hakim






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4262) add support for [RANGE | ROWS] BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

2016-01-11 Thread Deneche A. Hakim (JIRA)
Deneche A. Hakim created DRILL-4262:
---

 Summary: add support for [RANGE | ROWS] BETWEEN UNBOUNDED 
PRECEDING AND CURRENT ROW
 Key: DRILL-4262
 URL: https://issues.apache.org/jira/browse/DRILL-4262
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Deneche A. Hakim
Assignee: Deneche A. Hakim
 Fix For: Future






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4260) Adding support for custom window frames

2016-01-11 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim updated DRILL-4260:

Fix Version/s: Future

> Adding support for custom window frames
> ---
>
> Key: DRILL-4260
> URL: https://issues.apache.org/jira/browse/DRILL-4260
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.5.0
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
> Fix For: Future
>
>
> Current implementation of window functions (<1.6) only supports the default 
> frame. We want to add support for the FRAME clause. 
> This is an umbrella task to track the progress while adding all remaining 
> frames.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4263) add support for RANGE BETWEEN CURRENT ROW AND CURRENT ROW

2016-01-11 Thread Deneche A. Hakim (JIRA)
Deneche A. Hakim created DRILL-4263:
---

 Summary: add support for RANGE BETWEEN CURRENT ROW AND CURRENT ROW
 Key: DRILL-4263
 URL: https://issues.apache.org/jira/browse/DRILL-4263
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Deneche A. Hakim
Assignee: Deneche A. Hakim






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4237) Skew in hash distribution

2016-01-11 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092318#comment-15092318
 ] 

Jacques Nadeau commented on DRILL-4237:
---

I think even the google ones do Java object creation. I believe that all the 
hashing functions should be without any object allocation (and definitely no 
memcpy's). At first glance, the hadoop one looks to be operating 4 bytes at a 
time (at least the random java code I found that could be outdated). I think we 
want something that operates 8 bytes at a time.

> Skew in hash distribution
> -
>
> Key: DRILL-4237
> URL: https://issues.apache.org/jira/browse/DRILL-4237
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>
> Apparently, the fix in DRILL-4119 did not fully resolve the data skew issue.  
> It worked fine on the smaller sample of the data set but on another sample of 
> the same data set, it still produces skewed values - see below the hash 
> values which are all odd numbers. 
> {noformat}
> 0: jdbc:drill:zk=local> select columns[0], hash32(columns[0]) from `test.csv` 
> limit 10;
> +---+--+
> |  EXPR$0   |EXPR$1|
> +---+--+
> | f71aaddec3316ae18d43cb1467e88a41  | 1506011089   |
> | 3f3a13bb45618542b5ac9d9536704d3a  | 1105719049   |
> | 6935afd0c693c67bba482cedb7a2919b  | -18137557|
> | ca2a938d6d7e57bda40501578f98c2a8  | -1372666789  |
> | fab7f08402c8836563b0a5c94dbf0aec  | -1930778239  |
> | 9eb4620dcb68a84d17209da279236431  | -970026001   |
> | 16eed4a4e801b98550b4ff504242961e  | 356133757|
> | a46f7935fea578ce61d8dd45bfbc2b3d  | -94010449|
> | 7fdf5344536080c15deb2b5a2975a2b7  | -141361507   |
> | b82560a06e2e51b461c9fe134a8211bd  | -375376717   |
> +---+--+
> {noformat}
> This indicates an underlying issue with the XXHash64 java implementation, 
> which is Drill's implementation of the C version.  One of the key difference 
> as pointed out by [~jnadeau] was the use of unsigned int64 in the C version 
> compared to the Java version which uses (signed) long.  I created an XXHash 
> version using com.google.common.primitives.UnsignedLong.  However, 
> UnsignedLong does not have bit-wise operations that are needed for XXHash 
> such as rotateLeft(),  XOR etc.  One could write wrappers for these but at 
> this point, the question is: should we think of an alternative hash function 
> ? 
> The alternative approach could be the murmur hash for numeric data types that 
> we were using earlier and the Mahout version of hash function for string 
> types 
> (https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/HashHelper.java#L28).
>   As a test, I reverted to this function and was getting good hash 
> distribution for the test data. 
> I could not find any performance comparisons of our perf tests (TPC-H or DS) 
> with the original and newer (XXHash) hash functions.  If performance is 
> comparable, should we revert to the original function ?  
> As an aside, I would like to remove the hash64 versions of the functions 
> since these are not used anywhere. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4190) Don't hold on to batches from left side of merge join

2016-01-11 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092303#comment-15092303
 ] 

Jason Altekruse commented on DRILL-4190:


Hakim merged his fix for DRILL-4243, [~aah] can you rebase your patch and 
update the PR?

> Don't hold on to batches from left side of merge join
> -
>
> Key: DRILL-4190
> URL: https://issues.apache.org/jira/browse/DRILL-4190
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
>Reporter: Victoria Markman
>Assignee: amit hadke
>Priority: Blocker
> Attachments: 2990f5f8-ec64-1223-c1d8-97dd7e601cee.sys.drill, 
> exception.log, query3.sql
>
>
> TPCDS queries with the latest 1.4.0 release when hash join is disabled:
> 22 queries fail with out of memory 
> 2 wrong results (I did not validate the nature of wrong result yet)
> Only query97.sql is a legitimate failure: we don't support full outer join 
> with the merge join.
> It is important to understand what has changed between 1.2.0 and 1.4.0 that 
> made these tests not runnable with the same configuration. 
> Same tests with the same drill configuration pass in 1.2.0 release.
> (I hope I did not make a mistake somewhere in my cluster setup :))
> {code}
> 0: jdbc:drill:schema=dfs> select * from sys.version;
> +-+---+-++--++
> | version | commit_id |   
> commit_message|commit_time
>  | build_email  | build_time |
> +-+---+-++--++
> | 1.4.0-SNAPSHOT  | b9068117177c3b47025f52c00f67938e0c3e4732  | DRILL-4165 
> Add a precondition for size of merge join record batch.  | 08.12.2015 @ 
> 01:25:34 UTC  | Unknown  | 08.12.2015 @ 03:36:25 UTC  |
> +-+---+-++--++
> 1 row selected (2.211 seconds)
> Execution Failures:
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query50.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query33.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query74.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query68.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query34.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query21.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query46.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query91.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query59.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query3.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query66.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query84.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query97.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query19.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query96.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query43.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query15.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query2.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query60.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query79.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query73.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query45.sql
> Verification Failures
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query52.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query40.sql
> Timeout Failures
> 

[jira] [Commented] (DRILL-4237) Skew in hash distribution

2016-01-11 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092299#comment-15092299
 ] 

Aman Sinha commented on DRILL-4237:
---

I assume we are mainly talking about the murmur hash for varchar and varbinary. 
 For the numeric types,  it should be ok to use  
{{com.google.common.hash.Hashing.murmur3_128().hashInt()}} etc.For the 
varchar and varbinary,  there is 
{{com.google.common.hash.Hashing.murmur3_128().hashBytes()}}, however it takes 
a byte[]  array as argument.   Creating a byte[] from the DrillBuf requires 
doing a new allocation which we want to avoid (is that correct ?).  Note that 
ByteBuffer.array() returns a byte[] array but it is not supported unless there 
is a backing array.  I agree that ideally, we want to directly point to the 
memory address in the ByteBuffer and use start, end points similar to XXHash 
for strings.   [~jnadeau] what exact hash function from murmur hash should we 
be modifying ?  For 32-bit hashing,  can we modify the relatively simple 
{{hadoop.util.hash.MurmumrHash}} ?  Or do we want the more complex 128 bit 
version ? 

> Skew in hash distribution
> -
>
> Key: DRILL-4237
> URL: https://issues.apache.org/jira/browse/DRILL-4237
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>
> Apparently, the fix in DRILL-4119 did not fully resolve the data skew issue.  
> It worked fine on the smaller sample of the data set but on another sample of 
> the same data set, it still produces skewed values - see below the hash 
> values which are all odd numbers. 
> {noformat}
> 0: jdbc:drill:zk=local> select columns[0], hash32(columns[0]) from `test.csv` 
> limit 10;
> +---+--+
> |  EXPR$0   |EXPR$1|
> +---+--+
> | f71aaddec3316ae18d43cb1467e88a41  | 1506011089   |
> | 3f3a13bb45618542b5ac9d9536704d3a  | 1105719049   |
> | 6935afd0c693c67bba482cedb7a2919b  | -18137557|
> | ca2a938d6d7e57bda40501578f98c2a8  | -1372666789  |
> | fab7f08402c8836563b0a5c94dbf0aec  | -1930778239  |
> | 9eb4620dcb68a84d17209da279236431  | -970026001   |
> | 16eed4a4e801b98550b4ff504242961e  | 356133757|
> | a46f7935fea578ce61d8dd45bfbc2b3d  | -94010449|
> | 7fdf5344536080c15deb2b5a2975a2b7  | -141361507   |
> | b82560a06e2e51b461c9fe134a8211bd  | -375376717   |
> +---+--+
> {noformat}
> This indicates an underlying issue with the XXHash64 java implementation, 
> which is Drill's implementation of the C version.  One of the key difference 
> as pointed out by [~jnadeau] was the use of unsigned int64 in the C version 
> compared to the Java version which uses (signed) long.  I created an XXHash 
> version using com.google.common.primitives.UnsignedLong.  However, 
> UnsignedLong does not have bit-wise operations that are needed for XXHash 
> such as rotateLeft(),  XOR etc.  One could write wrappers for these but at 
> this point, the question is: should we think of an alternative hash function 
> ? 
> The alternative approach could be the murmur hash for numeric data types that 
> we were using earlier and the Mahout version of hash function for string 
> types 
> (https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/HashHelper.java#L28).
>   As a test, I reverted to this function and was getting good hash 
> distribution for the test data. 
> I could not find any performance comparisons of our perf tests (TPC-H or DS) 
> with the original and newer (XXHash) hash functions.  If performance is 
> comparable, should we revert to the original function ?  
> As an aside, I would like to remove the hash64 versions of the functions 
> since these are not used anywhere. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4246) New allocator causing a flatten regression test to fail with IllegalStateException

2016-01-11 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092326#comment-15092326
 ] 

Deneche A. Hakim commented on DRILL-4246:
-

This may be related: {{TestFlatten.testFlatten_Drill2162_simple}} is failing 
with a similar error:
{noformat}
testFlatten_Drill2162_simple(org.apache.drill.exec.physical.impl.flatten.TestFlatten)
  Time elapsed: 0.244 sec  <<< ERROR!
org.apache.drill.exec.rpc.RpcException: 
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
IllegalStateException: allocator[op:0:0:0:Screen]: buffer space (0) + prealloc 
space (0) + child space (0) != allocated (1048576)

Fragment 0:0

[Error Id: c8059885-a6c4-4e6c-81f5-324e83390c94 on 172.30.1.113:31013]

  (java.lang.IllegalStateException) allocator[op:0:0:0:Screen]: buffer space 
(0) + prealloc space (0) + child space (0) != allocated (1048576)
org.apache.drill.exec.memory.BaseAllocator.verifyAllocator():653
org.apache.drill.exec.memory.BaseAllocator.verifyAllocator():528
org.apache.drill.exec.memory.BaseAllocator.close():419
org.apache.drill.exec.ops.OperatorContextImpl.close():124
org.apache.drill.exec.ops.FragmentContext.suppressingClose():416
org.apache.drill.exec.ops.FragmentContext.close():405
org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():346
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():179
org.apache.drill.exec.work.fragment.FragmentExecutor.run():290
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1145
java.util.concurrent.ThreadPoolExecutor$Worker.run():615
java.lang.Thread.run():745

at 
org.apache.drill.exec.rpc.RpcException.mapException(RpcException.java:60)
at 
org.apache.drill.exec.client.DrillClient$ListHoldingResultsListener.getResults(DrillClient.java:424)
at 
org.apache.drill.exec.client.DrillClient.runQuery(DrillClient.java:321)
at 
org.apache.drill.BaseTestQuery.testRunAndReturn(BaseTestQuery.java:318)
at 
org.apache.drill.DrillTestWrapper.compareUnorderedResults(DrillTestWrapper.java:316)
at org.apache.drill.DrillTestWrapper.run(DrillTestWrapper.java:126)
at org.apache.drill.TestBuilder.go(TestBuilder.java:129)
at 
org.apache.drill.exec.physical.impl.flatten.TestFlatten.testFlatten_Drill2162_simple(TestFlatten.java:189)
{noformat}

> New allocator causing a flatten regression test to fail with 
> IllegalStateException
> --
>
> Key: DRILL-4246
> URL: https://issues.apache.org/jira/browse/DRILL-4246
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.5.0
>Reporter: Deneche A. Hakim
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> We are seeing the following error in the test cluster:
> {noformat}
> /framework/resources/Functional/flatten_operators/10rows/filter3.q
> Query: 
> select uid, flatten(events) from `data.json` where uid > 1
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Unaccounted for 
> outstanding allocation (851968)
> Allocator(op:0:0:0:Screen) 100/851968/1941504/100 
> (res/actual/peak/limit)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4243) CTAS with partition by, results in Out Of Memory

2016-01-11 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092360#comment-15092360
 ] 

Jason Altekruse commented on DRILL-4243:


I opened a new JIRA for enhancing the tests to make sure they will reliably 
fail in other environments, have linked this issue to it.

> CTAS with partition by, results in Out Of Memory
> 
>
> Key: DRILL-4243
> URL: https://issues.apache.org/jira/browse/DRILL-4243
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.5.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
>
> CTAS with partition by, results in Out Of Memory. It seems to be coming from 
> ExternalSortBatch
> Details of Drill are
> {noformat}
> version   commit_id   commit_message  commit_time build_email 
> build_time
> 1.5.0-SNAPSHOTe4372f224a4b474494388356355a53808092a67a
> DRILL-4242: Updates to storage-mongo03.01.2016 @ 15:31:13 PST   
> Unknown 04.01.2016 @ 01:02:29 PST
>  create table `tpch_single_partition/lineitem` partition by (l_moddate) as 
> select l.*, l_shipdate - extract(day from l_shipdate) + 1 l_moddate from 
> cp.`tpch/lineitem.parquet` l;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while 
> executing the query.
> Fragment 0:0
> [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010] 
> (state=,code=0)
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> Fragment 0:0
> [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010]
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1923)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:73)
>   at 
> net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
>   at 
> net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351)
>   at 
> net.hydromatic.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:338)
>   at 
> net.hydromatic.avatica.AvaticaStatement.execute(AvaticaStatement.java:69)
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.execute(DrillStatementImpl.java:101)
>   at sqlline.Commands.execute(Commands.java:841)
>   at sqlline.Commands.sql(Commands.java:751)
>   at sqlline.SqlLine.dispatch(SqlLine.java:746)
>   at sqlline.SqlLine.runCommands(SqlLine.java:1651)
>   at sqlline.Commands.run(Commands.java:1304)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
>   at sqlline.SqlLine.dispatch(SqlLine.java:742)
>   at sqlline.SqlLine.initArgs(SqlLine.java:553)
>   at sqlline.SqlLine.begin(SqlLine.java:596)
>   at sqlline.SqlLine.start(SqlLine.java:375)
>   at sqlline.SqlLine.main(SqlLine.java:268)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: RESOURCE 
> ERROR: One or more nodes ran out of memory while executing the query.
> Fragment 0:0
> [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:69)
>   at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:400)
>   at 
> org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105)
>   at 
> org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:264)
>   at 
> org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:142)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:298)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:269)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 

[jira] [Created] (DRILL-4259) Add new functional tests to ensure that failures can be detected independent of the testing environment

2016-01-11 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4259:
--

 Summary: Add new functional tests to ensure that failures can be 
detected independent of the testing environment
 Key: DRILL-4259
 URL: https://issues.apache.org/jira/browse/DRILL-4259
 Project: Apache Drill
  Issue Type: Test
Reporter: Jason Altekruse


In DRILL-4243 an out of memory issue was fixed after a change to the memory 
allocator made memory limits more strict. While the regression tests had been 
run by the team at Dremio prior to merging the patch, running the tests on a 
cluster with more cores changed the memory limits on the queries and caused 
several tests to fail.

While changes of this magnitude are not going to be common, we should have a 
test suite that reliably fails independent of the environment it is run 
(assuming that there are sufficient resources for the tests to run).

It would be good to at least try to reproduce this failure on a few different 
setups (cores, nodes in cluster) by adjusting available configuration options 
and adding tests with those different configurations so that the tests will 
fail in different environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4256) Performance regression in hive planning

2016-01-11 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092422#comment-15092422
 ] 

Venki Korukanti commented on DRILL-4256:


Commit ({{76f41e18}}) should only affect when native reader is enabled. Not 
sure why it is causing regression in this case. Is the environment same for 
both the tests? Also can you run jstack while the query is running and check 
the callstack?

> Performance regression in hive planning
> ---
>
> Key: DRILL-4256
> URL: https://issues.apache.org/jira/browse/DRILL-4256
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Rahul Challapalli
>
> Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> The fix for reading hive tables backed by hbase caused a performance 
> regression. The data set used in the below test has ~3700 partitions and the 
> filter in the query would ensure only 1 partition get selected.
> {code}
> Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~25 seconds
> {code}
> {code}
> Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~6.5 seconds
> {code}
> Since the data is large, I couldn't attach it here. Reach out to me if you 
> need additional information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)