Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet

2015-05-21 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34473/
---

(Updated May 21, 2015, 7:45 a.m.)


Review request for hive, Alan Gates and Sergio Pena.


Changes
---

Summary:
1. fix code style issues
2. remove codes irrelevant to the insert statement
3. fix one issue about SetParquetSchema from previous patch


Bugs: HIVE-10749
https://issues.apache.org/jira/browse/HIVE-10749


Repository: hive-git


Description
---

Implement the insert statement for parquet format.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 
c6fb26c 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 f513572 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java
 PRE-CREATION 
  ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION 
  ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/34473/diff/


Testing
---

Newly added qtest and UT passed locally


Thanks,

cheng xu



Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet

2015-05-21 Thread cheng xu


 On May 20, 2015, 8:45 p.m., Alexander Pivovarov wrote:
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java,
   line 207
  https://reviews.apache.org/r/34473/diff/1/?file=965270#file965270line207
 
  you can use
  final ArrayListObject list = new 
  ArrayListObject(Collections.nCopies(fields.size(), null));
  instead

I don't think so because only in the insert statement, we can't understand how 
to inspect the row object until creating parquet writer. This is why I create 
the new constructor in ParquetStructObjectInspector. Thank yoU!


- cheng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34473/#review84574
---


On May 20, 2015, 2:54 p.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34473/
 ---
 
 (Updated May 20, 2015, 2:54 p.m.)
 
 
 Review request for hive, Alan Gates and Sergio Pena.
 
 
 Bugs: HIVE-10749
 https://issues.apache.org/jira/browse/HIVE-10749
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Implement the insert statement for parquet format.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
  000eb38 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
  8380117 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java
  4e1820c 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRawRecordMerger.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java
  43c772f 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
  0a5edbb 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java
  0d32e49 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java
  5f7f597 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java
  PRE-CREATION 
   ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION 
   ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34473/diff/
 
 
 Testing
 ---
 
 Newly added qtest and UT passed locally
 
 
 Thanks,
 
 cheng xu
 




Re: Review Request 33956: HIVE-9614: Encrypt mapjoin tables

2015-05-21 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33956/#review84671
---


Thank you for this patch. I have some questions and will have another round of 
review after understanding these questions. Thank you!


common/src/java/org/apache/hive/common/util/HdfsEncryptionUtilities.java
https://reviews.apache.org/r/33956/#comment136026

Why not use the isPathEncrypted from HdfsEncryptionShim directly?



common/src/java/org/apache/hive/common/util/HdfsEncryptionUtilities.java
https://reviews.apache.org/r/33956/#comment136027

The same as above.



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/33956/#comment136025

Is it possible to get the FsPermission from 
org.apache.hadoop.fs.FileContext?



ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
https://reviews.apache.org/r/33956/#comment136022

I am a little confused here. How can a local path be converted to a hdfs 
path? The original code is trying to create a tar file from a local path and 
uploading it to the hdfs with replication information. The new code path will 
lose the replication information. And the previous code path will only be 
executed in a local file or pfile schema in test.



ql/src/test/queries/clientpositive/encryption_map_join_select.q
https://reviews.apache.org/r/33956/#comment136021

drop table encryptedTable PURGE;


- cheng xu


On May 7, 2015, 9:23 p.m., Sergio Pena wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33956/
 ---
 
 (Updated May 7, 2015, 9:23 p.m.)
 
 
 Review request for hive, Brock Noland and cheng xu.
 
 
 Bugs: HIVE-9614
 https://issues.apache.org/jira/browse/HIVE-9614
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The security issue here is that encrypted tables used on MAP-JOIN queries, 
 and stored on the distribute cache, are first copied to the client local 
 filesystem in an unencrypted form in order to compress it there.
 
 This patch avoids the local copy if the table is encrypted on HDFS. It keeps 
 the hash table on HDFS, compresses the table in HDFS, and then adds it to the 
 distributed cache.
 
 Files that are copied to the datanodes by the distributed cache are still 
 unencrypted. This is a limitation we have from HDFS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/common/CompressionUtils.java 
 0e0d538c2faf1c52c4d8378df013294ae4efa41c 
   common/src/java/org/apache/hive/common/util/HdfsEncryptionUtilities.java 
 PRE-CREATION 
   itests/src/test/resources/testconfiguration.properties 
 3eff7d010923a4e07d5024904f1531ca52473aa2 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 
 ad5c8f8302de2a15b1703161799f71cd81a94475 
   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 
 d7a08ecf1c183fe56b5ca41c2c69d413874418bb 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 
 4d84f0f76ce17711077ceadf23e6b9ed12e6a414 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java 
 c0a72b69df3871bbcc870af286774aee5269668b 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
 cbc5466261f749fe7b84d7533dc0ff3274b6777f 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 
 82143a64db163da766dcc138231b4d4174603470 
   ql/src/test/queries/clientpositive/encryption_map_join_select.q 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/encrypted/encryption_map_join_select.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/33956/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sergio Pena
 




Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument

2015-05-21 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34393/
---

(Updated May 21, 2015, 6:44 a.m.)


Review request for hive.


Bugs: HIVE-10427
https://issues.apache.org/jira/browse/HIVE-10427


Repository: hive-git


Description (updated)
---

Currently for collect_list() and collect_set(), only primitive types are 
supported. This patch adds support for struct, list and map types as well.

It turned out I that all I need is loosen the type checking.


Diffs
-

  data/files/customers.txt PRE-CREATION 
  data/files/nested_orders.txt PRE-CREATION 
  data/files/orders.txt PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 
536c4a7 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 
6dc424a 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java
 efcc8f5 
  ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION 
  ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/34393/diff/


Testing (updated)
---

All but one test (which seems unrelated) are passing.
I also added a test: udaf_collect_list_set_2.q


Thanks,

Chao Sun



Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument

2015-05-21 Thread Lenni Kuff

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34393/#review84669
---

Ship it!


lgtm - I assume this works with decimal (with scale/precision) and 
char/varchar? Maybe add one test case for those?

- Lenni Kuff


On May 21, 2015, 6:44 a.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34393/
 ---
 
 (Updated May 21, 2015, 6:44 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-10427
 https://issues.apache.org/jira/browse/HIVE-10427
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Currently for collect_list() and collect_set(), only primitive types are 
 supported. This patch adds support for struct, list and map types as well.
 
 It turned out I that all I need is loosen the type checking.
 
 
 Diffs
 -
 
   data/files/customers.txt PRE-CREATION 
   data/files/nested_orders.txt PRE-CREATION 
   data/files/orders.txt PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 
 536c4a7 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 
 6dc424a 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java
  efcc8f5 
   ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION 
   ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34393/diff/
 
 
 Testing
 ---
 
 All but one test (which seems unrelated) are passing.
 I also added a test: udaf_collect_list_set_2.q
 
 
 Thanks,
 
 Chao Sun
 




[jira] [Created] (HIVE-10783) Support MINUS set operation

2015-05-21 Thread sanjiv singh (JIRA)
sanjiv singh created HIVE-10783:
---

 Summary: Support MINUS set operation 
 Key: HIVE-10783
 URL: https://issues.apache.org/jira/browse/HIVE-10783
 Project: Hive
  Issue Type: Improvement
  Components: SQL
Reporter: sanjiv singh


Support a MINUS operation  as qb1 MINUS qb2 .


Common requirement of queries to project two sets where you want result set 
which contains only unique rows returned by the first query but not by the 
second.


The following sample statement combines results with the MINUS operator, which 
returns only unique rows returned by the first query but not by the second:

SELECT * FROM tableA
MINUS
SELECT * FROM tableB;

current exception:
FAILED: ParseException line *:** missing EOF at 'SELECT' near 'MINUS'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10780) Support INTERSECTION set operation

2015-05-21 Thread sanjiv singh (JIRA)
sanjiv singh created HIVE-10780:
---

 Summary: Support INTERSECTION set operation 
 Key: HIVE-10780
 URL: https://issues.apache.org/jira/browse/HIVE-10780
 Project: Hive
  Issue Type: Improvement
  Components: SQL
Reporter: sanjiv singh


Support a INTERSECTION operation  as qb1 INTERSECTION qb2 .


Common requirement of queries to project two sets where you want result set 
which contains only those rows returned by both queries.


The following sample statement combines the results with the INTERSECT 
operator, which returns only those rows returned by both queries:

SELECT * FROM tableA
INTERSECT
SELECT * FROM tableB;

current exception:
FAILED: ParseException line *:** missing EOF at 'SELECT' near 'INTERSECT'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10781) HadoopJobExecHelper Leaks RunningJobs

2015-05-21 Thread Nemon Lou (JIRA)
Nemon Lou created HIVE-10781:


 Summary: HadoopJobExecHelper Leaks RunningJobs
 Key: HIVE-10781
 URL: https://issues.apache.org/jira/browse/HIVE-10781
 Project: Hive
  Issue Type: Bug
  Components: Hive, HiveServer2
Affects Versions: 1.2.0, 0.13.1
Reporter: Nemon Lou


On one of our busy hadoop cluster, hiveServer2 holds more than 4000 
org.apache.hadoop.mapred.JobClient$NetworkedJob instances,while only has less 
than 3 backgroud handler thread at the same time.
All these instances are hold in one LinkedList from 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper's  runningJobs 
property,which is static.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument

2015-05-21 Thread Chao Sun


 On May 19, 2015, 5:36 a.m., Lenni Kuff wrote:
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java,
   line 50
  https://reviews.apache.org/r/34393/diff/1/?file=963345#file963345line50
 
  should we also support arrays and unions?

Added support for array. union seems a bit tricky - let's make that as a follow 
up task.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34393/#review84260
---


On May 19, 2015, 4:47 a.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34393/
 ---
 
 (Updated May 19, 2015, 4:47 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-10427
 https://issues.apache.org/jira/browse/HIVE-10427
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Currently for collect_list() and collect_set(), only primitive types are 
 supported. This patch adds support for struct and map types as well.
 
 It turned out I that all I need is loosen the type checking.
 
 
 Diffs
 -
 
   data/files/customers.txt PRE-CREATION 
   data/files/nested_orders.txt PRE-CREATION 
   data/files/orders.txt PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 
 536c4a7 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 
 6dc424a 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java
  efcc8f5 
   ql/src/test/queries/clientpositive/udaf_collect_list_set_nested.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/udaf_collect_list_set_nested.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34393/diff/
 
 
 Testing
 ---
 
 All but one test (which seems unrelated) are passing.
 I also added a test: udaf_collect_list_set_nested.q
 
 
 Thanks,
 
 Chao Sun
 




Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument

2015-05-21 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34393/
---

(Updated May 21, 2015, 6:44 a.m.)


Review request for hive.


Changes
---

Addressing RB comments.


Bugs: HIVE-10427
https://issues.apache.org/jira/browse/HIVE-10427


Repository: hive-git


Description
---

Currently for collect_list() and collect_set(), only primitive types are 
supported. This patch adds support for struct and map types as well.

It turned out I that all I need is loosen the type checking.


Diffs (updated)
-

  data/files/customers.txt PRE-CREATION 
  data/files/nested_orders.txt PRE-CREATION 
  data/files/orders.txt PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 
536c4a7 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 
6dc424a 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java
 efcc8f5 
  ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION 
  ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/34393/diff/


Testing
---

All but one test (which seems unrelated) are passing.
I also added a test: udaf_collect_list_set_nested.q


Thanks,

Chao Sun



[jira] [Created] (HIVE-10779) LLAP: Daemons should shutdown in case of fatal errors

2015-05-21 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-10779:
-

 Summary: LLAP: Daemons should shutdown in case of fatal errors
 Key: HIVE-10779
 URL: https://issues.apache.org/jira/browse/HIVE-10779
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth


For example, the scheduler loop exiting. Currently they end up getting stuck - 
while still accepting new work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10782) Support EXCEPT set operation

2015-05-21 Thread sanjiv singh (JIRA)
sanjiv singh created HIVE-10782:
---

 Summary: Support EXCEPT set operation
 Key: HIVE-10782
 URL: https://issues.apache.org/jira/browse/HIVE-10782
 Project: Hive
  Issue Type: Improvement
  Components: SQL
Reporter: sanjiv singh


Support a EXCEPT operation  as qb1 EXCEPT qb2 .


Common requirement of queries to project two sets where you want result set 
which contains distinct rows from the left input query that aren’t output by 
the right input query.


The following sample statement combines the results with the INTERSECT 
operator, which returns distinct rows from the left input query that aren’t 
output by the right input query:


SELECT * FROM tableA
EXCEPT
SELECT * FROM tableB;

current exception:
FAILED: ParseException line *:** missing EOF at 'SELECT' near 'EXCEPT'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument

2015-05-21 Thread Alexander Pivovarov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34393/#review84747
---



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArray.java
https://reviews.apache.org/r/34393/#comment136093

Can you replace this if block with
checkArgsSize(arguments, min, max) ?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArray.java
https://reviews.apache.org/r/34393/#comment136095

can you remove unused imports?
import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;


- Alexander Pivovarov


On May 21, 2015, 5:30 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34393/
 ---
 
 (Updated May 21, 2015, 5:30 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-10427
 https://issues.apache.org/jira/browse/HIVE-10427
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Currently for collect_list() and collect_set(), only primitive types are 
 supported. This patch adds support for struct, list and map types as well.
 
 It turned out I that all I need is loosen the type checking.
 
 
 Diffs
 -
 
   data/files/customers.txt PRE-CREATION 
   data/files/nested_orders.txt PRE-CREATION 
   data/files/orders.txt PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 
 536c4a7 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 
 6dc424a 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java
  efcc8f5 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArray.java 
 2d6d58c 
   ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q 
 PRE-CREATION 
   ql/src/test/queries/clientnegative/udf_sort_array_wrong3.q 034de06 
   ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION 
   ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out 
 PRE-CREATION 
   ql/src/test/results/clientnegative/udf_sort_array_wrong2.q.out c068ecd 
   ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34393/diff/
 
 
 Testing
 ---
 
 All but one test (which seems unrelated) are passing.
 I also added a test: udaf_collect_list_set_2.q
 
 
 Thanks,
 
 Chao Sun
 




[jira] [Created] (HIVE-10788) Change sort_array to support non-primitive types

2015-05-21 Thread Chao Sun (JIRA)
Chao Sun created HIVE-10788:
---

 Summary: Change sort_array to support non-primitive types
 Key: HIVE-10788
 URL: https://issues.apache.org/jira/browse/HIVE-10788
 Project: Hive
  Issue Type: Bug
Reporter: Chao Sun
Assignee: Chao Sun


Currently {{sort_array}} only support primitive types. As we already support 
comparison between non-primitive types, it makes sense to remove this 
restriction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument

2015-05-21 Thread Chao Sun


 On May 21, 2015, 6:22 p.m., Alexander Pivovarov wrote:
  ql/src/test/queries/clientpositive/udaf_collect_set_2.q, line 1
  https://reviews.apache.org/r/34393/diff/3/?file=966777#file966777line1
 
  Is it necessary?

Yes, date is a reserved keyword.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34393/#review84749
---


On May 21, 2015, 5:30 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34393/
 ---
 
 (Updated May 21, 2015, 5:30 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-10427
 https://issues.apache.org/jira/browse/HIVE-10427
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Currently for collect_list() and collect_set(), only primitive types are 
 supported. This patch adds support for struct, list and map types as well.
 
 It turned out I that all I need is loosen the type checking.
 
 
 Diffs
 -
 
   data/files/customers.txt PRE-CREATION 
   data/files/nested_orders.txt PRE-CREATION 
   data/files/orders.txt PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 
 536c4a7 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 
 6dc424a 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java
  efcc8f5 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArray.java 
 2d6d58c 
   ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q 
 PRE-CREATION 
   ql/src/test/queries/clientnegative/udf_sort_array_wrong3.q 034de06 
   ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION 
   ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out 
 PRE-CREATION 
   ql/src/test/results/clientnegative/udf_sort_array_wrong2.q.out c068ecd 
   ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34393/diff/
 
 
 Testing
 ---
 
 All but one test (which seems unrelated) are passing.
 I also added a test: udaf_collect_list_set_2.q
 
 
 Thanks,
 
 Chao Sun
 




Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument

2015-05-21 Thread Alexander Pivovarov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34393/#review84749
---



ql/src/test/queries/clientpositive/udaf_collect_set_2.q
https://reviews.apache.org/r/34393/#comment136097

Is it necessary?


- Alexander Pivovarov


On May 21, 2015, 5:30 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34393/
 ---
 
 (Updated May 21, 2015, 5:30 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-10427
 https://issues.apache.org/jira/browse/HIVE-10427
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Currently for collect_list() and collect_set(), only primitive types are 
 supported. This patch adds support for struct, list and map types as well.
 
 It turned out I that all I need is loosen the type checking.
 
 
 Diffs
 -
 
   data/files/customers.txt PRE-CREATION 
   data/files/nested_orders.txt PRE-CREATION 
   data/files/orders.txt PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 
 536c4a7 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 
 6dc424a 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java
  efcc8f5 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArray.java 
 2d6d58c 
   ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q 
 PRE-CREATION 
   ql/src/test/queries/clientnegative/udf_sort_array_wrong3.q 034de06 
   ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION 
   ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out 
 PRE-CREATION 
   ql/src/test/results/clientnegative/udf_sort_array_wrong2.q.out c068ecd 
   ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34393/diff/
 
 
 Testing
 ---
 
 All but one test (which seems unrelated) are passing.
 I also added a test: udaf_collect_list_set_2.q
 
 
 Thanks,
 
 Chao Sun
 




[jira] [Created] (HIVE-10784) Beeline requires new line (EOL) at the end of an Hive SQL script (NullPointerException)

2015-05-21 Thread Andrey Dmitriev (JIRA)
Andrey Dmitriev created HIVE-10784:
--

 Summary: Beeline requires new line (EOL) at the end of an Hive SQL 
script (NullPointerException)
 Key: HIVE-10784
 URL: https://issues.apache.org/jira/browse/HIVE-10784
 Project: Hive
  Issue Type: Bug
  Components: Beeline, CLI
Affects Versions: 0.13.1
 Environment: Linux 2.6.32 (Red Hat 4.4.7)
Reporter: Andrey Dmitriev
Priority: Minor


Beeline tool requires to have new line at the end of a Hive/Impala SQL script 
otherwise the last statement will be not executed or NullPointerException will 
be thrown.

# If a statement ends without end of line AND semicolon is on the same line 
then the statement will be ignored; i.e.
{code}select * from TABLE;EOF{code} will be *not* executed
# If a statement ends without end of line BUT semicolon is on the next line 
then the statement will be executed, but 
{color:red};java.lang.NullPointerException{color} will be thrown; i.e.
{code}select * from TABLE
;EOF{code} will be executed, but print 
{color:red};java.lang.NullPointerException{color}
# If a statement ends with end of line regardless where semicolon is then the 
statement will be executed; i.e.
{code}select * from TABLE;
EOLEOF{code}
or
{code}select * from TABLE
;EOLEOF{code}
will be executed




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [ANNOUNCE] New Hive Committer - Chaoyu Tang

2015-05-21 Thread Chaoyu Tang
Thanks Vaibhav.

On Wed, May 20, 2015 at 6:44 PM, Vaibhav Gumashta vgumas...@hortonworks.com
 wrote:

 Congratulations!

 ‹Vaibhav

 On 5/20/15, 3:40 PM, Jimmy Xiang jxi...@cloudera.com wrote:

 Congrats!!
 
 On Wed, May 20, 2015 at 3:29 PM, Carl Steinbach c...@apache.org wrote:
 
  The Apache Hive PMC has voted to make Chaoyu Tang a committer on the
 Apache
  Hive Project.
 
  Please join me in congratulating Chaoyu!
 
  Thanks.
 
  - Carl
 




Re: [ANNOUNCE] New Hive Committer - Chaoyu Tang

2015-05-21 Thread Sergio Pena
Congratulations Chaoyu !!!

On Wed, May 20, 2015 at 5:29 PM, Carl Steinbach c...@apache.org wrote:

 The Apache Hive PMC has voted to make Chaoyu Tang a committer on the Apache
 Hive Project.

 Please join me in congratulating Chaoyu!

 Thanks.

 - Carl



Review Request 34576: Bucketized Table feature fails in some cases

2015-05-21 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34576/
---

Review request for hive and John Pullokkaran.


Repository: hive-git


Description
---

Bucketized Table feature fails in some cases. if src  destination is bucketed 
on same key, and if actual data in the src is not bucketed (because data got 
loaded using LOAD DATA LOCAL INPATH ) then the data won't be bucketed while 
writing to destination.
Example
--
CREATE TABLE P1(key STRING, val STRING)
CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE P1;
– perform an insert to make sure there are 2 files
INSERT OVERWRITE TABLE P1 select key, val from P1;
--
This is not a regression. This has never worked.
This got only discovered due to Hadoop2 changes.
In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
what is requested by app. Hadoop2 now honors the number of reducer setting in 
local mode (by spawning threads).
Long term solution seems to be to prevent load data for bucketed table.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java e53933e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 1a9b42b 
  ql/src/test/results/clientnegative/alter_partition_invalidspec.q.out 404115f 
  ql/src/test/results/clientnegative/alter_partition_nodrop.q.out 1c78cff 
  ql/src/test/results/clientnegative/alter_partition_nodrop_table.q.out 3c425da 
  ql/src/test/results/clientnegative/alter_partition_offline.q.out c70fcb4 
  ql/src/test/results/clientnegative/archive_corrupt.q.out 56e8ec4 
  ql/src/test/results/clientnegative/bucket_mapjoin_mismatch1.q.out 623c2e8 
  
ql/src/test/results/clientnegative/bucket_mapjoin_wrong_table_metadata_2.q.out 
9aa9b5d 
  
ql/src/test/results/clientnegative/columnstats_partlvl_invalid_values.q.java1.7.out
 4ea70e3 
  
ql/src/test/results/clientnegative/columnstats_partlvl_multiple_part_clause.q.out
 ce79830 
  ql/src/test/results/clientnegative/dynamic_partitions_with_whitelist.q.out 
f069ae8 
  ql/src/test/results/clientnegative/exim_02_all_part_over_overlap.q.out 
3c05600 
  ql/src/test/results/clientnegative/exim_15_part_nonpart.q.out dfbf025 
  ql/src/test/results/clientnegative/exim_16_part_noncompat_schema.q.out 
4cb6ca7 
  ql/src/test/results/clientnegative/exim_17_part_spec_underspec.q.out 23caa4a 
  ql/src/test/results/clientnegative/exim_18_part_spec_missing.q.out 23caa4a 
  ql/src/test/results/clientnegative/exim_21_part_managed_external.q.out 
fd27f29 
  ql/src/test/results/clientnegative/exim_24_import_part_authfail.q.out 1a9a34d 
  ql/src/test/results/clientnegative/insertover_dynapart_ifnotexists.q.out 
a40ffab 
  ql/src/test/results/clientnegative/load_exist_part_authfail.q.out 491cfd0 
  ql/src/test/results/clientnegative/load_part_authfail.q.out 4ea8be9 
  ql/src/test/results/clientnegative/load_part_nospec.q.out bebaf92 
  ql/src/test/results/clientnegative/nopart_load.q.out 8815146 
  ql/src/test/results/clientnegative/protectmode_part2.q.out 16d58c7 
  ql/src/test/results/clientpositive/alter_concatenate_indexed_table.q.out 
ffcbcf9 
  ql/src/test/results/clientpositive/alter_merge.q.out 17d86b8 
  ql/src/test/results/clientpositive/alter_merge_2.q.out e118c39 
  ql/src/test/results/clientpositive/alter_merge_stats.q.out fdd2ddc 
  ql/src/test/results/clientpositive/alter_partition_protect_mode.q.out 80990d9 
  ql/src/test/results/clientpositive/alter_rename_table.q.out 732d8a2 
  ql/src/test/results/clientpositive/alter_table_cascade.q.out 0139466 
  ql/src/test/results/clientpositive/auto_join32.q.out bfc8be8 
  ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 383defd 
  ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out e6e7ef3 
  ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out e9fb705 
  ql/src/test/results/clientpositive/auto_sortmerge_join_16.q.out d4ecb19 
  ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out c089419 
  ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out 6e443fa 
  ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out feaea04 
  ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out e89f548 
  ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 44c037f 
  ql/src/test/results/clientpositive/bucket_map_join_spark1.q.out 870ecdd 
  ql/src/test/results/clientpositive/bucket_map_join_spark2.q.out 33f5c46 
  ql/src/test/results/clientpositive/bucket_map_join_spark3.q.out 067d1ff 
  ql/src/test/results/clientpositive/bucketcontext_1.q.out 77bfcf9 
  ql/src/test/results/clientpositive/bucketcontext_2.q.out a9db13d 
  

[jira] [Created] (HIVE-10789) union distinct query with NULL constant on both the sides throws Unsuported vector output type: void error

2015-05-21 Thread Matt McCline (JIRA)
Matt McCline created HIVE-10789:
---

 Summary: union distinct query with NULL constant on both the sides 
throws Unsuported vector output type: void error
 Key: HIVE-10789
 URL: https://issues.apache.org/jira/browse/HIVE-10789
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.2.1


A NULL expression in the SELECT projection list causes exception to be thrown 
instead of not vectorizing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


CVE-2015-1772

2015-05-21 Thread Chao Sun
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

CVE-2015-1772: Apache Hive Authentication vulnerability in HiveServer2

Severity: Important

Vendor: The Apache Software Foundation

Versions Affected: All versions of Apache Hive from 0.11.0 to 1.0.0, and
1.1.0 .

Users affected: Users who use LDAP authentication mode in HiveServer2 and
also have LDAP configured to allow simple unauthenticated or anonymous bind.

Description:
LDAP services are sometimes configured to allow simple unauthenticated
binds.
When HiveServer2 is configured to use LDAP authentication mode
(hive.server2.authentication configuration parameter is set to LDAP),
with such LDAP configurations, it can allow users without proper credentials
to get authenticated.

This is more easily reproducible when Kerberos authentication is also
enabled
 in the Apache Hadoop cluster.

Mitigation:
There are two options
1. Configure LDAP service to disallow unauthenticated binds. If the service
 allows anonymous binds, not having hive authorization checks enabled can
 also expose this vulnerability.

2. Update Hive installation to use an Authenticator with the fix. There are
 two options here -
   a. Users should upgrade to newer versions of Apache Hive with the
  fix, which includes 1.0.1, 1.1.1 and 1.2.0 .
   b. Users can download the ldap-fix.tar.gz being made available for
  download from the Apache Hive downloads page and follow instructions
  in the README.txt to use an LDAP authenticator that contains the fix
  with your existing Hive release.

Credit:
Thanks to Thomas Rega of CareerBuilder for reporting this issue.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJVXmECAAoJEN3RdT/2ztzmBDsP/inSE3VaTc7gLJf03MbjtoBX
bxrnWGpJir7IVe1nrlj2WiD8i4m/TqG5OoHZB2ZCnVOKbjngh6Mq4ldXM4lzGemN
6aDYW6gIdplwhiiKoVeNrTISl38whPlNO9Kp8Y9nabSGFxBcngRIuWOq6KyOADra
PP9QMys7xB325JgrgEjS9Fxrtx8cGQK+cRDm/Fi5RCjQ0Q3VRmSKVzcbg2jDmyR/
38P67SlZm4w37Z8hrBKakTQ2ql2dkmCSjnlIQCB1dln4iLp6VR2S7sizeYSvk4aQ
86BqORYYwXAmWeUfhUBlbBbLmeicu4VTvhKB2wYkD2G0TBIqXk90GVf5mdwDLir0
gk0R+gfv6YF89pmFVFjwerkLozjKs43Vx5NjQz1IxCeXnoUOw5n6gVC1kFgvnL2o
SYIRqa0+nn1ARf9ssodzffnCsm3QGPMtgy3L+iBiWY6vfI+zgWBhOeFcnlNWieqV
epxn5Q5ojjlwAwKQ7irco3uULiBu+f/CIYq2ey4I8a8qNLHQRs9n850E/3MYaV5o
PmHdu2Gmuvj216fyS+5OuROAjFeuPPDq+qzRVOcISXnCfxzFjXL2PWvPc/RyMN1d
g82gMzwczv8EFhag5MdD5FMyqAxz8BKdeOaKk/QGPQG1XvlGqjuDKJYDCfsHI4F/
5mUttG40ky0zn3ONQAPC
=7NKg
-END PGP SIGNATURE-


[jira] [Created] (HIVE-10790) orc file sql excute fail

2015-05-21 Thread xiaowei wang (JIRA)
xiaowei wang created HIVE-10790:
---

 Summary: orc file sql excute fail 
 Key: HIVE-10790
 URL: https://issues.apache.org/jira/browse/HIVE-10790
 Project: Hive
  Issue Type: Bug
  Components: API
Affects Versions: 0.14.0, 0.13.0
 Environment: Hadoop 2.5.0-cdh5.3.2 
hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang


from a text table insert into a orc table,like as 
insert overwrite table custom.rank_less_orc_none 
partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text where 
logdate='2015051500';

will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
while closing operators
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
getDefaultReplication on empty path is invalid
at 
org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
... 8 more




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34455: HIVE-10550 Dynamic RDD caching optimization for HoS.[Spark Branch]

2015-05-21 Thread chengxiang li


 On 五月 20, 2015, 9:12 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CacheTran.java, line 41
  https://reviews.apache.org/r/34455/diff/1/?file=964754#file964754line41
 
  Currently the storage level is memory+disk. Any reason to change it to 
  memory_only?

Cache data to disk means that data need serialization and deserialization, it's 
costly, and sometime may overwhlem the gain of cache, and it's hard to measure 
programatically, as read from source file just do deserialization, cache in 
disk need an additional serialization
Instead of add an optimizer which may or may not promote performance for user, 
i think it may be better to narrow the the optimzir scope a little bit, to make 
sure this optimizer do promote the performance.


 On 五月 20, 2015, 9:12 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java, line 63
  https://reviews.apache.org/r/34455/diff/1/?file=964756#file964756line63
 
  Can we keep the old code around. I understand it's not currently used.

Of course we can, it just make the code a little mess, you knon, for others who 
want to read the cache related code.


 On 五月 20, 2015, 9:12 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java, line 25
  https://reviews.apache.org/r/34455/diff/1/?file=964757#file964757line25
 
  I cannot construct a case where a MapTran would need caching. Do you 
  have an example?

For any queries which contains SparkWork like this: MapWork -- ReduceWork
 \ -- ReduceWork
for example, from person_orc insert overwrite table p1 select city, count(*) as 
s group by city order by s insert overwrite table p2  select city, avg(age) as 
g group by city order by g;


 On 五月 20, 2015, 9:12 p.m., Xuefu Zhang wrote:
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java, 
  line 419
  https://reviews.apache.org/r/34455/diff/1/?file=964774#file964774line419
 
  Do you think it makes sense for us to release the cache as soon as the 
  job is completed, as it's done here?

Theoretically we does not need to, i mean it would not lead to any extra memory 
leak issue, the only benefit of unpersist cache manually i can image is that it 
reduce GC effort, as Hive do it programatically instead of let GC collect it.
The reason i remove it is that, it add extra complexility to code, and not 
expandable for share cached RDD cross Spark job.


- chengxiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34455/#review84572
---


On 五月 20, 2015, 2:37 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34455/
 ---
 
 (Updated 五月 20, 2015, 2:37 a.m.)
 
 
 Review request for hive, Chao Sun, Jimmy Xiang, and Xuefu Zhang.
 
 
 Bugs: HIVE-10550
 https://issues.apache.org/jira/browse/HIVE-10550
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 see jira description
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CacheTran.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 
 19d3fee 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 26cfebd 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java 2170243 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java e60dfac 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 8b15099 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java ee5c78a 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 3f240f5 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/LocalSparkJobStatus.java
  5d62596 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  8e56263 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkRddCachingResolver.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSkewJoinProcFactory.java
  5990d17 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SplitSparkWorkResolver.java
  fb20080 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java bb5dd79 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
 af6332e 
   spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
 

[jira] [Created] (HIVE-10791) Beeline-CLI: Implement in-place update UI for CLI compatibility

2015-05-21 Thread Gopal V (JIRA)
Gopal V created HIVE-10791:
--

 Summary: Beeline-CLI: Implement in-place update UI for CLI 
compatibility
 Key: HIVE-10791
 URL: https://issues.apache.org/jira/browse/HIVE-10791
 Project: Hive
  Issue Type: Sub-task
Affects Versions: beeline-cli-branch
Reporter: Gopal V
Priority: Critical


The current CLI implementation has an in-place updating UI which offers a clear 
picture of execution runtime and failures.

This is designed for large DAGs which have more than 10 verticles, where the 
old UI would scroll sideways.

The new CLI implementation needs to keep up the usability standards set by the 
old one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

2015-05-21 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created HIVE-10793:
--

 Summary: Hybrid Hybrid Grace Hash Join : Don't allocate all hash 
table memory upfront
 Key: HIVE-10793
 URL: https://issues.apache.org/jira/browse/HIVE-10793
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 1.2.1


HybridHashTableContainer will allocate memory based on estimate, which means if 
the actual is less than the estimate the allocated memory won't be used.

Number of partitions is calculated based on estimated data size
{code}
numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, 
minNumParts, minWbSize,
  nwayConf);
{code}

Then based on number of partitions writeBufferSize is set

{code}
writeBufferSize = (int)(estimatedTableSize / numPartitions);
{code}

Each hash partition will allocate 1 WriteBuffer, with no further allocation if 
the estimate data size is correct.

Suggested solution is to reduce writeBufferSize by a factor such that only X% 
of the memory is preallocated.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10794) Remove the dependence from ErrorMsg to HiveUtils

2015-05-21 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-10794:


 Summary: Remove the dependence from ErrorMsg to HiveUtils
 Key: HIVE-10794
 URL: https://issues.apache.org/jira/browse/HIVE-10794
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley


HiveUtils has a large set of dependencies and ErrorMsg only needs the new line 
constant. Breaking the dependence will reduce the dependency set from ErrorMsg 
significantly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases

2015-05-21 Thread Dayue Gao (JIRA)
Dayue Gao created HIVE-10792:


 Summary: PPD leads to wrong answer when mapper scans the same 
table with multiple aliases
 Key: HIVE-10792
 URL: https://issues.apache.org/jira/browse/HIVE-10792
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Affects Versions: 1.2.0, 1.0.0, 0.13.1, 0.14.0, 0.13.0, 1.1.0
Reporter: Dayue Gao
Assignee: Dayue Gao
Priority: Critical


Here's the steps to reproduce the bug.
First of all, prepare a simple ORC table with one row
{code}
create table test_orc (c0 int, c1 int) stored as ORC;
{code}
Table: test_orc
||c0||c1||
|0|1|

The following SQL gets empty result which is not expected
{code}
select * from test_orc t1
union all
select * from test_orc t2
where t2.c0 = 1
{code}

Self join is also broken
{code}
set hive.auto.convert.join=false; -- force common join

select * from test_orc t1
left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0);
{code}
It gets empty result while the expected answer is
||t1.c0||t1.c1||t2.c0||t2.c1||
|0|1|NULL|NULL|

In these cases, we pushdown predicates into OrcInputFormat. As a result, 
TableScanOperator for t1 can't receive its rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 34586: HIVE-10704

2015-05-21 Thread Mostafa Mokhtar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34586/
---

Review request for hive.


Repository: hive-git


Description
---

fix biggest small table selection when table sizes are 0
fallback to dividing memory equally if any tables have invalid size


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 
536b92c5dd03abe9ff57bf64d87be0f3ef34aa7a 

Diff: https://reviews.apache.org/r/34586/diff/


Testing
---


Thanks,

Mostafa Mokhtar



Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet

2015-05-21 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34473/#review84758
---



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
https://reviews.apache.org/r/34473/#comment136104

Could you separate words with _? Like ENABLE_ACID_SCHEMA_INFO. It helps to 
read the constant more easily.

Do we have to enable transactions exclusively for parquet? Isn't there 
another variable that enables trasnactions on Hive that we can use?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
https://reviews.apache.org/r/34473/#comment136111

Could you separate the workds? Like ENABLE_ACID_SCHEMA_INFO. It makes the 
code more readable.

Also, isn't there another variable that we can use to detect if 
transactions are enabled? I am not sure if we should add more variables to Hive.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
https://reviews.apache.org/r/34473/#comment136107

You can use this one line to return the column list:

return (ListString) 
StringUtils.getStringCollection(tableProperties.getProperty(IOConstants.COLUMNS));

It will return an empty list array if COLUMN is empty.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
https://reviews.apache.org/r/34473/#comment136112

You can save code by using this line:

return (ListString) 
StringUtils.getStringCollection(tableProperties.getProperty(IOConstants.COLUMNS));

It will return an empty list if the parameter is empty.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
https://reviews.apache.org/r/34473/#comment136108

You can call TypeInfoUtils.getTypeINfosFromTypeString() with an empty 
string here. It will return an empty list. Let's save code by using:

ArrayListTypeInfo columnTypes = 
TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
https://reviews.apache.org/r/34473/#comment136113

You can save code by using this line:

ArrayListTypeInfo columnTypes = 
TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);

It will return an empty list if the parameter is empty.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
https://reviews.apache.org/r/34473/#comment136109

Same here, you can save code with this:

ArrayListString columnNames = (ArrayListString) 
StringUtils.getStringCollection(columnNameProperty);



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
https://reviews.apache.org/r/34473/#comment136114

Same thing here:

ArrayListString columnNames = (ArrayListString) 
StringUtils.getStringCollection(columnNameProperty);



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
https://reviews.apache.org/r/34473/#comment136117

Why do you need a Writable? HIVE-9658 tries to avoid wrapping java types 
into writable if they are being used by Hive to save memory usage.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java
https://reviews.apache.org/r/34473/#comment136116

I am waiting to commit the patch from HIVE-10749 that uses a similar class 
named ObjectArrayWritableObjectInspector. 

Also, I think this is already part o the parquet branch.


- Sergio Pena


On May 21, 2015, 7:45 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34473/
 ---
 
 (Updated May 21, 2015, 7:45 a.m.)
 
 
 Review request for hive, Alan Gates and Sergio Pena.
 
 
 Bugs: HIVE-10749
 https://issues.apache.org/jira/browse/HIVE-10749
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Implement the insert statement for parquet format.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
  c6fb26c 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
  f513572 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java
  PRE-CREATION 
   ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION 
   ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34473/diff/
 
 
 Testing
 ---
 
 Newly added qtest and UT passed locally
 
 
 

Re: [ANNOUNCE] New Hive Committer - Chaoyu Tang

2015-05-21 Thread kulkarni.swar...@gmail.com
Congrats Chaoyu!

On Thu, May 21, 2015 at 9:17 AM, Sergio Pena sergio.p...@cloudera.com
wrote:

 Congratulations Chaoyu !!!

 On Wed, May 20, 2015 at 5:29 PM, Carl Steinbach c...@apache.org wrote:

  The Apache Hive PMC has voted to make Chaoyu Tang a committer on the
 Apache
  Hive Project.
 
  Please join me in congratulating Chaoyu!
 
  Thanks.
 
  - Carl
 




-- 
Swarnim


[jira] [Created] (HIVE-10785) Support Aggregate push down through joins

2015-05-21 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-10785:
--

 Summary: Support Aggregate push down through joins
 Key: HIVE-10785
 URL: https://issues.apache.org/jira/browse/HIVE-10785
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Enable {{AggregateJoinTransposeRule}} in CBO that pushes Aggregate through Join 
operators (if possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10786) Propagate range for column stats

2015-05-21 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-10786:
--

 Summary: Propagate range for column stats
 Key: HIVE-10786
 URL: https://issues.apache.org/jira/browse/HIVE-10786
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


For column stats, Calcite doesn't propagate range. Range of a col will
help us in deciding filter cardinality for inequality.

Range of values of a column and NDV together will help us to get build 
histograms of uniform height.

This needs special handling for each operator:
- Inner Join where col is part of join key: range is lowest range of lhs, rhs
- Outer Join: range of outer side if col is from outer side
- Filter inequality on literal (x10): Range is restricted on upper side by
literal value



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet

2015-05-21 Thread Alan Gates

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34473/#review84729
---



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java
https://reviews.apache.org/r/34473/#comment136066

Do you intend to use this in conjunction with hive.hcatalog.streaming?  If 
so, closing the file on a flush is not what you'll want.


- Alan Gates


On May 21, 2015, 7:45 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34473/
 ---
 
 (Updated May 21, 2015, 7:45 a.m.)
 
 
 Review request for hive, Alan Gates and Sergio Pena.
 
 
 Bugs: HIVE-10749
 https://issues.apache.org/jira/browse/HIVE-10749
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Implement the insert statement for parquet format.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
  c6fb26c 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
  f513572 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java
  PRE-CREATION 
   ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION 
   ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34473/diff/
 
 
 Testing
 ---
 
 Newly added qtest and UT passed locally
 
 
 Thanks,
 
 cheng xu
 




[jira] [Created] (HIVE-10787) MatchPath misses the last matched row from the final result set

2015-05-21 Thread Mohammad Kamrul Islam (JIRA)
Mohammad Kamrul Islam created HIVE-10787:


 Summary: MatchPath misses the last matched row from the final 
result set
 Key: HIVE-10787
 URL: https://issues.apache.org/jira/browse/HIVE-10787
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 1.2.0
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam


For example, if you have a STAR(*) pattern at the end, the current code misses 
the last row from the final result.  For example, if I have pattern like 
(LATE.EARLY*), the matched rows are :
1. LATE
2. EARLY
In the current implementation, the final 'tpath' missed the last EARLY and 
returns only LATE . Ideally it should return LATE and EARLY.

The following code snippets shows the bug.
{noformat}
0. SymbolFunctionResult rowResult = symbolFn.match(row, pItr);
1. while (rowResult.matches  pItr.hasNext())
2.{
3.  row = pItr.next();
4.rowResult = symbolFn.match(row, pItr);
5.  }
6.
7.  result.nextRow = pItr.getIndex() - 1;
{noformat}

Line 7 of the code always moves the row index by one. If ,in some cases, loop 
(line 1)  is never executed (due to pItr.hasNext() being 'false'), the code 
still moves the row pointer back by one. Although the line 0 found the first 
match and the iterator reaches to the end.

I'm uploading a patch which I already tested.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument

2015-05-21 Thread Chao Sun


 On May 21, 2015, 7:18 a.m., Lenni Kuff wrote:
  lgtm - I assume this works with decimal (with scale/precision) and 
  char/varchar? Maybe add one test case for those?

OK, I added a few tests for decimal and varchar.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34393/#review84669
---


On May 21, 2015, 6:44 a.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34393/
 ---
 
 (Updated May 21, 2015, 6:44 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-10427
 https://issues.apache.org/jira/browse/HIVE-10427
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Currently for collect_list() and collect_set(), only primitive types are 
 supported. This patch adds support for struct, list and map types as well.
 
 It turned out I that all I need is loosen the type checking.
 
 
 Diffs
 -
 
   data/files/customers.txt PRE-CREATION 
   data/files/nested_orders.txt PRE-CREATION 
   data/files/orders.txt PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 
 536c4a7 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 
 6dc424a 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java
  efcc8f5 
   ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION 
   ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34393/diff/
 
 
 Testing
 ---
 
 All but one test (which seems unrelated) are passing.
 I also added a test: udaf_collect_list_set_2.q
 
 
 Thanks,
 
 Chao Sun
 




Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument

2015-05-21 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34393/
---

(Updated May 21, 2015, 5:30 p.m.)


Review request for hive.


Changes
---

Added a few tests for decimal and varchar. Also the behavior for `sort_array` 
to resolve the ordering issue of test result. Currently `sort_array` since it 
only accept list of primitives, but since we already support comparison between 
nested types (map, struct, union, etc), I think it makes sense to remove this 
limitation.


Bugs: HIVE-10427
https://issues.apache.org/jira/browse/HIVE-10427


Repository: hive-git


Description
---

Currently for collect_list() and collect_set(), only primitive types are 
supported. This patch adds support for struct, list and map types as well.

It turned out I that all I need is loosen the type checking.


Diffs (updated)
-

  data/files/customers.txt PRE-CREATION 
  data/files/nested_orders.txt PRE-CREATION 
  data/files/orders.txt PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 
536c4a7 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 
6dc424a 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java
 efcc8f5 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArray.java 
2d6d58c 
  ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q 
PRE-CREATION 
  ql/src/test/queries/clientnegative/udf_sort_array_wrong3.q 034de06 
  ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION 
  ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/udf_sort_array_wrong2.q.out c068ecd 
  ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/34393/diff/


Testing
---

All but one test (which seems unrelated) are passing.
I also added a test: udaf_collect_list_set_2.q


Thanks,

Chao Sun