Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34473/ --- (Updated May 21, 2015, 7:45 a.m.) Review request for hive, Alan Gates and Sergio Pena. Changes --- Summary: 1. fix code style issues 2. remove codes irrelevant to the insert statement 3. fix one issue about SetParquetSchema from previous patch Bugs: HIVE-10749 https://issues.apache.org/jira/browse/HIVE-10749 Repository: hive-git Description --- Implement the insert statement for parquet format. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java c6fb26c ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java f513572 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java PRE-CREATION ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34473/diff/ Testing --- Newly added qtest and UT passed locally Thanks, cheng xu
Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet
On May 20, 2015, 8:45 p.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java, line 207 https://reviews.apache.org/r/34473/diff/1/?file=965270#file965270line207 you can use final ArrayListObject list = new ArrayListObject(Collections.nCopies(fields.size(), null)); instead I don't think so because only in the insert statement, we can't understand how to inspect the row object until creating parquet writer. This is why I create the new constructor in ParquetStructObjectInspector. Thank yoU! - cheng --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34473/#review84574 --- On May 20, 2015, 2:54 p.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34473/ --- (Updated May 20, 2015, 2:54 p.m.) Review request for hive, Alan Gates and Sergio Pena. Bugs: HIVE-10749 https://issues.apache.org/jira/browse/HIVE-10749 Repository: hive-git Description --- Implement the insert statement for parquet format. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java 000eb38 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 8380117 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java 4e1820c ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRawRecordMerger.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 43c772f ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 0a5edbb ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java 0d32e49 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java 5f7f597 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java PRE-CREATION ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34473/diff/ Testing --- Newly added qtest and UT passed locally Thanks, cheng xu
Re: Review Request 33956: HIVE-9614: Encrypt mapjoin tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33956/#review84671 --- Thank you for this patch. I have some questions and will have another round of review after understanding these questions. Thank you! common/src/java/org/apache/hive/common/util/HdfsEncryptionUtilities.java https://reviews.apache.org/r/33956/#comment136026 Why not use the isPathEncrypted from HdfsEncryptionShim directly? common/src/java/org/apache/hive/common/util/HdfsEncryptionUtilities.java https://reviews.apache.org/r/33956/#comment136027 The same as above. ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/33956/#comment136025 Is it possible to get the FsPermission from org.apache.hadoop.fs.FileContext? ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java https://reviews.apache.org/r/33956/#comment136022 I am a little confused here. How can a local path be converted to a hdfs path? The original code is trying to create a tar file from a local path and uploading it to the hdfs with replication information. The new code path will lose the replication information. And the previous code path will only be executed in a local file or pfile schema in test. ql/src/test/queries/clientpositive/encryption_map_join_select.q https://reviews.apache.org/r/33956/#comment136021 drop table encryptedTable PURGE; - cheng xu On May 7, 2015, 9:23 p.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33956/ --- (Updated May 7, 2015, 9:23 p.m.) Review request for hive, Brock Noland and cheng xu. Bugs: HIVE-9614 https://issues.apache.org/jira/browse/HIVE-9614 Repository: hive-git Description --- The security issue here is that encrypted tables used on MAP-JOIN queries, and stored on the distribute cache, are first copied to the client local filesystem in an unencrypted form in order to compress it there. This patch avoids the local copy if the table is encrypted on HDFS. It keeps the hash table on HDFS, compresses the table in HDFS, and then adds it to the distributed cache. Files that are copied to the datanodes by the distributed cache are still unencrypted. This is a limitation we have from HDFS. Diffs - common/src/java/org/apache/hadoop/hive/common/CompressionUtils.java 0e0d538c2faf1c52c4d8378df013294ae4efa41c common/src/java/org/apache/hive/common/util/HdfsEncryptionUtilities.java PRE-CREATION itests/src/test/resources/testconfiguration.properties 3eff7d010923a4e07d5024904f1531ca52473aa2 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ad5c8f8302de2a15b1703161799f71cd81a94475 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java d7a08ecf1c183fe56b5ca41c2c69d413874418bb ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f76ce17711077ceadf23e6b9ed12e6a414 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java c0a72b69df3871bbcc870af286774aee5269668b ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cbc5466261f749fe7b84d7533dc0ff3274b6777f ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 82143a64db163da766dcc138231b4d4174603470 ql/src/test/queries/clientpositive/encryption_map_join_select.q PRE-CREATION ql/src/test/results/clientpositive/encrypted/encryption_map_join_select.q.out PRE-CREATION Diff: https://reviews.apache.org/r/33956/diff/ Testing --- Thanks, Sergio Pena
Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/ --- (Updated May 21, 2015, 6:44 a.m.) Review request for hive. Bugs: HIVE-10427 https://issues.apache.org/jira/browse/HIVE-10427 Repository: hive-git Description (updated) --- Currently for collect_list() and collect_set(), only primitive types are supported. This patch adds support for struct, list and map types as well. It turned out I that all I need is loosen the type checking. Diffs - data/files/customers.txt PRE-CREATION data/files/nested_orders.txt PRE-CREATION data/files/orders.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 536c4a7 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 6dc424a ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java efcc8f5 ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q PRE-CREATION ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out PRE-CREATION ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34393/diff/ Testing (updated) --- All but one test (which seems unrelated) are passing. I also added a test: udaf_collect_list_set_2.q Thanks, Chao Sun
Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/#review84669 --- Ship it! lgtm - I assume this works with decimal (with scale/precision) and char/varchar? Maybe add one test case for those? - Lenni Kuff On May 21, 2015, 6:44 a.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/ --- (Updated May 21, 2015, 6:44 a.m.) Review request for hive. Bugs: HIVE-10427 https://issues.apache.org/jira/browse/HIVE-10427 Repository: hive-git Description --- Currently for collect_list() and collect_set(), only primitive types are supported. This patch adds support for struct, list and map types as well. It turned out I that all I need is loosen the type checking. Diffs - data/files/customers.txt PRE-CREATION data/files/nested_orders.txt PRE-CREATION data/files/orders.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 536c4a7 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 6dc424a ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java efcc8f5 ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q PRE-CREATION ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out PRE-CREATION ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34393/diff/ Testing --- All but one test (which seems unrelated) are passing. I also added a test: udaf_collect_list_set_2.q Thanks, Chao Sun
[jira] [Created] (HIVE-10783) Support MINUS set operation
sanjiv singh created HIVE-10783: --- Summary: Support MINUS set operation Key: HIVE-10783 URL: https://issues.apache.org/jira/browse/HIVE-10783 Project: Hive Issue Type: Improvement Components: SQL Reporter: sanjiv singh Support a MINUS operation as qb1 MINUS qb2 . Common requirement of queries to project two sets where you want result set which contains only unique rows returned by the first query but not by the second. The following sample statement combines results with the MINUS operator, which returns only unique rows returned by the first query but not by the second: SELECT * FROM tableA MINUS SELECT * FROM tableB; current exception: FAILED: ParseException line *:** missing EOF at 'SELECT' near 'MINUS' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10780) Support INTERSECTION set operation
sanjiv singh created HIVE-10780: --- Summary: Support INTERSECTION set operation Key: HIVE-10780 URL: https://issues.apache.org/jira/browse/HIVE-10780 Project: Hive Issue Type: Improvement Components: SQL Reporter: sanjiv singh Support a INTERSECTION operation as qb1 INTERSECTION qb2 . Common requirement of queries to project two sets where you want result set which contains only those rows returned by both queries. The following sample statement combines the results with the INTERSECT operator, which returns only those rows returned by both queries: SELECT * FROM tableA INTERSECT SELECT * FROM tableB; current exception: FAILED: ParseException line *:** missing EOF at 'SELECT' near 'INTERSECT' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10781) HadoopJobExecHelper Leaks RunningJobs
Nemon Lou created HIVE-10781: Summary: HadoopJobExecHelper Leaks RunningJobs Key: HIVE-10781 URL: https://issues.apache.org/jira/browse/HIVE-10781 Project: Hive Issue Type: Bug Components: Hive, HiveServer2 Affects Versions: 1.2.0, 0.13.1 Reporter: Nemon Lou On one of our busy hadoop cluster, hiveServer2 holds more than 4000 org.apache.hadoop.mapred.JobClient$NetworkedJob instances,while only has less than 3 backgroud handler thread at the same time. All these instances are hold in one LinkedList from org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper's runningJobs property,which is static. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument
On May 19, 2015, 5:36 a.m., Lenni Kuff wrote: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java, line 50 https://reviews.apache.org/r/34393/diff/1/?file=963345#file963345line50 should we also support arrays and unions? Added support for array. union seems a bit tricky - let's make that as a follow up task. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/#review84260 --- On May 19, 2015, 4:47 a.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/ --- (Updated May 19, 2015, 4:47 a.m.) Review request for hive. Bugs: HIVE-10427 https://issues.apache.org/jira/browse/HIVE-10427 Repository: hive-git Description --- Currently for collect_list() and collect_set(), only primitive types are supported. This patch adds support for struct and map types as well. It turned out I that all I need is loosen the type checking. Diffs - data/files/customers.txt PRE-CREATION data/files/nested_orders.txt PRE-CREATION data/files/orders.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 536c4a7 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 6dc424a ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java efcc8f5 ql/src/test/queries/clientpositive/udaf_collect_list_set_nested.q PRE-CREATION ql/src/test/results/clientpositive/udaf_collect_list_set_nested.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34393/diff/ Testing --- All but one test (which seems unrelated) are passing. I also added a test: udaf_collect_list_set_nested.q Thanks, Chao Sun
Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/ --- (Updated May 21, 2015, 6:44 a.m.) Review request for hive. Changes --- Addressing RB comments. Bugs: HIVE-10427 https://issues.apache.org/jira/browse/HIVE-10427 Repository: hive-git Description --- Currently for collect_list() and collect_set(), only primitive types are supported. This patch adds support for struct and map types as well. It turned out I that all I need is loosen the type checking. Diffs (updated) - data/files/customers.txt PRE-CREATION data/files/nested_orders.txt PRE-CREATION data/files/orders.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 536c4a7 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 6dc424a ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java efcc8f5 ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q PRE-CREATION ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out PRE-CREATION ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34393/diff/ Testing --- All but one test (which seems unrelated) are passing. I also added a test: udaf_collect_list_set_nested.q Thanks, Chao Sun
[jira] [Created] (HIVE-10779) LLAP: Daemons should shutdown in case of fatal errors
Siddharth Seth created HIVE-10779: - Summary: LLAP: Daemons should shutdown in case of fatal errors Key: HIVE-10779 URL: https://issues.apache.org/jira/browse/HIVE-10779 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth For example, the scheduler loop exiting. Currently they end up getting stuck - while still accepting new work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10782) Support EXCEPT set operation
sanjiv singh created HIVE-10782: --- Summary: Support EXCEPT set operation Key: HIVE-10782 URL: https://issues.apache.org/jira/browse/HIVE-10782 Project: Hive Issue Type: Improvement Components: SQL Reporter: sanjiv singh Support a EXCEPT operation as qb1 EXCEPT qb2 . Common requirement of queries to project two sets where you want result set which contains distinct rows from the left input query that aren’t output by the right input query. The following sample statement combines the results with the INTERSECT operator, which returns distinct rows from the left input query that aren’t output by the right input query: SELECT * FROM tableA EXCEPT SELECT * FROM tableB; current exception: FAILED: ParseException line *:** missing EOF at 'SELECT' near 'EXCEPT' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/#review84747 --- ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArray.java https://reviews.apache.org/r/34393/#comment136093 Can you replace this if block with checkArgsSize(arguments, min, max) ? ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArray.java https://reviews.apache.org/r/34393/#comment136095 can you remove unused imports? import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category; - Alexander Pivovarov On May 21, 2015, 5:30 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/ --- (Updated May 21, 2015, 5:30 p.m.) Review request for hive. Bugs: HIVE-10427 https://issues.apache.org/jira/browse/HIVE-10427 Repository: hive-git Description --- Currently for collect_list() and collect_set(), only primitive types are supported. This patch adds support for struct, list and map types as well. It turned out I that all I need is loosen the type checking. Diffs - data/files/customers.txt PRE-CREATION data/files/nested_orders.txt PRE-CREATION data/files/orders.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 536c4a7 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 6dc424a ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java efcc8f5 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArray.java 2d6d58c ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q PRE-CREATION ql/src/test/queries/clientnegative/udf_sort_array_wrong3.q 034de06 ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out PRE-CREATION ql/src/test/results/clientnegative/udf_sort_array_wrong2.q.out c068ecd ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34393/diff/ Testing --- All but one test (which seems unrelated) are passing. I also added a test: udaf_collect_list_set_2.q Thanks, Chao Sun
[jira] [Created] (HIVE-10788) Change sort_array to support non-primitive types
Chao Sun created HIVE-10788: --- Summary: Change sort_array to support non-primitive types Key: HIVE-10788 URL: https://issues.apache.org/jira/browse/HIVE-10788 Project: Hive Issue Type: Bug Reporter: Chao Sun Assignee: Chao Sun Currently {{sort_array}} only support primitive types. As we already support comparison between non-primitive types, it makes sense to remove this restriction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument
On May 21, 2015, 6:22 p.m., Alexander Pivovarov wrote: ql/src/test/queries/clientpositive/udaf_collect_set_2.q, line 1 https://reviews.apache.org/r/34393/diff/3/?file=966777#file966777line1 Is it necessary? Yes, date is a reserved keyword. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/#review84749 --- On May 21, 2015, 5:30 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/ --- (Updated May 21, 2015, 5:30 p.m.) Review request for hive. Bugs: HIVE-10427 https://issues.apache.org/jira/browse/HIVE-10427 Repository: hive-git Description --- Currently for collect_list() and collect_set(), only primitive types are supported. This patch adds support for struct, list and map types as well. It turned out I that all I need is loosen the type checking. Diffs - data/files/customers.txt PRE-CREATION data/files/nested_orders.txt PRE-CREATION data/files/orders.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 536c4a7 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 6dc424a ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java efcc8f5 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArray.java 2d6d58c ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q PRE-CREATION ql/src/test/queries/clientnegative/udf_sort_array_wrong3.q 034de06 ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out PRE-CREATION ql/src/test/results/clientnegative/udf_sort_array_wrong2.q.out c068ecd ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34393/diff/ Testing --- All but one test (which seems unrelated) are passing. I also added a test: udaf_collect_list_set_2.q Thanks, Chao Sun
Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/#review84749 --- ql/src/test/queries/clientpositive/udaf_collect_set_2.q https://reviews.apache.org/r/34393/#comment136097 Is it necessary? - Alexander Pivovarov On May 21, 2015, 5:30 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/ --- (Updated May 21, 2015, 5:30 p.m.) Review request for hive. Bugs: HIVE-10427 https://issues.apache.org/jira/browse/HIVE-10427 Repository: hive-git Description --- Currently for collect_list() and collect_set(), only primitive types are supported. This patch adds support for struct, list and map types as well. It turned out I that all I need is loosen the type checking. Diffs - data/files/customers.txt PRE-CREATION data/files/nested_orders.txt PRE-CREATION data/files/orders.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 536c4a7 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 6dc424a ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java efcc8f5 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArray.java 2d6d58c ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q PRE-CREATION ql/src/test/queries/clientnegative/udf_sort_array_wrong3.q 034de06 ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out PRE-CREATION ql/src/test/results/clientnegative/udf_sort_array_wrong2.q.out c068ecd ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34393/diff/ Testing --- All but one test (which seems unrelated) are passing. I also added a test: udaf_collect_list_set_2.q Thanks, Chao Sun
[jira] [Created] (HIVE-10784) Beeline requires new line (EOL) at the end of an Hive SQL script (NullPointerException)
Andrey Dmitriev created HIVE-10784: -- Summary: Beeline requires new line (EOL) at the end of an Hive SQL script (NullPointerException) Key: HIVE-10784 URL: https://issues.apache.org/jira/browse/HIVE-10784 Project: Hive Issue Type: Bug Components: Beeline, CLI Affects Versions: 0.13.1 Environment: Linux 2.6.32 (Red Hat 4.4.7) Reporter: Andrey Dmitriev Priority: Minor Beeline tool requires to have new line at the end of a Hive/Impala SQL script otherwise the last statement will be not executed or NullPointerException will be thrown. # If a statement ends without end of line AND semicolon is on the same line then the statement will be ignored; i.e. {code}select * from TABLE;EOF{code} will be *not* executed # If a statement ends without end of line BUT semicolon is on the next line then the statement will be executed, but {color:red};java.lang.NullPointerException{color} will be thrown; i.e. {code}select * from TABLE ;EOF{code} will be executed, but print {color:red};java.lang.NullPointerException{color} # If a statement ends with end of line regardless where semicolon is then the statement will be executed; i.e. {code}select * from TABLE; EOLEOF{code} or {code}select * from TABLE ;EOLEOF{code} will be executed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [ANNOUNCE] New Hive Committer - Chaoyu Tang
Thanks Vaibhav. On Wed, May 20, 2015 at 6:44 PM, Vaibhav Gumashta vgumas...@hortonworks.com wrote: Congratulations! ‹Vaibhav On 5/20/15, 3:40 PM, Jimmy Xiang jxi...@cloudera.com wrote: Congrats!! On Wed, May 20, 2015 at 3:29 PM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Chaoyu Tang a committer on the Apache Hive Project. Please join me in congratulating Chaoyu! Thanks. - Carl
Re: [ANNOUNCE] New Hive Committer - Chaoyu Tang
Congratulations Chaoyu !!! On Wed, May 20, 2015 at 5:29 PM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Chaoyu Tang a committer on the Apache Hive Project. Please join me in congratulating Chaoyu! Thanks. - Carl
Review Request 34576: Bucketized Table feature fails in some cases
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34576/ --- Review request for hive and John Pullokkaran. Repository: hive-git Description --- Bucketized Table feature fails in some cases. if src destination is bucketed on same key, and if actual data in the src is not bucketed (because data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination. Example -- CREATE TABLE P1(key STRING, val STRING) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE P1; – perform an insert to make sure there are 2 files INSERT OVERWRITE TABLE P1 select key, val from P1; -- This is not a regression. This has never worked. This got only discovered due to Hadoop2 changes. In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning threads). Long term solution seems to be to prevent load data for bucketed table. Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java e53933e ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 1a9b42b ql/src/test/results/clientnegative/alter_partition_invalidspec.q.out 404115f ql/src/test/results/clientnegative/alter_partition_nodrop.q.out 1c78cff ql/src/test/results/clientnegative/alter_partition_nodrop_table.q.out 3c425da ql/src/test/results/clientnegative/alter_partition_offline.q.out c70fcb4 ql/src/test/results/clientnegative/archive_corrupt.q.out 56e8ec4 ql/src/test/results/clientnegative/bucket_mapjoin_mismatch1.q.out 623c2e8 ql/src/test/results/clientnegative/bucket_mapjoin_wrong_table_metadata_2.q.out 9aa9b5d ql/src/test/results/clientnegative/columnstats_partlvl_invalid_values.q.java1.7.out 4ea70e3 ql/src/test/results/clientnegative/columnstats_partlvl_multiple_part_clause.q.out ce79830 ql/src/test/results/clientnegative/dynamic_partitions_with_whitelist.q.out f069ae8 ql/src/test/results/clientnegative/exim_02_all_part_over_overlap.q.out 3c05600 ql/src/test/results/clientnegative/exim_15_part_nonpart.q.out dfbf025 ql/src/test/results/clientnegative/exim_16_part_noncompat_schema.q.out 4cb6ca7 ql/src/test/results/clientnegative/exim_17_part_spec_underspec.q.out 23caa4a ql/src/test/results/clientnegative/exim_18_part_spec_missing.q.out 23caa4a ql/src/test/results/clientnegative/exim_21_part_managed_external.q.out fd27f29 ql/src/test/results/clientnegative/exim_24_import_part_authfail.q.out 1a9a34d ql/src/test/results/clientnegative/insertover_dynapart_ifnotexists.q.out a40ffab ql/src/test/results/clientnegative/load_exist_part_authfail.q.out 491cfd0 ql/src/test/results/clientnegative/load_part_authfail.q.out 4ea8be9 ql/src/test/results/clientnegative/load_part_nospec.q.out bebaf92 ql/src/test/results/clientnegative/nopart_load.q.out 8815146 ql/src/test/results/clientnegative/protectmode_part2.q.out 16d58c7 ql/src/test/results/clientpositive/alter_concatenate_indexed_table.q.out ffcbcf9 ql/src/test/results/clientpositive/alter_merge.q.out 17d86b8 ql/src/test/results/clientpositive/alter_merge_2.q.out e118c39 ql/src/test/results/clientpositive/alter_merge_stats.q.out fdd2ddc ql/src/test/results/clientpositive/alter_partition_protect_mode.q.out 80990d9 ql/src/test/results/clientpositive/alter_rename_table.q.out 732d8a2 ql/src/test/results/clientpositive/alter_table_cascade.q.out 0139466 ql/src/test/results/clientpositive/auto_join32.q.out bfc8be8 ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 383defd ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out e6e7ef3 ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out e9fb705 ql/src/test/results/clientpositive/auto_sortmerge_join_16.q.out d4ecb19 ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out c089419 ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out 6e443fa ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out feaea04 ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out e89f548 ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 44c037f ql/src/test/results/clientpositive/bucket_map_join_spark1.q.out 870ecdd ql/src/test/results/clientpositive/bucket_map_join_spark2.q.out 33f5c46 ql/src/test/results/clientpositive/bucket_map_join_spark3.q.out 067d1ff ql/src/test/results/clientpositive/bucketcontext_1.q.out 77bfcf9 ql/src/test/results/clientpositive/bucketcontext_2.q.out a9db13d
[jira] [Created] (HIVE-10789) union distinct query with NULL constant on both the sides throws Unsuported vector output type: void error
Matt McCline created HIVE-10789: --- Summary: union distinct query with NULL constant on both the sides throws Unsuported vector output type: void error Key: HIVE-10789 URL: https://issues.apache.org/jira/browse/HIVE-10789 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.2.1 A NULL expression in the SELECT projection list causes exception to be thrown instead of not vectorizing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
CVE-2015-1772
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 CVE-2015-1772: Apache Hive Authentication vulnerability in HiveServer2 Severity: Important Vendor: The Apache Software Foundation Versions Affected: All versions of Apache Hive from 0.11.0 to 1.0.0, and 1.1.0 . Users affected: Users who use LDAP authentication mode in HiveServer2 and also have LDAP configured to allow simple unauthenticated or anonymous bind. Description: LDAP services are sometimes configured to allow simple unauthenticated binds. When HiveServer2 is configured to use LDAP authentication mode (hive.server2.authentication configuration parameter is set to LDAP), with such LDAP configurations, it can allow users without proper credentials to get authenticated. This is more easily reproducible when Kerberos authentication is also enabled in the Apache Hadoop cluster. Mitigation: There are two options 1. Configure LDAP service to disallow unauthenticated binds. If the service allows anonymous binds, not having hive authorization checks enabled can also expose this vulnerability. 2. Update Hive installation to use an Authenticator with the fix. There are two options here - a. Users should upgrade to newer versions of Apache Hive with the fix, which includes 1.0.1, 1.1.1 and 1.2.0 . b. Users can download the ldap-fix.tar.gz being made available for download from the Apache Hive downloads page and follow instructions in the README.txt to use an LDAP authenticator that contains the fix with your existing Hive release. Credit: Thanks to Thomas Rega of CareerBuilder for reporting this issue. -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQIcBAEBAgAGBQJVXmECAAoJEN3RdT/2ztzmBDsP/inSE3VaTc7gLJf03MbjtoBX bxrnWGpJir7IVe1nrlj2WiD8i4m/TqG5OoHZB2ZCnVOKbjngh6Mq4ldXM4lzGemN 6aDYW6gIdplwhiiKoVeNrTISl38whPlNO9Kp8Y9nabSGFxBcngRIuWOq6KyOADra PP9QMys7xB325JgrgEjS9Fxrtx8cGQK+cRDm/Fi5RCjQ0Q3VRmSKVzcbg2jDmyR/ 38P67SlZm4w37Z8hrBKakTQ2ql2dkmCSjnlIQCB1dln4iLp6VR2S7sizeYSvk4aQ 86BqORYYwXAmWeUfhUBlbBbLmeicu4VTvhKB2wYkD2G0TBIqXk90GVf5mdwDLir0 gk0R+gfv6YF89pmFVFjwerkLozjKs43Vx5NjQz1IxCeXnoUOw5n6gVC1kFgvnL2o SYIRqa0+nn1ARf9ssodzffnCsm3QGPMtgy3L+iBiWY6vfI+zgWBhOeFcnlNWieqV epxn5Q5ojjlwAwKQ7irco3uULiBu+f/CIYq2ey4I8a8qNLHQRs9n850E/3MYaV5o PmHdu2Gmuvj216fyS+5OuROAjFeuPPDq+qzRVOcISXnCfxzFjXL2PWvPc/RyMN1d g82gMzwczv8EFhag5MdD5FMyqAxz8BKdeOaKk/QGPQG1XvlGqjuDKJYDCfsHI4F/ 5mUttG40ky0zn3ONQAPC =7NKg -END PGP SIGNATURE-
[jira] [Created] (HIVE-10790) orc file sql excute fail
xiaowei wang created HIVE-10790: --- Summary: orc file sql excute fail Key: HIVE-10790 URL: https://issues.apache.org/jira/browse/HIVE-10790 Project: Hive Issue Type: Bug Components: API Affects Versions: 0.14.0, 0.13.0 Environment: Hadoop 2.5.0-cdh5.3.2 hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang from a text table insert into a orc table,like as insert overwrite table custom.rank_less_orc_none partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text where logdate='2015051500'; will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) ... 8 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34455: HIVE-10550 Dynamic RDD caching optimization for HoS.[Spark Branch]
On 五月 20, 2015, 9:12 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CacheTran.java, line 41 https://reviews.apache.org/r/34455/diff/1/?file=964754#file964754line41 Currently the storage level is memory+disk. Any reason to change it to memory_only? Cache data to disk means that data need serialization and deserialization, it's costly, and sometime may overwhlem the gain of cache, and it's hard to measure programatically, as read from source file just do deserialization, cache in disk need an additional serialization Instead of add an optimizer which may or may not promote performance for user, i think it may be better to narrow the the optimzir scope a little bit, to make sure this optimizer do promote the performance. On 五月 20, 2015, 9:12 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java, line 63 https://reviews.apache.org/r/34455/diff/1/?file=964756#file964756line63 Can we keep the old code around. I understand it's not currently used. Of course we can, it just make the code a little mess, you knon, for others who want to read the cache related code. On 五月 20, 2015, 9:12 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java, line 25 https://reviews.apache.org/r/34455/diff/1/?file=964757#file964757line25 I cannot construct a case where a MapTran would need caching. Do you have an example? For any queries which contains SparkWork like this: MapWork -- ReduceWork \ -- ReduceWork for example, from person_orc insert overwrite table p1 select city, count(*) as s group by city order by s insert overwrite table p2 select city, avg(age) as g group by city order by g; On 五月 20, 2015, 9:12 p.m., Xuefu Zhang wrote: spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java, line 419 https://reviews.apache.org/r/34455/diff/1/?file=964774#file964774line419 Do you think it makes sense for us to release the cache as soon as the job is completed, as it's done here? Theoretically we does not need to, i mean it would not lead to any extra memory leak issue, the only benefit of unpersist cache manually i can image is that it reduce GC effort, as Hive do it programatically instead of let GC collect it. The reason i remove it is that, it add extra complexility to code, and not expandable for share cached RDD cross Spark job. - chengxiang --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34455/#review84572 --- On 五月 20, 2015, 2:37 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34455/ --- (Updated 五月 20, 2015, 2:37 a.m.) Review request for hive, Chao Sun, Jimmy Xiang, and Xuefu Zhang. Bugs: HIVE-10550 https://issues.apache.org/jira/browse/HIVE-10550 Repository: hive-git Description --- see jira description Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CacheTran.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 19d3fee ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 26cfebd ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java 2170243 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java e60dfac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 8b15099 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java ee5c78a ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 3f240f5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/LocalSparkJobStatus.java 5d62596 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkRddCachingResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSkewJoinProcFactory.java 5990d17 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SplitSparkWorkResolver.java fb20080 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java bb5dd79 spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java af6332e spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java
[jira] [Created] (HIVE-10791) Beeline-CLI: Implement in-place update UI for CLI compatibility
Gopal V created HIVE-10791: -- Summary: Beeline-CLI: Implement in-place update UI for CLI compatibility Key: HIVE-10791 URL: https://issues.apache.org/jira/browse/HIVE-10791 Project: Hive Issue Type: Sub-task Affects Versions: beeline-cli-branch Reporter: Gopal V Priority: Critical The current CLI implementation has an in-place updating UI which offers a clear picture of execution runtime and failures. This is designed for large DAGs which have more than 10 verticles, where the old UI would scroll sideways. The new CLI implementation needs to keep up the usability standards set by the old one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
Mostafa Mokhtar created HIVE-10793: -- Summary: Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront Key: HIVE-10793 URL: https://issues.apache.org/jira/browse/HIVE-10793 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 1.2.1 HybridHashTableContainer will allocate memory based on estimate, which means if the actual is less than the estimate the allocated memory won't be used. Number of partitions is calculated based on estimated data size {code} numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, minNumParts, minWbSize, nwayConf); {code} Then based on number of partitions writeBufferSize is set {code} writeBufferSize = (int)(estimatedTableSize / numPartitions); {code} Each hash partition will allocate 1 WriteBuffer, with no further allocation if the estimate data size is correct. Suggested solution is to reduce writeBufferSize by a factor such that only X% of the memory is preallocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10794) Remove the dependence from ErrorMsg to HiveUtils
Owen O'Malley created HIVE-10794: Summary: Remove the dependence from ErrorMsg to HiveUtils Key: HIVE-10794 URL: https://issues.apache.org/jira/browse/HIVE-10794 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley HiveUtils has a large set of dependencies and ErrorMsg only needs the new line constant. Breaking the dependence will reduce the dependency set from ErrorMsg significantly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases
Dayue Gao created HIVE-10792: Summary: PPD leads to wrong answer when mapper scans the same table with multiple aliases Key: HIVE-10792 URL: https://issues.apache.org/jira/browse/HIVE-10792 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 1.2.0, 1.0.0, 0.13.1, 0.14.0, 0.13.0, 1.1.0 Reporter: Dayue Gao Assignee: Dayue Gao Priority: Critical Here's the steps to reproduce the bug. First of all, prepare a simple ORC table with one row {code} create table test_orc (c0 int, c1 int) stored as ORC; {code} Table: test_orc ||c0||c1|| |0|1| The following SQL gets empty result which is not expected {code} select * from test_orc t1 union all select * from test_orc t2 where t2.c0 = 1 {code} Self join is also broken {code} set hive.auto.convert.join=false; -- force common join select * from test_orc t1 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0); {code} It gets empty result while the expected answer is ||t1.c0||t1.c1||t2.c0||t2.c1|| |0|1|NULL|NULL| In these cases, we pushdown predicates into OrcInputFormat. As a result, TableScanOperator for t1 can't receive its rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 34586: HIVE-10704
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34586/ --- Review request for hive. Repository: hive-git Description --- fix biggest small table selection when table sizes are 0 fallback to dividing memory equally if any tables have invalid size Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 536b92c5dd03abe9ff57bf64d87be0f3ef34aa7a Diff: https://reviews.apache.org/r/34586/diff/ Testing --- Thanks, Mostafa Mokhtar
Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34473/#review84758 --- ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java https://reviews.apache.org/r/34473/#comment136104 Could you separate words with _? Like ENABLE_ACID_SCHEMA_INFO. It helps to read the constant more easily. Do we have to enable transactions exclusively for parquet? Isn't there another variable that enables trasnactions on Hive that we can use? ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java https://reviews.apache.org/r/34473/#comment136111 Could you separate the workds? Like ENABLE_ACID_SCHEMA_INFO. It makes the code more readable. Also, isn't there another variable that we can use to detect if transactions are enabled? I am not sure if we should add more variables to Hive. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java https://reviews.apache.org/r/34473/#comment136107 You can use this one line to return the column list: return (ListString) StringUtils.getStringCollection(tableProperties.getProperty(IOConstants.COLUMNS)); It will return an empty list array if COLUMN is empty. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java https://reviews.apache.org/r/34473/#comment136112 You can save code by using this line: return (ListString) StringUtils.getStringCollection(tableProperties.getProperty(IOConstants.COLUMNS)); It will return an empty list if the parameter is empty. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java https://reviews.apache.org/r/34473/#comment136108 You can call TypeInfoUtils.getTypeINfosFromTypeString() with an empty string here. It will return an empty list. Let's save code by using: ArrayListTypeInfo columnTypes = TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty); ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java https://reviews.apache.org/r/34473/#comment136113 You can save code by using this line: ArrayListTypeInfo columnTypes = TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty); It will return an empty list if the parameter is empty. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java https://reviews.apache.org/r/34473/#comment136109 Same here, you can save code with this: ArrayListString columnNames = (ArrayListString) StringUtils.getStringCollection(columnNameProperty); ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java https://reviews.apache.org/r/34473/#comment136114 Same thing here: ArrayListString columnNames = (ArrayListString) StringUtils.getStringCollection(columnNameProperty); ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java https://reviews.apache.org/r/34473/#comment136117 Why do you need a Writable? HIVE-9658 tries to avoid wrapping java types into writable if they are being used by Hive to save memory usage. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java https://reviews.apache.org/r/34473/#comment136116 I am waiting to commit the patch from HIVE-10749 that uses a similar class named ObjectArrayWritableObjectInspector. Also, I think this is already part o the parquet branch. - Sergio Pena On May 21, 2015, 7:45 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34473/ --- (Updated May 21, 2015, 7:45 a.m.) Review request for hive, Alan Gates and Sergio Pena. Bugs: HIVE-10749 https://issues.apache.org/jira/browse/HIVE-10749 Repository: hive-git Description --- Implement the insert statement for parquet format. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java c6fb26c ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java f513572 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java PRE-CREATION ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34473/diff/ Testing --- Newly added qtest and UT passed locally
Re: [ANNOUNCE] New Hive Committer - Chaoyu Tang
Congrats Chaoyu! On Thu, May 21, 2015 at 9:17 AM, Sergio Pena sergio.p...@cloudera.com wrote: Congratulations Chaoyu !!! On Wed, May 20, 2015 at 5:29 PM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Chaoyu Tang a committer on the Apache Hive Project. Please join me in congratulating Chaoyu! Thanks. - Carl -- Swarnim
[jira] [Created] (HIVE-10785) Support Aggregate push down through joins
Jesus Camacho Rodriguez created HIVE-10785: -- Summary: Support Aggregate push down through joins Key: HIVE-10785 URL: https://issues.apache.org/jira/browse/HIVE-10785 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Enable {{AggregateJoinTransposeRule}} in CBO that pushes Aggregate through Join operators (if possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10786) Propagate range for column stats
Jesus Camacho Rodriguez created HIVE-10786: -- Summary: Propagate range for column stats Key: HIVE-10786 URL: https://issues.apache.org/jira/browse/HIVE-10786 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez For column stats, Calcite doesn't propagate range. Range of a col will help us in deciding filter cardinality for inequality. Range of values of a column and NDV together will help us to get build histograms of uniform height. This needs special handling for each operator: - Inner Join where col is part of join key: range is lowest range of lhs, rhs - Outer Join: range of outer side if col is from outer side - Filter inequality on literal (x10): Range is restricted on upper side by literal value -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34473/#review84729 --- ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java https://reviews.apache.org/r/34473/#comment136066 Do you intend to use this in conjunction with hive.hcatalog.streaming? If so, closing the file on a flush is not what you'll want. - Alan Gates On May 21, 2015, 7:45 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34473/ --- (Updated May 21, 2015, 7:45 a.m.) Review request for hive, Alan Gates and Sergio Pena. Bugs: HIVE-10749 https://issues.apache.org/jira/browse/HIVE-10749 Repository: hive-git Description --- Implement the insert statement for parquet format. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java c6fb26c ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java f513572 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java PRE-CREATION ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34473/diff/ Testing --- Newly added qtest and UT passed locally Thanks, cheng xu
[jira] [Created] (HIVE-10787) MatchPath misses the last matched row from the final result set
Mohammad Kamrul Islam created HIVE-10787: Summary: MatchPath misses the last matched row from the final result set Key: HIVE-10787 URL: https://issues.apache.org/jira/browse/HIVE-10787 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 1.2.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam For example, if you have a STAR(*) pattern at the end, the current code misses the last row from the final result. For example, if I have pattern like (LATE.EARLY*), the matched rows are : 1. LATE 2. EARLY In the current implementation, the final 'tpath' missed the last EARLY and returns only LATE . Ideally it should return LATE and EARLY. The following code snippets shows the bug. {noformat} 0. SymbolFunctionResult rowResult = symbolFn.match(row, pItr); 1. while (rowResult.matches pItr.hasNext()) 2.{ 3. row = pItr.next(); 4.rowResult = symbolFn.match(row, pItr); 5. } 6. 7. result.nextRow = pItr.getIndex() - 1; {noformat} Line 7 of the code always moves the row index by one. If ,in some cases, loop (line 1) is never executed (due to pItr.hasNext() being 'false'), the code still moves the row pointer back by one. Although the line 0 found the first match and the iterator reaches to the end. I'm uploading a patch which I already tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument
On May 21, 2015, 7:18 a.m., Lenni Kuff wrote: lgtm - I assume this works with decimal (with scale/precision) and char/varchar? Maybe add one test case for those? OK, I added a few tests for decimal and varchar. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/#review84669 --- On May 21, 2015, 6:44 a.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/ --- (Updated May 21, 2015, 6:44 a.m.) Review request for hive. Bugs: HIVE-10427 https://issues.apache.org/jira/browse/HIVE-10427 Repository: hive-git Description --- Currently for collect_list() and collect_set(), only primitive types are supported. This patch adds support for struct, list and map types as well. It turned out I that all I need is loosen the type checking. Diffs - data/files/customers.txt PRE-CREATION data/files/nested_orders.txt PRE-CREATION data/files/orders.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 536c4a7 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 6dc424a ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java efcc8f5 ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q PRE-CREATION ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out PRE-CREATION ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34393/diff/ Testing --- All but one test (which seems unrelated) are passing. I also added a test: udaf_collect_list_set_2.q Thanks, Chao Sun
Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/ --- (Updated May 21, 2015, 5:30 p.m.) Review request for hive. Changes --- Added a few tests for decimal and varchar. Also the behavior for `sort_array` to resolve the ordering issue of test result. Currently `sort_array` since it only accept list of primitives, but since we already support comparison between nested types (map, struct, union, etc), I think it makes sense to remove this limitation. Bugs: HIVE-10427 https://issues.apache.org/jira/browse/HIVE-10427 Repository: hive-git Description --- Currently for collect_list() and collect_set(), only primitive types are supported. This patch adds support for struct, list and map types as well. It turned out I that all I need is loosen the type checking. Diffs (updated) - data/files/customers.txt PRE-CREATION data/files/nested_orders.txt PRE-CREATION data/files/orders.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 536c4a7 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 6dc424a ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java efcc8f5 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArray.java 2d6d58c ql/src/test/queries/clientnegative/udaf_collect_set_unsupported.q PRE-CREATION ql/src/test/queries/clientnegative/udf_sort_array_wrong3.q 034de06 ql/src/test/queries/clientpositive/udaf_collect_set_2.q PRE-CREATION ql/src/test/results/clientnegative/udaf_collect_set_unsupported.q.out PRE-CREATION ql/src/test/results/clientnegative/udf_sort_array_wrong2.q.out c068ecd ql/src/test/results/clientpositive/udaf_collect_set_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34393/diff/ Testing --- All but one test (which seems unrelated) are passing. I also added a test: udaf_collect_list_set_2.q Thanks, Chao Sun