Review Request 36939: HIVE-11376: CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36939/ --- Review request for hive. Bugs: HIVE-11376 https://issues.apache.org/jira/browse/HIVE-11376 Repository: hive-git Description --- https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379 This is the exact code snippet: {noformat} / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in the tree or not, // we use a configuration variable for the same if (this.mrwork != null !this.mrwork.getHadoopSupportsSplittable()) { // The following code should be removed, once // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed. // Hadoop does not handle non-splittable files correctly for CombineFileInputFormat, // so don't use CombineFileInputFormat for non-splittable files //ie, dont't combine if inputformat is a TextInputFormat and has compression turned on {noformat} Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 33b67dd7b0fde41f81f8d86ea8c83d29c631e3d7 ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e4073f5eea4c7be8423a7ebe6a89cb51d9f1 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 693d8c7e9f956999b2da33593d780e37ddf2b3b8 ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 3217df27bb5731a1dcd5db1ae17c5bdff2e3fbfc ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 76926e79729d9ea4823de0ffc9b1e5bac6364842 Diff: https://reviews.apache.org/r/36939/diff/ Testing --- Thanks, Rajat Khandelwal
Re: Review Request 36939: HIVE-11376: CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36939/ --- (Updated July 30, 2015, 6:49 p.m.) Review request for hive. Bugs: HIVE-11376 https://issues.apache.org/jira/browse/HIVE-11376 Repository: hive-git Description --- https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379 This is the exact code snippet: {noformat} / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in the tree or not, // we use a configuration variable for the same if (this.mrwork != null !this.mrwork.getHadoopSupportsSplittable()) { // The following code should be removed, once // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed. // Hadoop does not handle non-splittable files correctly for CombineFileInputFormat, // so don't use CombineFileInputFormat for non-splittable files //ie, dont't combine if inputformat is a TextInputFormat and has compression turned on {noformat} Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 33b67dd7b0fde41f81f8d86ea8c83d29c631e3d7 ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e4073f5eea4c7be8423a7ebe6a89cb51d9f1 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 693d8c7e9f956999b2da33593d780e37ddf2b3b8 ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 3217df27bb5731a1dcd5db1ae17c5bdff2e3fbfc ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 76926e79729d9ea4823de0ffc9b1e5bac6364842 Diff: https://reviews.apache.org/r/36939/diff/ Testing --- Thanks, Rajat Khandelwal
[jira] [Created] (HIVE-11411) Transaction lock
shiqian.huang created HIVE-11411: Summary: Transaction lock Key: HIVE-11411 URL: https://issues.apache.org/jira/browse/HIVE-11411 Project: Hive Issue Type: Wish Reporter: shiqian.huang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- Review request for hive and Aihua Xu. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena
[jira] [Created] (HIVE-11414) Fix OOM in MapTask with many input partitions by making ColumnarSerDeBase's cachedLazyStruct weakly referenced
Zheng Shao created HIVE-11414: - Summary: Fix OOM in MapTask with many input partitions by making ColumnarSerDeBase's cachedLazyStruct weakly referenced Key: HIVE-11414 URL: https://issues.apache.org/jira/browse/HIVE-11414 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 1.2.0, 0.13.1, 0.14.0, 0.12.0, 0.11.0 Reporter: Zheng Shao Priority: Minor MapTask hit OOM in the following situation in our production environment: * src: 2048 partitions, each with 1 file of about 2MB using RCFile format * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN. * MapTask memory Xmx: 1.5GB By analyzing the heap dump using jhat, we realized that the problem is: * One single mapper is processing many partitions (because of CombineHiveInputFormat) * Each input path (equivalent to partition here) will construct its own SerDe * Each SerDe will do its own caching of deserialized object (and try to reuse it), but will never release it (in this case, the serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a lot of space - pretty much the last N rows of a file where N is the number of rows in a columnar block). * This problem may exist in other SerDe as well, but columnar file format are affected the most because they need bigger cache for the last N rows instead of 1 row. Proposed solution: * Make cachedLazyStruct a weakly referenced object. Do similar changes to other columnar serde if any (e.g. maybe ORCFile's serde as well). Alternative solutions: * We can also free up the whole SerDe after processing a block/file. The problem with that is that the input splits may contain multiple blocks/files that maps to the same SerDe, and recreating a SerDe is just more work. * We can also move the SerDe creation/free-up to the place when input file changes. But that requires a much bigger change to the code. * We can also add a cleanup() method to SerDe interface that release the cached object, but that change is not backward compatible with many SerDes that people have wrote. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/#review93651 --- This looks good to me. - Reuben Kuhnert On July 30, 2015, 9:22 p.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- (Updated July 30, 2015, 9:22 p.m.) Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena
[jira] [Created] (HIVE-11415) Add early termination for recursion in vectorization for deep filter queries
Prasanth Jayachandran created HIVE-11415: Summary: Add early termination for recursion in vectorization for deep filter queries Key: HIVE-11415 URL: https://issues.apache.org/jira/browse/HIVE-11415 Project: Hive Issue Type: Bug Reporter: Prasanth Jayachandran Queries with deep filters (left deep) throws StackOverflowException in vectorization {code} Exception in thread main java.lang.StackOverflowError at java.lang.Class.getAnnotation(Class.java:3415) at org.apache.hive.common.util.AnnotationUtils.getAnnotation(AnnotationUtils.java:29) at org.apache.hadoop.hive.ql.exec.vector.VectorExpressionDescriptor.getVectorExpressionClass(VectorExpressionDescriptor.java:332) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:988) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:439) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.createVectorExpression(VectorizationContext.java:1014) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:996) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164) {code} Sample query: {code} explain select count(*) from over1k where ( (t=1 and si=2) or (t=2 and si=3) or (t=3 and si=4) or (t=4 and si=5) or (t=5 and si=6) or (t=6 and si=7) or (t=7 and si=8) ... .. {code} repeat the filter for few thousand times for reproduction of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- (Updated July 30, 2015, 9:22 p.m.) Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho. Changes --- Thanks Reuben for your feedback. This new patch includes fixes for your comments and the failured tests that appear on Jira. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena
[jira] [Created] (HIVE-11416) CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY
Pengcheng Xiong created HIVE-11416: -- Summary: CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY Key: HIVE-11416 URL: https://issues.apache.org/jira/browse/HIVE-11416 Project: Hive Issue Type: Sub-task Reporter: Pengcheng Xiong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11418) Dropping a database in an encryption zone with CASCADE and trash enabled fails
Sergio Peña created HIVE-11418: -- Summary: Dropping a database in an encryption zone with CASCADE and trash enabled fails Key: HIVE-11418 URL: https://issues.apache.org/jira/browse/HIVE-11418 Project: Hive Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Sergio Peña Here's the query that fails: {noformat} hive CREATE DATABASE db; hive USE db; hive CREATE TABLE a(id int); hive SET fs.trash.interval=1; hive DROP DATABASE db CASCADE; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop db.a because it is in an encryption zone and trash is enabled. Use PURGE option to skip trash.) {noformat} DROP DATABASE does not support PURGE, so we have to remove the tables one by one, and then drop the database. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11420) add support for set autocommit
Eugene Koifman created HIVE-11420: - Summary: add support for set autocommit Key: HIVE-11420 URL: https://issues.apache.org/jira/browse/HIVE-11420 Project: Hive Issue Type: Sub-task Components: CLI, Transactions Affects Versions: 1.3.0 Reporter: Eugene Koifman Assignee: Eugene Koifman HIVE-11077 add support for set autocommit true/false. should add support for set autocommit to return the current value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11421) Support Schema evolution for ACID tables
Eugene Koifman created HIVE-11421: - Summary: Support Schema evolution for ACID tables Key: HIVE-11421 URL: https://issues.apache.org/jira/browse/HIVE-11421 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Currently schema evolution is not supported for ACID tables. Whatever limitations ORC based tables have in general wrt to schema evolution applies to ACID tables. Generally, it's possible to have ORC based table in Hive where different partitions have different schemas as long as all data files in each partition have the same schema (and matches metastore partition information) With ACID tables the above as long as ... part can easily be violated. {noformat} CREATE TABLE acid_partitioned2(a INT, b STRING) PARTITIONED BY(bkt INT) CLUSTERED BY(a) INTO 2 BUCKETS STORED AS ORC; insert into table acid_partitioned2 partition(bkt=1) values(1, 'part one'),(2, 'part one'), (3, 'part two'),(4, 'part three'); alter table acid_partitioned2 add columns(c int, d string); insert into table acid_partitioned2 partition(bkt=2) values(1, 'part one', 10, 'str10'),(2, 'part one', 20, 'str20'), (3, 'part two', 30, 'str30'),(4, 'part three', 40, 'str40'); insert into table acid_partitioned2 partition(bkt=1) values(5, 'part one', 1, 'blah'),(6, 'part one', 2, 'doh!'); {noformat} Now partition bkt=1 will have delta files with different schemas which have to be merged on read, which leads to {noformat} Error: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 9 at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:247) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:169) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.ArrayIndexOutOfBoundsException: 9 at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.init(RecordReaderImpl.java:1864) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.createTreeReader(RecordReaderImpl.java:2263) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.access$000(RecordReaderImpl.java:77) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.init(RecordReaderImpl.java:1865) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.createTreeReader(RecordReaderImpl.java:2263) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.init(RecordReaderImpl.java:283) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:492) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.init(OrcRawRecordMerger.java:181) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.init(OrcRawRecordMerger.java:460) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1109) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1007) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:245) ... 8 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 36962: CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36962/ --- Review request for hive and Jesús Camacho Rodríguez. Repository: hive-git Description --- solution is to add a SEL in between Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 0f02737 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GroupByOptimizer.java af54286 Diff: https://reviews.apache.org/r/36962/diff/ Testing --- Thanks, pengcheng xiong
Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/#review93649 --- ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java (line 142) https://reviews.apache.org/r/36942/#comment148060 nit: typo - should be schema. ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q (line 7) https://reviews.apache.org/r/36942/#comment148066 Can you add a new partition p2 with data so that we can show only the data from p1 are returned? Not completely follow the logic, but I'm worrying that p='p1' gets removed. - Aihua Xu On July 30, 2015, 9:22 p.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- (Updated July 30, 2015, 9:22 p.m.) Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena
[jira] [Created] (HIVE-11419) hive-shims-0.23 doesn't declare yarn-server-resourcemanager dependency as provided
Steve Loughran created HIVE-11419: - Summary: hive-shims-0.23 doesn't declare yarn-server-resourcemanager dependency as provided Key: HIVE-11419 URL: https://issues.apache.org/jira/browse/HIVE-11419 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 1.2.1 Reporter: Steve Loughran Priority: Minor hive-shims-0.23 doesn't declare its {{hadoop-arn-server-resourcemanager}} dependency as optional, so you get hive 2.6.0 on your classpath unless you explicitly excluded it. see: [[http://mvnrepository.com/artifact/org.apache.hive.shims/hive-shims-0.23/1.2.1]] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11417) Create ObjectInspectors for VectorizedRowBatch
Owen O'Malley created HIVE-11417: Summary: Create ObjectInspectors for VectorizedRowBatch Key: HIVE-11417 URL: https://issues.apache.org/jira/browse/HIVE-11417 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley I'd like to make the default path for reading and writing ORC files to be vectorized. To ensure that Hive can still read row by row, I'll make ObjectInspectors that are backed by the VectorizedRowBatch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11423) Ship hive-storage-api along with hive-exec jar to all Tasks
Gopal V created HIVE-11423: -- Summary: Ship hive-storage-api along with hive-exec jar to all Tasks Key: HIVE-11423 URL: https://issues.apache.org/jira/browse/HIVE-11423 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 2.0.0 Reporter: Gopal V Priority: Blocker After moving critical classes into hive-storage-api, those classes are needed for queries to execute successfully. Currently all queries run fail with ClassNotFound exceptions on a large cluster. {code} Caused by: java.lang.NoClassDefFoundError: Lorg/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch; at java.lang.Class.getDeclaredFields0(Native Method) at java.lang.Class.privateGetDeclaredFields(Class.java:2583) at java.lang.Class.getDeclaredFields(Class.java:1916) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.rebuildCachedFields(FieldSerializer.java:150) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.init(FieldSerializer.java:109) ... 57 more Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 62 more {code} Temporary workaround added to hiverc: {{add jar ./dist/hive/lib/hive-storage-api-2.0.0-SNAPSHOT.jar;}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11422) Join a ACID table with non-ACID table fail with MR
Daniel Dai created HIVE-11422: - Summary: Join a ACID table with non-ACID table fail with MR Key: HIVE-11422 URL: https://issues.apache.org/jira/browse/HIVE-11422 Project: Hive Issue Type: Bug Affects Versions: 1.3.0 Reporter: Daniel Dai Fix For: 1.3.0, 2.0.0 The following script fail on MR mode: {code} CREATE TABLE orc_update_table (k1 INT, f1 STRING, op_code STRING) CLUSTERED BY (k1) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES(transactional=true); INSERT INTO TABLE orc_update_table VALUES (1, 'a', 'I'); CREATE TABLE orc_table (k1 INT, f1 STRING) CLUSTERED BY (k1) SORTED BY (k1) INTO 2 BUCKETS STORED AS ORC; INSERT OVERWRITE TABLE orc_table VALUES (1, 'x'); SET hive.execution.engine=mr; SET hive.auto.convert.join=false; SET hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; SELECT t1.*, t2.* FROM orc_table t1 JOIN orc_update_table t2 ON t1.k1=t2.k1 ORDER BY t1.k1; {code} Stack: {code} Error: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:701) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:169) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.AcidUtils.deserializeDeltas(AcidUtils.java:368) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1211) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1129) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249) ... 9 more {code} The script pass in 1.2.0 release however. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
On July 31, 2015, 1:35 p.m., cheng xu wrote: Looks good to me. Just one minor question. Also I am wondering why this search argument works fine for ORC. - cheng --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/#review93697 --- On July 31, 2015, 5:22 a.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- (Updated July 31, 2015, 5:22 a.m.) Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena
Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/#review93697 --- Looks good to me. Just one minor question. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java (lines 103 - 104) https://reviews.apache.org/r/36942/#comment148112 Why we need to create the leaf when columns is null? - cheng xu On July 31, 2015, 5:22 a.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- (Updated July 31, 2015, 5:22 a.m.) Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena
Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/#review93587 --- ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java (line 54) https://reviews.apache.org/r/36942/#comment147977 If the goal here is to get just the top-level fields, can we do something like: ``` for (Type field : schema.getFields()) { columns.add(field.getName()); } ``` This might be a little bit clearer. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java (line 64) https://reviews.apache.org/r/36942/#comment147969 Minor nit: Since we have the opportunity to fix it, can we change 'leafs' to 'leaves'. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java (line 102) https://reviews.apache.org/r/36942/#comment147978 ListT has O(N) lookup time. Can we store this in a SetT (O(1)) instead? - Reuben Kuhnert On July 30, 2015, 3:43 p.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- (Updated July 30, 2015, 3:43 p.m.) Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena
Hive-0.14 - Build # 1028 - Still Failing
Changes for Build #1007 Changes for Build #1008 Changes for Build #1009 Changes for Build #1010 Changes for Build #1011 Changes for Build #1012 Changes for Build #1013 Changes for Build #1014 Changes for Build #1015 Changes for Build #1016 Changes for Build #1017 Changes for Build #1018 Changes for Build #1019 Changes for Build #1020 Changes for Build #1021 Changes for Build #1022 Changes for Build #1023 Changes for Build #1024 Changes for Build #1025 Changes for Build #1026 Changes for Build #1027 Changes for Build #1028 No tests ran. The Apache Jenkins build system has built Hive-0.14 (build #1028) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-0.14/1028/ to view the results.
[jira] [Created] (HIVE-11412) StackOverFlow in SemanticAnalyzer for huge filters (~5000)
Prasanth Jayachandran created HIVE-11412: Summary: StackOverFlow in SemanticAnalyzer for huge filters (~5000) Key: HIVE-11412 URL: https://issues.apache.org/jira/browse/HIVE-11412 Project: Hive Issue Type: Bug Reporter: Prasanth Jayachandran Queries with ~5000 filter conditions fails in SemanticAnalysis Stack trace: {code} Exception in thread main java.lang.StackOverflowError at java.util.HashMap.hash(HashMap.java:366) at java.util.HashMap.getEntry(HashMap.java:466) at java.util.HashMap.containsKey(HashMap.java:453) at org.apache.commons.collections.map.AbstractMapDecorator.containsKey(AbstractMapDecorator.java:83) at org.apache.hadoop.conf.Configuration.isDeprecated(Configuration.java:558) at org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:605) at org.apache.hadoop.conf.Configuration.get(Configuration.java:885) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:907) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1308) at org.apache.hadoop.hive.conf.HiveConf.getBoolVar(HiveConf.java:2641) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processPositionAlias(SemanticAnalyzer.java:11132) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processPositionAlias(SemanticAnalyzer.java:11226) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processPositionAlias(SemanticAnalyzer.java:11226) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processPositionAlias(SemanticAnalyzer.java:11226) {code} Query: {code} explain select count(*) from over1k where ( (t=1 and si=2) or (t=2 and si=3) or (t=3 and si=4) or (t=4 and si=5) or (t=5 and si=6) or (t=6 and si=7) or (t=7 and si=8) or (t=7 and si=8) or (t=7 and si=8) ... {code} Repeat the filter around 5000 times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11413) Error in detecting availability of HiveSemanticAnalyzerHooks
Raajay Viswanathan created HIVE-11413: - Summary: Error in detecting availability of HiveSemanticAnalyzerHooks Key: HIVE-11413 URL: https://issues.apache.org/jira/browse/HIVE-11413 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 2.0.0 Reporter: Raajay Viswanathan Priority: Trivial In {{compile(String, Boolean)}} function in {{Driver.java}}, the list of available {{HiveSemanticAnalyzerHook}} (_saHooks_) are obtained using the {{getHooks}} method. This method always returns a {{List}} of hooks. However, while checking for availability of hooks, the current version of the code uses a comparison of _saHooks_ with NULL. This is incorrect, as the segment of code designed to call pre and post Analyze functions gets executed even when the list is empty. The comparison should be changed to {{saHooks.size() 0}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results
Nicholas Brenwald created HIVE-11410: Summary: Join with subquery containing a group by incorrectly returns no results Key: HIVE-11410 URL: https://issues.apache.org/jira/browse/HIVE-11410 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.1.0 Reporter: Nicholas Brenwald Priority: Minor Start by creating a table *t* with columns *c1* and *c2* and populate with 1 row of data. For example create table *t* from an existing table which contains at least 1 row of data by running: {code} create table t as select 'abc' as c1, 0 as c2 from Y limit 1; {code} Table *t* looks like the following: ||c1||c2|| |abc|0| Running the following query then returns zero results. {code} SELECT t1.c1 FROM t t1 JOIN (SELECT t2.c1, MAX(t2.c2) AS c2 FROM t t2 GROUP BY t2.c1 ) t3 ON t1.c2=t3.c2 {code} However, we expected to see the following: ||c1|| |abc| The problem seems to relate to the fact that in the subquery, we group by column *c1*, but this is not subsequently used in the join condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)