Review Request 36939: HIVE-11376: CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files

2015-07-30 Thread Rajat Khandelwal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36939/
---

Review request for hive.


Bugs: HIVE-11376
https://issues.apache.org/jira/browse/HIVE-11376


Repository: hive-git


Description
---

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379

This is the exact code snippet:

{noformat}

/ Since there is no easy way of knowing whether MAPREDUCE-1597 is present in 
the tree or not,
  // we use a configuration variable for the same
  if (this.mrwork != null  !this.mrwork.getHadoopSupportsSplittable()) {
// The following code should be removed, once
// https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed.
// Hadoop does not handle non-splittable files correctly for 
CombineFileInputFormat,
// so don't use CombineFileInputFormat for non-splittable files

//ie, dont't combine if inputformat is a TextInputFormat and has 
compression turned on

{noformat}


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
33b67dd7b0fde41f81f8d86ea8c83d29c631e3d7 
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
1de7e4073f5eea4c7be8423a7ebe6a89cb51d9f1 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
693d8c7e9f956999b2da33593d780e37ddf2b3b8 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 
3217df27bb5731a1dcd5db1ae17c5bdff2e3fbfc 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 
76926e79729d9ea4823de0ffc9b1e5bac6364842 

Diff: https://reviews.apache.org/r/36939/diff/


Testing
---


Thanks,

Rajat Khandelwal



Re: Review Request 36939: HIVE-11376: CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files

2015-07-30 Thread Rajat Khandelwal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36939/
---

(Updated July 30, 2015, 6:49 p.m.)


Review request for hive.


Bugs: HIVE-11376
https://issues.apache.org/jira/browse/HIVE-11376


Repository: hive-git


Description
---

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379

This is the exact code snippet:

{noformat}

/ Since there is no easy way of knowing whether MAPREDUCE-1597 is present in 
the tree or not,
  // we use a configuration variable for the same
  if (this.mrwork != null  !this.mrwork.getHadoopSupportsSplittable()) {
// The following code should be removed, once
// https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed.
// Hadoop does not handle non-splittable files correctly for 
CombineFileInputFormat,
// so don't use CombineFileInputFormat for non-splittable files

//ie, dont't combine if inputformat is a TextInputFormat and has 
compression turned on

{noformat}


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
33b67dd7b0fde41f81f8d86ea8c83d29c631e3d7 
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
1de7e4073f5eea4c7be8423a7ebe6a89cb51d9f1 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
693d8c7e9f956999b2da33593d780e37ddf2b3b8 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 
3217df27bb5731a1dcd5db1ae17c5bdff2e3fbfc 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 
76926e79729d9ea4823de0ffc9b1e5bac6364842 

Diff: https://reviews.apache.org/r/36939/diff/


Testing
---


Thanks,

Rajat Khandelwal



[jira] [Created] (HIVE-11411) Transaction lock

2015-07-30 Thread shiqian.huang (JIRA)
shiqian.huang created HIVE-11411:


 Summary: Transaction lock
 Key: HIVE-11411
 URL: https://issues.apache.org/jira/browse/HIVE-11411
 Project: Hive
  Issue Type: Wish
Reporter: shiqian.huang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36942/
---

Review request for hive and Aihua Xu.


Bugs: HIVE-11401
https://issues.apache.org/jira/browse/HIVE-11401


Repository: hive-git


Description
---

The following patch reviews the predicate created by Hive, and removes any 
column that does not belong to the Parquet schema, such as partitioned columns. 
This way Parquet can filter the columns correctly.


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 49e52da2e26fd7213df1db88716eaee94cb536b8 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java
 87dd344534f09c7fc565fdc467ac82a51f37ebba 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java
 PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 
85e952fb6855a2a03902ed971f54191837b32dac 
  ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/36942/diff/


Testing
---

Unit tests: TestParquetFilterPredicate.java
Integration tests: parquet_predicate_pushdown.q


Thanks,

Sergio Pena



[jira] [Created] (HIVE-11414) Fix OOM in MapTask with many input partitions by making ColumnarSerDeBase's cachedLazyStruct weakly referenced

2015-07-30 Thread Zheng Shao (JIRA)
Zheng Shao created HIVE-11414:
-

 Summary: Fix OOM in MapTask with many input partitions by making 
ColumnarSerDeBase's cachedLazyStruct weakly referenced
 Key: HIVE-11414
 URL: https://issues.apache.org/jira/browse/HIVE-11414
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 1.2.0, 0.13.1, 0.14.0, 0.12.0, 0.11.0
Reporter: Zheng Shao
Priority: Minor


MapTask hit OOM in the following situation in our production environment:
* src: 2048 partitions, each with 1 file of about 2MB using RCFile format
* query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
* Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
* MapTask memory Xmx: 1.5GB

By analyzing the heap dump using jhat, we realized that the problem is:
* One single mapper is processing many partitions (because of 
CombineHiveInputFormat)
* Each input path (equivalent to partition here) will construct its own SerDe
* Each SerDe will do its own caching of deserialized object (and try to reuse 
it), but will never release it (in this case, the 
serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a 
lot of space - pretty much the last N rows of a file where N is the number of 
rows in a columnar block).
* This problem may exist in other SerDe as well, but columnar file format are 
affected the most because they need bigger cache for the last N rows instead of 
1 row.

Proposed solution:
* Make cachedLazyStruct a weakly referenced object.  Do similar changes to 
other columnar serde if any (e.g. maybe ORCFile's serde as well).

Alternative solutions:
* We can also free up the whole SerDe after processing a block/file.  The 
problem with that is that the input splits may contain multiple blocks/files 
that maps to the same SerDe, and recreating a SerDe is just more work.
* We can also move the SerDe creation/free-up to the place when input file 
changes.  But that requires a much bigger change to the code.
* We can also add a cleanup() method to SerDe interface that release the 
cached object, but that change is not backward compatible with many SerDes that 
people have wrote.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread Reuben Kuhnert

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36942/#review93651
---


This looks good to me.

- Reuben Kuhnert


On July 30, 2015, 9:22 p.m., Sergio Pena wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36942/
 ---
 
 (Updated July 30, 2015, 9:22 p.m.)
 
 
 Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho.
 
 
 Bugs: HIVE-11401
 https://issues.apache.org/jira/browse/HIVE-11401
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The following patch reviews the predicate created by Hive, and removes any 
 column that does not belong to the Parquet schema, such as partitioned 
 columns. This way Parquet can filter the columns correctly.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
  49e52da2e26fd7213df1db88716eaee94cb536b8 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java
  87dd344534f09c7fc565fdc467ac82a51f37ebba 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 
 85e952fb6855a2a03902ed971f54191837b32dac 
   ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/36942/diff/
 
 
 Testing
 ---
 
 Unit tests: TestParquetFilterPredicate.java
 Integration tests: parquet_predicate_pushdown.q
 
 
 Thanks,
 
 Sergio Pena
 




[jira] [Created] (HIVE-11415) Add early termination for recursion in vectorization for deep filter queries

2015-07-30 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-11415:


 Summary: Add early termination for recursion in vectorization for 
deep filter queries
 Key: HIVE-11415
 URL: https://issues.apache.org/jira/browse/HIVE-11415
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth Jayachandran


Queries with deep filters (left deep) throws StackOverflowException in 
vectorization
{code}
Exception in thread main java.lang.StackOverflowError
at java.lang.Class.getAnnotation(Class.java:3415)
at 
org.apache.hive.common.util.AnnotationUtils.getAnnotation(AnnotationUtils.java:29)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorExpressionDescriptor.getVectorExpressionClass(VectorExpressionDescriptor.java:332)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:988)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:439)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.createVectorExpression(VectorizationContext.java:1014)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:996)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164)
{code}

Sample query:
{code}
explain select count(*) from over1k where (
(t=1 and si=2)
or (t=2 and si=3)
or (t=3 and si=4) 
or (t=4 and si=5) 
or (t=5 and si=6) 
or (t=6 and si=7) 
or (t=7 and si=8)
...
..
{code}
repeat the filter for few thousand times for reproduction of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36942/
---

(Updated July 30, 2015, 9:22 p.m.)


Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho.


Changes
---

Thanks Reuben for your feedback. 
This new patch includes fixes for your comments and the failured tests that 
appear on Jira.


Bugs: HIVE-11401
https://issues.apache.org/jira/browse/HIVE-11401


Repository: hive-git


Description
---

The following patch reviews the predicate created by Hive, and removes any 
column that does not belong to the Parquet schema, such as partitioned columns. 
This way Parquet can filter the columns correctly.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 49e52da2e26fd7213df1db88716eaee94cb536b8 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java
 87dd344534f09c7fc565fdc467ac82a51f37ebba 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java
 PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 
85e952fb6855a2a03902ed971f54191837b32dac 
  ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/36942/diff/


Testing
---

Unit tests: TestParquetFilterPredicate.java
Integration tests: parquet_predicate_pushdown.q


Thanks,

Sergio Pena



[jira] [Created] (HIVE-11416) CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY

2015-07-30 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-11416:
--

 Summary: CBO: Calcite Operator To Hive Operator (Calcite Return 
Path): Groupby Optimizer assumes the schema can match after removing RS and GBY
 Key: HIVE-11416
 URL: https://issues.apache.org/jira/browse/HIVE-11416
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11418) Dropping a database in an encryption zone with CASCADE and trash enabled fails

2015-07-30 Thread JIRA
Sergio Peña created HIVE-11418:
--

 Summary: Dropping a database in an encryption zone with CASCADE 
and trash enabled fails
 Key: HIVE-11418
 URL: https://issues.apache.org/jira/browse/HIVE-11418
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 1.2.0
Reporter: Sergio Peña


Here's the query that fails:

{noformat}
hive CREATE DATABASE db;
hive USE db;
hive CREATE TABLE a(id int);
hive SET fs.trash.interval=1;
hive DROP DATABASE db CASCADE;
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop 
db.a because it is in an encryption zone and trash
 is enabled.  Use PURGE option to skip trash.)
{noformat}

DROP DATABASE does not support PURGE, so we have to remove the tables one by 
one, and then drop the database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11420) add support for set autocommit

2015-07-30 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-11420:
-

 Summary: add support for set autocommit
 Key: HIVE-11420
 URL: https://issues.apache.org/jira/browse/HIVE-11420
 Project: Hive
  Issue Type: Sub-task
  Components: CLI, Transactions
Affects Versions: 1.3.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


HIVE-11077 add support for set autocommit true/false.
should add support for set autocommit to return the current value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11421) Support Schema evolution for ACID tables

2015-07-30 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-11421:
-

 Summary: Support Schema evolution for ACID tables
 Key: HIVE-11421
 URL: https://issues.apache.org/jira/browse/HIVE-11421
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


Currently schema evolution is not supported for ACID tables.
Whatever limitations ORC based tables have in general wrt to schema evolution 
applies to ACID tables.  Generally, it's possible to have ORC based table in 
Hive where different partitions have different schemas as long as all data 
files in each partition have the same schema (and matches metastore partition 
information)

With ACID tables the above as long as ... part can easily be violated.
{noformat}
CREATE TABLE acid_partitioned2(a INT, b STRING) PARTITIONED BY(bkt INT) 
CLUSTERED BY(a) INTO 2 BUCKETS STORED AS ORC;
insert into table acid_partitioned2 partition(bkt=1) values(1, 'part one'),(2, 
'part one'), (3, 'part two'),(4, 'part three');
alter table acid_partitioned2 add columns(c int, d string);
insert into table acid_partitioned2 partition(bkt=2) values(1, 'part one', 10, 
'str10'),(2, 'part one', 20, 'str20'), (3, 'part two', 30, 'str30'),(4, 'part 
three', 40, 'str40');
insert into table acid_partitioned2 partition(bkt=1) values(5, 'part one', 1, 
'blah'),(6, 'part one', 2, 'doh!');
{noformat}


Now partition bkt=1 will have delta files with different schemas which have to 
be merged on read, which leads to 

{noformat}
Error: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 9
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:247)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 9
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.init(RecordReaderImpl.java:1864)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.createTreeReader(RecordReaderImpl.java:2263)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.access$000(RecordReaderImpl.java:77)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.init(RecordReaderImpl.java:1865)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.createTreeReader(RecordReaderImpl.java:2263)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.init(RecordReaderImpl.java:283)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:492)
at 
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.init(OrcRawRecordMerger.java:181)
at 
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.init(OrcRawRecordMerger.java:460)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1109)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1007)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:245)
... 8 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 36962: CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY

2015-07-30 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36962/
---

Review request for hive and Jesús Camacho Rodríguez.


Repository: hive-git


Description
---

solution is to add a SEL in between


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 0f02737 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GroupByOptimizer.java af54286 

Diff: https://reviews.apache.org/r/36962/diff/


Testing
---


Thanks,

pengcheng xiong



Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread Aihua Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36942/#review93649
---



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 (line 142)
https://reviews.apache.org/r/36942/#comment148060

nit: typo - should be schema.



ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q (line 7)
https://reviews.apache.org/r/36942/#comment148066

Can you add a new partition p2 with data so that we can show only the data 
from p1 are returned? 

Not completely follow the logic, but I'm worrying that p='p1' gets removed.


- Aihua Xu


On July 30, 2015, 9:22 p.m., Sergio Pena wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36942/
 ---
 
 (Updated July 30, 2015, 9:22 p.m.)
 
 
 Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho.
 
 
 Bugs: HIVE-11401
 https://issues.apache.org/jira/browse/HIVE-11401
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The following patch reviews the predicate created by Hive, and removes any 
 column that does not belong to the Parquet schema, such as partitioned 
 columns. This way Parquet can filter the columns correctly.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
  49e52da2e26fd7213df1db88716eaee94cb536b8 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java
  87dd344534f09c7fc565fdc467ac82a51f37ebba 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 
 85e952fb6855a2a03902ed971f54191837b32dac 
   ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/36942/diff/
 
 
 Testing
 ---
 
 Unit tests: TestParquetFilterPredicate.java
 Integration tests: parquet_predicate_pushdown.q
 
 
 Thanks,
 
 Sergio Pena
 




[jira] [Created] (HIVE-11419) hive-shims-0.23 doesn't declare yarn-server-resourcemanager dependency as provided

2015-07-30 Thread Steve Loughran (JIRA)
Steve Loughran created HIVE-11419:
-

 Summary: hive-shims-0.23 doesn't declare 
yarn-server-resourcemanager dependency as provided
 Key: HIVE-11419
 URL: https://issues.apache.org/jira/browse/HIVE-11419
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 1.2.1
Reporter: Steve Loughran
Priority: Minor


hive-shims-0.23 doesn't declare its {{hadoop-arn-server-resourcemanager}} 
dependency as optional, so you get hive 2.6.0 on your classpath unless you 
explicitly excluded it.

see: 
[[http://mvnrepository.com/artifact/org.apache.hive.shims/hive-shims-0.23/1.2.1]]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11417) Create ObjectInspectors for VectorizedRowBatch

2015-07-30 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11417:


 Summary: Create ObjectInspectors for VectorizedRowBatch
 Key: HIVE-11417
 URL: https://issues.apache.org/jira/browse/HIVE-11417
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


I'd like to make the default path for reading and writing ORC files to be 
vectorized. To ensure that Hive can still read row by row, I'll make 
ObjectInspectors that are backed by the VectorizedRowBatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11423) Ship hive-storage-api along with hive-exec jar to all Tasks

2015-07-30 Thread Gopal V (JIRA)
Gopal V created HIVE-11423:
--

 Summary: Ship hive-storage-api along with hive-exec jar to all 
Tasks
 Key: HIVE-11423
 URL: https://issues.apache.org/jira/browse/HIVE-11423
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 2.0.0
Reporter: Gopal V
Priority: Blocker


After moving critical classes into hive-storage-api, those classes are needed 
for queries to execute successfully.

Currently all queries run fail with ClassNotFound exceptions on a large cluster.

{code}
Caused by: java.lang.NoClassDefFoundError: 
Lorg/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch;
at java.lang.Class.getDeclaredFields0(Native Method)
at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
at java.lang.Class.getDeclaredFields(Class.java:1916)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.rebuildCachedFields(FieldSerializer.java:150)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.init(FieldSerializer.java:109)
... 57 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 62 more
{code}

Temporary workaround added to hiverc: {{add jar 
./dist/hive/lib/hive-storage-api-2.0.0-SNAPSHOT.jar;}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11422) Join a ACID table with non-ACID table fail with MR

2015-07-30 Thread Daniel Dai (JIRA)
Daniel Dai created HIVE-11422:
-

 Summary: Join a ACID table with non-ACID table fail with MR
 Key: HIVE-11422
 URL: https://issues.apache.org/jira/browse/HIVE-11422
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.3.0
Reporter: Daniel Dai
 Fix For: 1.3.0, 2.0.0


The following script fail on MR mode:
{code}
CREATE TABLE orc_update_table (k1 INT, f1 STRING, op_code STRING) 
CLUSTERED BY (k1) INTO 2 BUCKETS 
STORED AS ORC TBLPROPERTIES(transactional=true); 
INSERT INTO TABLE orc_update_table VALUES (1, 'a', 'I');
CREATE TABLE orc_table (k1 INT, f1 STRING) 
CLUSTERED BY (k1) SORTED BY (k1) INTO 2 BUCKETS 
STORED AS ORC; 
INSERT OVERWRITE TABLE orc_table VALUES (1, 'x');
SET hive.execution.engine=mr; 
SET hive.auto.convert.join=false; 
SET hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
SELECT t1.*, t2.* FROM orc_table t1 
JOIN orc_update_table t2 ON t1.k1=t2.k1 ORDER BY t1.k1;
{code}
Stack:
{code}
Error: java.io.IOException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:701)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.io.AcidUtils.deserializeDeltas(AcidUtils.java:368)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1211)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1129)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
... 9 more
{code}

The script pass in 1.2.0 release however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread cheng xu


 On July 31, 2015, 1:35 p.m., cheng xu wrote:
  Looks good to me. Just one minor question.

Also I am wondering why this search argument works fine for ORC.


- cheng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36942/#review93697
---


On July 31, 2015, 5:22 a.m., Sergio Pena wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36942/
 ---
 
 (Updated July 31, 2015, 5:22 a.m.)
 
 
 Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho.
 
 
 Bugs: HIVE-11401
 https://issues.apache.org/jira/browse/HIVE-11401
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The following patch reviews the predicate created by Hive, and removes any 
 column that does not belong to the Parquet schema, such as partitioned 
 columns. This way Parquet can filter the columns correctly.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
  49e52da2e26fd7213df1db88716eaee94cb536b8 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java
  87dd344534f09c7fc565fdc467ac82a51f37ebba 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 
 85e952fb6855a2a03902ed971f54191837b32dac 
   ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/36942/diff/
 
 
 Testing
 ---
 
 Unit tests: TestParquetFilterPredicate.java
 Integration tests: parquet_predicate_pushdown.q
 
 
 Thanks,
 
 Sergio Pena
 




Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36942/#review93697
---


Looks good to me. Just one minor question.


ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
 (lines 103 - 104)
https://reviews.apache.org/r/36942/#comment148112

Why we need to create the leaf when columns is null?


- cheng xu


On July 31, 2015, 5:22 a.m., Sergio Pena wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36942/
 ---
 
 (Updated July 31, 2015, 5:22 a.m.)
 
 
 Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho.
 
 
 Bugs: HIVE-11401
 https://issues.apache.org/jira/browse/HIVE-11401
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The following patch reviews the predicate created by Hive, and removes any 
 column that does not belong to the Parquet schema, such as partitioned 
 columns. This way Parquet can filter the columns correctly.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
  49e52da2e26fd7213df1db88716eaee94cb536b8 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java
  87dd344534f09c7fc565fdc467ac82a51f37ebba 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 
 85e952fb6855a2a03902ed971f54191837b32dac 
   ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/36942/diff/
 
 
 Testing
 ---
 
 Unit tests: TestParquetFilterPredicate.java
 Integration tests: parquet_predicate_pushdown.q
 
 
 Thanks,
 
 Sergio Pena
 




Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread Reuben Kuhnert

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36942/#review93587
---



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
 (line 54)
https://reviews.apache.org/r/36942/#comment147977

If the goal here is to get just the top-level fields, can we do something 
like:

```
for (Type field : schema.getFields()) {  
  columns.add(field.getName());
}
``` 

This might be a little bit clearer.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
 (line 64)
https://reviews.apache.org/r/36942/#comment147969

Minor nit: Since we have the opportunity to fix it, can we change 'leafs' 
to 'leaves'.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
 (line 102)
https://reviews.apache.org/r/36942/#comment147978

ListT has O(N) lookup time. Can we store this in a SetT (O(1)) instead?


- Reuben Kuhnert


On July 30, 2015, 3:43 p.m., Sergio Pena wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36942/
 ---
 
 (Updated July 30, 2015, 3:43 p.m.)
 
 
 Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho.
 
 
 Bugs: HIVE-11401
 https://issues.apache.org/jira/browse/HIVE-11401
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The following patch reviews the predicate created by Hive, and removes any 
 column that does not belong to the Parquet schema, such as partitioned 
 columns. This way Parquet can filter the columns correctly.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
  49e52da2e26fd7213df1db88716eaee94cb536b8 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java
  87dd344534f09c7fc565fdc467ac82a51f37ebba 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 
 85e952fb6855a2a03902ed971f54191837b32dac 
   ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/36942/diff/
 
 
 Testing
 ---
 
 Unit tests: TestParquetFilterPredicate.java
 Integration tests: parquet_predicate_pushdown.q
 
 
 Thanks,
 
 Sergio Pena
 




Hive-0.14 - Build # 1028 - Still Failing

2015-07-30 Thread Apache Jenkins Server
Changes for Build #1007

Changes for Build #1008

Changes for Build #1009

Changes for Build #1010

Changes for Build #1011

Changes for Build #1012

Changes for Build #1013

Changes for Build #1014

Changes for Build #1015

Changes for Build #1016

Changes for Build #1017

Changes for Build #1018

Changes for Build #1019

Changes for Build #1020

Changes for Build #1021

Changes for Build #1022

Changes for Build #1023

Changes for Build #1024

Changes for Build #1025

Changes for Build #1026

Changes for Build #1027

Changes for Build #1028



No tests ran.

The Apache Jenkins build system has built Hive-0.14 (build #1028)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-0.14/1028/ to view 
the results.

[jira] [Created] (HIVE-11412) StackOverFlow in SemanticAnalyzer for huge filters (~5000)

2015-07-30 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-11412:


 Summary: StackOverFlow in SemanticAnalyzer for huge filters (~5000)
 Key: HIVE-11412
 URL: https://issues.apache.org/jira/browse/HIVE-11412
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth Jayachandran


Queries with ~5000 filter conditions fails in SemanticAnalysis

Stack trace:
{code}
Exception in thread main java.lang.StackOverflowError
at java.util.HashMap.hash(HashMap.java:366)
at java.util.HashMap.getEntry(HashMap.java:466)
at java.util.HashMap.containsKey(HashMap.java:453)
at 
org.apache.commons.collections.map.AbstractMapDecorator.containsKey(AbstractMapDecorator.java:83)
at 
org.apache.hadoop.conf.Configuration.isDeprecated(Configuration.java:558)
at 
org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:605)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:885)
at 
org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:907)
at 
org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1308)
at org.apache.hadoop.hive.conf.HiveConf.getBoolVar(HiveConf.java:2641)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processPositionAlias(SemanticAnalyzer.java:11132)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processPositionAlias(SemanticAnalyzer.java:11226)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processPositionAlias(SemanticAnalyzer.java:11226)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processPositionAlias(SemanticAnalyzer.java:11226)

{code}

Query:
{code}
explain select count(*) from over1k where (
(t=1 and si=2)
or (t=2 and si=3)
or (t=3 and si=4) 
or (t=4 and si=5) 
or (t=5 and si=6) 
or (t=6 and si=7) 
or (t=7 and si=8)
or (t=7 and si=8)
or (t=7 and si=8)
...
{code}
Repeat the filter around 5000 times. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11413) Error in detecting availability of HiveSemanticAnalyzerHooks

2015-07-30 Thread Raajay Viswanathan (JIRA)
Raajay Viswanathan created HIVE-11413:
-

 Summary: Error in detecting availability of 
HiveSemanticAnalyzerHooks
 Key: HIVE-11413
 URL: https://issues.apache.org/jira/browse/HIVE-11413
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 2.0.0
Reporter: Raajay Viswanathan
Priority: Trivial


In {{compile(String, Boolean)}} function in {{Driver.java}}, the list of 
available {{HiveSemanticAnalyzerHook}} (_saHooks_) are obtained using the 
{{getHooks}} method. This method always  returns a {{List}} of hooks. 

However, while checking for availability of hooks, the current version of the 
code uses a comparison of _saHooks_ with NULL. This is incorrect, as the 
segment of code designed to call pre and post Analyze functions gets executed 
even when the list is empty. The comparison should be changed to 
{{saHooks.size()  0}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results

2015-07-30 Thread Nicholas Brenwald (JIRA)
Nicholas Brenwald created HIVE-11410:


 Summary: Join with subquery containing a group by incorrectly 
returns no results
 Key: HIVE-11410
 URL: https://issues.apache.org/jira/browse/HIVE-11410
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.1.0
Reporter: Nicholas Brenwald
Priority: Minor


Start by creating a table *t* with columns *c1* and *c2* and populate with 1 
row of data. For example create table *t* from an existing table which contains 
at least 1 row of data by running:
{code}
create table t as select 'abc' as c1, 0 as c2 from Y limit 1; 
{code}

Table *t* looks like the following:
||c1||c2||
|abc|0|

Running the following query then returns zero results.
{code}
SELECT 
  t1.c1
FROM 
  t t1
JOIN
(SELECT 
   t2.c1,
   MAX(t2.c2) AS c2
 FROM 
   t t2 
 GROUP BY 
   t2.c1
) t3
ON t1.c2=t3.c2
{code}

However, we expected to see the following:
||c1||
|abc|

The problem seems to relate to the fact that in the subquery, we group by 
column *c1*, but this is not subsequently used in the join condition.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)