Review Request 41984: HIVE-12762: Common join on parquet tables returns incorrect result when hive.optimize.index.filter set to true

2016-01-06 Thread Aihua Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41984/
---

Review request for hive, Sergio Pena and Xuefu Zhang.


Repository: hive-git


Description
---

HIVE-12762: Common join on parquet tables returns incorrect result when 
hive.optimize.index.filter set to true


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 
9a7d990baaabfde8e564f00bb1fcfe30cd16dc90 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java 
017676bec2163f04fb95f43224a4f8743fa49f55 
  ql/src/test/queries/clientpositive/parquet_join2.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_join2.q.out PRE-CREATION 
  storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/ExpressionTree.java 
577d95d1a15a54c2804349b3e5e68d83b72df664 
  
storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java 
eeff131cbc14d7ef554517109612ae7d891f8003 

Diff: https://reviews.apache.org/r/41984/diff/


Testing
---

We have two issues: 1. We are filtering the parquet columns based on the last 
filter condition in the query. So if the query contains multiple instances of 
the same table, e.g., join on the same table with different filter conditions, 
then we could get incorrect result; 2. rewriteLeaves implementation in 
SearchArgumentImpl is not accurate since the different leaves could be sharing 
the same object. The current implementation could change the leave index 
multiple times to an incorrect value.

The patch will merge all the filter conditions (create OR expression on all the 
filters) so that the columns which will be used during operator won't be 
filtered during earlier splitting stage. rewriteLeaves is reimplemented to get 
all the unique leaves first and replace in place.


Thanks,

Aihua Xu



Re: Review Request 41984: HIVE-12762: Common join on parquet tables returns incorrect result when hive.optimize.index.filter set to true

2016-01-07 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41984/#review113327
---

Ship it!


Ship It!

- Sergio Pena


On Jan. 6, 2016, 4:52 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41984/
> ---
> 
> (Updated Jan. 6, 2016, 4:52 p.m.)
> 
> 
> Review request for hive, Sergio Pena and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-12762: Common join on parquet tables returns incorrect result when 
> hive.optimize.index.filter set to true
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 
> 9a7d990baaabfde8e564f00bb1fcfe30cd16dc90 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java 
> 017676bec2163f04fb95f43224a4f8743fa49f55 
>   ql/src/test/queries/clientpositive/parquet_join2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_join2.q.out PRE-CREATION 
>   storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/ExpressionTree.java 
> 577d95d1a15a54c2804349b3e5e68d83b72df664 
>   
> storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java
>  eeff131cbc14d7ef554517109612ae7d891f8003 
> 
> Diff: https://reviews.apache.org/r/41984/diff/
> 
> 
> Testing
> ---
> 
> We have two issues: 1. We are filtering the parquet columns based on the last 
> filter condition in the query. So if the query contains multiple instances of 
> the same table, e.g., join on the same table with different filter 
> conditions, then we could get incorrect result; 2. rewriteLeaves 
> implementation in SearchArgumentImpl is not accurate since the different 
> leaves could be sharing the same object. The current implementation could 
> change the leave index multiple times to an incorrect value.
> 
> The patch will merge all the filter conditions (create OR expression on all 
> the filters) so that the columns which will be used during operator won't be 
> filtered during earlier splitting stage. rewriteLeaves is reimplemented to 
> get all the unique leaves first and replace in place.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>