----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/41984/#review113327 -----------------------------------------------------------
Ship it! Ship It! - Sergio Pena On Jan. 6, 2016, 4:52 p.m., Aihua Xu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/41984/ > ----------------------------------------------------------- > > (Updated Jan. 6, 2016, 4:52 p.m.) > > > Review request for hive, Sergio Pena and Xuefu Zhang. > > > Repository: hive-git > > > Description > ------- > > HIVE-12762: Common join on parquet tables returns incorrect result when > hive.optimize.index.filter set to true > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java > 9a7d990baaabfde8e564f00bb1fcfe30cd16dc90 > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java > 017676bec2163f04fb95f43224a4f8743fa49f55 > ql/src/test/queries/clientpositive/parquet_join2.q PRE-CREATION > ql/src/test/results/clientpositive/parquet_join2.q.out PRE-CREATION > storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/ExpressionTree.java > 577d95d1a15a54c2804349b3e5e68d83b72df664 > > storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java > eeff131cbc14d7ef554517109612ae7d891f8003 > > Diff: https://reviews.apache.org/r/41984/diff/ > > > Testing > ------- > > We have two issues: 1. We are filtering the parquet columns based on the last > filter condition in the query. So if the query contains multiple instances of > the same table, e.g., join on the same table with different filter > conditions, then we could get incorrect result; 2. rewriteLeaves > implementation in SearchArgumentImpl is not accurate since the different > leaves could be sharing the same object. The current implementation could > change the leave index multiple times to an incorrect value. > > The patch will merge all the filter conditions (create OR expression on all > the filters) so that the columns which will be used during operator won't be > filtered during earlier splitting stage. rewriteLeaves is reimplemented to > get all the unique leaves first and replace in place. > > > Thanks, > > Aihua Xu > >