Yibing Shi created HIVE-16869:
---------------------------------

             Summary: Hive returns wrong result when predicates on non-existing 
columns are pushed down to Parquet reader
                 Key: HIVE-16869
                 URL: https://issues.apache.org/jira/browse/HIVE-16869
             Project: Hive
          Issue Type: Bug
            Reporter: Yibing Shi
            Assignee: Yibing Shi
            Priority: Critical


When {{hive.optimize.ppd}} and {{hive.optimize.index.filter}} are turned, and a 
select query has a condition on a column that doesn't exist in Parquet file 
(such as a partition column), Hive often returns wrong result.

Please see below example for details:
{noformat}
hive> create table test_parq (a int, b int) partitioned by (p int) stored as 
parquet;
OK
Time taken: 0.292 seconds
hive> insert overwrite table test_parq partition (p=1) values (1, 2);
OK
Time taken: 5.08 seconds
hive> select * from test_parq where a=1 and p=1;
OK
1       2       1
Time taken: 0.441 seconds, Fetched: 1 row(s)
hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999);
OK
1       2       1
Time taken: 0.197 seconds, Fetched: 1 row(s)
hive> set hive.optimize.index.filter=true;
hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999);
OK
Time taken: 0.167 seconds
hive> select * from test_parq where (a=1 or a=999) and (a=999 or p=1);
OK
Time taken: 0.563 seconds
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to