Tony Hill created HIVE-16661: -------------------------------- Summary: Parquet storage does not handle 'or' statement properly Key: HIVE-16661 URL: https://issues.apache.org/jira/browse/HIVE-16661 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.1.0 Reporter: Tony Hill
Query on a parquet backed table returns different results based on value of hive.optimize.ppd.storage. Steps to reproduce: CREATE TABLE `test_table`( `some_value` int) PARTITIONED BY ( `date` string, `id` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; set hive.exec.dynamic.partition.mode=nonstrict; insert into test_table PARTITION (date, id) VALUES (12, '2017-04-09', 16), (13, '2017-04-09', 32), (NULL, '2017-04-09', 51), (23, '2017-04-09', 51), (66, '2017-04-09', 16), (17, '2017-04-09', 32), (NULL, '2017-04-09', 32); SELECT distinct id from test_table WHERE id IN (16, 32, 51) AND date = '2017-04-09' AND (id!=32 OR some_value IS NULL); +-----+--+ | id | +-----+--+ | 32 | | 51 | (incorrect) Can be fixed with: set hive.optimize.ppd.storage=false; +-----+--+ | id | +-----+--+ | 16 | | 32 | | 51 | +-----+--+ (correct) Can also be fixed with ..... (id!=32 OR some_value IS NULL)=true; and replacing or with and fixes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)