Tony Hill created HIVE-16661:
--------------------------------

             Summary: Parquet storage does not handle 'or' statement properly
                 Key: HIVE-16661
                 URL: https://issues.apache.org/jira/browse/HIVE-16661
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 1.1.0
            Reporter: Tony Hill


Query on a parquet backed table returns different results based on value of 
hive.optimize.ppd.storage.

Steps to reproduce:

CREATE TABLE `test_table`(
`some_value` int)
PARTITIONED BY (
`date` string,
`id` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';


set hive.exec.dynamic.partition.mode=nonstrict;

insert into test_table PARTITION (date, id) VALUES (12, '2017-04-09', 16), (13, 
'2017-04-09', 32), (NULL, '2017-04-09', 51), (23, '2017-04-09', 51), (66, 
'2017-04-09', 16), (17, '2017-04-09', 32), (NULL, '2017-04-09', 32);


SELECT distinct id from test_table WHERE id IN (16, 32, 51) AND date = 
'2017-04-09' AND (id!=32 OR some_value IS NULL);
+-----+--+
| id |
+-----+--+
| 32 |
| 51 |
(incorrect)

Can be fixed with:
set hive.optimize.ppd.storage=false;

+-----+--+
| id |
+-----+--+
| 16 |
| 32 |
| 51 |
+-----+--+
(correct)

Can also be fixed with ..... (id!=32 OR some_value IS NULL)=true;
and replacing or with and fixes.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to