[ https://issues.apache.org/jira/browse/HIVE-12678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chaoyu Tang resolved HIVE-12678. -------------------------------- Resolution: Information Provided [~nbrenwald] I am resolving this JIRA since it has been fixed in Hive 2.1.0 and 1.3.0 in HIVE-13039. > BETWEEN relational operator sometimes returns incorrect results against > PARQUET tables > -------------------------------------------------------------------------------------- > > Key: HIVE-12678 > URL: https://issues.apache.org/jira/browse/HIVE-12678 > Project: Hive > Issue Type: Bug > Affects Versions: 1.1.0, 1.2.1 > Reporter: Nicholas Brenwald > Assignee: Chaoyu Tang > > When querying a parquet table, the BETWEEN relational operator returns > incorrect results when hive.optimize.index.filter and > hive.optimize.ppd.storage are enabled > Create a parquet table: > {code} > create table t(c string) stored as parquet; > {code} > Insert some strings representing dates > {code} > insert into t select '2015-12-09' from default.dual limit 1; > insert into t select '2015-12-10' from default.dual limit 1; > insert into t select '2015-12-11' from default.dual limit 1; > {code} > h3. Example 1 > This query correctly returns 3: > {code} > set hive.optimize.index.filter=true; > set hive.optimize.ppd.storage=true; > select count(*) from t where c >= '2015-12-09' and c <= '2015-12-11'; > +------+--+ > | _c0 | > +------+--+ > | 3 | > +------+--+ > {code} > This query incorrectly returns 1: > {code} > set hive.optimize.index.filter=true; > set hive.optimize.ppd.storage=true; > select count(*) from t where c between '2015-12-09' and '2015-12-11'; > +------+--+ > | _c0 | > +------+--+ > | 1 | > +------+--+ > {code} > Disabling hive.optimize.findex.filter resolves the problem. This query now > correctly returns 3: > {code} > set hive.optimize.index.filter=false; > set hive.optimize.ppd.storage=true; > select count(*) from t where c between '2015-12-09' and '2015-12-11'; > +------+--+ > | _c0 | > +------+--+ > | 3 | > +------+--+ > {code} > Disabling hive.optimize.ppd.storage resolves the problem. This query now > correctly returns 3: > {code} > set hive.optimize.index.filter=true; > set hive.optimize.ppd.storage=false; > select count(*) from t where c between '2015-12-09' and '2015-12-11'; > +------+--+ > | _c0 | > +------+--+ > | 3 | > +------+--+ > {code} > h3. Example 2 > This query correctly returns 1: > {code} > set hive.optimize.index.filter=true; > set hive.optimize.ppd.storage=true; > select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10'; > +------+--+ > | _c0 | > +------+--+ > | 1 | > +------+--+ > {code} > This query incorrectly returns 0: > {code} > set hive.optimize.index.filter=true; > set hive.optimize.ppd.storage=true; > select count(*) from t where c between '2015-12-10' and '2015-12-10'; > +------+--+ > | _c0 | > +------+--+ > | 0 | > +------+--+ > {code} > Disabling hive.optimize.findex.filter resolves the problem. This query now > correctly returns 1: > {code} > set hive.optimize.index.filter=false; > set hive.optimize.ppd.storage=true; > select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10'; > +------+--+ > | _c0 | > +------+--+ > | 1 | > +------+--+ > {code} > Disabling hive.optimize.ppd.storage resolves the problem. This query now > correctly returns 1: > {code} > set hive.optimize.index.filter=true; > set hive.optimize.ppd.storage=false; > select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10'; > +------+--+ > | _c0 | > +------+--+ > | 1 | > +------+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)