Gabriel C Balan created HIVE-13377:
--------------------------------------
Summary: Lost rows when using compact index on parquet table
Key: HIVE-13377
URL: https://issues.apache.org/jira/browse/HIVE-13377
Project: Hive
Issue Type: Bug
Components: Indexing
Affects Versions: 1.1.0
Environment: linux, cdh 5.5.0
Reporter: Gabriel C Balan
Priority: Minor
Query with where clause on a parquet table loses rows when using a compact
index. The query produces the right results without the index.
{code}
create table small_parq(i int) stored as parquet;
insert into table small_parq values (1), (2), (3), (4), (5), (6), (7), (8),
(9), (10), (11);
set hive.optimize.index.filter=true;
set hive.optimize.index.filter.compact.minsize=50;
create index comp_idx on table small_parq (i) as 'compact' WITH DEFERRED
REBUILD;
alter index comp_idx on small_parq rebuild;
select * from small_parq where i=3;
--this correctly produces 1 row (value 3).
select * from small_parq where i=11;
--this incorrectly produces 0 rows.
--I see correct results when looking for a row in [1,6];
--I see bad results when looking for a row in [7,11].
--All is well once I disable the compact index
set hive.optimize.index.filter.compact.minsize=50000000;
select * from small_parq where i=11;
--now it correctly produces 1 row (value 11).
{code}
It seems I can't reproduce this issue if the base table was ORC, SEQ, AVRO,
TEXTFILE.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)