Simone created HIVE-11266:
-----------------------------
Summary: count(*) wrong result based on table statistics
Key: HIVE-11266
URL: https://issues.apache.org/jira/browse/HIVE-11266
Project: Hive
Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Simone
Priority: Critical
Hive returns wrong count result on an external table with table statistics if I
change table data files.
This is the scenario in details:
1) create external table my_table (...) location 'my_location';
2) analyze table my_table compute statistics;
3) change/add/delete one or more files in 'my_location' directory;
4) select count(\*) from my_table;
In this case the count query doesn't generate a MR job and returns the result
based on table statistics. This result is wrong because is based on statistics
stored in the Hive metastore and doesn't take into account modifications
introduced on data files.
Obviously setting "hive.compute.query.using.stats" to FALSE this problem
doesn't occur but the default value of this property is TRUE.
I thinks that also this post on stackoverflow, that shows another type of bug
in case of multiple insert, is related to the one that I reported:
http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)