Hari Sekhon created HIVE-12359:
----------------------------------
Summary: Hive ORC table reports different counts between select *
and count(*)
Key: HIVE-12359
URL: https://issues.apache.org/jira/browse/HIVE-12359
Project: Hive
Issue Type: Bug
Components: CBO, HiveServer2, ORC, Statistics
Affects Versions: 1.2.1
Environment: HDP 2.3 + Kerberos
Reporter: Hari Sekhon
Assignee: Vaibhav Gumashta
I have an ORC table which is giving different figures between select count( * )
and select *:
{code}> select count(*) from myTable;
+--------+--+
| _c0 |
+--------+--+
| 56471 |
+--------+--+
{code}
{code}> select * from myTable;
...
109,295 rows selected (62.993 seconds)
{code}
At first I thought this was obvious just "analyze table ... compute statistics"
and it'll correct itself, however I've tried that as well as adding "for
columns" but the results remain the same. The select count( * ) is very fast so
it must be using the pre-computed stats.
When I transform the table to text or to another orc table the count star on
that new tables returns the correct number.
I've even tried disabling stats, CBO, the works, restart, same result, with
very fast return each time for select count( * ), indicating it's using either
pre-computed stats stored in Metastore or ORC stats in file format, but I'm not
sure how ORC could store the wrong count, especially as doing a CTAS to another
ORC table returns the correct count when I select count( * ) that new ORC table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)