Carter Shanklin created HIVE-16938:
--------------------------------------
Summary: INFORMATION_SCHEMA usability: difficult to access # of
table records
Key: HIVE-16938
URL: https://issues.apache.org/jira/browse/HIVE-16938
Project: Hive
Issue Type: Bug
Reporter: Carter Shanklin
HIVE-1010 adds an information schema to Hive, also taking the opportunity to
expose some non-standard but valuable things like statistics in a SYS table.
One common thing users want to know is the number of rows in tables, system
wide.
This information is in the table_params table but the structure of this table
makes it quite inconvenient to access since it is essentially a table of
key-value pairs. More table stats are likely to be added over time, especially
because of ACID. It would be a lot better if this were a first class table.
For what it's worth I deal with the current table by pivoting it into something
easier to deal with as follows:
{code}
create view table_stats as
select
tbl_id,
max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as
COLUMN_STATS_ACCURATE,
max(case param_key when 'numFiles' then param_value end) as numFiles,
max(case param_key when 'numRows' then param_value end) as numRows,
max(case param_key when 'rawDataSize' then param_value end) as rawDataSize,
max(case param_key when 'totalSize' then param_value end) as totalSize,
max(case param_key when 'transient_lastDdlTime' then param_value end) as
transient_lastDdlTime
from table_params group by tbl_id;
{code}
It would be better to not have users provide workarounds and make table stats
first-class like column stats currently are.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)