Congling Xia created KYLIN-4315:
-----------------------------------
Summary: use metadata numRows in beeline client for quick row
counting
Key: KYLIN-4315
URL: https://issues.apache.org/jira/browse/KYLIN-4315
Project: Kylin
Issue Type: Improvement
Components: Job Engine
Reporter: Congling Xia
Assignee: Congling Xia
Hi, I find that in `BeelineHiveClient`, method `getHiveTableRows` uses "select
count(*) from <tb_name>" for table row counting. The method is invoked in flat
intermediate table redistribution step in cube building.
This stats can be loaded in metastore. It costs much less time than scanning
all rows in Hive table. Since intermediate tables are created and inserted by
Kylin, statistics will be automatically calculated and stored in metastore when
`[hive.stats.autogather|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.stats.autogather]`
is enabled (which is the default setting for Hive).
ref Hive wiki for more detail about `numRows` stats:
[https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables%E2%80%93ANALYZE]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)