Gunther Hagleitner created HIVE-6157:
----------------------------------------
Summary: Fetching column stats slower than the 101 during rush hour
Key: HIVE-6157
URL: https://issues.apache.org/jira/browse/HIVE-6157
Project: Hive
Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
"hive.stats.fetch.column.stats" controls whether the column stats for a table
are fetched during explain (in Tez: during query planning). On my setup (1
table 4000 partitions, 24 columns) the time spent in semantic analyze goes from
~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching
column stats...
The reason is probably that the APIs force you to make separate metastore calls
for each column in each partition. That's probably the first thing that has to
change. The question is if in addition to that we need to cache this in the
client or store the stats as a single blob in the database to further cut down
on the time. However, the way it stands right now column stats seem unusable.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)