[
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470686#comment-13470686
]
Shreepadma Venugopalan commented on HIVE-1362:
----------------------------------------------
I assume when you say row level statistics you are referring to table
statistics. Today, table statistics is stored as part of the table_params.
table_params table gets mapped to the TTable object in memory and it looks like
the existing APIs sufficed. We want to have a dedicated Thrift API for column
stats for the following reasons,
1. Column statistics is a property of the column and not the table and hence
doesn't belong with the table_params. Furthermore, we have seen customers with
tables that are 100s-1000s of columns wide. Storing this information as a
table_param is going to bloat, and it will also make the output of DESCRIBE
EXTENDED unreadable.
2. We want column statistics to be a first class metadata. In order to do so,
we have to provide dedicated Thrift APIs to query and update it. We want the
Thrift API to be self-documenting, i.e. if someone tells you that metastore
supports column stats, you should be able to look at the Thrift IDL and figure
out which method you need to use to store/retrieve column stats. Right now a
lot of the API doesn't satisfy that goal since many methods are overloaded, and
other features are implemented by adding new key/value properties to different
catalog objects that aren't easy to document via the thrift API
3. Additionally storing column statistics as a key/value pair in the
table_params table is not space efficient. We need to repeat the keys for each
one of the columns in the table for which statistics is gathered. Furthermore,
by storing column stats in the table_params table we would de-normalize the
schema completely and incur a performance penalty performing self-joins, though
not necessarily in the metasote db, to retrieve the statistics associated with
a column.
> column level statistics
> -----------------------
>
> Key: HIVE-1362
> URL: https://issues.apache.org/jira/browse/HIVE-1362
> Project: Hive
> Issue Type: Sub-task
> Components: Statistics
> Reporter: Ning Zhang
> Assignee: Shreepadma Venugopalan
> Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt,
> HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt,
> HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt,
> HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira