[
https://issues.apache.org/jira/browse/HIVE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prasanth J updated HIVE-5369:
-----------------------------
Attachment: HIVE-5369.WIP.txt
Uploading a WIP patch. There are many rough edges which needs to be
fixed/addressed.
> Annotate hive operator tree with statistics from metastore
> ----------------------------------------------------------
>
> Key: HIVE-5369
> URL: https://issues.apache.org/jira/browse/HIVE-5369
> Project: Hive
> Issue Type: New Feature
> Components: Query Processor, Statistics
> Affects Versions: 0.13.0
> Reporter: Prasanth J
> Assignee: Prasanth J
> Labels: statistics
> Fix For: 0.13.0
>
> Attachments: HIVE-5369.WIP.txt
>
>
> Currently the statistics gathered at table/partition level and column level
> are not used during query planning stage. Statistics at table/partition and
> column level can be used for optimizing the query plans. Basic statistics
> like uncompressed data size can be used for better reducer estimation. Other
> statistics like number of rows, distinct values of columns, average length of
> columns etc. can be used by Cost Based Optimizer (CBO) for making better
> query plan selection. As a first step in improving query planning the
> statistics that are available in the metastore should be attached to hive
> operator tree. The operator tree should be walked and annotated with
> statistics information. The attached statistics will vary for each operator
> depending on the operation it performs. For example, select operator will
> change the average row size but doesn't affect the number of rows. Similarly
> filter operator will change the number of rows but doesn't change the average
> row size. Similar rules can be applied for other operators as well.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira