[ 
https://issues.apache.org/jira/browse/HIVE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-5369:
--------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Thank Prasanth. Nice work!

> Annotate hive operator tree with statistics from metastore
> ----------------------------------------------------------
>
>                 Key: HIVE-5369
>                 URL: https://issues.apache.org/jira/browse/HIVE-5369
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor, Statistics
>    Affects Versions: 0.13.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>              Labels: statistics
>             Fix For: 0.13.0
>
>         Attachments: HIVE-5369.1.txt, HIVE-5369.10.patch, 
> HIVE-5369.2.WIP.txt, HIVE-5369.2.patch.txt, HIVE-5369.3.patch.txt, 
> HIVE-5369.4.patch.txt, HIVE-5369.5.patch.txt, HIVE-5369.6.patch.txt, 
> HIVE-5369.7.patch.txt, HIVE-5369.8.patch.txt, HIVE-5369.9.patch, 
> HIVE-5369.9.patch.txt, HIVE-5369.WIP.txt, HIVE-5369.refactor.WIP.txt
>
>
> Currently the statistics gathered at table/partition level and column level 
> are not used during query planning stage. Statistics at table/partition and 
> column level can be used for optimizing the query plans. Basic statistics 
> like uncompressed data size can be used for better reducer estimation. Other 
> statistics like number of rows, distinct values of columns, average length of 
> columns etc. can be used by Cost Based Optimizer (CBO) for making better 
> query plan selection. As a first step in improving query planning the 
> statistics that are available in the metastore should be attached to hive 
> operator tree. The operator tree should be walked and annotated with 
> statistics information. The attached statistics will vary for each operator 
> depending on the operation it performs. For example, select operator will 
> change the average row size but doesn't affect the number of rows. Similarly 
> filter operator will change the number of rows but doesn't change the average 
> row size. Similar rules can be applied for other operators as well. 
> Rules for different operators are added as comments in the code. For more 
> detailed information, the reference book that I am using is "Database 
> Systems: The Complete Book" by Garcia-Molina et.al.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to