[ https://issues.apache.org/jira/browse/HIVE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gunther Hagleitner updated HIVE-5369: ------------------------------------- Status: Patch Available (was: Open) > Annotate hive operator tree with statistics from metastore > ---------------------------------------------------------- > > Key: HIVE-5369 > URL: https://issues.apache.org/jira/browse/HIVE-5369 > Project: Hive > Issue Type: New Feature > Components: Query Processor, Statistics > Affects Versions: 0.13.0 > Reporter: Prasanth J > Assignee: Prasanth J > Labels: statistics > Fix For: 0.13.0 > > Attachments: HIVE-5369.1.txt, HIVE-5369.2.WIP.txt, > HIVE-5369.2.patch.txt, HIVE-5369.3.patch.txt, HIVE-5369.4.patch.txt, > HIVE-5369.5.patch.txt, HIVE-5369.6.patch.txt, HIVE-5369.7.patch.txt, > HIVE-5369.8.patch.txt, HIVE-5369.9.patch, HIVE-5369.9.patch.txt, > HIVE-5369.WIP.txt, HIVE-5369.refactor.WIP.txt > > > Currently the statistics gathered at table/partition level and column level > are not used during query planning stage. Statistics at table/partition and > column level can be used for optimizing the query plans. Basic statistics > like uncompressed data size can be used for better reducer estimation. Other > statistics like number of rows, distinct values of columns, average length of > columns etc. can be used by Cost Based Optimizer (CBO) for making better > query plan selection. As a first step in improving query planning the > statistics that are available in the metastore should be attached to hive > operator tree. The operator tree should be walked and annotated with > statistics information. The attached statistics will vary for each operator > depending on the operation it performs. For example, select operator will > change the average row size but doesn't affect the number of rows. Similarly > filter operator will change the number of rows but doesn't change the average > row size. Similar rules can be applied for other operators as well. > Rules for different operators are added as comments in the code. For more > detailed information, the reference book that I am using is "Database > Systems: The Complete Book" by Garcia-Molina et.al. -- This message was sent by Atlassian JIRA (v6.1#6144)