[ https://issues.apache.org/jira/browse/HIVE-20382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on HIVE-20382 started by Jesus Camacho Rodriguez. ------------------------------------------------------ > Materialized views: Introduce heuristic to favour incremental rebuild > --------------------------------------------------------------------- > > Key: HIVE-20382 > URL: https://issues.apache.org/jira/browse/HIVE-20382 > Project: Hive > Issue Type: Improvement > Components: Materialized views > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Priority: Major > Attachments: HIVE-20382.patch > > > Currently, we do not expose stats over ROW__ID.writeId to the optimizer (this > should be fixed by HIVE-20313). Even if we did, we always assume uniform > distribution of the column values, which can easily lead to overestimations > on the number of rows read when we filter on ROW__ID.writeId for materialized > views (think about a large transaction for MV creation and then small ones > for incremental maintenance). This overestimation can lead to incremental > view maintenance not being triggered as cost of the incremental plan is > overestimated (we think we will read more rows than we actually do). This > could be fixed by introducing histograms that reflect better the column > values distribution. > Till both fixes are implemented, we will use a config variable that will > multiply the estimated cost of the rebuild plan and hence will be able to > favour incremental rebuild over full rebuild. -- This message was sent by Atlassian JIRA (v7.6.3#76005)