[ 
https://issues.apache.org/jira/browse/HIVE-20382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20382:
-------------------------------------------
    Attachment: HIVE-20382.patch

> Materialized views: Introduce heuristic to favour incremental rebuild
> ---------------------------------------------------------------------
>
>                 Key: HIVE-20382
>                 URL: https://issues.apache.org/jira/browse/HIVE-20382
>             Project: Hive
>          Issue Type: Improvement
>          Components: Materialized views
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>         Attachments: HIVE-20382.patch
>
>
> Currently, we do not expose stats over ROW__ID.writeId to the optimizer (this 
> should be fixed by HIVE-20313). Even if we did, we always assume uniform 
> distribution of the column values, which can easily lead to overestimations 
> on the number of rows read when we filter on ROW__ID.writeId for materialized 
> views (think about a large transaction for MV creation and then small ones 
> for incremental maintenance). This overestimation can lead to incremental 
> view maintenance not being triggered as cost of the incremental plan is 
> overestimated (we think we will read more rows than we actually do). This 
> could be fixed by introducing histograms that reflect better the column 
> values distribution.
> Till both fixes are implemented, we will use a config variable that will 
> multiply the estimated cost of the rebuild plan and hence will be able to 
> favour incremental rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to