[ 
https://issues.apache.org/jira/browse/HIVE-20382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20382:
-------------------------------------------
    Description: 
Currently, we do not expose stats over ROW__ID.writeId to the optimizer (this 
should be fixed by HIVE-20313). Even if we did, we always assume uniform 
distribution of the column values, which can easily lead to overestimations on 
the number of rows read when we filter on ROW__ID.writeId for materialized 
views (think about a large transaction for MV creation and then small ones for 
incremental maintenance). This overestimation can lead to incremental view 
maintenance not being triggered as cost of the incremental plan is 
overestimated (we think we will read more rows than we actually do). This could 
be fixed by introducing histograms that reflect better the column values 
distribution.

Till both fixes are implemented, we will use a config variable that will 
multiply the estimated cost of the rebuild plan and hence will be able to 
favour incremental rebuild over full rebuild.

> Materialized views: Introduce heuristic to favour incremental rebuild
> ---------------------------------------------------------------------
>
>                 Key: HIVE-20382
>                 URL: https://issues.apache.org/jira/browse/HIVE-20382
>             Project: Hive
>          Issue Type: Improvement
>          Components: Materialized views
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>
> Currently, we do not expose stats over ROW__ID.writeId to the optimizer (this 
> should be fixed by HIVE-20313). Even if we did, we always assume uniform 
> distribution of the column values, which can easily lead to overestimations 
> on the number of rows read when we filter on ROW__ID.writeId for materialized 
> views (think about a large transaction for MV creation and then small ones 
> for incremental maintenance). This overestimation can lead to incremental 
> view maintenance not being triggered as cost of the incremental plan is 
> overestimated (we think we will read more rows than we actually do). This 
> could be fixed by introducing histograms that reflect better the column 
> values distribution.
> Till both fixes are implemented, we will use a config variable that will 
> multiply the estimated cost of the rebuild plan and hence will be able to 
> favour incremental rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to