[ 
https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573974#comment-16573974
 ] 

Hive QA commented on HIVE-20332:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12934851/HIVE-20332.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 14870 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13107/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13107/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13107/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12934851 - PreCommit-HIVE-Build

> Materialized views: Introduce heuristic on selectivity over ROW__ID to favour 
> incremental rebuild
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-20332
>                 URL: https://issues.apache.org/jira/browse/HIVE-20332
>             Project: Hive
>          Issue Type: Improvement
>          Components: Materialized views
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>         Attachments: HIVE-20332.01.patch, HIVE-20332.01.patch, 
> HIVE-20332.patch
>
>
> Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the optimizer 
> (this should be fixed by HIVE-20313). Even if we did, we always assume 
> uniform distribution of the column values, which can easily lead to 
> overestimations on the number of rows read when we filter on 
> {{ROW\_\_ID.writeId}} for materialized views (think about a large transaction 
> for MV creation and then small ones for incremental maintenance). This 
> overestimation can lead to incremental view maintenance not being triggered 
> as cost of the incremental plan is overestimated (we think we will read more 
> rows than we actually do). This could be fixed by introducing histograms that 
> reflect better the column values distribution.
> Till both fixes are implemented, we will use a config variable that will set 
> the selectivity for filter condition on {{ROW\_\_ID}} during the cost 
> calculation. Setting that variable to a low value will favour incremental 
> rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to