[ 
https://issues.apache.org/jira/browse/DRILL-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-6442:
---------------------------------------
    Labels: ready-to-commit  (was: )

> Adjust Hbase disk cost & row count estimation when filter push down is applied
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-6442
>                 URL: https://issues.apache.org/jira/browse/DRILL-6442
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.13.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>            Priority: Major
>              Labels: ready-to-commit
>             Fix For: 1.14.0
>
>
> Disk cost for Hbase scan is calculated based on scan size in bytes.
> {noformat}
> float diskCost = scanSizeInBytes * ((columns == null || columns.isEmpty()) ? 
> 1 : columns.size() / statsCalculator.getColsPerRow());
> {noformat}
> Scan size is bytes is estimated using {{TableStatsCalculator}} with the help 
> of sampling.
> When we estimate size for the first time (before applying filter push down), 
> for sampling we use random rows. When estimating rows after filter push down, 
> for sampling we use rows that qualify filter condition. It can happen that 
> average row size can be higher after filter push down 
> than before. Unfortunately since disk cost depends on these calculations, 
> plan with filter push down can give higher cost then without it. 
> Possible enhancements:
> 1. Currently default row count is 1 million but if during sampling we return 
> less rows then expected, it means that our query will return not more rows 
> then this number. We can use this number instead of default row count to 
> achieve better cost estimations.
> 2. When filter push down was applied, row number was reduced by half in order 
> to ensure plan with filter push down will have less cost. Then same should be 
> done for disk cost as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to