[jira] [Created] (DRILL-6442) Adjust Hbase disk cost & row count estimation when filter push down is applied

Arina Ielchiieva (JIRA) Wed, 23 May 2018 08:28:44 -0700

Arina Ielchiieva created DRILL-6442:
---------------------------------------


             Summary: Adjust Hbase disk cost & row count estimation when filter 
push down is applied
                 Key: DRILL-6442
                 URL: https://issues.apache.org/jira/browse/DRILL-6442
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.13.0
            Reporter: Arina Ielchiieva
            Assignee: Arina Ielchiieva
             Fix For: 1.14.0


Disk cost for Hbase scan is calculated based on scan size in bytes.

{noformat}
float diskCost = scanSizeInBytes * ((columns == null || columns.isEmpty()) ? 1 
: columns.size() / statsCalculator.getColsPerRow());
{noformat}

Scan size is bytes is estimated using {{TableStatsCalculator}} with the help of 
sampling.
When we estimate size for the first time (before applying filter push down), 
for sampling we use random rows. When estimating rows after filter push down, 
for sampling we use rows that qualify filter condition. It can happen that 
average row size can be higher after filter push down 
than before. Unfortunately since disk cost depends on these calculations, plan 
with filter push down can give higher cost then without it. 





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6442) Adjust Hbase disk cost & row count estimation when filter push down is applied

Reply via email to