[ 
https://issues.apache.org/jira/browse/DRILL-6557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6557:
---------------------------------
    Labels: ready-to-commit  (was: )

> Use size in bytes during Hive statistics calculation if present
> ---------------------------------------------------------------
>
>                 Key: DRILL-6557
>                 URL: https://issues.apache.org/jira/browse/DRILL-6557
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.13.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>            Priority: Major
>              Labels: ready-to-commit
>             Fix For: 1.14.0
>
>
> Drill considers Hive statistics valid if it contains number of rows and size 
> in bytes. If at least of them is absent, statistics is calculated based on 
> input splits size in bytes. This means that we fetch all input splits though 
> we might not need some after planning optimizations (ex: partition pruning). 
> Though if number of rows are missing and size in bytes is present, there is 
> no need to fetch all input splits since their size in bytes will be the same 
> as in statistics, this would improve time planning since fetching input 
> splits is rather costly operation.
> This Jira aims to:
>  1. check size in bytes presence in stats before fetching input splits and 
> use it if present;
>  2. add log trace suggesting to use ANALYZE command before running queries if 
> statistics is unavailable and Drill had to fetch all input splits;
>  3. minor refactoring /  cleanup in HiveMetadataProvider class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to