[ https://issues.apache.org/jira/browse/DRILL-6557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pritesh Maker updated DRILL-6557: --------------------------------- Labels: ready-to-commit (was: ) > Use size in bytes during Hive statistics calculation if present > --------------------------------------------------------------- > > Key: DRILL-6557 > URL: https://issues.apache.org/jira/browse/DRILL-6557 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.13.0 > Reporter: Arina Ielchiieva > Assignee: Arina Ielchiieva > Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Drill considers Hive statistics valid if it contains number of rows and size > in bytes. If at least of them is absent, statistics is calculated based on > input splits size in bytes. This means that we fetch all input splits though > we might not need some after planning optimizations (ex: partition pruning). > Though if number of rows are missing and size in bytes is present, there is > no need to fetch all input splits since their size in bytes will be the same > as in statistics, this would improve time planning since fetching input > splits is rather costly operation. > This Jira aims to: > 1. check size in bytes presence in stats before fetching input splits and > use it if present; > 2. add log trace suggesting to use ANALYZE command before running queries if > statistics is unavailable and Drill had to fetch all input splits; > 3. minor refactoring / cleanup in HiveMetadataProvider class. -- This message was sent by Atlassian JIRA (v7.6.3#76005)