[jira] [Updated] (IMPALA-5096) Use parquet::Statistics for min/max aggregates when only a subset of scan columns have stats

Alexander Behm (JIRA) Fri, 17 Mar 2017 14:31:01 -0700

     [ 
https://issues.apache.org/jira/browse/IMPALA-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexander Behm updated IMPALA-5096:
-----------------------------------
    Labels: parquet performance ramp-up  (was: )

> Use parquet::Statistics for min/max aggregates when only a subset of scan 
> columns have stats
> --------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-5096
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5096
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>    Affects Versions: Impala 2.8.0
>            Reporter: Alexander Behm
>              Labels: parquet, performance, ramp-up
>
> If some columns do not have parquet::Statistics, then it is still possible to 
> use the stats of those columns that do have them, but with more effort. For 
> those columns that have stats, we can populate the scanner's template tuple 
> with the stats values, and avoid scanning/materializing those columns. We 
> still need to scan the columns that do not have stats.
> Also consider how the various optimizations in IMPALA-4986 will interact. For 
> example,
> {code}
> select min(string_col), count(*) from parquet_table
> {code}
> Can we still safely apply any of the optimizations?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (IMPALA-5096) Use parquet::Statistics for min/max aggregates when only a subset of scan columns have stats

Reply via email to