[
https://issues.apache.org/jira/browse/IMPALA-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Csaba Ringhofer updated IMPALA-5095:
------------------------------------
Labels: parquet performance ramp-up (was: parquet perfomance ramp-up)
> Use parquet::Statistics for simple min/max aggregates
> -----------------------------------------------------
>
> Key: IMPALA-5095
> URL: https://issues.apache.org/jira/browse/IMPALA-5095
> Project: IMPALA
> Issue Type: Sub-task
> Components: Backend
> Affects Versions: Impala 2.8.0
> Reporter: Alexander Behm
> Priority: Major
> Labels: parquet, performance, ramp-up
> Attachments: Parquet stats for evaluating aggregates.pdf
>
>
> {code}
> select min(int_col), max(bigint_col) from parquet_table;
> select min(int_col), max(bigint_col) from parquet_table group by
> partition_col;
> select min(int_col), max(int_col) from parquet_table; <--- case a little
> trickier because int_col refd twice
> {code}
> The slot values for int_col and bigint_col can be directly filled in from the
> parquet::Statistics, assuming stats are available for both columns. No
> columns need to be scanned/materialized.
> This JIRA focuses on implementing this optimization in the simple case where
> all scanned columns feed into min/max aggregates and where all columns have
> parquet::Statistics. Those conditions can be relaxed, but should be addressed
> separately.
> This optimization opportunity must be detected by the planner and is not
> applicable when there are scan predicates.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]