[jira] [Commented] (IMPALA-5095) Use parquet::Statistics for simple min/max aggregates

Alexander Behm (JIRA) Fri, 17 Mar 2017 15:02:52 -0700

    [ 
https://issues.apache.org/jira/browse/IMPALA-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15930805#comment-15930805
 ]


Alexander Behm commented on IMPALA-5095:
----------------------------------------

The last tricky case can be solved by returning two rows from the scan, one 
with the int_col slot having the min value, and another row with int_col having 
the max value.

> Use parquet::Statistics for simple min/max aggregates
> -----------------------------------------------------
>
>                 Key: IMPALA-5095
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5095
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>    Affects Versions: Impala 2.8.0
>            Reporter: Alexander Behm
>              Labels: parquet, perfomance, ramp-up
>
> {code}
> select min(int_col), max(bigint_col) from parquet_table;
> select min(int_col), max(bigint_col) from parquet_table group by 
> partition_col;
> select min(int_col), max(int_col) from parquet_table; <--- case a little 
> trickier because int_col refd twice
> {code}
> The slot values for int_col and bigint_col can be directly filled in from the 
> parquet::Statistics, assuming stats are available for both columns. No 
> columns need to be scanned/materialized.
> This JIRA focuses on implementing this optimization in the simple case where 
> all scanned columns feed into min/max aggregates and where all columns have 
> parquet::Statistics. Those conditions can be relaxed, but should be addressed 
> separately.
> This optimization opportunity must be detected by the planner and is not 
> applicable when there are scan predicates.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (IMPALA-5095) Use parquet::Statistics for simple min/max aggregates

Reply via email to