[ https://issues.apache.org/jira/browse/IMPALA-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15930805#comment-15930805 ]
Alexander Behm commented on IMPALA-5095: ---------------------------------------- The last tricky case can be solved by returning two rows from the scan, one with the int_col slot having the min value, and another row with int_col having the max value. > Use parquet::Statistics for simple min/max aggregates > ----------------------------------------------------- > > Key: IMPALA-5095 > URL: https://issues.apache.org/jira/browse/IMPALA-5095 > Project: IMPALA > Issue Type: Sub-task > Components: Backend > Affects Versions: Impala 2.8.0 > Reporter: Alexander Behm > Labels: parquet, perfomance, ramp-up > > {code} > select min(int_col), max(bigint_col) from parquet_table; > select min(int_col), max(bigint_col) from parquet_table group by > partition_col; > select min(int_col), max(int_col) from parquet_table; <--- case a little > trickier because int_col refd twice > {code} > The slot values for int_col and bigint_col can be directly filled in from the > parquet::Statistics, assuming stats are available for both columns. No > columns need to be scanned/materialized. > This JIRA focuses on implementing this optimization in the simple case where > all scanned columns feed into min/max aggregates and where all columns have > parquet::Statistics. Those conditions can be relaxed, but should be addressed > separately. > This optimization opportunity must be detected by the planner and is not > applicable when there are scan predicates. -- This message was sent by Atlassian JIRA (v6.3.15#6346)