[ https://issues.apache.org/jira/browse/IMPALA-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Behm updated IMPALA-5096: ----------------------------------- Labels: parquet performance ramp-up (was: ) > Use parquet::Statistics for min/max aggregates when only a subset of scan > columns have stats > -------------------------------------------------------------------------------------------- > > Key: IMPALA-5096 > URL: https://issues.apache.org/jira/browse/IMPALA-5096 > Project: IMPALA > Issue Type: Sub-task > Components: Backend > Affects Versions: Impala 2.8.0 > Reporter: Alexander Behm > Labels: parquet, performance, ramp-up > > If some columns do not have parquet::Statistics, then it is still possible to > use the stats of those columns that do have them, but with more effort. For > those columns that have stats, we can populate the scanner's template tuple > with the stats values, and avoid scanning/materializing those columns. We > still need to scan the columns that do not have stats. > Also consider how the various optimizations in IMPALA-4986 will interact. For > example, > {code} > select min(string_col), count(*) from parquet_table > {code} > Can we still safely apply any of the optimizations? -- This message was sent by Atlassian JIRA (v6.3.15#6346)