[ https://issues.apache.org/jira/browse/IMPALA-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amogh Margoor resolved IMPALA-9873. ----------------------------------- Resolution: Fixed > Skip decoding of non-materialised columns in Parquet > ---------------------------------------------------- > > Key: IMPALA-9873 > URL: https://issues.apache.org/jira/browse/IMPALA-9873 > Project: IMPALA > Issue Type: Sub-task > Components: Backend > Reporter: Tim Armstrong > Assignee: Amogh Margoor > Priority: Major > > This is a first milestone for lazy materialization in parquet, focusing on > avoiding decompression and decoding of columns. > * Identify columns referenced by predicates and runtime row filters and > determine what order the columns need to be materialised in. Probably we want > to evaluate static predicates before runtime filters to match current > behaviour. > * Rework this loop so that it alternates between materialising columns and > evaluating predicates: > https://github.com/apache/impala/blob/052129c/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1110 > * We probably need to keep track of filtered rows using a new data structure, > e.g. bitmap > * We need to then check that bitmap at each step to see if we skip > materialising part or all of the following columns. E.g. if the first N rows > were pruned, we can skip forward the remaining readers N rows. > * This part may be a little tricky - there is the risk of adding overhead > compared to the current code. > * It is probably OK to just materialise the partition columns to start off > with - avoiding materialising those is not going to buy that much. -- This message was sent by Atlassian Jira (v8.20.1#820001)