Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12065
Change subject: WIP: IMPALA-5843: Use page index in Parquet files to skip pages ...................................................................... WIP: IMPALA-5843: Use page index in Parquet files to skip pages Initial prototype of page filtering. Some detailes are smelly, but conceptually it is getting into shape. The read and evaluation of the page index is done by the HdfsParquetScanner. At first, we determine the row ranges we are interested in, and based on the row ranges we determine the filtered pages for each column that we are reading. We still issue one ScanRange per column chunk, but we specify sub-ranges that store the filtered pages, i.e. we don't read the whole column chunk, but only fractions of it. Pages are not aligned across column chunks, i.e. page #2 of column A might store completely different rows than page #2 of column B. It means we need to implement some kind of row-skipping logic when we read the data pages. This logic is implemented in BaseScalarColumnReader and ScalarColumnReader. Collection column readers know nothing about page filtering. I also extended the decoders with value-skipping functionalities. TODOs: * add unit tests (BE tests) wherever possible * fix the smelly code parts * implement row-skipping in MaterializeValueBatchRepeatedDefLevel() * add counters about filtered pages * generate files by Impala and Parquet-MR for EE tests * test with nested types * performance measurements Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-bool-decoder.cc M be/src/exec/parquet/parquet-bool-decoder.h M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-readers.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/parquet/parquet-common.cc M be/src/exec/parquet/parquet-common.h M be/src/exec/parquet/parquet-level-decoder.h M be/src/exprs/literal.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bit-stream-utils.h M be/src/util/bit-stream-utils.inline.h M be/src/util/dict-encoding.h M be/src/util/rle-encoding.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift 22 files changed, 790 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/12065/1 -- To view, visit http://gerrit.cloudera.org:8080/12065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a Gerrit-Change-Number: 12065 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>