Hello Lars Volker, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/8267 to look at the new patch set (#8). Change subject: IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding ...................................................................... IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding Switch the decoders to using more batch-oriented interfaces. As an intermediate step this doesn't make the interfaces of LevelDecoder or DictDecoder batch-oriented, only the lower-level utility classes. The next step would be to change those interfaces to be batch-oriented and make according optimisations in parquet. This could deliver much larger perf improvements than the current patch. The high-level changes are. * BitReader -> BatchedBitReader, which is built to unpack runs of 32 bit-packed values efficiently. * RleDecoder -> RleBatchDecoder, which exposes the repeated and literal runs to the caller and uses BatchedBitReader to unpack literal runs efficiently. * Dict decoding uses RleBatchDecoder to decode repeated runs efficiently and uses the BitPacking utilities to unpack and encode in a single step. Also removes an older benchmark that isn't too interesting (since the batch-oriented approach to encoding and decoding is so much faster than the value-by-value approach). Testing: * Ran core tests. * Updated unit tests to exercise new code. * Added test coverage for the deprecated bit-packed level encoding to that it still works (there was no coverage previously). Perf: Single-node benchmarks showed a few % performance gain. 16 node cluster benchmarks only showed a gain for TPC-H nested. Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6 --- M be/src/benchmarks/CMakeLists.txt M be/src/benchmarks/bit-packing-benchmark.cc D be/src/benchmarks/rle-benchmark.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h D be/src/experiments/bit-stream-utils.8byte.h D be/src/experiments/bit-stream-utils.8byte.inline.h M be/src/util/bit-packing.h M be/src/util/bit-packing.inline.h M be/src/util/bit-stream-utils.h M be/src/util/bit-stream-utils.inline.h M be/src/util/dict-encoding.h M be/src/util/dict-test.cc M be/src/util/parquet-reader.cc M be/src/util/rle-encoding.h M be/src/util/rle-test.cc M testdata/data/README A testdata/data/alltypes_agg_bitpacked_def_levels.parquet M tests/query_test/test_scanners.py 19 files changed, 1,149 insertions(+), 946 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/8267/8 -- To view, visit http://gerrit.cloudera.org:8080/8267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6 Gerrit-Change-Number: 8267 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>