Andrew Sherman has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/11582 )
Change subject: IMPALA-6658: improve Parquet RLE for low bit widths ...................................................................... IMPALA-6658: improve Parquet RLE for low bit widths RleEncoder buffers values in its own cache to detect run lengths that can be efficiently encoded. When a run is detected it is written with an indicator byte which encodes the length of the run. So an encoded run always has an overhead of at least one byte. This means that for single bit values, encoding 8 values as a run is inefficient. Change RleEncoder to have the ability to use run lengths other than 8. A new parameter to the constructor (min_run_length) allows test callers (only) to set the minimum run length. By default RleEncoder will now use run length encoding for runs of length 16 for single bit values. All other bit widths will use the existing length 8 runs. Internally RleEncoder must buffer more values so that the longer runs can be detected. The internal buffer “buffered_values_” is larger and is now a circular buffer so that the first 8 bytes of the buffer can be separately flushed to BitWriter. Testing: All end-to-end and unit tests pass The unit test rle-test is enhanced to run all tests against RleEncoders using all possible values of min_run_length. In Addition, rle-test is refactored so that the Rle tests are in a class that inherits from ::testing::Test so that a SetUp() method can be used. The Overflow test is enhanced to be more exhaustive (while still completing in a second or two). Change-Id: I191a581d3f699b6669e48ac9dc39c76ed77c4a76 --- M be/src/util/rle-encoding.h M be/src/util/rle-test.cc 2 files changed, 499 insertions(+), 255 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/11582/7 -- To view, visit http://gerrit.cloudera.org:8080/11582 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I191a581d3f699b6669e48ac9dc39c76ed77c4a76 Gerrit-Change-Number: 11582 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu>