Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11582 )
Change subject: IMPALA-6658: improve Parquet RLE for low bit widths ...................................................................... Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/11582/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11582/2//COMMIT_MSG@20 PS2, Line 20: By default RleEncoder will now use run length encoding for runs of : length 24 for single bit values, and of length 16 for 2 bit wide values. : All other bit widths will use the existing length 8 runs. About the solution in general: If I do not misunderstand something then it will lead to worse encoding if <24/16 repeated runs alternate with larger repeated runs. bit width 1: repeated lengths: 24 16 24 16 24 bytes needed with old solution: 2 2 2 2 2 bytes needed with new solution: 2 3 2 3 2 bit width 2: repeated lengths: 16 8 16 8 16 bytes needed with old solution: 2 2 2 2 2 bytes needed with new solution: 2 3 2 3 2 This could be avoided by using smaller min_repeated_run_length if the last run was a repeated run: 16 for 1 bit width, 8 for 2 bit width (and all other bit widths). http://gerrit.cloudera.org:8080/#/c/11582/2/be/src/util/rle-encoding.h File be/src/util/rle-encoding.h: http://gerrit.cloudera.org:8080/#/c/11582/2/be/src/util/rle-encoding.h@211 PS2, Line 211: mu;tiple typo: multiple http://gerrit.cloudera.org:8080/#/c/11582/2/be/src/util/rle-encoding.h@475 PS2, Line 475: // updates num_buffered_values This comment does not seam true here. http://gerrit.cloudera.org:8080/#/c/11582/2/be/src/util/rle-encoding.h@549 PS2, Line 549: we typo -- To view, visit http://gerrit.cloudera.org:8080/11582 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I191a581d3f699b6669e48ac9dc39c76ed77c4a76 Gerrit-Change-Number: 11582 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu> Gerrit-Comment-Date: Wed, 17 Oct 2018 22:24:17 +0000 Gerrit-HasComments: Yes