Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9403 )
Change subject: IMPALA-6324: Support reading RLE-encoded boolean values in Parquet scanner ...................................................................... Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/9403/3/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/9403/3/be/src/exec/parquet-column-readers.cc@673 PS3, Line 673: switch (page_encoding_) { > nit: space before ( Done http://gerrit.cloudera.org:8080/#/c/9403/3/be/src/exec/parquet-column-readers.cc@1304 PS3, Line 1304: } > Convert to using Substitute() while you're here - we generally prefer using Done http://gerrit.cloudera.org:8080/#/c/9403/3/be/src/util/rle-encoding.h File be/src/util/rle-encoding.h: http://gerrit.cloudera.org:8080/#/c/9403/3/be/src/util/rle-encoding.h@138 PS3, Line 138: /// Returns the number of consumed values or 0 if an error occurred. > Can you add some tests to be/src/util/rle-test.cc for the new interface? Th Done http://gerrit.cloudera.org:8080/#/c/9403/3/be/src/util/rle-encoding.h@674 PS3, Line 674: for (int i = 0; i < num_repeats_to_set; ++i) { > Yes, please do add the benchmark. The old benchmark was comparing two old i Done http://gerrit.cloudera.org:8080/#/c/9403/3/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/9403/3/testdata/data/README@160 PS3, Line 160: Parquet v1 file with RLE encoded boolean column "b" and int column "i". > How was it created? Parquet-mr? Yes, it was created with parquet-mr (parquet-cli's convert-csv command). I had to modify parquet-mr to force RLE encoding for booleans, because by default it will use bit packed for Parquet v1. Parquet v2 uses RLE by default, but Impala does not read v2 data pages at the moment. http://gerrit.cloudera.org:8080/#/c/9403/3/testdata/workloads/functional-query/queries/QueryTest/parquet-rle-encoded-bool.test File testdata/workloads/functional-query/queries/QueryTest/parquet-rle-encoded-bool.test: http://gerrit.cloudera.org:8080/#/c/9403/3/testdata/workloads/functional-query/queries/QueryTest/parquet-rle-encoded-bool.test@4 PS3, Line 4: select count(*) from rle_encoded_bool where b; > Thanks for the tip, I have created such a table, and something is actually The new test will catch if rle encoded booleans are returned in the wrong order. http://gerrit.cloudera.org:8080/#/c/9403/3/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/9403/3/tests/query_test/test_scanners.py@656 PS3, Line 656: self.client.execute(("""CREATE TABLE {0}.rle_encoded_bool (b boolean, i int) > Nit: trailing """ can go on same line. Done -- To view, visit http://gerrit.cloudera.org:8080/9403 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4644bf8cf5d2b7238b05076407fbf78ab5d2c14f Gerrit-Change-Number: 9403 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Tue, 13 Mar 2018 13:59:39 +0000 Gerrit-HasComments: Yes