Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16893 )
Change subject: IMPALA-6434: Add support to decode RLE_DICTIONARY encoded pages ...................................................................... Patch Set 3: (6 comments) Few nits, but the code looks good to me overall. http://gerrit.cloudera.org:8080/#/c/16893/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16893/3//COMMIT_MSG@10 PS3, Line 10: old PLAIN/PLAIN_DICTIONARY values. Maybe you could emphasise that the data is still encoded the same way. http://gerrit.cloudera.org:8080/#/c/16893/3//COMMIT_MSG@10 PS3, Line 10: PLAIN/ PLAIN is the new way AFAIK, so we use PLAIN for the dictionary page and RLE_DICTIONARY for the data pages. While the old way was to use PLAIN_DICTIONARY everywhere, and it meant PLAIN encoding for the dictionary page and RLE encoded dict keys for the data pages. http://gerrit.cloudera.org:8080/#/c/16893/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc File be/src/exec/parquet/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/16893/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc@92 PS3, Line 92: use nit: maybe write_new_parquet_dictionary_encodings to be more explicit? http://gerrit.cloudera.org:8080/#/c/16893/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc@881 PS3, Line 881: current_encoding_ I wonder if the code would be cleaner/less error-prone if 'current_encoding_' stored the actual encoding. So probably we could move this 'if' to the place where we set 'current_encoding_'. http://gerrit.cloudera.org:8080/#/c/16893/3/be/src/exec/parquet/parquet-column-readers.cc File be/src/exec/parquet/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/16893/3/be/src/exec/parquet/parquet-column-readers.cc@326 PS3, Line 326: so nit: to http://gerrit.cloudera.org:8080/#/c/16893/3/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/16893/3/testdata/data/README@593 PS3, Line 593: is the newline intentional? -- To view, visit http://gerrit.cloudera.org:8080/16893 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I90942022edcd5d96c720a1bde53879e50394660a Gerrit-Change-Number: 16893 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Mon, 04 Jan 2021 19:21:47 +0000 Gerrit-HasComments: Yes