Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12004 )
Change subject: IMPALA-7889: Write new logical types in Parquet ...................................................................... Patch Set 6: (14 comments) http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/hdfs-parquet-table-writer.cc File be/src/exec/parquet/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/hdfs-parquet-table-writer.cc@973 PS5, Line 973: parquet::SchemaElement& col_schema = file_metadata_.schema[i + 1]; > Maybe choose a better name here? I choose 'col_schema' - I can look for a new name if others do not like it. 'schema_element' would be more exact for someone who know Parquet well, but for other people 'col_schema' is more informative in my opinion (+ it is shorter). http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.h File be/src/exec/parquet/parquet-metadata-utils.h: http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.h@62 PS5, Line 62: static void FillSchemaElement(const ColumnType& col_type, > We usually put input parameters first, then output, i.e. move node to the e Done http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc File be/src/exec/parquet/parquet-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@153 PS5, Line 153: return Status::OK();; > This can be in the anonymous namespace, too Done http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@283 PS5, Line 283: ErrorMsg msg(TErrorCode::PARQUET_INCOMPATIBLE_DECIMAL, filename, schema_element.name, > Move to the anonymous namespace. Pass output parameters by pointer and move Done http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@292 PS5, Line 292: th' > Can you think of a better name for the schema element than "node"? Done http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@297 PS5, Line 297: parquet::LogicalType logical_type; > Should this be a case statement instead? I think that the special handling needed for strings causes switch/case to loose its benefits. I can do it in the next change if you still prefer it. http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@310 PS5, Line 310: col_schema->__set_type_length(ParquetPlainEncoder::DecimalSize(col_type)); > Move else to previous line. Done http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@311 PS5, Line 311: col_schema->__set_scale(col_type.scale); > Please add a brief comment that VARCHAR and CHAR are always UTF8 annotated, Done http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py File tests/query_test/test_insert_parquet.py: http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py@362 PS5, Line 362: > Add function comment (here and for all other new functions Done http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py@376 PS5, Line 376: @staticmethod > Maybe you can clean up some of the other uses of os.walk() in this file? Done - test_def_level_encoding uses os.walk a bit differently (call be/util/parquet-reader instead of python functions), so I did not change it. http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py@394 PS5, Line 394: assert val is None > Can you replace the values with the names from parquet.thrift? Done http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py@397 PS5, Line 397: def _check_no_logical_type(self, schemas, column_name): > The other types have to be None, right? If so, maybe create a helper _check Done http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py@432 PS5, Line 432: lf > nit: once Done http://gerrit.cloudera.org:8080/#/c/12004/5/tests/util/get_parquet_metadata.py File tests/util/get_parquet_metadata.py: http://gerrit.cloudera.org:8080/#/c/12004/5/tests/util/get_parquet_metadata.py@181 PS5, Line 181: for f in files: > Add (as some uses of this in test_insert_parquet.py do): Done -- To view, visit http://gerrit.cloudera.org:8080/12004 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6f377950845683ab9c6dea79f4c54db0359d0b91 Gerrit-Change-Number: 12004 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Comment-Date: Wed, 05 Dec 2018 15:26:20 +0000 Gerrit-HasComments: Yes