Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12004 )

Change subject: IMPALA-7889: Write new logical types in Parquet
......................................................................


Patch Set 6:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/hdfs-parquet-table-writer.cc@973
PS5, Line 973:     parquet::SchemaElement& col_schema = file_metadata_.schema[i 
+ 1];
> Maybe choose a better name here?
I choose 'col_schema' -  I can look for a new name if others do not like it. 
'schema_element' would be more exact for someone who know Parquet well, but for 
other people 'col_schema' is more informative in my opinion (+ it is shorter).


http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.h
File be/src/exec/parquet/parquet-metadata-utils.h:

http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.h@62
PS5, Line 62:   static void FillSchemaElement(const ColumnType& col_type,
> We usually put input parameters first, then output, i.e. move node to the e
Done


http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc
File be/src/exec/parquet/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@153
PS5, Line 153:   return Status::OK();;
> This can be in the anonymous namespace, too
Done


http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@283
PS5, Line 283:     ErrorMsg msg(TErrorCode::PARQUET_INCOMPATIBLE_DECIMAL, 
filename, schema_element.name,
> Move to the anonymous namespace. Pass output parameters by pointer and move
Done


http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@292
PS5, Line 292: th'
> Can you think of a better name for the schema element than "node"?
Done


http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@297
PS5, Line 297:   parquet::LogicalType logical_type;
> Should this be a case statement instead?
I think that the special handling needed for strings causes switch/case to 
loose its benefits. I can do it in the next change if you still prefer it.


http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@310
PS5, Line 310:     
col_schema->__set_type_length(ParquetPlainEncoder::DecimalSize(col_type));
> Move else to previous line.
Done


http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@311
PS5, Line 311:     col_schema->__set_scale(col_type.scale);
> Please add a brief comment that VARCHAR and CHAR are always UTF8 annotated,
Done


http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py@362
PS5, Line 362:
> Add function comment (here and for all other new functions
Done


http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py@376
PS5, Line 376:   @staticmethod
> Maybe you can clean up some of the other uses of os.walk() in this file?
Done - test_def_level_encoding uses os.walk a bit differently (call 
be/util/parquet-reader instead of python functions), so I did not change it.


http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py@394
PS5, Line 394:         assert val is None
> Can you replace the values with the names from parquet.thrift?
Done


http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py@397
PS5, Line 397:   def _check_no_logical_type(self, schemas, column_name):
> The other types have to be None, right? If so, maybe create a helper _check
Done


http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py@432
PS5, Line 432: lf
> nit: once
Done


http://gerrit.cloudera.org:8080/#/c/12004/5/tests/util/get_parquet_metadata.py
File tests/util/get_parquet_metadata.py:

http://gerrit.cloudera.org:8080/#/c/12004/5/tests/util/get_parquet_metadata.py@181
PS5, Line 181:     for f in files:
> Add (as some uses of this in test_insert_parquet.py do):
Done



--
To view, visit http://gerrit.cloudera.org:8080/12004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6f377950845683ab9c6dea79f4c54db0359d0b91
Gerrit-Change-Number: 12004
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Dec 2018 15:26:20 +0000
Gerrit-HasComments: Yes

Reply via email to