Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12247 )
Change subject: IMPALA-5051: Add INT64 timestamp write support in Parquet ...................................................................... Patch Set 8: (7 comments) http://gerrit.cloudera.org:8080/#/c/12247/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12247/8//COMMIT_MSG@39 PS8, Line 39: without conversion to UTC Whouldn't it be better to convert to UTC? You write later that old readers also assume that the data is written in UTC. http://gerrit.cloudera.org:8080/#/c/12247/8//COMMIT_MSG@39 PS8, Line 39: tha nit: the http://gerrit.cloudera.org:8080/#/c/12247/8/be/src/exec/parquet/hdfs-parquet-table-writer.cc File be/src/exec/parquet/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/12247/8/be/src/exec/parquet/hdfs-parquet-table-writer.cc@579 PS8, Line 579: result_ What about deleting the member 'result_', and only have it here as a local variable? Now it looks strange that we invoke a member function here and pass a member variable to it. Also, this way we will have some aliasing inside ConvertTimestamp() that might prevent some compiler optimizations, but I'm not really sure honestly. http://gerrit.cloudera.org:8080/#/c/12247/8/be/src/exec/parquet/parquet-metadata-utils.h File be/src/exec/parquet/parquet-metadata-utils.h: http://gerrit.cloudera.org:8080/#/c/12247/8/be/src/exec/parquet/parquet-metadata-utils.h@60 PS8, Line 60: Return nit: Returns http://gerrit.cloudera.org:8080/#/c/12247/8/be/src/exec/parquet/parquet-metadata-utils.cc File be/src/exec/parquet/parquet-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/12247/8/be/src/exec/parquet/parquet-metadata-utils.cc@142 PS8, Line 142: /// converted_type is not set because Impala always writes timestamps without UTC Does Parquet-MR also write INT64 timestamps un-normalized? http://gerrit.cloudera.org:8080/#/c/12247/8/be/src/runtime/timestamp-value.inline.h File be/src/runtime/timestamp-value.inline.h: http://gerrit.cloudera.org:8080/#/c/12247/8/be/src/runtime/timestamp-value.inline.h@154 PS8, Line 154: kudu::int128_t nanos128 = : static_cast<kudu::int128_t>(unixtime_seconds) * NANOS_PER_SEC : + time_.fractional_seconds(); : : if (nanos128 < std::numeric_limits<int64_t>::min() : || nanos128 > std::numeric_limits<int64_t>::max()) return false; I think we can still avoid using int128_t. http://gerrit.cloudera.org:8080/#/c/12247/8/testdata/workloads/functional-query/queries/QueryTest/parquet-int64-timestamps.test File testdata/workloads/functional-query/queries/QueryTest/parquet-int64-timestamps.test: http://gerrit.cloudera.org:8080/#/c/12247/8/testdata/workloads/functional-query/queries/QueryTest/parquet-int64-timestamps.test@102 PS8, Line 102: ---- QUERY : create table int96_nanos (ts timestamp) stored as parquet; : ==== : ---- QUERY : # Insert edge values as "normal" int96 timestamps that can represent all values. : set parquet_timestamp_type=INT96_NANOS; : insert into int96_nanos values nit: you dont't need to start a new QUERY block for each query when you don't check the results. E.g. it could be: ==== ---- QUERY create table... set ... insert into ... create table ... ==== -- To view, visit http://gerrit.cloudera.org:8080/12247 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib41ad532ec902ed5a9a1528513726eac1c11441f Gerrit-Change-Number: 12247 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Zoltan Ivanfi <zi+ger...@cloudera.com> Gerrit-Comment-Date: Thu, 28 Feb 2019 15:35:07 +0000 Gerrit-HasComments: Yes