Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12247 )
Change subject: IMPALA-5051: Add INT64 timestamp write support in Parquet ...................................................................... Patch Set 9: Code-Review+1 (5 comments) http://gerrit.cloudera.org:8080/#/c/12247/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12247/8//COMMIT_MSG@39 PS8, Line 39: without conversion to UTC > The main problem with normalizing to UTC from Impala's standpoint is the pe OK, thanks for the explanation for both of you! http://gerrit.cloudera.org:8080/#/c/12247/8/be/src/exec/parquet/hdfs-parquet-table-writer.cc File be/src/exec/parquet/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/12247/8/be/src/exec/parquet/hdfs-parquet-table-writer.cc@579 PS8, Line 579: result_ > result_ shouldn't be a local variable, as we return a pointer to it. This i Ah, I completely missed that. http://gerrit.cloudera.org:8080/#/c/12247/8/be/src/exec/parquet/parquet-metadata-utils.cc File be/src/exec/parquet/parquet-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/12247/8/be/src/exec/parquet/parquet-metadata-utils.cc@142 PS8, Line 142: /// converted_type is not set because Impala always writes timestamps without UTC > That's true, but it has a metadata field to tell the caller the desired sem Sorry, I was really thinking about Hive and Spark. But it doesn't really matter, since now we have this 'isAdjustedToUTC' field. And Zoltan also said in one of his comments that "'pure' TIMESTAMP and TIMESTAMP WITHOUT TIME ZONE shall not be normalized to UTC" http://gerrit.cloudera.org:8080/#/c/12247/8/be/src/runtime/timestamp-value.inline.h File be/src/runtime/timestamp-value.inline.h: http://gerrit.cloudera.org:8080/#/c/12247/8/be/src/runtime/timestamp-value.inline.h@154 PS8, Line 154: // TODO: consider optimizing this (IMPALA-8268) : kudu::int128_t nanos128 = : static_cast<kudu::int128_t>(unixtime_seconds) * NANOS_PER_SEC : + time_.fractional_seconds(); : : if (nanos128 < std::numeric_limits<int64_t>::min() > I created a ticket for benchmarking and optimizing these new functions: IMP OK, thanks! http://gerrit.cloudera.org:8080/#/c/12247/8/testdata/workloads/functional-query/queries/QueryTest/parquet-int64-timestamps.test File testdata/workloads/functional-query/queries/QueryTest/parquet-int64-timestamps.test: http://gerrit.cloudera.org:8080/#/c/12247/8/testdata/workloads/functional-query/queries/QueryTest/parquet-int64-timestamps.test@102 PS8, Line 102: ---- QUERY : create table int96_nanos (ts timestamp) stored as parquet; : ==== : ---- QUERY : # Insert edge values as "normal" int96 timestamps that can represent all values. : set parquet_timestamp_type=INT96_NANOS; : insert into int96_nanos values > Thanks for the tip! You're welcome! -- To view, visit http://gerrit.cloudera.org:8080/12247 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib41ad532ec902ed5a9a1528513726eac1c11441f Gerrit-Change-Number: 12247 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Zoltan Ivanfi <zi+ger...@cloudera.com> Gerrit-Comment-Date: Fri, 01 Mar 2019 15:43:45 +0000 Gerrit-HasComments: Yes