Hello Zoltan Borok-Nagy, Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/11183 to look at the new patch set (#19). Change subject: IMPALA-7521: Speed up sub-second unix time->TimestampValue conversions ...................................................................... IMPALA-7521: Speed up sub-second unix time->TimestampValue conversions Impala used to convert from sub-second unix time to TimestampValue (which is split to date_ and time_ similarly to boost::posix_time::ptime) by first splitting the input into seconds and sub-seconds part, converting the seconds part wit boost::posix_time::from_time_t(), and then adding the sub-seconds part to this timestamp. Different tricks are used to speed up different functions: - UTC functions that expect a single integer as input can split it into date_ and time_ directly. - Non-UTC functions need seconds for timezone conversion, because CCTZ expects time points as seconds. These were optimized by adding the subsecond part to time_ instead of adding it to a ptime. This can be done safely because the sub-second part is between [0, 1 sec), so it cannot overflow into a different day or timezone. Benchmarks show 2x - 6x speedup for the measured functions. The main motivation is IMPALA-5050: "Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet scanner" - reading these types will run micro/milli->TimestampValue conversion for every row. Other changes: - TimestampValue::UtcFromUnixTimeMillis was added - currently this is only used in tests but it will be useful for IMPALA-5050 - Some functions were moved from .h to .inline.h. - FromUnixTimeMicros was changed to do the utc->local conversion depending on flag use_local_tz_for_unix_timestamp_conversions to be consistent with other similar functions. This function was only used in tests until now but it will be useful for IMPALA-5050. - When a result mismatch is detected in convert-timestamp-benchmark.cc it now prints non-equal values. - Benchmarks were added for micro + nano conversions. Note that only single threaded benchmarks were added because I do not expect any difference in the multi threaded case. - DCHECKs were added to TimeStampValue::Validate to ensure that time_ is between [0, 24 hour). Testing: - timestamp-test.cc was extended to give better coverage for sub-second conversions. Edge cases were already covered pretty well. Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa --- M be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-orc-scanner.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h 9 files changed, 378 insertions(+), 108 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/19 -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 19 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>