Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11183

to look at the new patch set (#11).

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
......................................................................

IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

Impala used to convert from sub-second unix time to TimestampValue
(which is split do date_ and time_ similarly to boost::posix_time::ptime)
by first splitting the input into seconds and sub-seconds part,
converting the seconds part with boost::posix_time::from_time_t(), and
then adding the sub-seconds part to this timestamp.

Different tricks are used to speed up different functions:
- UTC functions that expect a single integer as input can
  split it into date_ and time_ directly.
- Non-UTC functions need seconds for timezone conversion,
  because CCTZ expects time points as seconds. These
  were optimized by adding the subsecond part to time_
  instead of adding it to a ptime. This can be done safely
  because the sub-second part is between [0, 1 sec), so
  it cannot overflow into a different day or timezone.

The main motivation is IMPALA-5050: "Add support to read TIMESTAMP_MILLIS
and TIMESTAMP_MICROS to the parquet scanner" - reading these types
will run micro/milli->TimestampValue conversion for every row.

Other changes:
- TimestampValue::UtcFromUnixTimeMillis was added - currently this
  is only used in tests but it will be useful for IMPALA-5050
- Some functions were moved from .h to .inline.h.
- FromUnixTimeMicros was changed to do the utc->local conversion
  depending on flag use_local_tz_for_unix_timestamp_conversions
  to be consistent with other similar functions. This function was
  only used in tests until now but it will be useful for IMPALA-5050.
- When a result mismatch is detected in convert-timestamp-benchmark.cc
  it now prints non-equal values.
- Benchmarks were added for micro + nano conversions.
  Note that only single threaded benchmarks were added because I
  do not expect any difference in thw multi threaded case.
- DCHECKs were added to TimeStampValue::Validate to
  ensure that time_ is between [0, 24 hour).

Testing:
- timestamp-test.cc was extended to give better coverage
  for sub-second conversions. Edge cases were already
  covered pretty well.

Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
9 files changed, 369 insertions(+), 106 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/11
--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>

Reply via email to