[ https://issues.apache.org/jira/browse/IMPALA-5050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618048#comment-16618048 ]
ASF subversion and git services commented on IMPALA-5050: --------------------------------------------------------- Commit 2ee8caeb3053dfa2c434c680ffb2ac756627ee38 in impala's branch refs/heads/master from [~csringhofer] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=2ee8cae ] IMPALA-7521: Speed up sub-second unix time->TimestampValue conversions Impala used to convert from sub-second unix time to TimestampValue (which is split to date_ and time_ similarly to boost::posix_time::ptime) by first splitting the input into seconds and sub-seconds part, converting the seconds part wit boost::posix_time::from_time_t(), and then adding the sub-seconds part to this timestamp. Different tricks are used to speed up different functions: - UTC functions that expect a single integer as input can split it into date_ and time_ directly. - Non-UTC functions need seconds for timezone conversion, because CCTZ expects time points as seconds. These were optimized by adding the subsecond part to time_ instead of adding it to a ptime. This can be done safely because the sub-second part is between [0, 1 sec), so it cannot overflow into a different day or timezone. Benchmarks show 2x - 6x speedup for the measured functions. The main motivation is IMPALA-5050: "Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet scanner" - reading these types will run micro/milli->TimestampValue conversion for every row. Other changes: - TimestampValue::UtcFromUnixTimeMillis was added - currently this is only used in tests but it will be useful for IMPALA-5050 - Some functions were moved from .h to .inline.h. - FromUnixTimeMicros was changed to do the utc->local conversion depending on flag use_local_tz_for_unix_timestamp_conversions to be consistent with other similar functions. This function was only used in tests until now but it will be useful for IMPALA-5050. - When a result mismatch is detected in convert-timestamp-benchmark.cc it now prints non-equal values. - Benchmarks were added for micro + nano conversions. Note that only single threaded benchmarks were added because I do not expect any difference in the multi threaded case. - DCHECKs were added to TimeStampValue::Validate to ensure that time_ is between [0, 24 hour). Testing: - timestamp-test.cc was extended to give better coverage for sub-second conversions. Edge cases were already covered pretty well. Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Reviewed-on: http://gerrit.cloudera.org:8080/11183 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet > scanner > -------------------------------------------------------------------------------- > > Key: IMPALA-5050 > URL: https://issues.apache.org/jira/browse/IMPALA-5050 > Project: IMPALA > Issue Type: New Feature > Components: Backend > Affects Versions: Impala 2.9.0 > Reporter: Lars Volker > Assignee: Csaba Ringhofer > Priority: Major > > This requires updating {{parquet.thrift}} to a version that includes the > {{TIMESTAMP_MICROS}} logical type. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org