[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11183 to look at the new patch set (#15). Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions Impala used to convert from sub-second unix time to TimestampValue (which is split do date_ and time_ similarly to boost::posix_time::ptime) by first splitting the input into seconds and sub-seconds part, converting the seconds part with boost::posix_time::from_time_t(), and then adding the sub-seconds part to this timestamp. Different tricks are used to speed up different functions: - UTC functions that expect a single integer as input can split it into date_ and time_ directly. - Non-UTC functions need seconds for timezone conversion, because CCTZ expects time points as seconds. These were optimized by adding the subsecond part to time_ instead of adding it to a ptime. This can be done safely because the sub-second part is between [0, 1 sec), so it cannot overflow into a different day or timezone. The main motivation is IMPALA-5050: "Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet scanner" - reading these types will run micro/milli->TimestampValue conversion for every row. Other changes: - TimestampValue::UtcFromUnixTimeMillis was added - currently this is only used in tests but it will be useful for IMPALA-5050 - Some functions were moved from .h to .inline.h. - FromUnixTimeMicros was changed to do the utc->local conversion depending on flag use_local_tz_for_unix_timestamp_conversions to be consistent with other similar functions. This function was only used in tests until now but it will be useful for IMPALA-5050. - When a result mismatch is detected in convert-timestamp-benchmark.cc it now prints non-equal values. - Benchmarks were added for micro + nano conversions. Note that only single threaded benchmarks were added because I do not expect any difference in the multi threaded case. - DCHECKs were added to TimeStampValue::Validate to ensure that time_ is between [0, 24 hour). Testing: - timestamp-test.cc was extended to give better coverage for sub-second conversions. Edge cases were already covered pretty well. Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa --- M be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-orc-scanner.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h 9 files changed, 388 insertions(+), 106 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/15 -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 15 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 14: Code-Review+1 (6 comments) Thanks, LGTM. Couple cosmetic things: http://gerrit.cloudera.org:8080/#/c/11183/14/be/src/benchmarks/convert-timestamp-benchmark.cc File be/src/benchmarks/convert-timestamp-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/11183/14/be/src/benchmarks/convert-timestamp-benchmark.cc@581 PS14, Line 581: TimestampValue sec_split_utc_from_unix_time_micros( : const int64_t& unix_time_micros) { nit: one line. Here and below, where applicable. http://gerrit.cloudera.org:8080/#/c/11183/14/be/src/benchmarks/convert-timestamp-benchmark.cc@875 PS14, Line 875: // Note that the test data contains only whole seconds. The comment is out of date. http://gerrit.cloudera.org:8080/#/c/11183/14/be/src/benchmarks/convert-timestamp-benchmark.cc@902 PS14, Line 902: // Note that the test data contains only whole seconds. The comment is out of date. http://gerrit.cloudera.org:8080/#/c/11183/14/be/src/benchmarks/convert-timestamp-benchmark.cc@929 PS14, Line 929: // Note that the test data contains only whole seconds. The comment is out of date. http://gerrit.cloudera.org:8080/#/c/11183/14/be/src/benchmarks/convert-timestamp-benchmark.cc@936 PS14, Line 936: nit: extra space http://gerrit.cloudera.org:8080/#/c/11183/14/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: http://gerrit.cloudera.org:8080/#/c/11183/14/be/src/runtime/timestamp-test.cc@280 PS14, Line 280: offset_millis Probably this should be called 'offset_seconds' or just 'offsets' -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 14 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 03 Sep 2018 13:04:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 12: (10 comments) http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc File be/src/benchmarks/convert-timestamp-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@875 PS12, Line 875: Note that the test data contains only whole seconds. > Why not add sub-second microsec test values? Done http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@899 PS12, Line 899: Benchmark FromUnixTimeNanos improvement in IMPALA-7417. : // Note that the test data contains only whole seconds. > Why not add sub-second nanosec test values? Done http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@917 PS12, Line 917: sec split > Shouldn't this be "day split"? The first division in nano conversion is splitting nano to second and (sub-second) nanosecond parts. This is why I still consider it do be sec split. The majority of the speedup in nano + double functions comes from replacing "temp += boost::posix_time::nanoseconds(...);" with "result.time_ += boost::posix_time::nanoseconds(...);". The former has to care about overflowing to another day, while the later does not. http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@939 PS12, Line 939: (old) > "sec split (old)" ? see line 917 http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@940 PS12, Line 940: (new) > "day split (new)" ? see line 917 http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@238 PS12, Line 238: TimestampValue > const TimestampValue& Done http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@249 PS12, Line 249: 10 > I could be mistaken, but shouldn't this diff be 0 or 1? It can be more than a microsec due to double's lower precision at large numbers. The biggest one I saw was 7, so I have set the limit to 8 + added a comment about this. http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@290 PS12, Line 290: EXPECT_EQ(result, TestFromSubSecondFunctionsInner( : seconds + sign * offset, millis - 1000 * sign * offset)); > EXPECT_EQ() is not really necessary here. It checks that I have removed TestFromSubSecondFunctionsInner, and call FromUnixTimeNanos directly in this loop, as that is the only functions where this shifting matters. http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@295 PS12, Line 295: if (expected == nullptr) EXPECT_FALSE(result.HasDate()); : else EXPECT_EQ(expected, result.ToString()); > I would also move these lines after L284 for readability. Good idea. http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-value.h@315 PS12, Line 315: GRANURALITY > Typo: GRANULARITY. Done -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 12 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 31 Aug 2018 15:48:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11183 to look at the new patch set (#14). Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions Impala used to convert from sub-second unix time to TimestampValue (which is split do date_ and time_ similarly to boost::posix_time::ptime) by first splitting the input into seconds and sub-seconds part, converting the seconds part with boost::posix_time::from_time_t(), and then adding the sub-seconds part to this timestamp. Different tricks are used to speed up different functions: - UTC functions that expect a single integer as input can split it into date_ and time_ directly. - Non-UTC functions need seconds for timezone conversion, because CCTZ expects time points as seconds. These were optimized by adding the subsecond part to time_ instead of adding it to a ptime. This can be done safely because the sub-second part is between [0, 1 sec), so it cannot overflow into a different day or timezone. The main motivation is IMPALA-5050: "Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet scanner" - reading these types will run micro/milli->TimestampValue conversion for every row. Other changes: - TimestampValue::UtcFromUnixTimeMillis was added - currently this is only used in tests but it will be useful for IMPALA-5050 - Some functions were moved from .h to .inline.h. - FromUnixTimeMicros was changed to do the utc->local conversion depending on flag use_local_tz_for_unix_timestamp_conversions to be consistent with other similar functions. This function was only used in tests until now but it will be useful for IMPALA-5050. - When a result mismatch is detected in convert-timestamp-benchmark.cc it now prints non-equal values. - Benchmarks were added for micro + nano conversions. Note that only single threaded benchmarks were added because I do not expect any difference in the multi threaded case. - DCHECKs were added to TimeStampValue::Validate to ensure that time_ is between [0, 24 hour). Testing: - timestamp-test.cc was extended to give better coverage for sub-second conversions. Edge cases were already covered pretty well. Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa --- M be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-orc-scanner.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h 9 files changed, 397 insertions(+), 106 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/14 -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 14 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11183 to look at the new patch set (#13). Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions Impala used to convert from sub-second unix time to TimestampValue (which is split do date_ and time_ similarly to boost::posix_time::ptime) by first splitting the input into seconds and sub-seconds part, converting the seconds part with boost::posix_time::from_time_t(), and then adding the sub-seconds part to this timestamp. Different tricks are used to speed up different functions: - UTC functions that expect a single integer as input can split it into date_ and time_ directly. - Non-UTC functions need seconds for timezone conversion, because CCTZ expects time points as seconds. These were optimized by adding the subsecond part to time_ instead of adding it to a ptime. This can be done safely because the sub-second part is between [0, 1 sec), so it cannot overflow into a different day or timezone. The main motivation is IMPALA-5050: "Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet scanner" - reading these types will run micro/milli->TimestampValue conversion for every row. Other changes: - TimestampValue::UtcFromUnixTimeMillis was added - currently this is only used in tests but it will be useful for IMPALA-5050 - Some functions were moved from .h to .inline.h. - FromUnixTimeMicros was changed to do the utc->local conversion depending on flag use_local_tz_for_unix_timestamp_conversions to be consistent with other similar functions. This function was only used in tests until now but it will be useful for IMPALA-5050. - When a result mismatch is detected in convert-timestamp-benchmark.cc it now prints non-equal values. - Benchmarks were added for micro + nano conversions. Note that only single threaded benchmarks were added because I do not expect any difference in the multi threaded case. - DCHECKs were added to TimeStampValue::Validate to ensure that time_ is between [0, 24 hour). Testing: - timestamp-test.cc was extended to give better coverage for sub-second conversions. Edge cases were already covered pretty well. Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa --- M be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-orc-scanner.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h 9 files changed, 397 insertions(+), 106 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/13 -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 12: (14 comments) http://gerrit.cloudera.org:8080/#/c/11183/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11183/12//COMMIT_MSG@43 PS12, Line 43: thw typo: the http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc File be/src/benchmarks/convert-timestamp-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@932 PS11, Line 932: > I added the test originally to check if the rounding has added any regressi Thanks for the explanation, I'm okay with keeping it. http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc File be/src/benchmarks/convert-timestamp-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@875 PS12, Line 875: Note that the test data contains only whole seconds. Why not add sub-second microsec test values? http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@899 PS12, Line 899: Benchmark FromUnixTimeNanos improvement in IMPALA-7417. : // Note that the test data contains only whole seconds. Why not add sub-second nanosec test values? http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@917 PS12, Line 917: sec split Shouldn't this be "day split"? http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@939 PS12, Line 939: (old) "sec split (old)" ? http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@940 PS12, Line 940: (new) "day split (new)" ? http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@238 PS12, Line 238: TimestampValue const TimestampValue& http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@249 PS12, Line 249: 10 I could be mistaken, but shouldn't this diff be 0 or 1? 'expected' is set to 'from_millis' in L267 and has millisec precision. 'from_double's time member is set to milliseconds + sub-nanos (<1 nano). So when they are both converted to microseconds their diff should be 0 (or maybe 1), right? http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@290 PS12, Line 290: EXPECT_EQ(result, TestFromSubSecondFunctionsInner( : seconds + sign * offset, millis - 1000 * sign * offset)); EXPECT_EQ() is not really necessary here. It checks that (seconds*1000 + millis) calculated on L258 is the same as ((seconds + sign * offset)*1000 + (millis - 1000 * sign * offset)) which is always true. I would change TestFromSubSecondFunctionsInner() to return void and just call TestFromSubSecondFunctionsInner() here in a loop without EXPECT_EQ(). Calculations in L257-259 can be moved to L284 and 'result' can be passed to TestFromSubSecondFunctionsInner() as a parameter. http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@295 PS12, Line 295: if (expected == nullptr) EXPECT_FALSE(result.HasDate()); : else EXPECT_EQ(expected, result.ToString()); I would also move these lines after L284 for readability. http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@884 PS12, Line 884: // A few test with sub-millisec precision. : auto tz = TimezoneDatabase::GetUtcTimezone(); : EXPECT_EQ("1970-01-01 00:00:00.01000", : TimestampValue::UtcFromUnixTimeMicros(1).ToString()); : EXPECT_EQ("1969-12-31 23:59:59.99000", : TimestampValue::UtcFromUnixTimeMicros(-1).ToString()); : : EXPECT_EQ("1970-01-01 00:00:00.01000", : TimestampValue::FromUnixTimeMicros(1, tz).ToString()); : EXPECT_EQ("1969-12-31 23:59:59.99000", : TimestampValue::FromUnixTimeMicros(-1, tz).ToString()); : : EXPECT_EQ("1970-01-01 00:00:00.1", : TimestampValue::FromUnixTimeNanos(0, 1, tz).ToString()); : EXPECT_EQ("1969-12-31 23:59:59.9", : TimestampValue::FromUnixTimeNanos(0, -1, tz).ToString()); Thanks for adding these. http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-value.h@315 PS12, Line 315: GRANURALITY Typo: GRANULARITY.
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 12: (15 comments) http://gerrit.cloudera.org:8080/#/c/11183/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11183/11//COMMIT_MSG@10 PS11, Line 10: (which is split do date_ and time_ similarly to > nit: please wrap at 70 characters, here and below. Done http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc File be/src/benchmarks/convert-timestamp-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@578 PS11, Line 578: boost::posix_time::nanoseconds(t > It probably wouldn't make a difference but maybe you should call boost::pos Done http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@631 PS11, Line 631: cctz_optimized_unix_time_to_utc_ptime(unix_time_whole); > The original implementation had: That constant was not accessible from here and I thought that the results would be the same, so why not use NANOS_PER_SEC. It turned that I was wrong and the result is slightly different in some cases (+- 1 nanosec in analytic average tests). http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@635 PS11, Line 635: : TimestampValue from_subsecond_unix_time_new( : const double& unix_time) { : const double ONE_BILLIONTH = 0.1; : int64_t unix_time_whole = unix_time; : int64_t nanos = (unix_time - unix_time_whole) / ONE_BILLIONTH; : > The implementation of TimestampValue::FromSubsecondUnixTime() was changed i Done http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@923 PS11, Line 923: : // Benchmark FromSubsecondUnixTime before and after IMPALA-7417. > Are the rounding rules still different? No, it should be the same in patch set 12. http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@932 PS11, Line 932: > This benchmark and the previous bm_utc_from_unix_time_nanos benchmark measu I added the test originally to check if the rounding has added any regression. As the rounding was replaced with truncation this is no longer relevant. I would still keep the test because the results are different (5.57 vs 3.97 improvement) and the the benchmark turned out to be useful for catching correctness issues. So I would prefer to keep the test but I can remove it if you want to keep this file more compact. http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@235 PS11, Line 235: The fraction part of 'nanos' can e > You mean "'nanos' can express sub-nanoseconds"? I hope that it is clearer with the new comment. http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@240 PS11, Line 240: ue:: > Maybe use MILLIS_PER_SEC instead? My impression is that 1000 is more readable than MILLIS_PER_SEC, while NANOS_PER_SEC is more readable than 1000*1000*1000 (and 10 is by far the worst), so this inconsistency is intentional. http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@282 PS11, Line 282: p > nit: extra space Done http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@280 PS11, Line 280: / Checks that all sub-second From*UnixTime gives the same result and that the result : // is the same as 'expected'. : // If 'expected' is nullptr then the result expected to be invalid (out of range). : void TestFromSubSecondFunctions(int64_t seconds, int millis, const cha > Why do we need this extra test? And how is this 'opposite sign'? Please cla I have extended the test to check the same timestamp with different seconds / millis, so that > 1000 millis are also tested. http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@284 PS11, Line 284: Ti > nit: missing space after 'if' Done http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@848 PS11, Line 848: > How about adding some extra tests for the micros/nanos unix time conversion Done http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h@189 PS11, Line 189: !time.is_negative() > Instead this, you could use !time.is_negative() Done http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h@190 PS11, Line 190: time.total_nanoseconds() <
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/488/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 12 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 27 Aug 2018 16:51:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11183 to look at the new patch set (#12). Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions Impala used to convert from sub-second unix time to TimestampValue (which is split do date_ and time_ similarly to boost::posix_time::ptime) by first splitting the input into seconds and sub-seconds part, converting the seconds part with boost::posix_time::from_time_t(), and then adding the sub-seconds part to this timestamp. Different tricks are used to speed up different functions: - UTC functions that expect a single integer as input can split it into date_ and time_ directly. - Non-UTC functions need seconds for timezone conversion, because CCTZ expects time points as seconds. These were optimized by adding the subsecond part to time_ instead of adding it to a ptime. This can be done safely because the sub-second part is between [0, 1 sec), so it cannot overflow into a different day or timezone. The main motivation is IMPALA-5050: "Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet scanner" - reading these types will run micro/milli->TimestampValue conversion for every row. Other changes: - TimestampValue::UtcFromUnixTimeMillis was added - currently this is only used in tests but it will be useful for IMPALA-5050 - Some functions were moved from .h to .inline.h. - FromUnixTimeMicros was changed to do the utc->local conversion depending on flag use_local_tz_for_unix_timestamp_conversions to be consistent with other similar functions. This function was only used in tests until now but it will be useful for IMPALA-5050. - When a result mismatch is detected in convert-timestamp-benchmark.cc it now prints non-equal values. - Benchmarks were added for micro + nano conversions. Note that only single threaded benchmarks were added because I do not expect any difference in thw multi threaded case. - DCHECKs were added to TimeStampValue::Validate to ensure that time_ is between [0, 24 hour). Testing: - timestamp-test.cc was extended to give better coverage for sub-second conversions. Edge cases were already covered pretty well. Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa --- M be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-orc-scanner.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h 9 files changed, 399 insertions(+), 106 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/12 -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 12 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 11: (15 comments) Thanks! Another batch of comments: http://gerrit.cloudera.org:8080/#/c/11183/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11183/11//COMMIT_MSG@10 PS11, Line 10: (which is split do date_ and time_ similarly to boost::posix_time::ptime) nit: please wrap at 70 characters, here and below. http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc File be/src/benchmarks/convert-timestamp-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@578 PS11, Line 578: boost::posix_time::seconds(time) It probably wouldn't make a difference but maybe you should call boost::posix_time::nanoseconds(time*NANOS_PER_SEC) here to be consistent with TimestampValue::UtcFromUnixTimeTicks(). http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@631 PS11, Line 631: temp += boost::posix_time::nanoseconds((unix_time - unix_time_whole) * NANOS_PER_SEC); The original implementation had: temp += boost::posix_time::nanoseconds((unix_time - unix_time_whole) / ONE_BILLIONTH); Any reason you changed it? http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@635 PS11, Line 635: TimestampValue from_subsecond_unix_time_new( : const double& unix_time) { : int64_t unix_time_whole = round(unix_time); : int64_t nanos = round((unix_time - unix_time_whole)*NANOS_PER_SEC); : return TimestampValue::FromUnixTimeNanos( : unix_time_whole, nanos, TimezoneDatabase::GetUtcTimezone()); : } The implementation of TimestampValue::FromSubsecondUnixTime() was changed in patch-set #11. Please change the implementation here too. http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@923 PS11, Line 923: // Note that the test data contains only whole seconds. The results could be slightly : // different for sub-second times due to the different rounding rules. Are the rounding rules still different? http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@932 PS11, Line 932: Benchmark from_subsecond_unix_time("FromSubsecondUnixTime"); This benchmark and the previous bm_utc_from_unix_time_nanos benchmark measures very similar code. Do we need both? http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@235 PS11, Line 235: Double can express sub-nanoseconds You mean "'nanos' can express sub-nanoseconds"? I'm not really sure what is the role of 'nanos' in this test. Could you clarify in the comment? http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@240 PS11, Line 240: 1000 Maybe use MILLIS_PER_SEC instead? http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@282 PS11, Line 282: nit: extra space http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@280 PS11, Line 280: // Try the same timestamp with opposite sign. : seconds += millis < 0 ? -1 : 1; : millis += millis < 0 ? 1000 : -1000; : EXPECT_EQ(result, TestFromSubSecondFunctionsInner(seconds, millis)); Why do we need this extra test? And how is this 'opposite sign'? Please clarify in the comment. http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@284 PS11, Line 284: if nit: missing space after 'if' http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@848 PS11, Line 848: Test subsecond unix time conversion for non edge cases. How about adding some extra tests for the micros/nanos unix time conversion functions to make sure that micros/nanos precision is handled correctly. http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h@189 PS11, Line 189: time.total_nanoseconds() >= 0 Instead this, you could use !time.is_negative() http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h@190 PS11, Line 190: time.total_nanoseconds() < NANOS_PER_DAY maybe use (time.hours() < 24) ? http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h@328 PS11, Line 328: unix_time unix_time_ticks -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Aug 2018 12:22:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 11: The last build failure was caused by a hung test that seemed unrelated. I have created a jira about it: IMPALA-7485 "test_spilling_naaj hung on jenkins" -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Aug 2018 10:09:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3080/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Aug 2018 09:01:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 11: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3077/ -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Aug 2018 05:18:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/467/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 23 Aug 2018 20:59:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 11: (18 comments) http://gerrit.cloudera.org:8080/#/c/11183/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11183/9//COMMIT_MSG@35 PS9, Line 35: This function was : only used in tests until > Why? Is it used for testing only? If so, please add a sentence explaining Done http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/benchmarks/convert-timestamp-benchmark.cc File be/src/benchmarks/convert-timestamp-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/benchmarks/convert-timestamp-benchmark.cc@258 PS9, Line 258: < > nit: space is missing before '<<' operator Done http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/benchmarks/convert-timestamp-benchmark.cc@561 PS9, Line 561: GRANULARITY > typo: GRANULARITY. Done http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/benchmarks/convert-timestamp-benchmark.cc@574 PS9, Line 574: Sp > nit: remove extra spaces. Done http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/exec/hdfs-orc-scanner.cc@27 PS9, Line 27: #include "runtime/timestamp-value.inline.h" : #include "runtime/tuple-row.h" > nit: order alphabetically. Done http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@229 PS9, Line 229: UtcToLoc > You should validate the time value here (call Validate() or whatever functi Done http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@295 PS9, Line 295: > Fix the comment or move the time-validation logic below to a separate funct Done http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@298 PS9, Line 298: ti > else if Done http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@303 PS9, Line 303: Time's validity is only checked in debug builds. : /// TODO: This could be also checked in release, but I a > Maybe instead of DCHECK calls, you should call SetToInvalidDateTime() here? I am unsure about the future of Validate() - with the exception of cases when the timestamp is created with memcpy from an unchecked source (e.g. Parquet reading), it may be better to use a DCHECK instead of SetToInvalidDateTime() - generally the caller logic should ensure that only valid arguments are used. I have moved the check to a separate function. http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@314 PS9, Line 314: > typo: GRANULARITY Done http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@327 PS9, Line 327: > I think, this should be called 'unix_time_ticks' Done http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@327 PS9, Line 327: > Maybe this should be called UtcFromUnixTimeTicks(). Done http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.cc@153 PS9, Line 153: TimestampValue TimestampValue::UnixTimeToLocal( : time_t unix_time, const Timezone& local_tz) { > Fix comment. I have fixed the comment in .h and removed the comment from .cc. http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.cc@168 PS9, Line 168: T > nit: insert empty line after } Done http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.cc@169 PS9, Line 169: > nit: missing space between ) and { Done http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.inline.h File be/src/runtime/timestamp-value.inline.h: http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.inline.h@34 PS9, Line 34: 64_t unix > should be called 'unix_time_ticks' Done http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.inline.h@34 PS9, Line 34: UtcFromUnixTime > should be called UtcFromUnixTimeTicks(). Done http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.inline.h@61 PS9, Line 61: unix_ > Is calling round() here and below intentional? The previous version of From I have removed the rounds because it lead to slightly (+-1 nanosec) different results in a test. I think that using round() leads more to precise results but I am trying to avoid adding functional changes in this commit. (It would be probably the best to remove direct double<->timestamp conversion and replace it with decimal<->timestamp, see IMPALA-7472. -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3077/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 23 Aug 2018 20:28:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11183 to look at the new patch set (#11). Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions Impala used to convert from sub-second unix time to TimestampValue (which is split do date_ and time_ similarly to boost::posix_time::ptime) by first splitting the input into seconds and sub-seconds part, converting the seconds part with boost::posix_time::from_time_t(), and then adding the sub-seconds part to this timestamp. Different tricks are used to speed up different functions: - UTC functions that expect a single integer as input can split it into date_ and time_ directly. - Non-UTC functions need seconds for timezone conversion, because CCTZ expects time points as seconds. These were optimized by adding the subsecond part to time_ instead of adding it to a ptime. This can be done safely because the sub-second part is between [0, 1 sec), so it cannot overflow into a different day or timezone. The main motivation is IMPALA-5050: "Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet scanner" - reading these types will run micro/milli->TimestampValue conversion for every row. Other changes: - TimestampValue::UtcFromUnixTimeMillis was added - currently this is only used in tests but it will be useful for IMPALA-5050 - Some functions were moved from .h to .inline.h. - FromUnixTimeMicros was changed to do the utc->local conversion depending on flag use_local_tz_for_unix_timestamp_conversions to be consistent with other similar functions. This function was only used in tests until now but it will be useful for IMPALA-5050. - When a result mismatch is detected in convert-timestamp-benchmark.cc it now prints non-equal values. - Benchmarks were added for micro + nano conversions. Note that only single threaded benchmarks were added because I do not expect any difference in thw multi threaded case. - DCHECKs were added to TimeStampValue::Validate to ensure that time_ is between [0, 24 hour). Testing: - timestamp-test.cc was extended to give better coverage for sub-second conversions. Edge cases were already covered pretty well. Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa --- M be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-orc-scanner.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h 9 files changed, 369 insertions(+), 106 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/11 -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 10: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3067/ -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 22 Aug 2018 17:58:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 9: (18 comments) Did a quick first-round review. I will look at the tests the next time. http://gerrit.cloudera.org:8080/#/c/11183/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11183/9//COMMIT_MSG@35 PS9, Line 35: This should have : no visible side-effects. Why? Is it used for testing only? If so, please add a sentence explaining it. http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/benchmarks/convert-timestamp-benchmark.cc File be/src/benchmarks/convert-timestamp-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/benchmarks/convert-timestamp-benchmark.cc@258 PS9, Line 258: << nit: space is missing before '<<' operator http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/benchmarks/convert-timestamp-benchmark.cc@561 PS9, Line 561: GRANURALITY typo: GRANULARITY. http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/benchmarks/convert-timestamp-benchmark.cc@574 PS9, Line 574: nit: remove extra spaces. http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/exec/hdfs-orc-scanner.cc@27 PS9, Line 27: #include "runtime/tuple-row.h" : #include "runtime/timestamp-value.inline.h" nit: order alphabetically. http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@229 PS9, Line 229: set_time You should validate the time value here (call Validate() or whatever function time-validation ends up in) http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@295 PS9, Line 295: /// Sets both date and time to invalid if date is outside the valid range. Fix the comment or move the time-validation logic below to a separate function. http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@298 PS9, Line 298: if else if http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@303 PS9, Line 303: DCHECK_GE(time_.total_nanoseconds(), 0); : DCHECK_LT(time_.total_nanoseconds(), NANOS_PER_DAY); Maybe instead of DCHECK calls, you should call SetToInvalidDateTime() here? Again, it might make sense to move time-validation code to a separate function. http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@314 PS9, Line 314: GRANURALITY typo: GRANULARITY http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@327 PS9, Line 327: unix_time I think, this should be called 'unix_time_ticks' http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@327 PS9, Line 327: UtcFromUnixTime Maybe this should be called UtcFromUnixTimeTicks(). http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.cc@153 PS9, Line 153: /// Return a ptime representation of the given Unix time (seconds since the Unix epoch). : /// The time zone of the resulting ptime is 'local_tz'. This is called by UnixTimeToPtime. Fix comment. http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.cc@168 PS9, Line 168: } nit: insert empty line after } http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.cc@169 PS9, Line 169: ){ nit: missing space between ) and { http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.inline.h File be/src/runtime/timestamp-value.inline.h: http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.inline.h@34 PS9, Line 34: UtcFromUnixTime should be called UtcFromUnixTimeTicks(). http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.inline.h@34 PS9, Line 34: unix_time should be called 'unix_time_ticks' http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.inline.h@61 PS9, Line 61: round Is calling round() here and below intentional? The previous version of FromSubsecondUnixTime() just casts unix_time to int64_t. Is this change addressed in the commit message? -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 22 Aug 2018 16:23:17 +
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 10: The test has hit IMPALA-7459 - the commit that causes it was reverted, so a rebase should help. -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 22 Aug 2018 14:43:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3067/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 22 Aug 2018 14:41:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 9: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3066/ -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 22 Aug 2018 14:01:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/447/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 22 Aug 2018 11:22:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 9: -Code-Review (1 comment) http://gerrit.cloudera.org:8080/#/c/11183/8/be/src/runtime/timestamp-value.inline.h File be/src/runtime/timestamp-value.inline.h: http://gerrit.cloudera.org:8080/#/c/11183/8/be/src/runtime/timestamp-value.inline.h@67 PS8, Line 67: const Timezone& local_tz) { The test failure in patch set 8 was caused by hitting this DCHECK for very small (<<1 nanosec) negative unix times. floor() truncated these values to -1, but unix_time - unix_time_whole was 1.0 (instead of 1-very_small_fractional _number) due the to the decreased preciison at 1.0 compared to 0.0. This would also be a problem when adding the sub-second value to time_, because adding a whole sec can step to a different day/timezone rule. The solution was to use rounding to get seconds, calculate nanoseconds from unix_time-unix_time_whole without any expectations, and give the results to FromUnixTimeNanos() , which can handle negative or >=1 sec values. This is probably a bit sub-optimal, but I do not want to put too much effort to this function. A benchmark was added that shows that this function is still faster than it used to be. -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 22 Aug 2018 11:12:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3066/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 22 Aug 2018 10:49:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Hello Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11183 to look at the new patch set (#9). Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions Impala used to convert from sub-second unix time to TimestampValue (which is split do date_ and time_ similarly to boost::posix_time::ptime) by first splitting the input into seconds and sub-seconds part, converting the seconds part with boost::posix_time::from_time_t(), and then adding the sub-seconds part to this timestamp. Different tricks are used to speed up different functions: - UTC functions that expect a single integer as input can split it into date_ and time_ directly. - Non-UTC functions need seconds for timezone conversion, because CCTZ expects time points as seconds. These were optimized by adding the subsecond part to time_ instead of adding it to a ptime. This can be done safely because the sub-second part is between [0, 1 sec), so it cannot overflow into a different day or timezone. The main motivation is IMPALA-5050: "Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet scanner" - reading these types will run micro/milli->TimestampValue conversion for every row. Other changes: - TimestampValue::UtcFromUnixTimeMillis was added - currently this is only used in tests but it will be useful for IMPALA-5050 - Some functions were moved from .h to .inline.h. - FromUnixTimeMicros was changed to do the utc->local conversion depending on flag use_local_tz_for_unix_timestamp_conversions to be consistent with other similar functions. This should have no visible side-effects. - When a result mismatch is detected in convert-timestamp-benchmark.cc it now prints non-equal values. - Benchmarks were added for micro + nano conversions. Note that only single threaded benchmarks were added because I do not expect any difference in thw multi threaded case. - DCHECKs were added to TimeStampValue::Validate to ensure that time_ is between [0, 24 hour). Testing: - timestamp-test.cc was extended to give better coverage for sub-second conversions. Edge cases were already covered pretty well. Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa --- M be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-orc-scanner.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h 9 files changed, 365 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/9 -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 8: Code-Review-2 The test failure is a bit weird: expr-test crashes at "TestCast("cast(-2.2250738585072020E-308 as double)", -2.2250738585072020e-308);", which should not be affected by this change. The crash occurs reliably both on my desktop and on jenkins. -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 21 Aug 2018 11:59:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 8: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3029/ -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 17 Aug 2018 19:34:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 7: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/389/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 17 Aug 2018 17:09:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 6: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/388/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 17 Aug 2018 17:04:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 8: Sorry for the large number of patches, I had a hard time fixing compilation errors (one of them only occurred in clean release builds, so not in my dirty local release build...) -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 17 Aug 2018 16:46:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11183 ) Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3029/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 17 Aug 2018 16:36:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11183 to look at the new patch set (#7). Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions Impala used to convert from sub-second unix time to TimestampValue (which is split do date_ and time_ similarly to boost::posix_time::ptime) by first splitting the input into seconds and sub-seconds part, converting the seconds part with boost::posix_time::from_time_t(), and then adding the sub-seconds part to this timestamp. Different tricks are used to speed up different functions: - UTC functions that expect a single integer as input can split it into date_ and time_ directly. - Non-UTC functions need seconds for timezone conversion, because CCTZ expects time points as seconds. These were optimized by adding the subsecond part to time_ instead of adding it to a ptime. This can be done safely because the sub-second part is between [0, 1 sec), so it cannot overflow into a different day or timezone. The main motivation is IMPALA-5050: "Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet scanner" - reading these types will run micro/milli->TimestampValue conversion for every row. Other changes: - Some functions were moved from .h to .inline.h. - FromUnixTimeMicros was changed to do the utc->local conversion depending on flag use_local_tz_for_unix_timestamp_conversions to be consistent with other similar functions. This should have no visible side-effects. - When a result mismatch is detected in convert-timestamp-benchmark.cc it now prints non-equal values. - Benchmarks were added for micro + nano conversions. - DCHECKs were added to TimeStampValue::Validate to ensure that time_ is between [0, 24 hour). Testing: - timestamp-test.cc was extended to give better coverage for sub-second conversions. Edge cases were already covered pretty well. Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa --- M be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-orc-scanner.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h 9 files changed, 312 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/7 -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11183 to look at the new patch set (#6). Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions .. IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions Impala used to convert from sub-second unix time to TimestampValue (which is split do date_ and time_ similarly to boost::posix_time::ptime) by first splitting the input into seconds and sub-seconds part, converting the seconds part with boost::posix_time::from_time_t(), and then adding the sub-seconds part to this timestamp. Different tricks are used to speed up different functions: - UTC functions that expect a single integer as input can split it into date_ and time_ directly. - Non-UTC functions need seconds for timezone conversion, because CCTZ expects time points as seconds. These were optimized by adding the subsecond part to time_ instead of adding it to a ptime. This can be done safely because the sub-second part is between [0, 1 sec), so it cannot overflow into a different day or timezone. The main motivation is IMPALA-5050: "Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet scanner" - reading these types will run micro/milli->TimestampValue conversion for every row. Other changes: - Some functions were moved from .h to .inline.h. - FromUnixTimeMicros was changed to do the utc->local conversion depending on flag use_local_tz_for_unix_timestamp_conversions to be consistent with other similar functions. This should have no visible side-effects. - When a result mismatch is detected in convert-timestamp-benchmark.cc it now prints non-equal values. - Benchmarks were added for micro + nano conversions. - DCHECKs were added to TimeStampValue::Validate to ensure that time_ is between [0, 24 hour). Testing: - timestamp-test.cc was extended to give better coverage for sub-second conversions. Edge cases were already covered pretty well. Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa --- M be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-orc-scanner.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h 9 files changed, 304 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/6 -- To view, visit http://gerrit.cloudera.org:8080/11183 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa Gerrit-Change-Number: 11183 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins