Hello Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17980 to look at the new patch set (#3). Change subject: IMPALA-10984: Improve TimestampValue to String casting ...................................................................... IMPALA-10984: Improve TimestampValue to String casting TimestampValue::ToString was implemented by concatenating boost::gregorian::to_iso_extended_string and boost::posix_time::to_simple_string using stringstream. This involves multiple string allocations, copying, and might hit lock within tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that touches this function can be inefficient if the expression is being evaluated for millions of rows. This patch reimplement TimestampValue::ToString by supplying default DateTimeFormatContext if no pattern was specified. "yyyy-MM-dd HH:mm:ss" will be picked as the default format if the time_ component does not have fractional seconds. Otherwise, "yyyy-MM-dd HH:mm:ss.SSSSSSSSS" will be picked as the default format. The chosen DateTimeFormatContext then passed to TimestampParser::Format along with date_ and time_ to be formatted into the string representation. Int to string parsing method is replaced with FastInt32ToBufferLeft in TimestampParser::Format. This patch gives > 10X performance improvement for CAST timestamp to string and FROM_UNIXTIME without a date-time pattern, as shown by the following benchmark (modified from expr-benchmark), before and after the patch. Before the patch: Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz FromUnixCodegen: Function iters/ms 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) --------------------------------------------------------------------------------------------------------- literal 37.4 38.6 39.1 1X 1X 1X cast(now() as string) 2.2 2.28 2.31 0.0589X 0.0591X 0.0591X from_unixtime(0,'yyyy-MM-dd HH:mm:ss') 5.94 6.18 6.23 0.159X 0.16X 0.159X from_unixtime(0,'yyyy-MM-dd') 11.5 11.8 11.9 0.308X 0.305X 0.304X from_unixtime(0) 2.26 2.35 2.39 0.0606X 0.061X 0.0612X After the patch: Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz FromUnixCodegen: Function iters/ms 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) --------------------------------------------------------------------------------------------------------- literal 37.9 38 38.4 1X 1X 1X cast(now() as string) 29.3 29.3 29.4 0.773X 0.77X 0.767X from_unixtime(0,'yyyy-MM-dd HH:mm:ss') 33.6 33.6 33.8 0.888X 0.884X 0.88X from_unixtime(0,'yyyy-MM-dd') 49.9 49.9 50.3 1.32X 1.31X 1.31X from_unixtime(0) 33.1 33.2 33.5 0.875X 0.872X 0.873X The literal expression used as the baseline in this benchmark is "cast('2012-01-01 09:10:11.123456789' as timestamp)". This patch also updates numbers in expr-benchmark for BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear its MemPool in between benchmark iteration so that it does not run out of memory. Testing: - Pass core tests. Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exprs/cast-functions-ir.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.h M be/src/runtime/datetime-simple-date-format-parser.cc M be/src/runtime/datetime-simple-date-format-parser.h M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h 13 files changed, 209 insertions(+), 130 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/3 -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>