Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17980
Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString ...................................................................... IMPALA-10984: Improve performance of TimestampValue::ToString TimestampValue::ToString was implemented by concatenating boost::gregorian::to_iso_extended_string and boost::posix_time::to_simple_string using stringstream. This involves multiple string allocations, copying, and might hit lock within tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that touches this function can be inefficient if the expression is being evaluated for millions of rows. This patch reimplement TimestampValue::ToString by supplying default DateTimeFormatContext if no pattern was specified. "yyyy-MM-dd HH:mm:ss" will be picked as the default format if the time_ component does not have fractional seconds. Otherwise, "yyyy-MM-dd HH:mm:ss.SSSSSSSSS" will be picked as the default format. The chosen DateTimeFormatContext then passed to TimestampParser::Format along with date_ and time_ to be formatted into the string representation. This patch gives > 2X performance improvement for CAST timestamp to string and FROM_UNIXTIME without a date-time pattern, as shown by the following benchmark (modified from expr-benchmark), before and after the patch. Before the patch: Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz FromUnixCodegen: Function iters/ms 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) --------------------------------------------------------------------------------------------------------- literal 37.4 38.6 39.1 1X 1X 1X cast(now() as string) 2.2 2.28 2.31 0.0589X 0.0591X 0.0591X from_unixtime(0,'yyyy-MM-dd HH:mm:ss') 5.94 6.18 6.23 0.159X 0.16X 0.159X from_unixtime(0,'yyyy-MM-dd') 11.5 11.8 11.9 0.308X 0.305X 0.304X from_unixtime(0) 2.26 2.35 2.39 0.0606X 0.061X 0.0612X After the patch: Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz FromUnixCodegen: Function iters/ms 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) --------------------------------------------------------------------------------------------------------- literal 37.2 37.5 38.1 1X 1X 1X cast(now() as string) 5.65 5.65 5.67 0.152X 0.15X 0.149X from_unixtime(0,'yyyy-MM-dd HH:mm:ss') 6.3 6.3 6.37 0.169X 0.168X 0.167X from_unixtime(0,'yyyy-MM-dd') 11.7 11.9 12 0.315X 0.318X 0.314X from_unixtime(0) 6.23 6.23 6.3 0.167X 0.166X 0.165X The literal expression used as the baseline in this benchmark is "cast('2012-01-01 09:10:11.123456789' as timestamp)". This patch also updates numbers in expr-benchmark for BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear its MemPool in between benchmark iteration so that it does not run out of memory. Testing: - Pass core tests. Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exprs/cast-functions-ir.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/datetime-simple-date-format-parser.cc M be/src/runtime/datetime-simple-date-format-parser.h M be/src/runtime/timestamp-value.cc 6 files changed, 119 insertions(+), 86 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/1 -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>