Hello Qifan Chen, Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17980 to look at the new patch set (#5). Change subject: IMPALA-10984: Improve TimestampValue to String casting ...................................................................... IMPALA-10984: Improve TimestampValue to String casting TimestampValue::ToString was implemented by concatenating boost::gregorian::to_iso_extended_string and boost::posix_time::to_simple_string using stringstream. This involves multiple string allocations, copying, and might hit lock within tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that touches this function can be inefficient if the expression is being evaluated for millions of rows. This patch adds method TimestampValue::ToStringVal and reimplements TimestampValue::ToString by supplying default DateTimeFormatContext if no pattern was specified. "yyyy-MM-dd HH:mm:ss" will be picked as the default format if the time_ component does not have fractional seconds. Otherwise, "yyyy-MM-dd HH:mm:ss.SSSSSSSSS" will be picked as the default format. The chosen DateTimeFormatContext then is passed to TimestampParser::Format along with date_ and time_ to be formatted into the string representation. Int to string parsing method is replaced with FastInt32ToBufferLeft in TimestampParser::Format. We ran a set of expression benchmarks in a machine with Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz. This patch gives > 10X performance improvement for CAST timestamp to string and FROM_UNIXTIME without a date-time pattern. Following are the detailed results before and after the patch. Before the patch: FromUnixCodegen: Function 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) --------------------------------------------------------------------------------------------------- literal 36.7 37 37.3 1X 1X 1X cast(now() as string) 2.31 2.31 2.33 0.0628X 0.0623X 0.0626X cast(now() as string format 'Y .SSSSS') 16.9 17.5 17.5 0.459X 0.472X 0.471X from_unixtime(0,'yyyy-MM-dd HH:mm:ss') 6.3 6.3 6.37 0.171X 0.17X 0.171X from_unixtime(0,'yyyy-MM-dd') 11.8 11.8 12 0.32X 0.32X 0.322X from_unixtime(0) 2.36 2.4 2.4 0.0644X 0.0648X 0.0644X After the patch: FromUnixCodegen: Function 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) --------------------------------------------------------------------------------------------------- literal 37.7 38.1 38.4 1X 1X 1X cast(now() as string) 29.9 30.1 30.2 0.794X 0.79X 0.787X cast(now() as string format 'Y .SSSSS') 61.1 61.3 61.6 1.62X 1.61X 1.61X from_unixtime(0,'yyyy-MM-dd HH:mm:ss') 33.6 33.8 34.2 0.892X 0.887X 0.892X from_unixtime(0,'yyyy-MM-dd') 50.5 50.6 50.9 1.34X 1.33X 1.33X from_unixtime(0) 34 34.2 34.5 0.902X 0.896X 0.898X The literal expression used as the baseline in this benchmark is "cast('2012-01-01 09:10:11.123456789' as timestamp)". This patch also updates numbers in expr-benchmark for BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear its MemPool in between benchmark iteration so that it does not run out of memory. Testing: - Pass core tests. Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exec/kudu-util-ir.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/cast-functions-ir.cc M be/src/exprs/literal.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.cc M be/src/runtime/date-parse-util.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/datetime-simple-date-format-parser.cc M be/src/runtime/datetime-simple-date-format-parser.h M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h M be/src/service/client-request-state.cc M be/src/util/min-max-filter.cc 22 files changed, 316 insertions(+), 213 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/5 -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>