Csaba Ringhofer has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
......................................................................

IMPALA-10984: Improve TimestampValue to String casting

TimestampValue::ToString was implemented by concatenating
boost::gregorian::to_iso_extended_string and
boost::posix_time::to_simple_string using stringstream. This involves
multiple string allocations, copying, and might hit lock within
tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that
touches this function can be inefficient if the expression is being
evaluated for millions of rows.

This patch adds method TimestampValue::ToStringVal and reimplements
TimestampValue::ToString by supplying default DateTimeFormatContext if
no pattern was specified. "yyyy-MM-dd HH:mm:ss" will be picked as the
default format if the time_ component does not have fractional seconds.
Otherwise, "yyyy-MM-dd HH:mm:ss.SSSSSSSSS" will be picked as the default
format. The chosen DateTimeFormatContext then is passed to
TimestampParser::Format along with date_ and time_ to be formatted into
the string representation. Int to string parsing method is replaced with
FastInt32ToBufferLeft in TimestampParser::Format.

We ran a set of expression benchmarks in a machine with Intel(R)
Core(TM) i7-4790 CPU @ 3.60GHz. This patch gives > 10X performance
improvement for CAST timestamp to string and FROM_UNIXTIME without a
date-time pattern. Following are the detailed results before and after
the patch.

Before the patch:
FromUnixCodegen:               Function   10%ile   50%ile   90%ile     10%ile   
  50%ile     90%ile
                                                                   (relative) 
(relative) (relative)
---------------------------------------------------------------------------------------------------
                                literal     36.7       37     37.3         1X   
      1X         1X
                  cast(now() as string)     2.31     2.31     2.33    0.0628X   
 0.0623X    0.0626X
cast(now() as string format 'Y .SSSSS')     16.9     17.5     17.5     0.459X   
  0.472X     0.471X
 from_unixtime(0,'yyyy-MM-dd HH:mm:ss')      6.3      6.3     6.37     0.171X   
   0.17X     0.171X
          from_unixtime(0,'yyyy-MM-dd')     11.8     11.8       12      0.32X   
   0.32X     0.322X
                       from_unixtime(0)     2.36      2.4      2.4    0.0644X   
 0.0648X    0.0644X

After the patch:
FromUnixCodegen:           Function       10%ile   50%ile   90%ile     10%ile   
  50%ile     90%ile
                                                                   (relative) 
(relative) (relative)
---------------------------------------------------------------------------------------------------
                                literal     37.7     38.1     38.4         1X   
      1X         1X
                  cast(now() as string)     29.9     30.1     30.2     0.794X   
   0.79X     0.787X
cast(now() as string format 'Y .SSSSS')     61.1     61.3     61.6      1.62X   
   1.61X      1.61X
 from_unixtime(0,'yyyy-MM-dd HH:mm:ss')     33.6     33.8     34.2     0.892X   
  0.887X     0.892X
          from_unixtime(0,'yyyy-MM-dd')     50.5     50.6     50.9      1.34X   
   1.33X      1.33X
                       from_unixtime(0)       34     34.2     34.5     0.902X   
  0.896X     0.898X

The literal expression used as the baseline in this benchmark is
"cast('2012-01-01 09:10:11.123456789' as timestamp)".

This patch also updates numbers in expr-benchmark for
BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear
its MemPool in between benchmark iteration so that it does not run out
of memory.

Testing:
- Pass core tests.

Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Reviewed-on: http://gerrit.cloudera.org:8080/17980
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Csaba Ringhofer <csringho...@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringho...@cloudera.com>
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/runtime/date-parse-util.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M be/src/util/min-max-filter.cc
20 files changed, 261 insertions(+), 159 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved
  Csaba Ringhofer: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>

Reply via email to