Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17980


Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString
......................................................................

IMPALA-10984: Improve performance of TimestampValue::ToString

TimestampValue::ToString was implemented by concatenating
boost::gregorian::to_iso_extended_string and
boost::posix_time::to_simple_string using stringstream. This involves
multiple string allocations, copying, and might hit lock within
tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that
touches this function can be inefficient if the expression is being
evaluated for millions of rows.

This patch reimplement TimestampValue::ToString by supplying default
DateTimeFormatContext if no pattern was specified. "yyyy-MM-dd HH:mm:ss"
will be picked as the default format if the time_ component does not
have fractional seconds. Otherwise, "yyyy-MM-dd HH:mm:ss.SSSSSSSSS" will
be picked as the default format. The chosen DateTimeFormatContext then
passed to TimestampParser::Format along with date_ and time_ to be
formatted into the string representation.

This patch gives > 2X performance improvement for CAST timestamp to
string and FROM_UNIXTIME without a date-time pattern, as shown by the
following benchmark (modified from expr-benchmark), before and after the
patch.

Before the patch:
Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
FromUnixCodegen:           Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
                                                                         
(relative) (relative) (relative)
---------------------------------------------------------------------------------------------------------
                            literal               37.4     38.6     39.1        
 1X         1X         1X
              cast(now() as string)                2.2     2.28     2.31    
0.0589X    0.0591X    0.0591X
from_unixtime(0,'yyyy-MM-dd HH:mm:ss')               5.94     6.18     6.23     
0.159X      0.16X     0.159X
      from_unixtime(0,'yyyy-MM-dd')               11.5     11.8     11.9     
0.308X     0.305X     0.304X
                   from_unixtime(0)               2.26     2.35     2.39    
0.0606X     0.061X    0.0612X

After the patch:
Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
FromUnixCodegen:           Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
                                                                         
(relative) (relative) (relative)
---------------------------------------------------------------------------------------------------------
                            literal               37.2     37.5     38.1        
 1X         1X         1X
              cast(now() as string)               5.65     5.65     5.67     
0.152X      0.15X     0.149X
from_unixtime(0,'yyyy-MM-dd HH:mm:ss')                6.3      6.3     6.37     
0.169X     0.168X     0.167X
      from_unixtime(0,'yyyy-MM-dd')               11.7     11.9       12     
0.315X     0.318X     0.314X
                   from_unixtime(0)               6.23     6.23      6.3     
0.167X     0.166X     0.165X

The literal expression used as the baseline in this benchmark is
"cast('2012-01-01 09:10:11.123456789' as timestamp)".

This patch also updates numbers in expr-benchmark for
BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear
its MemPool in between benchmark iteration so that it does not run out
of memory.

Testing:
- Pass core tests.

Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/timestamp-value.cc
6 files changed, 119 insertions(+), 86 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/1
--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>

Reply via email to