[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-09-03 Thread Csaba Ringhofer (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11183

to look at the new patch set (#15).

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..

IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

Impala used to convert from sub-second unix time to TimestampValue
(which is split do date_ and time_ similarly to
boost::posix_time::ptime) by first splitting the input into seconds
and sub-seconds part,
converting the seconds part with boost::posix_time::from_time_t(),
and then adding the sub-seconds part to this timestamp.

Different tricks are used to speed up different functions:
- UTC functions that expect a single integer as input can
  split it into date_ and time_ directly.
- Non-UTC functions need seconds for timezone conversion,
  because CCTZ expects time points as seconds. These
  were optimized by adding the subsecond part to time_
  instead of adding it to a ptime. This can be done safely
  because the sub-second part is between [0, 1 sec), so
  it cannot overflow into a different day or timezone.

The main motivation is IMPALA-5050: "Add support to read
TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet
scanner" - reading these types will run
micro/milli->TimestampValue conversion for every row.

Other changes:
- TimestampValue::UtcFromUnixTimeMillis was added - currently this
  is only used in tests but it will be useful for IMPALA-5050
- Some functions were moved from .h to .inline.h.
- FromUnixTimeMicros was changed to do the utc->local conversion
  depending on flag use_local_tz_for_unix_timestamp_conversions
  to be consistent with other similar functions. This function was
  only used in tests until now but it will be useful for IMPALA-5050.
- When a result mismatch is detected in
  convert-timestamp-benchmark.cc it now prints non-equal values.
- Benchmarks were added for micro + nano conversions.
  Note that only single threaded benchmarks were added because I
  do not expect any difference in the multi threaded case.
- DCHECKs were added to TimeStampValue::Validate to
  ensure that time_ is between [0, 24 hour).

Testing:
- timestamp-test.cc was extended to give better coverage
  for sub-second conversions. Edge cases were already
  covered pretty well.

Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
9 files changed, 388 insertions(+), 106 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/15
--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 15
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-09-03 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 14: Code-Review+1

(6 comments)

Thanks, LGTM. Couple cosmetic things:

http://gerrit.cloudera.org:8080/#/c/11183/14/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/11183/14/be/src/benchmarks/convert-timestamp-benchmark.cc@581
PS14, Line 581: TimestampValue sec_split_utc_from_unix_time_micros(
  : const int64_t& unix_time_micros) {
nit: one line. Here and below, where applicable.


http://gerrit.cloudera.org:8080/#/c/11183/14/be/src/benchmarks/convert-timestamp-benchmark.cc@875
PS14, Line 875: // Note that the test data contains only whole seconds.
The comment is out of date.


http://gerrit.cloudera.org:8080/#/c/11183/14/be/src/benchmarks/convert-timestamp-benchmark.cc@902
PS14, Line 902: // Note that the test data contains only whole seconds.
The comment is out of date.


http://gerrit.cloudera.org:8080/#/c/11183/14/be/src/benchmarks/convert-timestamp-benchmark.cc@929
PS14, Line 929: // Note that the test data contains only whole seconds.
The comment is out of date.


http://gerrit.cloudera.org:8080/#/c/11183/14/be/src/benchmarks/convert-timestamp-benchmark.cc@936
PS14, Line 936:
nit: extra space


http://gerrit.cloudera.org:8080/#/c/11183/14/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

http://gerrit.cloudera.org:8080/#/c/11183/14/be/src/runtime/timestamp-test.cc@280
PS14, Line 280: offset_millis
Probably this should be called 'offset_seconds' or just 'offsets'



--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 14
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 03 Sep 2018 13:04:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-31 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 12:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@875
PS12, Line 875: Note that the test data contains only whole seconds.
> Why not add sub-second microsec test values?
Done


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@899
PS12, Line 899: Benchmark FromUnixTimeNanos improvement in IMPALA-7417.
  :   // Note that the test data contains only whole seconds.
> Why not add sub-second nanosec test values?
Done


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@917
PS12, Line 917: sec split
> Shouldn't this be "day split"?
The first division in nano conversion is splitting nano to second and 
(sub-second) nanosecond parts. This is why I still consider it do be sec split.

The majority of the speedup in nano + double functions comes from replacing  
"temp += boost::posix_time::nanoseconds(...);" with "result.time_ += 
boost::posix_time::nanoseconds(...);". The former has to care about overflowing 
to another day, while the later does not.


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@939
PS12, Line 939: (old)
> "sec split (old)" ?
see line 917


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@940
PS12, Line 940: (new)
> "day split (new)" ?
see line 917


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@238
PS12, Line 238: TimestampValue
> const TimestampValue&
Done


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@249
PS12, Line 249: 10
> I could be mistaken, but shouldn't this diff be 0 or 1?
It can be more than a microsec due to double's lower precision at large 
numbers. The biggest one I saw was 7, so I have set the limit to 8 + added a 
comment about this.


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@290
PS12, Line 290:  EXPECT_EQ(result, TestFromSubSecondFunctionsInner(
  :   seconds + sign * offset, millis - 1000 * sign * 
offset));
> EXPECT_EQ() is not really necessary here. It checks that
I have removed TestFromSubSecondFunctionsInner, and call FromUnixTimeNanos 
directly in this loop, as that is the only functions where this shifting 
matters.


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@295
PS12, Line 295:  if (expected == nullptr) EXPECT_FALSE(result.HasDate());
  :   else EXPECT_EQ(expected, result.ToString());
> I would also move these lines after L284 for readability.
Good idea.


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-value.h@315
PS12, Line 315: GRANURALITY
> Typo: GRANULARITY.
Done



--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 31 Aug 2018 15:48:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-31 Thread Csaba Ringhofer (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11183

to look at the new patch set (#14).

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..

IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

Impala used to convert from sub-second unix time to TimestampValue
(which is split do date_ and time_ similarly to
boost::posix_time::ptime) by first splitting the input into seconds
and sub-seconds part,
converting the seconds part with boost::posix_time::from_time_t(),
and then adding the sub-seconds part to this timestamp.

Different tricks are used to speed up different functions:
- UTC functions that expect a single integer as input can
  split it into date_ and time_ directly.
- Non-UTC functions need seconds for timezone conversion,
  because CCTZ expects time points as seconds. These
  were optimized by adding the subsecond part to time_
  instead of adding it to a ptime. This can be done safely
  because the sub-second part is between [0, 1 sec), so
  it cannot overflow into a different day or timezone.

The main motivation is IMPALA-5050: "Add support to read
TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet
scanner" - reading these types will run
micro/milli->TimestampValue conversion for every row.

Other changes:
- TimestampValue::UtcFromUnixTimeMillis was added - currently this
  is only used in tests but it will be useful for IMPALA-5050
- Some functions were moved from .h to .inline.h.
- FromUnixTimeMicros was changed to do the utc->local conversion
  depending on flag use_local_tz_for_unix_timestamp_conversions
  to be consistent with other similar functions. This function was
  only used in tests until now but it will be useful for IMPALA-5050.
- When a result mismatch is detected in
  convert-timestamp-benchmark.cc it now prints non-equal values.
- Benchmarks were added for micro + nano conversions.
  Note that only single threaded benchmarks were added because I
  do not expect any difference in the multi threaded case.
- DCHECKs were added to TimeStampValue::Validate to
  ensure that time_ is between [0, 24 hour).

Testing:
- timestamp-test.cc was extended to give better coverage
  for sub-second conversions. Edge cases were already
  covered pretty well.

Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
9 files changed, 397 insertions(+), 106 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/14
--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 14
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-31 Thread Csaba Ringhofer (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11183

to look at the new patch set (#13).

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..

IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

Impala used to convert from sub-second unix time to TimestampValue
(which is split do date_ and time_ similarly to
boost::posix_time::ptime) by first splitting the input into seconds
and sub-seconds part,
converting the seconds part with boost::posix_time::from_time_t(),
and then adding the sub-seconds part to this timestamp.

Different tricks are used to speed up different functions:
- UTC functions that expect a single integer as input can
  split it into date_ and time_ directly.
- Non-UTC functions need seconds for timezone conversion,
  because CCTZ expects time points as seconds. These
  were optimized by adding the subsecond part to time_
  instead of adding it to a ptime. This can be done safely
  because the sub-second part is between [0, 1 sec), so
  it cannot overflow into a different day or timezone.

The main motivation is IMPALA-5050: "Add support to read
TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet
scanner" - reading these types will run
micro/milli->TimestampValue conversion for every row.

Other changes:
- TimestampValue::UtcFromUnixTimeMillis was added - currently this
  is only used in tests but it will be useful for IMPALA-5050
- Some functions were moved from .h to .inline.h.
- FromUnixTimeMicros was changed to do the utc->local conversion
  depending on flag use_local_tz_for_unix_timestamp_conversions
  to be consistent with other similar functions. This function was
  only used in tests until now but it will be useful for IMPALA-5050.
- When a result mismatch is detected in
  convert-timestamp-benchmark.cc it now prints non-equal values.
- Benchmarks were added for micro + nano conversions.
  Note that only single threaded benchmarks were added because I
  do not expect any difference in the multi threaded case.
- DCHECKs were added to TimeStampValue::Validate to
  ensure that time_ is between [0, 24 hour).

Testing:
- timestamp-test.cc was extended to give better coverage
  for sub-second conversions. Edge cases were already
  covered pretty well.

Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
9 files changed, 397 insertions(+), 106 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/13
--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-31 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 12:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/11183/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11183/12//COMMIT_MSG@43
PS12, Line 43: thw
typo: the


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@932
PS11, Line 932:
> I added the test originally to check if the rounding has added any regressi
Thanks for the explanation, I'm okay with keeping it.


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@875
PS12, Line 875: Note that the test data contains only whole seconds.
Why not add sub-second microsec test values?


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@899
PS12, Line 899: Benchmark FromUnixTimeNanos improvement in IMPALA-7417.
  :   // Note that the test data contains only whole seconds.
Why not add sub-second nanosec test values?


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@917
PS12, Line 917: sec split
Shouldn't this be "day split"?


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@939
PS12, Line 939: (old)
"sec split (old)" ?


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/benchmarks/convert-timestamp-benchmark.cc@940
PS12, Line 940: (new)
"day split (new)" ?


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@238
PS12, Line 238: TimestampValue
const TimestampValue&


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@249
PS12, Line 249: 10
I could be mistaken, but shouldn't this diff be 0 or 1?

'expected' is set to 'from_millis' in L267 and has millisec precision.

'from_double's time member is set to milliseconds + sub-nanos (<1 nano).

So when they are both converted to microseconds their diff should be 0 (or 
maybe 1), right?


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@290
PS12, Line 290:  EXPECT_EQ(result, TestFromSubSecondFunctionsInner(
  :   seconds + sign * offset, millis - 1000 * sign * 
offset));
EXPECT_EQ() is not really necessary here. It checks that
(seconds*1000 + millis) calculated on L258 is the same as
((seconds + sign * offset)*1000 + (millis - 1000 * sign * offset))
which is always true.

I would change TestFromSubSecondFunctionsInner() to return void and just call 
TestFromSubSecondFunctionsInner() here in a loop without EXPECT_EQ().

Calculations in L257-259 can be moved to L284 and 'result' can be passed to 
TestFromSubSecondFunctionsInner() as a parameter.


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@295
PS12, Line 295:  if (expected == nullptr) EXPECT_FALSE(result.HasDate());
  :   else EXPECT_EQ(expected, result.ToString());
I would also move these lines after L284 for readability.


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-test.cc@884
PS12, Line 884:   // A few test with sub-millisec precision.
  :   auto tz = TimezoneDatabase::GetUtcTimezone();
  :   EXPECT_EQ("1970-01-01 00:00:00.01000",
  :   TimestampValue::UtcFromUnixTimeMicros(1).ToString());
  :   EXPECT_EQ("1969-12-31 23:59:59.99000",
  :   TimestampValue::UtcFromUnixTimeMicros(-1).ToString());
  :
  :   EXPECT_EQ("1970-01-01 00:00:00.01000",
  :   TimestampValue::FromUnixTimeMicros(1, tz).ToString());
  :   EXPECT_EQ("1969-12-31 23:59:59.99000",
  :   TimestampValue::FromUnixTimeMicros(-1, tz).ToString());
  :
  :   EXPECT_EQ("1970-01-01 00:00:00.1",
  :   TimestampValue::FromUnixTimeNanos(0, 1, tz).ToString());
  :   EXPECT_EQ("1969-12-31 23:59:59.9",
  :   TimestampValue::FromUnixTimeNanos(0, -1, tz).ToString());
Thanks for adding these.


http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/11183/12/be/src/runtime/timestamp-value.h@315
PS12, Line 315: GRANURALITY
Typo: GRANULARITY.



[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-28 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 12:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/11183/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11183/11//COMMIT_MSG@10
PS11, Line 10: (which is split do date_ and time_ similarly to
> nit: please wrap at 70 characters, here and below.
Done


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@578
PS11, Line 578: boost::posix_time::nanoseconds(t
> It probably wouldn't make a difference but maybe you should call boost::pos
Done


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@631
PS11, Line 631: cctz_optimized_unix_time_to_utc_ptime(unix_time_whole);
> The original implementation had:
That constant was not accessible from here and I thought that the results would 
be the same, so why not use NANOS_PER_SEC. It turned that I was wrong and the 
result is slightly different in some cases (+- 1 nanosec in analytic average 
tests).


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@635
PS11, Line 635:
  : TimestampValue from_subsecond_unix_time_new(
  : const double& unix_time) {
  :   const double ONE_BILLIONTH = 0.1;
  :   int64_t unix_time_whole = unix_time;
  :   int64_t nanos = (unix_time - unix_time_whole) / ONE_BILLIONTH;
  :
> The implementation of TimestampValue::FromSubsecondUnixTime() was changed i
Done


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@923
PS11, Line 923:
  :   // Benchmark FromSubsecondUnixTime before and after 
IMPALA-7417.
> Are the rounding rules still different?
No, it should be the same in patch set 12.


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@932
PS11, Line 932:
> This benchmark and the previous bm_utc_from_unix_time_nanos benchmark measu
I added the test originally to check if the rounding has added any regression. 
As the rounding was replaced with truncation this is no longer relevant. I 
would still keep the test because the results are different (5.57 vs 3.97 
improvement) and the the benchmark turned out to be useful for catching 
correctness issues.

So I would prefer to keep the test but I can remove it if you want to keep this 
file more compact.


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@235
PS11, Line 235: The fraction part of 'nanos' can e
> You mean "'nanos' can express sub-nanoseconds"?
I hope that it is clearer with the new comment.


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@240
PS11, Line 240: ue::
> Maybe use MILLIS_PER_SEC instead?
My impression is that 1000 is more readable than MILLIS_PER_SEC, while 
NANOS_PER_SEC is more readable than 1000*1000*1000 (and 10 is by far 
the worst), so this inconsistency is intentional.


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@282
PS11, Line 282: p
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@280
PS11, Line 280: / Checks that all sub-second From*UnixTime gives the same 
result and that the result
  : // is the same as 'expected'.
  : // If 'expected' is nullptr then the result expected to be 
invalid (out of range).
  : void TestFromSubSecondFunctions(int64_t seconds, int millis, 
const cha
> Why do we need this extra test? And how is this 'opposite sign'? Please cla
I have extended the test to check the same timestamp with different seconds / 
millis, so that > 1000 millis are also tested.


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@284
PS11, Line 284: Ti
> nit: missing space after 'if'
Done


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@848
PS11, Line 848:
> How about adding some extra tests for the micros/nanos unix time conversion
Done


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h@189
PS11, Line 189:  !time.is_negative()
> Instead this, you could use !time.is_negative()
Done


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h@190
PS11, Line 190: time.total_nanoseconds() < 

[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/488/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 27 Aug 2018 16:51:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-27 Thread Csaba Ringhofer (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11183

to look at the new patch set (#12).

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..

IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

Impala used to convert from sub-second unix time to TimestampValue
(which is split do date_ and time_ similarly to
boost::posix_time::ptime) by first splitting the input into seconds
and sub-seconds part,
converting the seconds part with boost::posix_time::from_time_t(),
and then adding the sub-seconds part to this timestamp.

Different tricks are used to speed up different functions:
- UTC functions that expect a single integer as input can
  split it into date_ and time_ directly.
- Non-UTC functions need seconds for timezone conversion,
  because CCTZ expects time points as seconds. These
  were optimized by adding the subsecond part to time_
  instead of adding it to a ptime. This can be done safely
  because the sub-second part is between [0, 1 sec), so
  it cannot overflow into a different day or timezone.

The main motivation is IMPALA-5050: "Add support to read
TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet
scanner" - reading these types will run
micro/milli->TimestampValue conversion for every row.

Other changes:
- TimestampValue::UtcFromUnixTimeMillis was added - currently this
  is only used in tests but it will be useful for IMPALA-5050
- Some functions were moved from .h to .inline.h.
- FromUnixTimeMicros was changed to do the utc->local conversion
  depending on flag use_local_tz_for_unix_timestamp_conversions
  to be consistent with other similar functions. This function was
  only used in tests until now but it will be useful for IMPALA-5050.
- When a result mismatch is detected in
  convert-timestamp-benchmark.cc it now prints non-equal values.
- Benchmarks were added for micro + nano conversions.
  Note that only single threaded benchmarks were added because I
  do not expect any difference in thw multi threaded case.
- DCHECKs were added to TimeStampValue::Validate to
  ensure that time_ is between [0, 24 hour).

Testing:
- timestamp-test.cc was extended to give better coverage
  for sub-second conversions. Edge cases were already
  covered pretty well.

Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
9 files changed, 399 insertions(+), 106 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/12
--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-24 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 11:

(15 comments)

Thanks! Another batch of comments:

http://gerrit.cloudera.org:8080/#/c/11183/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11183/11//COMMIT_MSG@10
PS11, Line 10: (which is split do date_ and time_ similarly to 
boost::posix_time::ptime)
nit: please wrap at 70 characters, here and below.


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@578
PS11, Line 578: boost::posix_time::seconds(time)
It probably wouldn't make a difference but maybe you should call 
boost::posix_time::nanoseconds(time*NANOS_PER_SEC) here to be consistent with 
TimestampValue::UtcFromUnixTimeTicks().


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@631
PS11, Line 631: temp += boost::posix_time::nanoseconds((unix_time - 
unix_time_whole) * NANOS_PER_SEC);
The original implementation had:

temp += boost::posix_time::nanoseconds((unix_time - unix_time_whole) / 
ONE_BILLIONTH);

Any reason you changed it?


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@635
PS11, Line 635: TimestampValue from_subsecond_unix_time_new(
  : const double& unix_time) {
  :   int64_t unix_time_whole = round(unix_time);
  :   int64_t nanos = round((unix_time - 
unix_time_whole)*NANOS_PER_SEC);
  :   return TimestampValue::FromUnixTimeNanos(
  :   unix_time_whole, nanos, 
TimezoneDatabase::GetUtcTimezone());
  : }
The implementation of TimestampValue::FromSubsecondUnixTime() was changed in 
patch-set #11. Please change the implementation here too.


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@923
PS11, Line 923: // Note that the test data contains only whole seconds. The 
results could be slightly
  :   // different for sub-second times due to the different 
rounding rules.
Are the rounding rules still different?


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/benchmarks/convert-timestamp-benchmark.cc@932
PS11, Line 932: Benchmark from_subsecond_unix_time("FromSubsecondUnixTime");
This benchmark and the previous bm_utc_from_unix_time_nanos benchmark measures 
very similar code. Do we need both?


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@235
PS11, Line 235: Double can express sub-nanoseconds
You mean "'nanos' can express sub-nanoseconds"?

I'm not really sure what is the role of 'nanos' in this test. Could you clarify 
in the comment?


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@240
PS11, Line 240: 1000
Maybe use MILLIS_PER_SEC instead?


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@282
PS11, Line 282:
nit: extra space


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@280
PS11, Line 280:  // Try the same timestamp with opposite sign.
  :   seconds += millis < 0 ? -1 : 1;
  :   millis += millis < 0 ?  1000 : -1000;
  :   EXPECT_EQ(result, TestFromSubSecondFunctionsInner(seconds, 
millis));
Why do we need this extra test? And how is this 'opposite sign'? Please clarify 
in the comment.


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@284
PS11, Line 284: if
nit: missing space after 'if'


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-test.cc@848
PS11, Line 848: Test subsecond unix time conversion for non edge cases.
How about adding some extra tests for the micros/nanos unix time conversion 
functions to make sure that micros/nanos precision is handled correctly.


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h@189
PS11, Line 189:  time.total_nanoseconds() >= 0
Instead this, you could use !time.is_negative()


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h@190
PS11, Line 190: time.total_nanoseconds() < NANOS_PER_DAY
maybe use (time.hours() < 24) ?


http://gerrit.cloudera.org:8080/#/c/11183/11/be/src/runtime/timestamp-value.h@328
PS11, Line 328: unix_time
unix_time_ticks



--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF

[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 11: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 24 Aug 2018 12:22:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-24 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 11:

The last build failure was caused by a hung test that seemed unrelated. I have 
created a jira about it: IMPALA-7485
"test_spilling_naaj hung on jenkins"


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 24 Aug 2018 10:09:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3080/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 24 Aug 2018 09:01:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 11: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3077/


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 24 Aug 2018 05:18:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/467/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 23 Aug 2018 20:59:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-23 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 11:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/11183/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11183/9//COMMIT_MSG@35
PS9, Line 35: This function was
:   only used in tests until
> Why? Is it used for testing only? If so, please  add a sentence explaining
Done


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/benchmarks/convert-timestamp-benchmark.cc@258
PS9, Line 258:  <
> nit: space is missing before '<<' operator
Done


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/benchmarks/convert-timestamp-benchmark.cc@561
PS9, Line 561: GRANULARITY
> typo: GRANULARITY.
Done


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/benchmarks/convert-timestamp-benchmark.cc@574
PS9, Line 574: Sp
> nit: remove extra spaces.
Done


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/exec/hdfs-orc-scanner.cc@27
PS9, Line 27: #include "runtime/timestamp-value.inline.h"
: #include "runtime/tuple-row.h"
> nit: order alphabetically.
Done


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@229
PS9, Line 229: UtcToLoc
> You should validate the time value here (call Validate() or whatever functi
Done


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@295
PS9, Line 295:
> Fix the comment or move the time-validation logic below to a separate funct
Done


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@298
PS9, Line 298: ti
> else if
Done


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@303
PS9, Line 303: Time's validity is only checked in debug builds.
 :   /// TODO: This could be also checked in release, but I a
> Maybe instead of DCHECK calls, you should call SetToInvalidDateTime() here?
I am unsure about the future of Validate() - with the exception of cases when 
the timestamp is created with memcpy from an unchecked source (e.g. Parquet 
reading), it may be better to use a DCHECK instead of SetToInvalidDateTime() - 
generally the caller logic should ensure that only valid arguments are used.

I have moved the check to a separate function.


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@314
PS9, Line 314:
> typo: GRANULARITY
Done


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@327
PS9, Line 327:
> I think, this should be called 'unix_time_ticks'
Done


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@327
PS9, Line 327:
> Maybe this should be called UtcFromUnixTimeTicks().
Done


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.cc@153
PS9, Line 153: TimestampValue TimestampValue::UnixTimeToLocal(
 : time_t unix_time, const Timezone& local_tz) {
> Fix comment.
I have fixed the comment in .h and removed the comment from .cc.


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.cc@168
PS9, Line 168: T
> nit: insert empty line after }
Done


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.cc@169
PS9, Line 169:
> nit: missing space between ) and {
Done


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.inline.h
File be/src/runtime/timestamp-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.inline.h@34
PS9, Line 34: 64_t unix
> should be called 'unix_time_ticks'
Done


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.inline.h@34
PS9, Line 34: UtcFromUnixTime
> should be called UtcFromUnixTimeTicks().
Done


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.inline.h@61
PS9, Line 61: unix_
> Is calling round() here and below intentional? The previous version of From
I have removed the rounds because it lead to slightly (+-1 nanosec) different 
results in a test. I think that using round() leads more to precise results but 
I am trying to avoid adding functional changes in this commit. (It would be 
probably the best to remove direct double<->timestamp conversion and replace it 
with decimal<->timestamp, see IMPALA-7472.



--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit 

[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3077/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 23 Aug 2018 20:28:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-23 Thread Csaba Ringhofer (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11183

to look at the new patch set (#11).

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..

IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

Impala used to convert from sub-second unix time to TimestampValue
(which is split do date_ and time_ similarly to boost::posix_time::ptime)
by first splitting the input into seconds and sub-seconds part,
converting the seconds part with boost::posix_time::from_time_t(), and
then adding the sub-seconds part to this timestamp.

Different tricks are used to speed up different functions:
- UTC functions that expect a single integer as input can
  split it into date_ and time_ directly.
- Non-UTC functions need seconds for timezone conversion,
  because CCTZ expects time points as seconds. These
  were optimized by adding the subsecond part to time_
  instead of adding it to a ptime. This can be done safely
  because the sub-second part is between [0, 1 sec), so
  it cannot overflow into a different day or timezone.

The main motivation is IMPALA-5050: "Add support to read TIMESTAMP_MILLIS
and TIMESTAMP_MICROS to the parquet scanner" - reading these types
will run micro/milli->TimestampValue conversion for every row.

Other changes:
- TimestampValue::UtcFromUnixTimeMillis was added - currently this
  is only used in tests but it will be useful for IMPALA-5050
- Some functions were moved from .h to .inline.h.
- FromUnixTimeMicros was changed to do the utc->local conversion
  depending on flag use_local_tz_for_unix_timestamp_conversions
  to be consistent with other similar functions. This function was
  only used in tests until now but it will be useful for IMPALA-5050.
- When a result mismatch is detected in convert-timestamp-benchmark.cc
  it now prints non-equal values.
- Benchmarks were added for micro + nano conversions.
  Note that only single threaded benchmarks were added because I
  do not expect any difference in thw multi threaded case.
- DCHECKs were added to TimeStampValue::Validate to
  ensure that time_ is between [0, 24 hour).

Testing:
- timestamp-test.cc was extended to give better coverage
  for sub-second conversions. Edge cases were already
  covered pretty well.

Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
9 files changed, 369 insertions(+), 106 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/11
--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 10: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3067/


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 22 Aug 2018 17:58:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-22 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 9:

(18 comments)

Did a quick first-round review. I will look at the tests the next time.

http://gerrit.cloudera.org:8080/#/c/11183/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11183/9//COMMIT_MSG@35
PS9, Line 35: This should have
:   no visible side-effects.
Why? Is it used for testing only? If so, please  add a sentence explaining it.


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/benchmarks/convert-timestamp-benchmark.cc@258
PS9, Line 258: <<
nit: space is missing before '<<' operator


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/benchmarks/convert-timestamp-benchmark.cc@561
PS9, Line 561: GRANURALITY
typo: GRANULARITY.


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/benchmarks/convert-timestamp-benchmark.cc@574
PS9, Line 574:
nit: remove extra spaces.


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/exec/hdfs-orc-scanner.cc@27
PS9, Line 27: #include "runtime/tuple-row.h"
: #include "runtime/timestamp-value.inline.h"
nit: order alphabetically.


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@229
PS9, Line 229: set_time
You should validate the time value here (call Validate() or whatever function 
time-validation ends up in)


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@295
PS9, Line 295: /// Sets both date and time to invalid if date is outside the 
valid range.
Fix the comment or move the time-validation logic below to a separate function.


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@298
PS9, Line 298: if
else if


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@303
PS9, Line 303: DCHECK_GE(time_.total_nanoseconds(), 0);
 :   DCHECK_LT(time_.total_nanoseconds(), NANOS_PER_DAY);
Maybe instead of DCHECK calls, you should call SetToInvalidDateTime() here?

Again, it might make sense to move time-validation code to a separate function.


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@314
PS9, Line 314: GRANURALITY
typo: GRANULARITY


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@327
PS9, Line 327: unix_time
I think, this should be called 'unix_time_ticks'


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.h@327
PS9, Line 327: UtcFromUnixTime
Maybe this should be called UtcFromUnixTimeTicks().


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.cc@153
PS9, Line 153: /// Return a ptime representation of the given Unix time 
(seconds since the Unix epoch).
 : /// The time zone of the resulting ptime is 'local_tz'. This is 
called by UnixTimeToPtime.
Fix comment.


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.cc@168
PS9, Line 168: }
nit: insert empty line after }


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.cc@169
PS9, Line 169: ){
nit: missing space between ) and {


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.inline.h
File be/src/runtime/timestamp-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.inline.h@34
PS9, Line 34: UtcFromUnixTime
should be called UtcFromUnixTimeTicks().


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.inline.h@34
PS9, Line 34: unix_time
should be called 'unix_time_ticks'


http://gerrit.cloudera.org:8080/#/c/11183/9/be/src/runtime/timestamp-value.inline.h@61
PS9, Line 61: round
Is calling round() here and below intentional? The previous version of 
FromSubsecondUnixTime() just casts unix_time to int64_t.

Is this change addressed in the commit message?



--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 22 Aug 2018 16:23:17 +

[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-22 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 10:

The test has hit IMPALA-7459 - the commit that causes it was reverted, so a 
rebase should help.


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 22 Aug 2018 14:43:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3067/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 22 Aug 2018 14:41:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 9: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3066/


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 22 Aug 2018 14:01:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/447/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 22 Aug 2018 11:22:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-22 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 9: -Code-Review

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11183/8/be/src/runtime/timestamp-value.inline.h
File be/src/runtime/timestamp-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/11183/8/be/src/runtime/timestamp-value.inline.h@67
PS8, Line 67: const Timezone& local_tz) {
The test failure in patch set 8 was caused by hitting this DCHECK for very 
small (<<1 nanosec) negative unix times. floor() truncated these values to -1, 
but unix_time - unix_time_whole was 1.0 (instead of 1-very_small_fractional 
_number) due the to the decreased preciison at 1.0 compared to 0.0. This would 
also be a problem when adding the sub-second value to time_, because adding a 
whole sec can step to a different day/timezone rule.

The solution was to use rounding to get seconds, calculate nanoseconds from 
unix_time-unix_time_whole without any expectations, and give the results to 
FromUnixTimeNanos() , which can handle negative or >=1 sec values. This is 
probably a bit sub-optimal, but I do not want to put too much effort to this 
function. A benchmark was added that shows that this function is still faster 
than it used to be.



--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 22 Aug 2018 11:12:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3066/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 22 Aug 2018 10:49:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-22 Thread Csaba Ringhofer (Code Review)
Hello Attila Jeges, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11183

to look at the new patch set (#9).

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..

IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

Impala used to convert from sub-second unix time to TimestampValue
(which is split do date_ and time_ similarly to boost::posix_time::ptime)
by first splitting the input into seconds and sub-seconds part,
converting the seconds part with boost::posix_time::from_time_t(), and
then adding the sub-seconds part to this timestamp.

Different tricks are used to speed up different functions:
- UTC functions that expect a single integer as input can
  split it into date_ and time_ directly.
- Non-UTC functions need seconds for timezone conversion,
  because CCTZ expects time points as seconds. These
  were optimized by adding the subsecond part to time_
  instead of adding it to a ptime. This can be done safely
  because the sub-second part is between [0, 1 sec), so
  it cannot overflow into a different day or timezone.

The main motivation is IMPALA-5050: "Add support to read TIMESTAMP_MILLIS
and TIMESTAMP_MICROS to the parquet scanner" - reading these types
will run micro/milli->TimestampValue conversion for every row.

Other changes:
- TimestampValue::UtcFromUnixTimeMillis was added - currently this
  is only used in tests but it will be useful for IMPALA-5050
- Some functions were moved from .h to .inline.h.
- FromUnixTimeMicros was changed to do the utc->local conversion
  depending on flag use_local_tz_for_unix_timestamp_conversions
  to be consistent with other similar functions. This should have
  no visible side-effects.
- When a result mismatch is detected in convert-timestamp-benchmark.cc
  it now prints non-equal values.
- Benchmarks were added for micro + nano conversions.
  Note that only single threaded benchmarks were added because I
  do not expect any difference in thw multi threaded case.
- DCHECKs were added to TimeStampValue::Validate to
  ensure that time_ is between [0, 24 hour).

Testing:
- timestamp-test.cc was extended to give better coverage
  for sub-second conversions. Edge cases were already
  covered pretty well.

Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
9 files changed, 365 insertions(+), 104 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/9
--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-21 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 8: Code-Review-2

The test failure is a bit weird: expr-test crashes at 
"TestCast("cast(-2.2250738585072020E-308 as double)", 
-2.2250738585072020e-308);", which should not be affected by this change. The 
crash occurs reliably both on my desktop and on jenkins.


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 21 Aug 2018 11:59:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3029/


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 17 Aug 2018 19:34:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 7:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/389/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 17 Aug 2018 17:09:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 6:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/388/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 17 Aug 2018 17:04:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-17 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 8:

Sorry for the large number of patches, I had a hard time fixing compilation 
errors (one of them only occurred in clean release builds, so not in my dirty 
local release build...)


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 17 Aug 2018 16:46:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11183 )

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3029/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 17 Aug 2018 16:36:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-17 Thread Csaba Ringhofer (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11183

to look at the new patch set (#7).

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..

IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

Impala used to convert from sub-second unix time to TimestampValue
(which is split do date_ and time_ similarly to boost::posix_time::ptime)
by first splitting the input into seconds and sub-seconds part,
converting the seconds part with boost::posix_time::from_time_t(), and
then adding the sub-seconds part to this timestamp.

Different tricks are used to speed up different functions:
- UTC functions that expect a single integer as input can
  split it into date_ and time_ directly.
- Non-UTC functions need seconds for timezone conversion,
  because CCTZ expects time points as seconds. These
  were optimized by adding the subsecond part to time_
  instead of adding it to a ptime. This can be done safely
  because the sub-second part is between [0, 1 sec), so
  it cannot overflow into a different day or timezone.

The main motivation is IMPALA-5050: "Add support to read TIMESTAMP_MILLIS
and TIMESTAMP_MICROS to the parquet scanner" - reading these types
will run micro/milli->TimestampValue conversion for every row.

Other changes:
- Some functions were moved from .h to .inline.h.
- FromUnixTimeMicros was changed to do the utc->local conversion
  depending on flag use_local_tz_for_unix_timestamp_conversions
  to be consistent with other similar functions. This should have
  no visible side-effects.
- When a result mismatch is detected in convert-timestamp-benchmark.cc
  it now prints non-equal values.
- Benchmarks were added for micro + nano conversions.
- DCHECKs were added to TimeStampValue::Validate to
  ensure that time_ is between [0, 24 hour).

Testing:
- timestamp-test.cc was extended to give better coverage
  for sub-second conversions. Edge cases were already
  covered pretty well.

Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
9 files changed, 312 insertions(+), 104 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/7
--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

2018-08-17 Thread Csaba Ringhofer (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11183

to look at the new patch set (#6).

Change subject: IMPALA-7417: Speed up sub-second unix time->TimestampValue 
conversions
..

IMPALA-7417: Speed up sub-second unix time->TimestampValue conversions

Impala used to convert from sub-second unix time to TimestampValue
(which is split do date_ and time_ similarly to boost::posix_time::ptime)
by first splitting the input into seconds and sub-seconds part,
converting the seconds part with boost::posix_time::from_time_t(), and
then adding the sub-seconds part to this timestamp.

Different tricks are used to speed up different functions:
- UTC functions that expect a single integer as input can
  split it into date_ and time_ directly.
- Non-UTC functions need seconds for timezone conversion,
  because CCTZ expects time points as seconds. These
  were optimized by adding the subsecond part to time_
  instead of adding it to a ptime. This can be done safely
  because the sub-second part is between [0, 1 sec), so
  it cannot overflow into a different day or timezone.

The main motivation is IMPALA-5050: "Add support to read TIMESTAMP_MILLIS
and TIMESTAMP_MICROS to the parquet scanner" - reading these types
will run micro/milli->TimestampValue conversion for every row.

Other changes:
- Some functions were moved from .h to .inline.h.
- FromUnixTimeMicros was changed to do the utc->local conversion
  depending on flag use_local_tz_for_unix_timestamp_conversions
  to be consistent with other similar functions. This should have
  no visible side-effects.
- When a result mismatch is detected in convert-timestamp-benchmark.cc
  it now prints non-equal values.
- Benchmarks were added for micro + nano conversions.
- DCHECKs were added to TimeStampValue::Validate to
  ensure that time_ is between [0, 24 hour).

Testing:
- timestamp-test.cc was extended to give better coverage
  for sub-second conversions. Edge cases were already
  covered pretty well.

Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
9 files changed, 304 insertions(+), 104 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11183/6
--
To view, visit http://gerrit.cloudera.org:8080/11183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Gerrit-Change-Number: 11183
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins