[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..

IMPALA-5014: Part 1: Round when casting string to decimal

In this patch we implement rounding when casting string to decimal if
DECIMAL_V2 is enabled. The backend method that parses strings and
converts them to decimals is refactored to make it easier to understand.

Testing:
- Added some BE tests.

Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Reviewed-on: http://gerrit.cloudera.org:8080/8774
Reviewed-by: Taras Bobrovytsky 
Tested-by: Impala Public Jenkins
---
M be/src/benchmarks/atod-benchmark.cc
M be/src/benchmarks/overflow-benchmark.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/decimal-util.h
M be/src/util/string-parser.cc
M be/src/util/string-parser.h
M tests/query_test/test_decimal_casting.py
12 files changed, 245 insertions(+), 104 deletions(-)

Approvals:
  Taras Bobrovytsky: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 5
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 4
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 22 Dec 2017 11:39:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-21 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 4: Code-Review+2

Fixed the issue with the test. The test did not expected there to be rounding 
when converting string to decimal, that's why it was failing. Forwarding the +2.


--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 4
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 22 Dec 2017 07:57:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1648/


--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 4
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 22 Dec 2017 07:58:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 3: Code-Review-1

Looks like there's a Python test we need to update...

03:52:25 ]  TestDecimalCasting.test_underflow[cast_from: string | decimal_type: 
(6, 3) | exec_option: {'decimal_v2': 'true'}]
03:52:25 ] [gw0] linux2 -- Python 2.7.12 
/home/ubuntu/Impala/bin/../infra/python/env/bin/python
03:52:25 ] query_test/test_decimal_casting.py:166: in test_underflow
03:52:25 ] self._assert_decimal_result(cast, res, expected_val)
03:52:25 ] query_test/test_decimal_casting.py:80: in _assert_decimal_result
03:52:25 ] assert expected == actual, "Cast: {0}, Expected: {1}, Actual: 
{2}".format(cast,\
03:52:25 ] E   AssertionError: Cast: select cast('-85.0817' as Decimal(6,3)), 
Expected: -85.081, Actual: -85.082
03:52:25 ] E   assert Decimal('-85.081') == Decimal('-85.082')


--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 22 Dec 2017 05:54:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1645/


--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 22 Dec 2017 03:52:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1645/


--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 22 Dec 2017 00:15:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-21 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 3:

(2 comments)

Thanks for the review, Zach!

http://gerrit.cloudera.org:8080/#/c/8774/3/be/src/exec/hdfs-scanner-ir.cc
File be/src/exec/hdfs-scanner-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8774/3/be/src/exec/hdfs-scanner-ir.cc@145
PS3, Line 145:   auto ret = StringToDecimal4(s, len, type_precision, 
type_scale, false, result);
> Will we ever want to implement rounding in the scanner?
The plan is to not implement rounding in the scanner because it should never be 
necessary. We do not expect decimals in text files to have more digits after 
the dot than the scale of the decimal data type. If this is detected (which is 
an underflow), the scanner errors out.


http://gerrit.cloudera.org:8080/#/c/8774/3/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/8774/3/be/src/util/string-parser.h@243
PS3, Line 243: // There are a maximum number of digits to the left of 
the dot. We round by
> I finally understand this comment.
Nice :)



--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 22 Dec 2017 00:14:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 22 Dec 2017 00:10:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 3:

(2 comments)

Let me give the math one last check..

http://gerrit.cloudera.org:8080/#/c/8774/3/be/src/exec/hdfs-scanner-ir.cc
File be/src/exec/hdfs-scanner-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8774/3/be/src/exec/hdfs-scanner-ir.cc@145
PS3, Line 145:   auto ret = StringToDecimal4(s, len, type_precision, 
type_scale, false, result);
Will we ever want to implement rounding in the scanner?


http://gerrit.cloudera.org:8080/#/c/8774/3/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/8774/3/be/src/util/string-parser.h@243
PS3, Line 243: // There are a maximum number of digits to the left of 
the dot. We round by
I finally understand this comment.



--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 22 Dec 2017 00:07:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@191
PS2, Line 191: } else if (UNLIKELY(round && total_digits_count == 
type_precision)) {
> Saving the truncated digit may be costly. The first_truncated_digit, along
I didn't think of that.  Now I see your point.



--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Thu, 21 Dec 2017 21:10:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-21 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@191
PS2, Line 191: } else if (UNLIKELY(round && total_digits_count == 
type_precision)) {
> I just thought it would be simpler to just always save the first truncated
Saving the truncated digit may be costly. The first_truncated_digit, along with 
the check on line 191 should be completely codegened away if round is false.

This function is used in our scanners for parsing decimals in text files. I 
think it would be a waste to make the scanners slower by saving the 
first_truncated_digit needlessly.



--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Thu, 21 Dec 2017 21:05:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@191
PS2, Line 191: } else if (UNLIKELY(round && total_digits_count == 
type_precision)) {
> We are saving the first truncated digit here. We save the truncated digit i
I just thought it would be simpler to just always save the first truncated 
digit - then the DCHECK on 248 is unnecessary.



--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Thu, 21 Dec 2017 20:54:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-15 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/runtime/decimal-test.cc
File be/src/runtime/decimal-test.cc:

http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/runtime/decimal-test.cc@302
PS2, Line 302:   StringToAllDecimals("0.5", 5, 0,
> More interesting test cases:
Done


http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@191
PS2, Line 191: } else if (UNLIKELY(round && total_digits_count == 
type_precision)) {
> This doesn't need to be conditional on round, in fact I think leaving that
We are saving the first truncated digit here. We save the truncated digit in 
order to be able to round if necessary. If round is false, there is no point in 
saving the first truncated digit.

Why do you think it's better for it to not be conditional on round?

Also, which DCHECK are you talking about? the one on line 194?


http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@238
PS2, Line 238: // There are less than maximum number of digits to the 
left of the dot.
> Don't you mean "There are more than the maximum number of digits to the rig
We already know that there are too many digits to the right of the dot at this 
point because of comment on line 232. (which means we will be doing rounding 
for sure if round=true)

We are now checking on line 237 what's going on to the left of the dot. In this 
case, the number looks like this: x. with let's say prec=3, scale=1. If the 
number looked like this xx., the condition on line 237 would be false. (In 
other words, we are checking if the part of the number to the left of the dot 
is "full" or not on line 237).

If it's full, then we need to look at the first truncated digit. Otherwise, we 
can round without having to look at the first truncated digit.


http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@239
PS2, Line 239: value = DecimalUtil::ScaleDownAndRound(value, shift, 
round);
> After thinking on this some more, rounding away from zero is symmetric with
Agreed. No changes made.


http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@243
PS2, Line 243: // There are a maximum number of digits to the left of 
the dot. We attempt
> to the right of the dot?
No, to the left. See comment on line 238.


http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@248
PS2, Line 248: DCHECK(first_truncated_digit == 0 || round);
> I found this DCHECK confusing until I realized saving the truncated digit w
It's still not clear to me why it shouldn't be conditional on round. Look at my 
comment above.


http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@250
PS2, Line 250: value += (first_truncated_digit >= 5);
> This rounds towards positive infinity which isn't consistent with our round
Correct. We negate at the very bottom on line 266. So we are rounding away from 
zero.


http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@262
PS2, Line 262:   DCHECK_EQ(truncated_digit_count, 0);
> It would be nice to have a DCHECK that this can't overflow.  While that is
Done


http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@391
PS2, Line 391:   int digits_after_dot_count, int type_precision, int8_t 
exponent, T* value,
> type_precision is unused in this function
Nice catch, removed.



--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 15 Dec 2017 22:09:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-13 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@239
PS2, Line 239: value = DecimalUtil::ScaleDownAndRound(value, shift, 
round);
> ScaleDownAndRound implements signed rounding to do rounding away from zero,
After thinking on this some more, rounding away from zero is symmetric with 
respect to positive and negative numbers, so it doesn't matter if the number 
value is negated before or after this call.  Either way works correctly.



--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 13 Dec 2017 16:24:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-12 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/runtime/decimal-test.cc
File be/src/runtime/decimal-test.cc:

http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/runtime/decimal-test.cc@302
PS2, Line 302:   StringToAllDecimals("0.5", 5, 0,
More interesting test cases:

0.0 as 6,5
1.0 as 6.5
0.00 as 6,5
1.00 as 6,5


http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@191
PS2, Line 191: } else if (UNLIKELY(round && total_digits_count == 
type_precision)) {
This doesn't need to be conditional on round, in fact I think leaving that 
condition out makes reading the DCHECK below simpler.


http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@238
PS2, Line 238: // There are less than maximum number of digits to the 
left of the dot.
Don't you mean "There are more than the maximum number of digits to the right 
of the dot." ?


http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@239
PS2, Line 239: value = DecimalUtil::ScaleDownAndRound(value, shift, 
round);
ScaleDownAndRound implements signed rounding to do rounding away from zero, so 
value should be negated if is_negative is true.


http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@243
PS2, Line 243: // There are a maximum number of digits to the left of 
the dot. We attempt
to the right of the dot?


http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@248
PS2, Line 248: DCHECK(first_truncated_digit == 0 || round);
I found this DCHECK confusing until I realized saving the truncated digit was 
conditional on round, which I don't think it needs to be.


http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@250
PS2, Line 250: value += (first_truncated_digit >= 5);
This rounds towards positive infinity which isn't consistent with our rounding 
behavior in ScaleDownAndRound, which rounds away from zero.

Edit: Actually, since the value isn't negated yet, this is correct.

Where to actually do the negation appears to be a bit of a conundrum.


http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@262
PS2, Line 262:   DCHECK_EQ(truncated_digit_count, 0);
It would be nice to have a DCHECK that this can't overflow.  While that is true 
because of the structure of the if conditions, there are enough of them to 
invite uncertainty in the readers mind by the time we get here.


http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@391
PS2, Line 391:   int digits_after_dot_count, int type_precision, int8_t 
exponent, T* value,
type_precision is unused in this function



-- 
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 13 Dec 2017 00:35:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 2: Code-Review+1

(1 comment)

Not sure if Zach wants to take a look - I know he implemented some of the 
earlier rounding.

http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@219
PS1, Line 219: iffere
> Unfortunately DCHECK_GE does not work with int128_t. Added comment.
Ah I always forget that.



--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 12 Dec 2017 16:05:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..


Patch Set 1:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/8774/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8774/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-5014: Part 1: Round when casting string to decimal
> What's left for later parts?
There's still decimal -> timestamp. It's not going to be a large patch, but I 
think it's better to separate them to make it easier to review.


http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/exec/hdfs-scanner-ir.cc
File be/src/exec/hdfs-scanner-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/exec/hdfs-scanner-ir.cc@143
PS1, Line 143: Decimal4Value IrStringToDecimal4(const char* s, int len, int 
type_precision,
> Should we also be switching to rounding instead of truncation for text scan
Yes, on line 146, the result can't be PARSE_UNDERFLOW. So there is no point in 
trying to round.


http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@122
PS1, Line 122:   static inline void ApplyExponent(int total_digits_count,
> Should this be private? I don't think it makes sense for anything external
Done


http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@123
PS1, Line 123:   int digits_after_dot_count, int type_precision, int8_t 
exponent, T& value,
> Our convention is to pass by pointer if it's modified.
Done


http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@125
PS1, Line 125:
> nit unnecessary vertical whitespace
Done


http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@127
PS1, Line 127:   // Ex: 0.1e3 (which at this point would have precision == 
1 and scale == 1), the
> Thanks for all the examples, this helped a lot when reviewing it.
Many of them were there already. I just moved this code into a separate 
function to make it easier to reason.


http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@141
PS1, Line 141: if (*precision < *scale) *precision = *scale;
> Can this be moved into the else branch? If *scale is zero, this shouldn't b
Good catch. Moved.

If scale is zero, then precision has to be negative for this to be triggered. 
This should not happen.


http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@204
PS1, Line 204: &
> Remove the reference here? We want value semantics anyway.
Done


http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@219
PS1, Line 219: DCHECK
> DCHECK_GE?
Unfortunately DCHECK_GE does not work with int128_t. Added comment.


http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@268
PS1, Line 268: maximumum
> maximum
Done


http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@620
PS1, Line 620: &
> Related to my other comment, passing a char value by reference is weird sin
Done. Passing by value now.



--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 12 Dec 2017 03:57:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/8774 )

Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..

IMPALA-5014: Part 1: Round when casting string to decimal

In this patch we implement rounding when casting string to decimal if
DECIMAL_V2 is enabled. The backend method that parses strings and
converts them to decimals is refactored to make it easier to understand.

Testing:
- Added some BE tests.

Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
---
M be/src/benchmarks/atod-benchmark.cc
M be/src/benchmarks/overflow-benchmark.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/decimal-util.h
M be/src/util/string-parser.cc
M be/src/util/string-parser.h
11 files changed, 235 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/8774/2
--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal

2017-12-05 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8774


Change subject: IMPALA-5014: Part 1: Round when casting string to decimal
..

IMPALA-5014: Part 1: Round when casting string to decimal

In this patch we implement rounding when casting string to decimal if
DECIMAL_V2 is enabled. The backend method that parses strings and
converts them to decimals is refactored to make it easier to understand.

Testing:
- Added some BE tests.

Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
---
M be/src/benchmarks/atod-benchmark.cc
M be/src/benchmarks/overflow-benchmark.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/decimal-util.h
M be/src/util/string-parser.cc
M be/src/util/string-parser.h
11 files changed, 233 insertions(+), 101 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/8774/1
--
To view, visit http://gerrit.cloudera.org:8080/8774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Gerrit-Change-Number: 8774
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky