[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. IMPALA-5014: Part 1: Round when casting string to decimal In this patch we implement rounding when casting string to decimal if DECIMAL_V2 is enabled. The backend method that parses strings and converts them to decimals is refactored to make it easier to understand. Testing: - Added some BE tests. Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Reviewed-on: http://gerrit.cloudera.org:8080/8774 Reviewed-by: Taras BobrovytskyTested-by: Impala Public Jenkins --- M be/src/benchmarks/atod-benchmark.cc M be/src/benchmarks/overflow-benchmark.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/text-converter.inline.h M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/runtime/decimal-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/decimal-util.h M be/src/util/string-parser.cc M be/src/util/string-parser.h M tests/query_test/test_decimal_casting.py 12 files changed, 245 insertions(+), 104 deletions(-) Approvals: Taras Bobrovytsky: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 5 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 4 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 22 Dec 2017 11:39:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 4: Code-Review+2 Fixed the issue with the test. The test did not expected there to be rounding when converting string to decimal, that's why it was failing. Forwarding the +2. -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 4 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 22 Dec 2017 07:57:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1648/ -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 4 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 22 Dec 2017 07:58:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 3: Code-Review-1 Looks like there's a Python test we need to update... 03:52:25 ] TestDecimalCasting.test_underflow[cast_from: string | decimal_type: (6, 3) | exec_option: {'decimal_v2': 'true'}] 03:52:25 ] [gw0] linux2 -- Python 2.7.12 /home/ubuntu/Impala/bin/../infra/python/env/bin/python 03:52:25 ] query_test/test_decimal_casting.py:166: in test_underflow 03:52:25 ] self._assert_decimal_result(cast, res, expected_val) 03:52:25 ] query_test/test_decimal_casting.py:80: in _assert_decimal_result 03:52:25 ] assert expected == actual, "Cast: {0}, Expected: {1}, Actual: {2}".format(cast,\ 03:52:25 ] E AssertionError: Cast: select cast('-85.0817' as Decimal(6,3)), Expected: -85.081, Actual: -85.082 03:52:25 ] E assert Decimal('-85.081') == Decimal('-85.082') -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 22 Dec 2017 05:54:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 3: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1645/ -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 22 Dec 2017 03:52:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1645/ -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 22 Dec 2017 00:15:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 3: (2 comments) Thanks for the review, Zach! http://gerrit.cloudera.org:8080/#/c/8774/3/be/src/exec/hdfs-scanner-ir.cc File be/src/exec/hdfs-scanner-ir.cc: http://gerrit.cloudera.org:8080/#/c/8774/3/be/src/exec/hdfs-scanner-ir.cc@145 PS3, Line 145: auto ret = StringToDecimal4(s, len, type_precision, type_scale, false, result); > Will we ever want to implement rounding in the scanner? The plan is to not implement rounding in the scanner because it should never be necessary. We do not expect decimals in text files to have more digits after the dot than the scale of the decimal data type. If this is detected (which is an underflow), the scanner errors out. http://gerrit.cloudera.org:8080/#/c/8774/3/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/8774/3/be/src/util/string-parser.h@243 PS3, Line 243: // There are a maximum number of digits to the left of the dot. We round by > I finally understand this comment. Nice :) -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 22 Dec 2017 00:14:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 22 Dec 2017 00:10:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 3: (2 comments) Let me give the math one last check.. http://gerrit.cloudera.org:8080/#/c/8774/3/be/src/exec/hdfs-scanner-ir.cc File be/src/exec/hdfs-scanner-ir.cc: http://gerrit.cloudera.org:8080/#/c/8774/3/be/src/exec/hdfs-scanner-ir.cc@145 PS3, Line 145: auto ret = StringToDecimal4(s, len, type_precision, type_scale, false, result); Will we ever want to implement rounding in the scanner? http://gerrit.cloudera.org:8080/#/c/8774/3/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/8774/3/be/src/util/string-parser.h@243 PS3, Line 243: // There are a maximum number of digits to the left of the dot. We round by I finally understand this comment. -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 22 Dec 2017 00:07:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@191 PS2, Line 191: } else if (UNLIKELY(round && total_digits_count == type_precision)) { > Saving the truncated digit may be costly. The first_truncated_digit, along I didn't think of that. Now I see your point. -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Thu, 21 Dec 2017 21:10:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@191 PS2, Line 191: } else if (UNLIKELY(round && total_digits_count == type_precision)) { > I just thought it would be simpler to just always save the first truncated Saving the truncated digit may be costly. The first_truncated_digit, along with the check on line 191 should be completely codegened away if round is false. This function is used in our scanners for parsing decimals in text files. I think it would be a waste to make the scanners slower by saving the first_truncated_digit needlessly. -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Thu, 21 Dec 2017 21:05:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@191 PS2, Line 191: } else if (UNLIKELY(round && total_digits_count == type_precision)) { > We are saving the first truncated digit here. We save the truncated digit i I just thought it would be simpler to just always save the first truncated digit - then the DCHECK on 248 is unnecessary. -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Thu, 21 Dec 2017 20:54:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/runtime/decimal-test.cc File be/src/runtime/decimal-test.cc: http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/runtime/decimal-test.cc@302 PS2, Line 302: StringToAllDecimals("0.5", 5, 0, > More interesting test cases: Done http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@191 PS2, Line 191: } else if (UNLIKELY(round && total_digits_count == type_precision)) { > This doesn't need to be conditional on round, in fact I think leaving that We are saving the first truncated digit here. We save the truncated digit in order to be able to round if necessary. If round is false, there is no point in saving the first truncated digit. Why do you think it's better for it to not be conditional on round? Also, which DCHECK are you talking about? the one on line 194? http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@238 PS2, Line 238: // There are less than maximum number of digits to the left of the dot. > Don't you mean "There are more than the maximum number of digits to the rig We already know that there are too many digits to the right of the dot at this point because of comment on line 232. (which means we will be doing rounding for sure if round=true) We are now checking on line 237 what's going on to the left of the dot. In this case, the number looks like this: x. with let's say prec=3, scale=1. If the number looked like this xx., the condition on line 237 would be false. (In other words, we are checking if the part of the number to the left of the dot is "full" or not on line 237). If it's full, then we need to look at the first truncated digit. Otherwise, we can round without having to look at the first truncated digit. http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@239 PS2, Line 239: value = DecimalUtil::ScaleDownAndRound(value, shift, round); > After thinking on this some more, rounding away from zero is symmetric with Agreed. No changes made. http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@243 PS2, Line 243: // There are a maximum number of digits to the left of the dot. We attempt > to the right of the dot? No, to the left. See comment on line 238. http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@248 PS2, Line 248: DCHECK(first_truncated_digit == 0 || round); > I found this DCHECK confusing until I realized saving the truncated digit w It's still not clear to me why it shouldn't be conditional on round. Look at my comment above. http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@250 PS2, Line 250: value += (first_truncated_digit >= 5); > This rounds towards positive infinity which isn't consistent with our round Correct. We negate at the very bottom on line 266. So we are rounding away from zero. http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@262 PS2, Line 262: DCHECK_EQ(truncated_digit_count, 0); > It would be nice to have a DCHECK that this can't overflow. While that is Done http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@391 PS2, Line 391: int digits_after_dot_count, int type_precision, int8_t exponent, T* value, > type_precision is unused in this function Nice catch, removed. -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 15 Dec 2017 22:09:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@239 PS2, Line 239: value = DecimalUtil::ScaleDownAndRound(value, shift, round); > ScaleDownAndRound implements signed rounding to do rounding away from zero, After thinking on this some more, rounding away from zero is symmetric with respect to positive and negative numbers, so it doesn't matter if the number value is negated before or after this call. Either way works correctly. -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 13 Dec 2017 16:24:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/runtime/decimal-test.cc File be/src/runtime/decimal-test.cc: http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/runtime/decimal-test.cc@302 PS2, Line 302: StringToAllDecimals("0.5", 5, 0, More interesting test cases: 0.0 as 6,5 1.0 as 6.5 0.00 as 6,5 1.00 as 6,5 http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@191 PS2, Line 191: } else if (UNLIKELY(round && total_digits_count == type_precision)) { This doesn't need to be conditional on round, in fact I think leaving that condition out makes reading the DCHECK below simpler. http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@238 PS2, Line 238: // There are less than maximum number of digits to the left of the dot. Don't you mean "There are more than the maximum number of digits to the right of the dot." ? http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@239 PS2, Line 239: value = DecimalUtil::ScaleDownAndRound(value, shift, round); ScaleDownAndRound implements signed rounding to do rounding away from zero, so value should be negated if is_negative is true. http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@243 PS2, Line 243: // There are a maximum number of digits to the left of the dot. We attempt to the right of the dot? http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@248 PS2, Line 248: DCHECK(first_truncated_digit == 0 || round); I found this DCHECK confusing until I realized saving the truncated digit was conditional on round, which I don't think it needs to be. http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@250 PS2, Line 250: value += (first_truncated_digit >= 5); This rounds towards positive infinity which isn't consistent with our rounding behavior in ScaleDownAndRound, which rounds away from zero. Edit: Actually, since the value isn't negated yet, this is correct. Where to actually do the negation appears to be a bit of a conundrum. http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@262 PS2, Line 262: DCHECK_EQ(truncated_digit_count, 0); It would be nice to have a DCHECK that this can't overflow. While that is true because of the structure of the if conditions, there are enough of them to invite uncertainty in the readers mind by the time we get here. http://gerrit.cloudera.org:8080/#/c/8774/2/be/src/util/string-parser.h@391 PS2, Line 391: int digits_after_dot_count, int type_precision, int8_t exponent, T* value, type_precision is unused in this function -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 13 Dec 2017 00:35:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 2: Code-Review+1 (1 comment) Not sure if Zach wants to take a look - I know he implemented some of the earlier rounding. http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@219 PS1, Line 219: iffere > Unfortunately DCHECK_GE does not work with int128_t. Added comment. Ah I always forget that. -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 12 Dec 2017 16:05:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. Patch Set 1: (11 comments) http://gerrit.cloudera.org:8080/#/c/8774/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8774/1//COMMIT_MSG@7 PS1, Line 7: IMPALA-5014: Part 1: Round when casting string to decimal > What's left for later parts? There's still decimal -> timestamp. It's not going to be a large patch, but I think it's better to separate them to make it easier to review. http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/exec/hdfs-scanner-ir.cc File be/src/exec/hdfs-scanner-ir.cc: http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/exec/hdfs-scanner-ir.cc@143 PS1, Line 143: Decimal4Value IrStringToDecimal4(const char* s, int len, int type_precision, > Should we also be switching to rounding instead of truncation for text scan Yes, on line 146, the result can't be PARSE_UNDERFLOW. So there is no point in trying to round. http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@122 PS1, Line 122: static inline void ApplyExponent(int total_digits_count, > Should this be private? I don't think it makes sense for anything external Done http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@123 PS1, Line 123: int digits_after_dot_count, int type_precision, int8_t exponent, T& value, > Our convention is to pass by pointer if it's modified. Done http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@125 PS1, Line 125: > nit unnecessary vertical whitespace Done http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@127 PS1, Line 127: // Ex: 0.1e3 (which at this point would have precision == 1 and scale == 1), the > Thanks for all the examples, this helped a lot when reviewing it. Many of them were there already. I just moved this code into a separate function to make it easier to reason. http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@141 PS1, Line 141: if (*precision < *scale) *precision = *scale; > Can this be moved into the else branch? If *scale is zero, this shouldn't b Good catch. Moved. If scale is zero, then precision has to be negative for this to be triggered. This should not happen. http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@204 PS1, Line 204: & > Remove the reference here? We want value semantics anyway. Done http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@219 PS1, Line 219: DCHECK > DCHECK_GE? Unfortunately DCHECK_GE does not work with int128_t. Added comment. http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@268 PS1, Line 268: maximumum > maximum Done http://gerrit.cloudera.org:8080/#/c/8774/1/be/src/util/string-parser.h@620 PS1, Line 620: & > Related to my other comment, passing a char value by reference is weird sin Done. Passing by value now. -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 1 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 12 Dec 2017 03:57:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Taras Bobrovytsky has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/8774 ) Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. IMPALA-5014: Part 1: Round when casting string to decimal In this patch we implement rounding when casting string to decimal if DECIMAL_V2 is enabled. The backend method that parses strings and converts them to decimals is refactored to make it easier to understand. Testing: - Added some BE tests. Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d --- M be/src/benchmarks/atod-benchmark.cc M be/src/benchmarks/overflow-benchmark.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/text-converter.inline.h M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/runtime/decimal-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/decimal-util.h M be/src/util/string-parser.cc M be/src/util/string-parser.h 11 files changed, 235 insertions(+), 103 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/8774/2 -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5014: Part 1: Round when casting string to decimal
Taras Bobrovytsky has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8774 Change subject: IMPALA-5014: Part 1: Round when casting string to decimal .. IMPALA-5014: Part 1: Round when casting string to decimal In this patch we implement rounding when casting string to decimal if DECIMAL_V2 is enabled. The backend method that parses strings and converts them to decimals is refactored to make it easier to understand. Testing: - Added some BE tests. Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d --- M be/src/benchmarks/atod-benchmark.cc M be/src/benchmarks/overflow-benchmark.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/text-converter.inline.h M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/runtime/decimal-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/decimal-util.h M be/src/util/string-parser.cc M be/src/util/string-parser.h 11 files changed, 233 insertions(+), 101 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/8774/1 -- To view, visit http://gerrit.cloudera.org:8080/8774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d Gerrit-Change-Number: 8774 Gerrit-PatchSet: 1 Gerrit-Owner: Taras Bobrovytsky