[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 11 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sun, 15 Aug 2021 20:40:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library StringToFloatInternal is used to parse string into float. It had logic to ensure it is faster than standard functions like strtod in many cases, but it was not as accurate. We are replacing it by a third party library named fast_double_parser which is both fast and doesn't sacrifise the accuracy for speed. On benchmarking on more than 1 million rows where string is cast to double, it is found that new patch is on par with the earlier algorithm. Results: W/O library: Fetched 1222386 row(s) in 32.10s With library: Fetched 1222386 row(s) in 31.71s Testing: 1. Added test to check for accuracy improvement. 2. Ran existing Backend tests for correctness. Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Reviewed-on: http://gerrit.cloudera.org:8080/17389 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exprs/expr-test.cc M be/src/util/string-parser-test.cc M be/src/util/string-parser.h M testdata/workloads/functional-query/queries/QueryTest/values.test 4 files changed, 161 insertions(+), 69 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 12 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 11 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sun, 15 Aug 2021 14:28:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7394/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 11 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sun, 15 Aug 2021 14:28:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 10: > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7383/ These seems unrelated due to unable to connect to impalad. I ran the pre-review tests later and it passed: https://jenkins.impala.io/job/pre-review-test/1081/ -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 10 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 13 Aug 2021 20:22:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 10: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7383/ -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 10 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 11 Aug 2021 14:31:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7383/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 10 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 11 Aug 2021 08:19:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 10 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 11 Aug 2021 08:19:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 9: Code-Review+2 Thanks Amogh for doing this improvement! LGTM! -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 9 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 11 Aug 2021 08:18:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 9: Code-Review+1 Looks good to me! -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 9 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 11 Aug 2021 00:29:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 9: Code-Review+1 +1 from my side too, great work! -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 9 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 10 Aug 2021 17:27:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 9: Code-Review+1 (1 comment) LGTM, I can upgrade my +1 to +2 if this patch looks OK to Qifan. http://gerrit.cloudera.org:8080/#/c/17389/9/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/9/be/src/util/string-parser.h@528 PS9, Line 528: *result = PARSE_UNDERFLOW; Do we have a test for PARSE_UNDERFLOW? -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 9 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 10 Aug 2021 12:42:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9268/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 9 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 10 Aug 2021 11:25:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 9: > > (1 comment) > So '-' would reach as an empty string here - the reason being signs > are stripped off before sending it to the library. But anyways I > think you are talking about pointer to one past the end of array. > So AFAIK C++ standard allows pointer to point to either any > elements of array or one past the end of elements - thats how STL > Iterator end() are implemented too. But I think dereferencing such > pointer is an undefined behaviour. So probably we cannot check if > incoming string is already null-terminated and should always take > care of making a copy and null-terminating it. This is taken care of now. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 9 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 10 Aug 2021 11:05:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library StringToFloatInternal is used to parse string into float. It had logic to ensure it is faster than standard functions like strtod in many cases, but it was not as accurate. We are replacing it by a third party library named fast_double_parser which is both fast and doesn't sacrifise the accuracy for speed. On benchmarking on more than 1 million rows where string is cast to double, it is found that new patch is on par with the earlier algorithm. Results: W/O library: Fetched 1222386 row(s) in 32.10s With library: Fetched 1222386 row(s) in 31.71s Testing: 1. Added test to check for accuracy improvement. 2. Ran existing Backend tests for correctness. Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 --- M be/src/exprs/expr-test.cc M be/src/util/string-parser-test.cc M be/src/util/string-parser.h M testdata/workloads/functional-query/queries/QueryTest/values.test 4 files changed, 161 insertions(+), 69 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17389/9 -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 9 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 8: > (1 comment) So '-' would reach as an empty string here - the reason being signs are stripped off before sending it to the library. But anyways I think you are talking about pointer to one past the end of array. So AFAIK C++ standard allows pointer to point to either any elements of array or one past the end of elements - thats how STL Iterator end() are implemented too. But I think dereferencing such pointer is an undefined behaviour. So probably we cannot check if incoming string is already null-terminated and should always take care of making a copy and null-terminating it. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 8 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 06 Aug 2021 14:23:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@508 PS7, Line 508: s[len]) > I see. Thanks for the details. So those conditions do not determine ELSE i. Okay. Thanks for the explanation. Since parse_number() parses a prefix of max length from input 'p' to a double, we would have to make sure it can do so with our input 's' even when we take the ELASE branch. For example, if the input 's' is "-" of length 1, will we crash in parse_numer() at line 1141? 1133 WARN_UNUSED 1134 really_inline const char * parse_number(const char *p, double *outDouble) { 1135 const char *pinit = p; 1136 bool found_minus = (*p == '-'); 1137 bool negative = false; 1138 if (found_minus) { 1139 ++p; 1140 negative = true; 1141 if (!is_integer(*p)) { // a negative sign must be followed by an integer 1142 return nullptr; 1143 } 1144 } It sounds like we would have to convert to null-terminating string first, unless we modify parse_number()? -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 8 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 12 Jul 2021 13:56:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@508 PS7, Line 508: s[len]) > The idea is to find the conditions to go to the ELSE branch quickly (withou I see. Thanks for the details. So those conditions do not determine ELSE i.e., they don't determine if string copy is not required. We can have strings satisfying those conditions and still needing copy (for e.g., when they are not null-terminated). And also we want some strings that do not satisfy above conditions to goto ELSE branch. For instance a null-terminated invalid input "INVALID_NUM", we want it to goto ELSE branch and not create copy of it. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 8 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sun, 11 Jul 2021 15:58:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 8: (3 comments) http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser-test.cc File be/src/util/string-parser-test.cc: http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser-test.cc@535 PS7, Line 535: TestStringToFloatPreprocess(".43256e4", "0.43256e4"); > Preprocess function doesn't handle sign either '+' or '-', so have added it I see. Done. http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@508 PS7, Line 508: s[len]) > I didn't understand the conditions above, but if s ='99' (which satisfies c The idea is to find the conditions to go to the ELSE branch quickly (without looking ahead too much). The JSON specs on numeric values are as follows (from https://datatracker.ietf.org/doc/html/rfc7159), on which the fast_double_parser is based. Numeric values that cannot be represented in the grammar below (such as Infinity and NaN) are not permitted. number = [ minus ] int [ frac ] [ exp ] decimal-point = %x2E ; . digit1-9 = %x31-39 ; 1-9 e = %x65 / %x45; e E exp = e [ minus / plus ] 1*DIGIT frac = decimal-point 1*DIGIT int = zero / ( digit1-9 *DIGIT ) minus = %x2D ; - plus = %x2B; + zero = %x30; 0 Thus, the conditions to do so are 1). minus digit1-9 2). digit1-9 3). 0 frac Sorry I did not spell out these conditions precisely in the first attempt. http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@541 PS7, Line 541: return (T)(negative ? -val : val); > It will be reached for UNDERFLOW and OVERFLOW. Done -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 8 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sun, 11 Jul 2021 12:55:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9067/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 8 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 09 Jul 2021 23:33:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser-test.cc File be/src/util/string-parser-test.cc: http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser-test.cc@535 PS7, Line 535: TestStringToFloatPreprocess(".43256e4", "0.43256e4"); > May add a couple tests with negative values, such as Preprocess function doesn't handle sign either '+' or '-', so have added it under Invalid input. http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@508 PS7, Line 508: s[len]) > nit. I wonder if the condition to go without preprocessing (the ElSE branch I didn't understand the conditions above, but if s ='99' (which satisfies condition 2 i believe), it can goto any of the branches based on what s[2] is. if s[2] is null then it will goto true branch and if s[2] is a digit then to false (ELSE) branch. http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@541 PS7, Line 541: return (T)(negative ? -val : val); > nit. This code can not be reached. It will be reached for UNDERFLOW and OVERFLOW. http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@584 PS7, Line 584: > nit. This may not be accurate since the code removes leading '0's before an Agreed. Changed the wordings. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 8 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 09 Jul 2021 23:11:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library StringToFloatInternal is used to parse string into float. It had logic to ensure it is faster than standard functions like strtod in many cases, but it was not as accurate. We are replacing it by a third party library named fast_double_parser which is both fast and doesn't sacrifise the accuracy for speed. On benchmarking on more than 1 million rows where string is cast to double, it is found that new patch is on par with the earlier algorithm. Results: W/O library: Fetched 1222386 row(s) in 32.10s With library: Fetched 1222386 row(s) in 31.71s Testing: 1. Added test to check for accuracy improvement. 2. Ran existing Backend tests for correctness. Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 --- M be/src/exprs/expr-test.cc M be/src/util/string-parser-test.cc M be/src/util/string-parser.h M testdata/workloads/functional-query/queries/QueryTest/values.test 4 files changed, 167 insertions(+), 67 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17389/8 -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 8 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 7: (4 comments) Looks good to me and thanks a lot for handling well-formed string cases! http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser-test.cc File be/src/util/string-parser-test.cc: http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser-test.cc@535 PS7, Line 535: TestStringToFloatPreprocess(".43256e4", "0.43256e4"); May add a couple tests with negative values, such as -0.1 -.555 -.12 -11 http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@508 PS7, Line 508: s[len]) nit. I wonder if the condition to go without preprocessing (the ElSE branch) is one of the following. 1. '0.' 2. [1-9]+ I was not able to map the above two conditions to the line at 508. http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@541 PS7, Line 541: return (T)(negative ? -val : val); nit. This code can not be reached. http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@584 PS7, Line 584: before '.' nit. This may not be accurate since the code removes leading '0's before any digits. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 7 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 09 Jul 2021 21:35:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9065/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 7 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 09 Jul 2021 19:40:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 7: (6 comments) http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@463 PS6, Line 463: FAILURE and 0 is retur > If we can change it to std::string(), we can take advantage of the fast_dou As discussed in other thread, changing it to std::string will need changes in many places even to other StringTo* functions. So we will retain this signature. http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@498 PS6, Line 498: of remaning string not parsed by li > We can pass the input std::string as 's' and get rid off 'len', This may not be required as only long strings will use std::string now and they would not need any preprocessing. http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@538 PS6, Line 538: : retur > The new signature of this function becomes This may not be required as only long strings will use std::string now and they would not need any preprocessing. http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545 PS6, Line 545: DCHECK(trailing_ > In a message at Jun 3 you mentioned "So I will use VAC for shorter strings Have made the changes for the same now. For shorter string we will use stack and for longer string we will use std::string (malloc) http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@550 PS6, Line 550: > This would probably prevent "return value optimization", as there would be This code has changed now. We take care of not making a copy for null-terminated string at some other place. http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@552 PS6, Line 552: (val == std::numeric_limits::infinity())) { > Do you plan to resolve this comment? This has been removed. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 7 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 09 Jul 2021 19:33:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library StringToFloatInternal is used to parse string into float. It had logic to ensure it is faster than standard functions like strtod in many cases, but it was not as accurate. We are replacing it by a third party library named fast_double_parser which is both fast and doesn't sacrifise the accuracy for speed. On benchmarking on more than 1 million rows where string is cast to double, it is found that new patch is on par with the earlier algorithm. Results: W/O library: Fetched 1222386 row(s) in 32.10s With library: Fetched 1222386 row(s) in 31.71s Testing: 1. Added test to check for accuracy improvement. 2. Ran existing Backend tests for correctness. Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 --- M be/src/exprs/expr-test.cc M be/src/util/string-parser-test.cc M be/src/util/string-parser.h M testdata/workloads/functional-query/queries/QueryTest/values.test 4 files changed, 164 insertions(+), 67 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17389/7 -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 7 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: (3 comments) I am happy with the code as is, but seems like there are still some unresolved comment threads in 'string-parser.h'. Can we resolve (or mark as Done / won't resolve) each before moving forward? http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545 PS6, Line 545: std::string res; > Results: In a message at Jun 3 you mentioned "So I will use VAC for shorter strings to avoid malloc and for larger strings we can use malloc + strtod." Is there still a plan to do VLA for short strings? http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@550 PS6, Line 550: > Here before copying the test of the string, make the following check and re This would probably prevent "return value optimization", as there would be multiple return statements with different objects. http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@552 PS6, Line 552: // string has `move` defined, which would be used instead of copy. > nit: I don't think we need this comment. And I think the situation is even Do you plan to resolve this comment? -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 07 Jul 2021 08:38:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: +2 Since Zolta has +1. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 06 Jul 2021 22:55:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: Code-Review+2 Hi Amogh, Thanks a lot for digging into the code and find out another use case. Yes, I agree that we have to pay the construction cost if the string version is constructed and passed to the new code. In that case, let us just pay the cost in the new code. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 06 Jul 2021 22:52:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: > (4 comments) > > It is great to know that Impala can achieve 926 MB/s conversion > rate and very attempting to get the best from fast_double_parser():-) > > The key is not to populate a new std::string when the original > input conforms to the requirements of the library (well formed > null-terminated string via string::c_str() in constant speed), > which should be true in most cases. > > Throughout the code base of Impala, I was able to find only the > following call that needs the service of converting string to > double which makes the above idea feasible. > > 346 static bool ParseProbability(const string& prob_str, bool* > should_execute) { > 347 StringParser::ParseResult parse_result; > 348 double probability = StringParser::StringToFloat( > 349 prob_str.c_str(), prob_str.size(), &parse_result); > 350 if (parse_result != StringParser::PARSE_SUCCESS || > 351 probability < 0.0 || probability > 1.0) { > 352 return false; > 353 } > 354 // +1L ensures probability of 0.0 and 1.0 work as expected. > 355 *should_execute = rand() < probability * (RAND_MAX + 1L); > 356 return true; > 357 } Hi Qifan, I got late to the comment. So the other important code path which can lead to non-null terminated strings are due to the cast: 'select cast("0.454" as double)' or 'select cast(x as double) from foo' etc. The code path will pass through CastFunctions::CastToDoubleVal generated via Macro: #define CAST_FROM_STRING(num_type, native_type, string_parser_fn) \ num_type CastFunctions::CastTo##num_type(FunctionContext* ctx, const StringVal& val) { \ if (val.is_null) return num_type::null(); \ StringParser::ParseResult result; \ num_type ret; \ ret.val = StringParser::string_parser_fn( \ reinterpret_cast(val.ptr), val.len, &result); \ if (UNLIKELY(result != StringParser::PARSE_SUCCESS)) return num_type::null(); \ return ret; \ } this code can probably be frequently used based on usage of cast by client/customer. But the point you are making is valid that well formed null-terminated string need no extra processing and should directly be passed to library function. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 05 Jul 2021 14:34:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: (4 comments) It is great to know that Impala can achieve 926 MB/s conversion rate and very attempting to get the best from fast_double_parser():-) The key is not to populate a new std::string when the original input conforms to the requirements of the library (well formed null-terminated string via string::c_str() in constant speed), which should be true in most cases. Throughout the code base of Impala, I was able to find only the following call that needs the service of converting string to double which makes the above idea feasible. 346 static bool ParseProbability(const string& prob_str, bool* should_execute) { 347 StringParser::ParseResult parse_result; 348 double probability = StringParser::StringToFloat( 349 prob_str.c_str(), prob_str.size(), &parse_result); 350 if (parse_result != StringParser::PARSE_SUCCESS || 351 probability < 0.0 || probability > 1.0) { 352 return false; 353 } 354 // +1L ensures probability of 0.0 and 1.0 work as expected. 355 *should_execute = rand() < probability * (RAND_MAX + 1L); 356 return true; 357 } http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@463 PS6, Line 463: const char* s, int len If we can change it to std::string(), we can take advantage of the fast_double_parser to the fullest. http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@498 PS6, Line 498: StringToFloatPreprocess(s, i, len); We can pass the input std::string as 's' and get rid off 'len', On return, if processed_str.length() is 0, then the original input confirms to fast_double_parser() and we can make use of the original directly. http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@538 PS6, Line 538: const char* s, int offset, : int len The new signature of this function becomes static inline std::string StringToFloatPreprocess(const string& s, int offset) http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@550 PS6, Line 550: Here before copying the test of the string, make the following check and return an empty string object to signal that the original input is good to go with fast_double_parser. if (i==offset && res.length() == 0) { return string(); } -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 03 Jun 2021 17:55:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: > I think it would be nice to see a comparision between > fast_double_parser and strtod() for short strings. > > If we can measure a significant benefit for fast_double_parser we > can use it for short strings with a VLA, and use strtod() for long > strings with heap allocation (in this case we don't need to > preprocess the string). > > If we can't measure a difference between the two then maybe we > should just use strtod() since it's more robust and doesn't need > any preprocessing of the input. There was no observable difference found when running through Impala as that function would be small fraction of entire runtime. However I benchmarked the individual functions and also added malloc and VAC operations to check the difference in the benchmark. This is the result: === trial 1 === fast_double_parser_malloc 397.00 MB/s fast_double_parser_VAC 738.19 MB/s fast_double_parser 926.76 MB/s strtod 110.63 MB/s === trial 2 === fast_double_parser_malloc 395.90 MB/s fast_double_parser_VAC 694.98 MB/s fast_double_parser 926.33 MB/s strtod 110.42 MB/s So we see that malloc and strtod are significantly slower and in bigger data sets it may start showing difference. tcmalloc however might be faster than above but still will incur some penalty. So I will use VAC for shorter strings to avoid malloc and for larger strings we can use malloc + strtod. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 03 Jun 2021 12:31:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: I think it would be nice to see a comparision between fast_double_parser and strtod() for short strings. If we can measure a significant benefit for fast_double_parser we can use it for short strings with a VLA, and use strtod() for long strings with heap allocation (in this case we don't need to preprocess the string). If we can't measure a difference between the two then maybe we should just use strtod() since it's more robust and doesn't need any preprocessing of the input. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 27 May 2021 10:31:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: > Agree that the performance numbers are OK WRT strings of 20 or 30 > bytes long. > > Would it be possible to make another performance by calling > strtod() instead? > > According to the performance numbers documented here > (https://github.com/lemire/fast_double_parser), fast_double is > about 9.4x faster than strtod, tested with Apple Clang. > > I just curious how fast fast_double is in Impala, compared to > strtod(). So for both 20 and 30 length string fast_double_parser also uses strtod so not any noticeable difference observed. On checking code, they specify that more than 19 digits will generally cause overflow and are highly unlikely to occur so they just send it to strtod(). Just that even for strtod we need a null terminated string. So I think short string optimisation should work well here as longer strings are not likely. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 26 May 2021 17:58:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: Agree that the performance numbers are OK WRT strings of 20 or 30 bytes long. Would it be possible to make another performance by calling strtod() instead? According to the performance numbers documented here (https://github.com/lemire/fast_double_parser), fast_double is about 9.4x faster than strtod, tested with Apple Clang. I just curious how fast fast_double is in Impala, compared to strtod(). -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 24 May 2021 01:05:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545 PS6, Line 545: std::string res; > Original input string in StringToFloatInternal() is not null-terminated str Results: 3 Runs with 20 length string: With std::string Fetched 110 row(s) in 30.54s Fetched 110 row(s) in 30.68s Fetched 110 row(s) in 30.62s With VLA: Fetched 110 row(s) in 30.60s Fetched 110 row(s) in 30.64s Fetched 110 row(s) in 30.56s 3 Runs with 30 length string: - With std::string Fetched 110 row(s) in 33.85s Fetched 110 row(s) in 33.89s Fetched 110 row(s) in 34.19s With VLA: Fetched 110 row(s) in 33.78s Fetched 110 row(s) in 33.71s Fetched 110 row(s) in 33.57s Summary: I think we are fine with performance. In real system hopefully we don't see too many tcmalloc contention due to this. Couple of things will help us avoid malloc in current scenario: 1. Strings less than 16 length due to Small string optimisation. 2. Based on Qifan's suggestion, we can avoid creating copy of string for some cases where we have non-integer terminated string as original input. Other alternative (also suggested by Zoltan) is to do VLA for string of less than 256 bytes and above that we can use strtod. I am ok with either approach. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 21 May 2021 21:37:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545 PS6, Line 545: std::string res; > Stack overflow is sensitive to the depth of the stack and the size of the c Original input string in StringToFloatInternal() is not null-terminated string and was figured out during testing. That is also one of the reason for the copying things into null-terminated string. I am checking perf penalty with a million casts requiring heap allocation. if we see perf issues, we can use VLA for shorter strings and heap for longer ones. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 21 May 2021 16:02:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545 PS6, Line 545: std::string res; > Yeah, I think it'd be good to see measurements for longer float literals. W Stack overflow is sensitive to the depth of the stack and the size of the current frame. Assume in most cases, the input string conforms to the format that the fast double parser library can handle properly, then we may not need to build the string object at all. Thus, we can use the following form of signature: inline bool stringToFloatPreprocess(char*, int offset, int len, string* preprocessed_result); We return string (in preprocessed_result) only when the input does not conform. In good case, we do not construct a string object and we can feed the original input string in StringToFloatInternal() to the fast parser library, if the input string is null-terminated. In seems this is the case as the only use of the StringToFloatInternal() is called indirectly here throughout Impala BE. 343 static bool ParseProbability(const string& prob_str, bool* should_execute) { 344 StringParser::ParseResult parse_result; 345 double probability = StringParser::StringToFloat( 346 prob_str.c_str(), prob_str.size(), &parse_result); 347 if (parse_result != StringParser::PARSE_SUCCESS || 348 probability < 0.0 || probability > 1.0) { 349 return false; 350 } 351 // +1L ensures probability of 0.0 and 1.0 work as expected. 352 *should_execute = rand() < probability * (RAND_MAX + 1L); 353 return true; 354 } -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 21 May 2021 15:27:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545 PS6, Line 545: std::string res; > I had checked Small String Optimization and had read till 24 bytes it would Yeah, I think it'd be good to see measurements for longer float literals. We are using TCMalloc in Impala which should be fairly performant so it's possible that the perf is not too bad. We can also have an 'if' in the code. If strlen > 1 KiB use strtod(), otherwise use fast_double_parser with a Variable-length array, or a fixed-sized 1 KiB array? -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 21 May 2021 11:24:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545 PS6, Line 545: std::string res; > I wonder if the performance numbers are still the same, given that now we n I had checked Small String Optimization and had read till 24 bytes it would not do allocation which appeared reasonable. But i rechecked after your comment and 24 bytes seem to be only for clang. I wrote a small program to check for our g++ and it avoids allocation till 15 bytes only. Program: #include #include #include // replace operator new and delete to log allocations void* operator new(std::size_t n) { std::cout << "[Allocating " << n << " bytes]"; return malloc(n); } void operator delete(void* p) throw() { free(p); } int main() { for (size_t i = 0; i < 30; ++i) { std::cout << i << ": " << std::string(i, '=') << std::endl; } } is 15 good enough length ? otherwise dynamic allocation will make things slow. My assumption is input won't be long enough to cause stack overflow. I am not sure if we have some limitation on the string being passed to StringToFloat. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 21 May 2021 10:49:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: (2 comments) Thanks for the changes! http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545 PS6, Line 545: std::string res; I wonder if the performance numbers are still the same, given that now we need to dynamically allocate memory from the heap. (Though for short strings the standard library implementation being used avoids allocation by using Small String Optimization) http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@552 PS6, Line 552: // string has `move` defined, which would be used instead of copy. nit: I don't think we need this comment. And I think the situation is even better as in this case Named Return Value Optimization kicks in which even avoids moving: https://en.cppreference.com/w/cpp/language/copy_elision -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 21 May 2021 09:29:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: Code-Review+1 (4 comments) Looks great! http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@538 PS6, Line 538: const char* s, int offset, : int le nit. Maybe we could use two arguments (instead of three): char* start and char* end, where 'start' is s + offset, and 'end' is 'start' + len -1 (inclusive) on entry into this function. In the body, we set char* pos = start, and work with *pos and *(pos+1). In this way, we increment on pos only to avoid s+i in several places. http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@493 PS5, Line 493: ng decimal > Thats a good catch. I have changed it to string and code is much simpler no ack :-). http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@501 PS5, Line 501: > 'pos' is not going to be 'loop constant' which can be pulled out as it need I see. See my comment to the new version of the code. http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@524 PS5, Line 524: esult = PARSE_OVERFLOW; > Actually, when function underflows it doesn't return NaN. It returns 0 inst Done -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 21 May 2021 01:16:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8764/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 20 May 2021 22:45:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/17389/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17389/5//COMMIT_MSG@12 PS5, Line 12: doesn't : sacrifise the accuracy for speed. > Could you please add your measurements here as well? Done. http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@493 PS5, Line 493: ng decimal > When len is super long, c_str may overflow the stack. Maybe std::string or Thats a good catch. I have changed it to string and code is much simpler now. http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@501 PS5, Line 501: > (s+i) is repeatedly computed in this loop. May precompute it as pos=s+i in 'pos' is not going to be 'loop constant' which can be pulled out as it needs to be incremented along with 'i'. So keeping just 1 counter for loop instead of 2. http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@501 PS5, Line 501: : if (UNLIKELY(s_end == nullptr)) { : // Determine if it is an overflow case : if (UNLIKELY(!std::isfinite(val))) { : *result = PARSE_OVERFLOW; : } else if (UNLIKELY(val == 0)) { : *result = PARSE_UNDERFLOW; : } else { : *result = PARSE_FAILURE; : return 0; : } : return (T)(negative ? -val : val); : } : : // check if trailing characters are present and are all white spaces : int trailing_len = processed_str.length() - (int)(s_end - c_str); : i > Maybe we could put this into its own function, stg like 'NormalizeFloatStri Done...added a new private function and test for it. http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@524 PS5, Line 524: esult = PARSE_OVERFLOW; > Should we also test std::isnan(val) and set PARSE_UNDERFLOW flag here? Actually, when function underflows it doesn't return NaN. It returns 0 instead. Handled that case. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 20 May 2021 22:30:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library StringToFloatInternal is used to parse string into float. It had logic to ensure it is faster than standard functions like strtod in many cases, but it was not as accurate. We are replacing it by a third party library named fast_double_parser which is both fast and doesn't sacrifise the accuracy for speed. On benchmarking on more than 1 million rows where string is cast to double, it is found that new patch is on par with the earlier algorithm. Results: W/O library: Fetched 1222386 row(s) in 32.10s With library: Fetched 1222386 row(s) in 31.71s Testing: 1. Added test to check for accuracy improvement. 2. Ran existing Backend tests for correctness. Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 --- M be/src/exprs/expr-test.cc M be/src/util/string-parser-test.cc M be/src/util/string-parser.h M testdata/workloads/functional-query/queries/QueryTest/values.test 4 files changed, 122 insertions(+), 67 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17389/6 -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 5: (3 comments) Looks great! http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@493 PS5, Line 493: len - i + 2 When len is super long, c_str may overflow the stack. Maybe std::string or some other buffer class can be used instead. http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@501 PS5, Line 501: s + i (s+i) is repeatedly computed in this loop. May precompute it as pos=s+i in advance. http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@524 PS5, Line 524: *result = PARSE_OVERFLOW; Should we also test std::isnan(val) and set PARSE_UNDERFLOW flag here? -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 5 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 May 2021 15:42:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 5: Code-Review+1 (2 comments) It's great to hear that the library has great perf. LGTM, only had smaller nits. http://gerrit.cloudera.org:8080/#/c/17389/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17389/5//COMMIT_MSG@12 PS5, Line 12: doesn't : sacrifise the accuracy for speed. Could you please add your measurements here as well? http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@501 PS5, Line 501: while(i + 1 < len && *(s + i) == '0' && : fast_double_parser::is_integer(*(s + i + 1))) { : i++; : } : : // Library doesn't handle strings starting with '.' like '.56' or '-.56'. : // Preprocessing such strings by prefixing it with '0' : if (*(s + i) == '.') { : c_len = len - i + 1; : c_str[0] = '0'; // Prefix '0' : memcpy(c_str + 1, s + i, c_len - 1); : c_str[c_len] = '\0'; : } else { : c_len = len - i; : memcpy(c_str, s + i, c_len); : c_str[c_len] = '\0'; : } Maybe we could put this into its own function, stg like 'NormalizeFloatString()' and add a few unit tests for it in string-parser-test.cc. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 5 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 May 2021 14:14:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 5: Benchmarking: Tested it on more than million rows casted from string to double. Results shows both algorithms took almost same time: W/O library: Fetched 1222386 row(s) in 32.10s With library: Fetched 1222386 row(s) in 31.71s With library timing is slightly better around half a second but it can be a noise. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 5 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 May 2021 21:17:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8741/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 5 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 May 2021 15:27:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/17389/4/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/4/be/src/util/string-parser.h@495 PS4, Line 495: // Library function doesn't handle leading 0s like '1','-.9','-001' > What do you mean by 'doesn't handle'? Falls back to strtod(), returns nullp It would return nullptr. I added that as a part of comment now. Exhaustive test for all these cases already exist in string-parser-test.cc and these cases were identified through them. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 5 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 May 2021 15:05:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library StringToFloatInternal is used to parse string into float. It had logic to ensure it is faster than standard functions like strtod in many cases, but it was not as accurate. We are replacing it by a third party library named fast_double_parser which is both fast and doesn't sacrifise the accuracy for speed. Testing: 1. Added test to check for accuracy improvement. 2. Ran existing Backend tests for correctness. Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 --- M be/src/exprs/expr-test.cc M be/src/util/string-parser-test.cc M be/src/util/string-parser.h M testdata/workloads/functional-query/queries/QueryTest/values.test 4 files changed, 73 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17389/5 -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 5 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17389/4/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/4/be/src/util/string-parser.h@495 PS4, Line 495: // Library function doesn't handle leading 0s like '1','-.9','-001' What do you mean by 'doesn't handle'? Falls back to strtod(), returns nullptr, or returns wrong results? Could you please add tests for such values? -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 4 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 May 2021 11:11:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8710/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 4 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 12 May 2021 22:29:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/17389/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17389/3//COMMIT_MSG@10 PS3, Line 10: strto > strtod done. http://gerrit.cloudera.org:8080/#/c/17389/3//COMMIT_MSG@13 PS3, Line 13: accuracy > accuracy done. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 4 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 12 May 2021 22:08:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library StringToFloatInternal is used to parse string into float. It had logic to ensure it is faster than standard functions like strtod in many cases, but it was not as accurate. We are replacing it by a third party library named fast_double_parser which is both fast and doesn't sacrifise the accuracy for speed. Testing: 1. Added test to check for accuracy improvement. 2. Ran existing Backend tests for correctness. Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 --- M be/src/exprs/expr-test.cc M be/src/util/string-parser-test.cc M be/src/util/string-parser.h M testdata/workloads/functional-query/queries/QueryTest/values.test 4 files changed, 72 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17389/4 -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 4 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 3: Code-Review+1 (2 comments) LGTM! http://gerrit.cloudera.org:8080/#/c/17389/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17389/3//COMMIT_MSG@10 PS3, Line 10: strod strtod http://gerrit.cloudera.org:8080/#/c/17389/3//COMMIT_MSG@13 PS3, Line 13: accurancy accuracy -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 3 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 12 May 2021 16:08:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8700/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 3 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 11 May 2021 11:59:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/17389/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17389/2//COMMIT_MSG@13 PS2, Line 13: speed > It would be nice to have some measurements, for example by adding a large n Sure, will try to measure it. Theoretically, this should not improve performance but just improve the accuracy without perf penalty. http://gerrit.cloudera.org:8080/#/c/17389/2//COMMIT_MSG@17 PS2, Line 17: 2. Ran existing Backend tests for correctness. > Can you add some tests to https://github.com/apache/impala/blob/master/be/s I have added them now. Thanks for the suggestion. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 3 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 11 May 2021 11:44:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library StringToFloatInternal is used to parse string into float. It had logic to ensure it is faster than standard functions like strod in many cases, but it was not as accurate. We are replacing it by a third party library named fast_double_parser which is both fast and doesn't sacrifise the accurancy for speed. Testing: 1. Added test to check for accuracy improvement. 2. Ran existing Backend tests for correctness. Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 --- M be/src/exprs/expr-test.cc M be/src/util/string-parser.h M testdata/workloads/functional-query/queries/QueryTest/values.test 3 files changed, 39 insertions(+), 67 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17389/3 -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 3 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17389/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17389/2//COMMIT_MSG@13 PS2, Line 13: speed It would be nice to have some measurements, for example by adding a large number of casts to a query with many rows so that the cost of casts become measurable. http://gerrit.cloudera.org:8080/#/c/17389/2//COMMIT_MSG@17 PS2, Line 17: 2. Ran existing Backend tests for correctness. Can you add some tests to https://github.com/apache/impala/blob/master/be/src/exprs/expr-test.cc#L3221 ? Having some EE tests is useful, but expression are more widely covered in expr-test.cc -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 2 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 04 May 2021 13:16:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 2: FYI: There are few test failures i saw in pre-review job which i would need to fix. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 2 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 04 May 2021 10:42:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8679/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 2 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 03 May 2021 18:13:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17389 Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library StringToFloatInternal is used to parse string into float. It had logic to ensure it is faster than standard functions like strod in many cases, but it was not as accurate. We are replacing it by a third party library named fast_double_parser which is both fast and doesn't sacrifise the accurancy for speed. Testing: 1. Added test to check for accuracy improvement. 2. Ran existing Backend tests for correctness. Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 --- M be/src/util/string-parser.h M testdata/workloads/functional-query/queries/QueryTest/values.test 2 files changed, 24 insertions(+), 67 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17389/2 -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 2 Gerrit-Owner: Amogh Margoor