[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-08-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 11: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 11
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sun, 15 Aug 2021 20:40:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-08-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..

IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library

StringToFloatInternal is used to parse string into float. It had logic
to ensure it is faster than standard functions like strtod in many
cases, but it was not as accurate. We are replacing it by a third
party library named fast_double_parser which is both fast and doesn't
sacrifise the accuracy for speed. On benchmarking on more than
1 million rows where string is cast to double, it is found that new
patch is on par with the earlier algorithm.

Results:
W/O library: Fetched 1222386 row(s) in 32.10s
With library: Fetched 1222386 row(s) in 31.71s

Testing:
1. Added test to check for accuracy improvement.
2. Ran existing Backend tests for correctness.

Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Reviewed-on: http://gerrit.cloudera.org:8080/17389
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exprs/expr-test.cc
M be/src/util/string-parser-test.cc
M be/src/util/string-parser.h
M testdata/workloads/functional-query/queries/QueryTest/values.test
4 files changed, 161 insertions(+), 69 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 12
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-08-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 11: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 11
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sun, 15 Aug 2021 14:28:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-08-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7394/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 11
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sun, 15 Aug 2021 14:28:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-08-13 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 10:

> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7383/

These seems unrelated due to unable to connect to impalad. I ran the pre-review 
tests later and it passed: https://jenkins.impala.io/job/pre-review-test/1081/


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 10
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 13 Aug 2021 20:22:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-08-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 10: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7383/


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 10
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 11 Aug 2021 14:31:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-08-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7383/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 10
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 11 Aug 2021 08:19:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-08-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 10
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 11 Aug 2021 08:19:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-08-11 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 9: Code-Review+2

Thanks Amogh for doing this improvement! LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 9
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 11 Aug 2021 08:18:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-08-10 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 9: Code-Review+1

Looks good to me!


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 9
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 11 Aug 2021 00:29:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-08-10 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 9: Code-Review+1

+1 from my side too, great work!


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 9
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 10 Aug 2021 17:27:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-08-10 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 9: Code-Review+1

(1 comment)

LGTM, I can upgrade my +1 to +2 if this patch looks OK to Qifan.

http://gerrit.cloudera.org:8080/#/c/17389/9/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/9/be/src/util/string-parser.h@528
PS9, Line 528: *result = PARSE_UNDERFLOW;
Do we have a test for PARSE_UNDERFLOW?



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 9
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 10 Aug 2021 12:42:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-08-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9268/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 9
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 10 Aug 2021 11:25:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-08-10 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 9:

> > (1 comment)
 > So '-' would reach as an empty string here - the reason being signs
 > are stripped off before sending it to the library. But anyways I
 > think you are talking about pointer to one past the end of array.
 > So AFAIK C++ standard allows pointer to point to either any
 > elements of array or one past the end of elements - thats how STL
 > Iterator end() are implemented too. But I think dereferencing such
 > pointer is an undefined behaviour. So probably we cannot check if
 > incoming string is already null-terminated and should always take
 > care of making a copy and null-terminating it.

This is taken care of now.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 9
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 10 Aug 2021 11:05:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-08-10 Thread Amogh Margoor (Code Review)
Amogh Margoor has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..

IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library

StringToFloatInternal is used to parse string into float. It had logic
to ensure it is faster than standard functions like strtod in many
cases, but it was not as accurate. We are replacing it by a third
party library named fast_double_parser which is both fast and doesn't
sacrifise the accuracy for speed. On benchmarking on more than
1 million rows where string is cast to double, it is found that new
patch is on par with the earlier algorithm.

Results:
W/O library: Fetched 1222386 row(s) in 32.10s
With library: Fetched 1222386 row(s) in 31.71s

Testing:
1. Added test to check for accuracy improvement.
2. Ran existing Backend tests for correctness.

Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
---
M be/src/exprs/expr-test.cc
M be/src/util/string-parser-test.cc
M be/src/util/string-parser.h
M testdata/workloads/functional-query/queries/QueryTest/values.test
4 files changed, 161 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17389/9
--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 9
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-08-06 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 8:

> (1 comment)
So '-' would reach as an empty string here - the reason being signs are 
stripped off before sending it to the library. But anyways I think you are 
talking about pointer to one past the end of array. So AFAIK C++ standard 
allows pointer to point to either any elements of array or one past the end of 
elements - thats how STL Iterator end() are implemented too. But I think 
dereferencing such pointer is an undefined behaviour. So probably we cannot 
check if incoming string is already null-terminated and should always take care 
of making a copy and null-terminating it.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 8
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 06 Aug 2021 14:23:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-07-12 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@508
PS7, Line 508: s[len])
> I see. Thanks for the details. So those conditions do not determine ELSE i.
Okay. Thanks for the explanation.

Since parse_number() parses a prefix of max length from input 'p' to a double, 
we would have to make sure it can do so with our input 's' even when we take 
the ELASE branch.  For example, if the input 's' is "-" of length 1, will we 
crash in parse_numer() at line 1141?

1133 WARN_UNUSED  
1134 really_inline const char * parse_number(const char *p, double *outDouble) {
1135   const char *pinit = p;   
1136   bool found_minus = (*p == '-'); 
1137   bool negative = false;  
1138   if (found_minus) {
1139 ++p;   
  
1140 negative = true; 
1141 if (!is_integer(*p)) { // a negative sign must be followed by an 
integer
1142   return nullptr;  
 
1143 } 
1144   }

It sounds like we would have to convert to null-terminating string first, 
unless we modify parse_number()?



-- 
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 8
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 12 Jul 2021 13:56:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-07-11 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@508
PS7, Line 508: s[len])
> The idea is to find the conditions to go to the ELSE branch quickly (withou
I see. Thanks for the details. So those conditions do not determine ELSE i.e., 
they don't determine if string copy is not required. We can have strings 
satisfying those conditions and still needing copy (for e.g., when they are not 
null-terminated). And also we want some strings that do not satisfy above 
conditions to goto ELSE branch. For instance a null-terminated invalid input 
"INVALID_NUM", we want it to goto ELSE branch and not create copy of it.



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 8
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sun, 11 Jul 2021 15:58:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-07-11 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser-test.cc
File be/src/util/string-parser-test.cc:

http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser-test.cc@535
PS7, Line 535:   TestStringToFloatPreprocess(".43256e4", "0.43256e4");
> Preprocess function doesn't handle sign either '+' or '-', so have added it
I see. Done.


http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@508
PS7, Line 508: s[len])
> I didn't understand the conditions above, but if s ='99' (which satisfies c
The idea is to find the conditions to go to the ELSE branch quickly (without 
looking ahead too much).

The JSON specs on numeric values are as follows (from 
https://datatracker.ietf.org/doc/html/rfc7159), on which the fast_double_parser 
is based.

Numeric values that cannot be represented in the grammar below (such
   as Infinity and NaN) are not permitted.

  number = [ minus ] int [ frac ] [ exp ]

  decimal-point = %x2E   ; .

  digit1-9 = %x31-39 ; 1-9

  e = %x65 / %x45; e E

  exp = e [ minus / plus ] 1*DIGIT

  frac = decimal-point 1*DIGIT

  int = zero / ( digit1-9 *DIGIT )

  minus = %x2D   ; -

  plus = %x2B; +

  zero = %x30; 0

Thus, the conditions to do so are

1). minus digit1-9
2). digit1-9
3). 0 frac

Sorry I did not spell out these conditions precisely in the first attempt.


http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@541
PS7, Line 541: return (T)(negative ? -val : val);
> It will be reached for UNDERFLOW and OVERFLOW.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 8
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sun, 11 Jul 2021 12:55:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-07-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9067/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 8
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 09 Jul 2021 23:33:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-07-09 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser-test.cc
File be/src/util/string-parser-test.cc:

http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser-test.cc@535
PS7, Line 535:   TestStringToFloatPreprocess(".43256e4", "0.43256e4");
> May add a couple tests with negative values, such as
Preprocess function doesn't handle sign either '+' or '-', so have added it 
under Invalid input.


http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@508
PS7, Line 508: s[len])
> nit. I wonder if the condition to go without preprocessing (the ElSE branch
I didn't understand the conditions above, but if s ='99' (which satisfies 
condition 2 i believe), it can goto any of the branches based on what s[2] is. 
if s[2] is null then it will goto true branch and if s[2] is a digit then to 
false (ELSE) branch.


http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@541
PS7, Line 541: return (T)(negative ? -val : val);
> nit. This code can not be reached.
It will be reached for UNDERFLOW and OVERFLOW.


http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@584
PS7, Line 584:
> nit. This may not be accurate since the code removes leading '0's before an
Agreed. Changed the wordings.



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 8
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 09 Jul 2021 23:11:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-07-09 Thread Amogh Margoor (Code Review)
Amogh Margoor has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..

IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library

StringToFloatInternal is used to parse string into float. It had logic
to ensure it is faster than standard functions like strtod in many
cases, but it was not as accurate. We are replacing it by a third
party library named fast_double_parser which is both fast and doesn't
sacrifise the accuracy for speed. On benchmarking on more than
1 million rows where string is cast to double, it is found that new
patch is on par with the earlier algorithm.

Results:
W/O library: Fetched 1222386 row(s) in 32.10s
With library: Fetched 1222386 row(s) in 31.71s

Testing:
1. Added test to check for accuracy improvement.
2. Ran existing Backend tests for correctness.

Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
---
M be/src/exprs/expr-test.cc
M be/src/util/string-parser-test.cc
M be/src/util/string-parser.h
M testdata/workloads/functional-query/queries/QueryTest/values.test
4 files changed, 167 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17389/8
--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 8
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-07-09 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 7:

(4 comments)

Looks good to me and thanks a lot for handling well-formed string cases!

http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser-test.cc
File be/src/util/string-parser-test.cc:

http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser-test.cc@535
PS7, Line 535:   TestStringToFloatPreprocess(".43256e4", "0.43256e4");
May add a couple tests with negative values, such as

-0.1
-.555
-.12
-11


http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@508
PS7, Line 508: s[len])
nit. I wonder if the condition to go without preprocessing (the ElSE branch) is 
one of the following.

1. '0.'
2. [1-9]+

I was not able to map the above two conditions to the line at 508.


http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@541
PS7, Line 541: return (T)(negative ? -val : val);
nit. This code can not be reached.


http://gerrit.cloudera.org:8080/#/c/17389/7/be/src/util/string-parser.h@584
PS7, Line 584: before '.'
nit. This may not be accurate since the code removes leading '0's before any 
digits.



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 7
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 09 Jul 2021 21:35:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-07-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9065/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 7
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 09 Jul 2021 19:40:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-07-09 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 7:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@463
PS6, Line 463: FAILURE and 0 is retur
> If we can change it to std::string(), we can take advantage of the fast_dou
As discussed in other thread, changing it to std::string will need changes in 
many places even to other StringTo* functions. So we will retain this signature.


http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@498
PS6, Line 498: of remaning string not parsed by li
> We can pass the input std::string as 's' and get rid off 'len',
This may not be required as only long strings will use std::string now and they 
would not need any preprocessing.


http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@538
PS6, Line 538:
 : retur
> The new signature of this function becomes
This may not be required as only long strings will use std::string now and they 
would not need any preprocessing.


http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545
PS6, Line 545: DCHECK(trailing_
> In a message at Jun 3 you mentioned "So I will use VAC for shorter strings
Have made the changes for the same now. For shorter string we will use stack 
and for longer string  we will use std::string  (malloc)


http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@550
PS6, Line 550:
> This would probably prevent "return value optimization", as there would be
This code has changed now. We take care of not making a copy for 
null-terminated string at some other place.


http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@552
PS6, Line 552: (val == std::numeric_limits::infinity())) {
> Do you plan to resolve this comment?
This has been removed.



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 7
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 09 Jul 2021 19:33:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-07-09 Thread Amogh Margoor (Code Review)
Amogh Margoor has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..

IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library

StringToFloatInternal is used to parse string into float. It had logic
to ensure it is faster than standard functions like strtod in many
cases, but it was not as accurate. We are replacing it by a third
party library named fast_double_parser which is both fast and doesn't
sacrifise the accuracy for speed. On benchmarking on more than
1 million rows where string is cast to double, it is found that new
patch is on par with the earlier algorithm.

Results:
W/O library: Fetched 1222386 row(s) in 32.10s
With library: Fetched 1222386 row(s) in 31.71s

Testing:
1. Added test to check for accuracy improvement.
2. Ran existing Backend tests for correctness.

Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
---
M be/src/exprs/expr-test.cc
M be/src/util/string-parser-test.cc
M be/src/util/string-parser.h
M testdata/workloads/functional-query/queries/QueryTest/values.test
4 files changed, 164 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17389/7
--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 7
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-07-07 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

(3 comments)

I am happy with the code as is, but seems like there are still some unresolved 
comment threads in 'string-parser.h'. Can we resolve (or mark as Done / won't 
resolve) each before moving forward?

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545
PS6, Line 545: std::string res;
> Results:
In a message at Jun 3 you mentioned "So I will use VAC for shorter strings to 
avoid malloc and for larger strings we can use malloc + strtod."

Is there still a plan to do VLA for short strings?


http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@550
PS6, Line 550:
> Here before copying the test of the string, make the following check and re
This would probably prevent "return value optimization", as there would be 
multiple return statements with different objects.


http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@552
PS6, Line 552: // string has `move` defined, which would be used instead of 
copy.
> nit: I don't think we need this comment. And I think the situation is even
Do you plan to resolve this comment?



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 07 Jul 2021 08:38:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-07-06 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

+2 Since Zolta has +1.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 06 Jul 2021 22:55:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-07-06 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6: Code-Review+2

Hi Amogh,

Thanks a lot for digging into the code and find out another use case. Yes, I 
agree that we have to pay the construction cost if the string version is 
constructed and passed to the new code. In that case, let us just pay the cost 
in the new code.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 06 Jul 2021 22:52:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-07-05 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

> (4 comments)
 >
 > It is great to know that Impala can achieve 926 MB/s conversion
 > rate and very attempting to get the best from fast_double_parser():-)
 >
 > The key is not to populate a new std::string when the original
 > input conforms to the requirements of the library (well formed
 > null-terminated string via string::c_str() in constant speed),
 > which should be true in most cases.
 >
 > Throughout the code base of Impala, I was able to find only the
 > following call that needs the service of converting string to
 > double which makes the above idea feasible.
 >
 > 346 static bool ParseProbability(const string& prob_str, bool*
 > should_execute) {
 > 347   StringParser::ParseResult parse_result;
 > 348   double probability = StringParser::StringToFloat(
 > 349   prob_str.c_str(), prob_str.size(), &parse_result);
 > 350   if (parse_result != StringParser::PARSE_SUCCESS ||
 > 351   probability < 0.0 || probability > 1.0) {
 > 352 return false;
 > 353   }
 > 354   // +1L ensures probability of 0.0 and 1.0 work as expected.
 > 355   *should_execute = rand() < probability * (RAND_MAX + 1L);
 > 356   return true;
 > 357 }

Hi Qifan, I got late to the comment. So the other important code path which can 
lead to non-null terminated strings are due to the cast: 'select cast("0.454" 
as double)' or 'select cast(x as double) from foo' etc. The code path will pass 
through CastFunctions::CastToDoubleVal generated via Macro:
#define CAST_FROM_STRING(num_type, native_type, string_parser_fn) \
  num_type CastFunctions::CastTo##num_type(FunctionContext* ctx, const 
StringVal& val) { \
if (val.is_null) return num_type::null(); \
StringParser::ParseResult result; \
num_type ret; \
ret.val = StringParser::string_parser_fn( \
reinterpret_cast(val.ptr), val.len, &result); \
if (UNLIKELY(result != StringParser::PARSE_SUCCESS)) return 
num_type::null(); \
return ret; \
  }

this code can probably be frequently used based on usage of cast by 
client/customer. But the point you are making is valid that well formed 
null-terminated string need no extra processing and should directly be passed 
to library function.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 05 Jul 2021 14:34:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-06-03 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

(4 comments)

It is great to know that Impala can achieve 926 MB/s conversion rate and very 
attempting to get the best from fast_double_parser():-)

The key is not to populate a new std::string when the original input conforms 
to the requirements of the library (well formed null-terminated string via 
string::c_str() in constant speed), which should be true in most cases.

Throughout the code base of Impala, I was able to find only the following call 
that needs the service of converting string to double which makes the above 
idea feasible.

346 static bool ParseProbability(const string& prob_str, bool* should_execute) 
{   
347   StringParser::ParseResult parse_result; 
348   double probability = StringParser::StringToFloat(
349   prob_str.c_str(), prob_str.size(), &parse_result);
 
350   if (parse_result != StringParser::PARSE_SUCCESS ||  
351   probability < 0.0 || probability > 1.0) {
352 return false;   
   
353   }   
354   // +1L ensures probability of 0.0 and 1.0 work as expected.
355   *should_execute = rand() < probability * (RAND_MAX + 1L);
356   return true;
357 }

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@463
PS6, Line 463: const char* s, int len
If we can change it to std::string(), we can take advantage of the 
fast_double_parser to the fullest.


http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@498
PS6, Line 498: StringToFloatPreprocess(s, i, len);
We can pass the input std::string as 's' and get rid off 'len',

On return, if processed_str.length() is 0, then the original input confirms to 
fast_double_parser() and we can make use of the original directly.


http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@538
PS6, Line 538: const char* s, int offset,
 :   int len
The new signature of this function becomes

static inline std::string StringToFloatPreprocess(const string& s, int offset)


http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@550
PS6, Line 550:
Here before copying the test of the string, make the following check and return 
an empty string object to signal that the original input is good to go with 
fast_double_parser.

if (i==offset && res.length() == 0) {
  return string();
}



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 03 Jun 2021 17:55:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-06-03 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

> I think it would be nice to see a comparision between
 > fast_double_parser and strtod() for short strings.
 >
 > If we can measure a significant benefit for fast_double_parser we
 > can use it for short strings with a VLA, and use strtod() for long
 > strings with heap allocation (in this case we don't need to
 > preprocess the string).
 >
 > If we can't measure a difference between the two then maybe we
 > should just use strtod() since it's more robust and doesn't need
 > any preprocessing of the input.

There was no observable difference found when running through Impala as that 
function would be small fraction of entire runtime. However I benchmarked the 
individual functions and also added malloc and VAC operations to check the 
difference in the benchmark. This is the result:

=== trial 1 ===
fast_double_parser_malloc  397.00 MB/s
fast_double_parser_VAC  738.19 MB/s
fast_double_parser  926.76 MB/s
strtod  110.63 MB/s


=== trial 2 ===
fast_double_parser_malloc  395.90 MB/s
fast_double_parser_VAC  694.98 MB/s
fast_double_parser  926.33 MB/s
strtod  110.42 MB/s

So we see that malloc and strtod are significantly slower and in bigger data 
sets it may start showing difference. tcmalloc however might be faster than 
above but still will incur some penalty. So I will use VAC for shorter strings 
to avoid malloc and for larger strings we can use malloc + strtod.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 03 Jun 2021 12:31:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-27 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

I think it would be nice to see a comparision between fast_double_parser and 
strtod() for short strings.

If we can measure a significant benefit for fast_double_parser we can use it 
for short strings with a VLA, and use strtod() for long strings with heap 
allocation (in this case we don't need to preprocess the string).

If we can't measure a difference between the two then maybe we should just use 
strtod() since it's more robust and doesn't need any preprocessing of the input.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 27 May 2021 10:31:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-26 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

> Agree that the performance numbers are OK WRT strings of 20 or 30
 > bytes long.
 >
 > Would it be possible to make another performance by calling
 > strtod() instead?
 >
 > According to the performance numbers documented here
 > (https://github.com/lemire/fast_double_parser), fast_double is
 > about 9.4x faster than strtod, tested with Apple Clang.
 >
 > I just curious how fast fast_double is in Impala, compared to
 > strtod().

So for both 20 and 30 length string fast_double_parser also uses strtod so not 
any noticeable difference observed. On checking code, they specify that more 
than 19 digits will generally cause overflow and are highly unlikely to occur 
so they just send it to strtod(). Just that even for strtod we need a null 
terminated string. So I think short string optimisation should work well here 
as longer strings are not likely.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 26 May 2021 17:58:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-23 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

Agree that the performance numbers are OK WRT strings of 20 or 30 bytes long.

Would it be possible to make another performance by calling strtod() instead?

According to the performance numbers documented here 
(https://github.com/lemire/fast_double_parser), fast_double is about 9.4x 
faster than strtod, tested with Apple Clang.

I just curious how fast fast_double is in Impala, compared to strtod().


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 24 May 2021 01:05:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-21 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545
PS6, Line 545: std::string res;
> Original input string in StringToFloatInternal() is not null-terminated str
Results:

3 Runs with 20 length string:


With std::string
Fetched 110 row(s) in 30.54s
Fetched 110 row(s) in 30.68s
Fetched 110 row(s) in 30.62s


With VLA:
Fetched 110 row(s) in 30.60s
Fetched 110 row(s) in 30.64s
Fetched 110 row(s) in 30.56s



3 Runs with 30 length string:
-

With std::string
Fetched 110 row(s) in 33.85s
Fetched 110 row(s) in 33.89s
Fetched 110 row(s) in 34.19s

With VLA:
Fetched 110 row(s) in 33.78s
Fetched 110 row(s) in 33.71s
Fetched 110 row(s) in 33.57s

Summary:
I think we are fine with performance. In real system hopefully we don't see too 
many tcmalloc contention due to this. Couple of things will help us avoid 
malloc in current scenario:
1. Strings less than 16 length due to Small string optimisation.
2. Based on Qifan's suggestion, we can avoid creating copy of string for some 
cases where we have non-integer terminated string as original input.

Other alternative (also suggested by Zoltan) is to do VLA for string of less 
than 256 bytes and above that we can use strtod.

I am ok with either approach.



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 21 May 2021 21:37:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-21 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545
PS6, Line 545: std::string res;
> Stack overflow is sensitive to the depth of the stack and the size of the c
Original input string in StringToFloatInternal() is not null-terminated string 
and was figured out during testing. That is also one of the reason for the 
copying things into null-terminated string.
I am checking perf penalty with a million casts requiring heap allocation. if 
we see perf issues, we can use VLA for shorter strings and heap for longer ones.



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 21 May 2021 16:02:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-21 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545
PS6, Line 545: std::string res;
> Yeah, I think it'd be good to see measurements for longer float literals. W
Stack overflow is sensitive to the depth of the stack and the size of the 
current frame.

Assume in most cases, the input string conforms to the format that the fast 
double parser library can handle properly, then we may not need to build the 
string object at all.

Thus, we can use the following form of signature:

inline bool stringToFloatPreprocess(char*, int offset, int len, string* 
preprocessed_result);

We return string (in preprocessed_result) only when the input does not conform.

In good case, we do not construct a string object and we can feed the original 
input string in StringToFloatInternal() to the fast parser library, if the 
input string is null-terminated. In seems this is the case as the only use of 
the StringToFloatInternal() is called indirectly here throughout Impala BE.

343 static bool ParseProbability(const string& prob_str, bool* should_execute) {
344   StringParser::ParseResult parse_result;  
345   double probability = StringParser::StringToFloat(
346   prob_str.c_str(), prob_str.size(), &parse_result);
347   if (parse_result != StringParser::PARSE_SUCCESS ||   
348   probability < 0.0 || probability > 1.0) {   
349 return false;
350   } 
   
351   // +1L ensures probability of 0.0 and 1.0 work as expected.
352   *should_execute = rand() < probability * (RAND_MAX + 1L);
353   return true;  
   
354 }



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 21 May 2021 15:27:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545
PS6, Line 545: std::string res;
> I had checked Small String Optimization and had read till 24 bytes it would
Yeah, I think it'd be good to see measurements for longer float literals. We 
are using TCMalloc in Impala which should be fairly performant so it's possible 
that the perf is not too bad.

We can also have an 'if' in the code. If strlen > 1 KiB use strtod(), otherwise 
use fast_double_parser with a Variable-length array, or a fixed-sized 1 KiB 
array?



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 21 May 2021 11:24:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-21 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545
PS6, Line 545: std::string res;
> I wonder if the performance numbers are still the same, given that now we n
I had checked Small String Optimization and had read till 24 bytes it would not 
do allocation which appeared reasonable. But i rechecked after your comment and 
24 bytes seem to be only for clang. I wrote a small program to check for our 
g++ and it avoids allocation till 15 bytes only. Program:

#include 
#include 
#include 

// replace operator new and delete to log allocations
void* operator new(std::size_t n) {
std::cout << "[Allocating " << n << " bytes]";
return malloc(n);
}
void operator delete(void* p) throw() {
free(p);
}

int main() {
for (size_t i = 0; i < 30; ++i) {
std::cout << i << ": " << std::string(i, '=') << std::endl;
}
}

is 15 good enough length ? otherwise dynamic allocation will make things slow. 
My assumption is input won't be long enough to cause stack overflow. I am not 
sure if we have some limitation on the string being passed to StringToFloat.



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 21 May 2021 10:49:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

(2 comments)

Thanks for the changes!

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545
PS6, Line 545: std::string res;
I wonder if the performance numbers are still the same, given that now we need 
to dynamically allocate memory from the heap. (Though for short strings the 
standard library implementation being used avoids allocation by using Small 
String Optimization)


http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@552
PS6, Line 552: // string has `move` defined, which would be used instead of 
copy.
nit: I don't think we need this comment. And I think the situation is even 
better as in this case Named Return Value Optimization kicks in which even 
avoids moving:
https://en.cppreference.com/w/cpp/language/copy_elision



-- 
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 21 May 2021 09:29:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-20 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6: Code-Review+1

(4 comments)

Looks great!

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@538
PS6, Line 538: const char* s, int offset,
 :   int le
nit.

Maybe we could use two arguments (instead of three): char* start and char* end, 
where 'start' is s + offset, and 'end' is 'start' + len  -1 (inclusive) on 
entry into this function.

In the body, we set char* pos = start, and work with *pos and *(pos+1).

In this way, we increment on pos only to avoid s+i in several places.


http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@493
PS5, Line 493: ng decimal
> Thats a good catch. I have changed it to string and code is much simpler no
ack :-).


http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@501
PS5, Line 501:
> 'pos' is not going to be 'loop constant' which can be pulled out as it need
I see. See my comment to the new version of the code.


http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@524
PS5, Line 524: esult = PARSE_OVERFLOW;
> Actually, when function underflows it doesn't return NaN. It returns 0 inst
Done



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 21 May 2021 01:16:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8764/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 20 May 2021 22:45:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-20 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17389/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17389/5//COMMIT_MSG@12
PS5, Line 12: doesn't
: sacrifise the accuracy for speed.
> Could you please add your measurements here as well?
Done.


http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@493
PS5, Line 493: ng decimal
> When len is super long, c_str may overflow the stack. Maybe std::string or
Thats a good catch. I have changed it to string and code is much simpler now.


http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@501
PS5, Line 501:
> (s+i) is repeatedly computed in this loop. May precompute it as pos=s+i in
'pos' is not going to be 'loop constant' which can be pulled out as it needs to 
be incremented along with 'i'. So keeping just 1 counter for loop instead of 2.


http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@501
PS5, Line 501:
 : if (UNLIKELY(s_end == nullptr)) {
 :   // Determine if it is an overflow case
 :   if (UNLIKELY(!std::isfinite(val))) {
 : *result = PARSE_OVERFLOW;
 :   } else if (UNLIKELY(val == 0)) {
 : *result = PARSE_UNDERFLOW;
 :   } else {
 : *result = PARSE_FAILURE;
 : return 0;
 :   }
 :   return (T)(negative ? -val : val);
 : }
 :
 : // check if trailing characters are present and are all 
white spaces
 : int trailing_len = processed_str.length() - (int)(s_end - 
c_str);
 : i
> Maybe we could put this into its own function, stg like 'NormalizeFloatStri
Done...added a new private function and test for it.


http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@524
PS5, Line 524: esult = PARSE_OVERFLOW;
> Should we also test std::isnan(val) and set PARSE_UNDERFLOW flag here?
Actually, when function underflows it doesn't return NaN. It returns 0 instead. 
Handled that case.



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 20 May 2021 22:30:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-20 Thread Amogh Margoor (Code Review)
Amogh Margoor has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..

IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library

StringToFloatInternal is used to parse string into float. It had logic
to ensure it is faster than standard functions like strtod in many
cases, but it was not as accurate. We are replacing it by a third
party library named fast_double_parser which is both fast and doesn't
sacrifise the accuracy for speed. On benchmarking on more than
1 million rows where string is cast to double, it is found that new
patch is on par with the earlier algorithm. Results:
W/O library: Fetched 1222386 row(s) in 32.10s
With library: Fetched 1222386 row(s) in 31.71s

Testing:
1. Added test to check for accuracy improvement.
2. Ran existing Backend tests for correctness.

Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
---
M be/src/exprs/expr-test.cc
M be/src/util/string-parser-test.cc
M be/src/util/string-parser.h
M testdata/workloads/functional-query/queries/QueryTest/values.test
4 files changed, 122 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17389/6
--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-19 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 5:

(3 comments)

Looks great!

http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@493
PS5, Line 493: len - i + 2
When len is super long, c_str may overflow the stack. Maybe std::string or some 
other buffer class can be used instead.


http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@501
PS5, Line 501: s + i
(s+i) is repeatedly computed in this loop. May precompute it as pos=s+i in 
advance.


http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@524
PS5, Line 524: *result = PARSE_OVERFLOW;
Should we also test std::isnan(val) and set PARSE_UNDERFLOW flag here?



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 May 2021 15:42:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-19 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 5: Code-Review+1

(2 comments)

It's great to hear that the library has great perf. LGTM, only had smaller nits.

http://gerrit.cloudera.org:8080/#/c/17389/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17389/5//COMMIT_MSG@12
PS5, Line 12: doesn't
: sacrifise the accuracy for speed.
Could you please add your measurements here as well?


http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@501
PS5, Line 501: while(i + 1 < len && *(s + i) == '0' &&
 : fast_double_parser::is_integer(*(s + i + 1))) {
 :   i++;
 : }
 :
 : // Library doesn't handle strings starting with '.' like 
'.56' or '-.56'.
 : // Preprocessing such strings by prefixing it with '0'
 : if (*(s + i) == '.') {
 :   c_len = len - i + 1;
 :   c_str[0] = '0'; // Prefix '0'
 :   memcpy(c_str + 1, s + i, c_len - 1);
 :   c_str[c_len] = '\0';
 : } else {
 :   c_len = len - i;
 :   memcpy(c_str, s + i, c_len);
 :   c_str[c_len] = '\0';
 : }
Maybe we could put this into its own function, stg like 
'NormalizeFloatString()' and add a few unit tests for it in 
string-parser-test.cc.



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 May 2021 14:14:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-18 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 5:

Benchmarking:
Tested it on more than million rows casted from string to double.
Results shows both algorithms took almost same time:

W/O library: Fetched 1222386 row(s) in 32.10s
With library: Fetched 1222386 row(s) in 31.71s

With library timing is slightly better around half a second but it can be a 
noise.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 May 2021 21:17:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8741/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 May 2021 15:27:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-18 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17389/4/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/4/be/src/util/string-parser.h@495
PS4, Line 495: // Library function doesn't handle leading 0s like 
'1','-.9','-001'
> What do you mean by 'doesn't handle'? Falls back to strtod(), returns nullp
It would return nullptr. I added that as a part of comment now. Exhaustive test 
for all these cases already exist in string-parser-test.cc and these cases were 
identified through them.



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 May 2021 15:05:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-18 Thread Amogh Margoor (Code Review)
Amogh Margoor has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..

IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library

StringToFloatInternal is used to parse string into float. It had logic
to ensure it is faster than standard functions like strtod in many
cases, but it was not as accurate. We are replacing it by a third
party library named fast_double_parser which is both fast and doesn't
sacrifise the accuracy for speed.

Testing:
1. Added test to check for accuracy improvement.
2. Ran existing Backend tests for correctness.

Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
---
M be/src/exprs/expr-test.cc
M be/src/util/string-parser-test.cc
M be/src/util/string-parser.h
M testdata/workloads/functional-query/queries/QueryTest/values.test
4 files changed, 73 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17389/5
--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-13 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17389/4/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/4/be/src/util/string-parser.h@495
PS4, Line 495: // Library function doesn't handle leading 0s like 
'1','-.9','-001'
What do you mean by 'doesn't handle'? Falls back to strtod(), returns nullptr, 
or returns wrong results?

Could you please add tests for such values?



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 4
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 May 2021 11:11:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8710/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 4
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 12 May 2021 22:29:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-12 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17389/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17389/3//COMMIT_MSG@10
PS3, Line 10: strto
> strtod
done.


http://gerrit.cloudera.org:8080/#/c/17389/3//COMMIT_MSG@13
PS3, Line 13: accuracy
> accuracy
done.



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 4
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 12 May 2021 22:08:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-12 Thread Amogh Margoor (Code Review)
Amogh Margoor has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..

IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library

StringToFloatInternal is used to parse string into float. It had logic
to ensure it is faster than standard functions like strtod in many
cases, but it was not as accurate. We are replacing it by a third
party library named fast_double_parser which is both fast and doesn't
sacrifise the accuracy for speed.

Testing:
1. Added test to check for accuracy improvement.
2. Ran existing Backend tests for correctness.

Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
---
M be/src/exprs/expr-test.cc
M be/src/util/string-parser-test.cc
M be/src/util/string-parser.h
M testdata/workloads/functional-query/queries/QueryTest/values.test
4 files changed, 72 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17389/4
--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 4
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-12 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 3: Code-Review+1

(2 comments)

LGTM!

http://gerrit.cloudera.org:8080/#/c/17389/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17389/3//COMMIT_MSG@10
PS3, Line 10: strod
strtod


http://gerrit.cloudera.org:8080/#/c/17389/3//COMMIT_MSG@13
PS3, Line 13: accurancy
accuracy



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 3
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 12 May 2021 16:08:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8700/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 3
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 11 May 2021 11:59:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-11 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17389/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17389/2//COMMIT_MSG@13
PS2, Line 13: speed
> It would be nice to have some measurements, for example by adding a large n
Sure, will try to measure it. Theoretically, this should not improve 
performance but just improve the accuracy without perf penalty.


http://gerrit.cloudera.org:8080/#/c/17389/2//COMMIT_MSG@17
PS2, Line 17: 2. Ran existing Backend tests for correctness.
> Can you add some tests to https://github.com/apache/impala/blob/master/be/s
I have added them now. Thanks for the suggestion.



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 3
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 11 May 2021 11:44:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-11 Thread Amogh Margoor (Code Review)
Amogh Margoor has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..

IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library

StringToFloatInternal is used to parse string into float. It had logic
to ensure it is faster than standard functions like strod in many
cases, but it was not as accurate. We are replacing it by a third
party library named fast_double_parser which is both fast and doesn't
sacrifise the accurancy for speed.

Testing:
1. Added test to check for accuracy improvement.
2. Ran existing Backend tests for correctness.

Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
---
M be/src/exprs/expr-test.cc
M be/src/util/string-parser.h
M testdata/workloads/functional-query/queries/QueryTest/values.test
3 files changed, 39 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17389/3
--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 3
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-04 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17389/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17389/2//COMMIT_MSG@13
PS2, Line 13: speed
It would be nice to have some measurements, for example by adding a large 
number of casts to a query with many rows so that the cost of casts become 
measurable.


http://gerrit.cloudera.org:8080/#/c/17389/2//COMMIT_MSG@17
PS2, Line 17: 2. Ran existing Backend tests for correctness.
Can you add some tests to 
https://github.com/apache/impala/blob/master/be/src/exprs/expr-test.cc#L3221 ?

Having some EE tests is useful, but expression are more widely covered in 
expr-test.cc



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 2
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 04 May 2021 13:16:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-04 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 2:

FYI: There are few test failures i saw in pre-review job which i would need to 
fix.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 2
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 04 May 2021 10:42:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8679/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 2
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 03 May 2021 18:13:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-03 Thread Amogh Margoor (Code Review)
Amogh Margoor has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17389


Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..

IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library

StringToFloatInternal is used to parse string into float. It had logic
to ensure it is faster than standard functions like strod in many
cases, but it was not as accurate. We are replacing it by a third
party library named fast_double_parser which is both fast and doesn't
sacrifise the accurancy for speed.

Testing:
1. Added test to check for accuracy improvement.
2. Ran existing Backend tests for correctness.

Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
---
M be/src/util/string-parser.h
M testdata/workloads/functional-query/queries/QueryTest/values.test
2 files changed, 24 insertions(+), 67 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17389/2
--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 2
Gerrit-Owner: Amogh Margoor