Jason Fehr has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21038 )

Change subject: IMPALA-12426: New built-in prettyprint functions for bytes and 
duration.
......................................................................


Patch Set 4:

(4 comments)

Appreciate all the comments.

http://gerrit.cloudera.org:8080/#/c/21038/4/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/21038/4/be/src/exprs/string-functions-ir.cc@1894
PS4, Line 1894: bool is_null, const T& val
> nit: 'is_null' does not seem necessary to pass as a separate parameter. We
Done

Originally this function supported DecimalVal since the prettyprint functions 
accepted decimals as input, but I later modified the prettyprint functions to 
only take integers.  I modified this function to only take a val input which 
must have is_null and val members.


http://gerrit.cloudera.org:8080/#/c/21038/4/be/src/exprs/string-functions-ir.cc@1902
PS4, Line 1902: StringVal result(context, fmt_str.size());
              :   if (UNLIKELY(result.is_null)) return StringVal::null();
              :   uint8_t* ptr = result.ptr;
              :   memcpy(ptr, fmt_str.c_str(), fmt_str.size());
> nit: It seems that these lines of code can be replaced with StringVal::Copy
These lines of code perform the same functionality as StringVal::CopyFrom, but 
this particular implementation was copied from other instances were build-in 
functions need to allocate additional memory to store a string result and is 
much more common than calling StringVal::CopyFrom.  I dug into the differences, 
and this implementation should run faster since it uses memcpy instead of 
std::copy.  Since built-in functions have the potential to be called billions 
of time during execution of a single query, even minor speed improvements can 
add up to huge differences in query execution time.

TL;DR - this implementation is faster, less safe, and repeats code, but for 
this use case, that is the better implementation.

I'm open to changing my mind though if there are other reasons I am not seeing 
why StringVal::CopyFrom is the better choice.


http://gerrit.cloudera.org:8080/#/c/21038/4/docs/topics/impala_string_functions.xml
File docs/topics/impala_string_functions.xml:

http://gerrit.cloudera.org:8080/#/c/21038/4/docs/topics/impala_string_functions.xml@1183
PS4, Line 1183: PRETTYPRINT_MEMORY
> nit: Maybe we can rename it to 'prettyprint_size' or 'prettyprint_bytes'?
Done

I changed the name to prettyprint_bytes.


http://gerrit.cloudera.org:8080/#/c/21038/4/testdata/workloads/functional-query/queries/QueryTest/prettyprint-duration.test
File 
testdata/workloads/functional-query/queries/QueryTest/prettyprint-duration.test:

http://gerrit.cloudera.org:8080/#/c/21038/4/testdata/workloads/functional-query/queries/QueryTest/prettyprint-duration.test@1
PS4, Line 1: ====
> Is there corresponding .py code that calls the new tests?
Done

There is now.  I was under the impression that all *.test files were 
automatically picked up and executed.



--
To view, visit http://gerrit.cloudera.org:8080/21038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e76632ce21ad2ca5df474160338699a542a6913
Gerrit-Change-Number: 21038
Gerrit-PatchSet: 4
Gerrit-Owner: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Yifan Zhang <chinazhangyi...@163.com>
Gerrit-Reviewer: Zihao Ye <eyiz...@163.com>
Gerrit-Comment-Date: Tue, 20 Feb 2024 20:28:45 +0000
Gerrit-HasComments: Yes

Reply via email to