Jason Fehr has posted comments on this change. ( http://gerrit.cloudera.org:8080/21038 )
Change subject: IMPALA-12426: New built-in prettyprint functions for bytes and duration. ...................................................................... Patch Set 4: (4 comments) Appreciate all the comments. http://gerrit.cloudera.org:8080/#/c/21038/4/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/21038/4/be/src/exprs/string-functions-ir.cc@1894 PS4, Line 1894: bool is_null, const T& val > nit: 'is_null' does not seem necessary to pass as a separate parameter. We Done Originally this function supported DecimalVal since the prettyprint functions accepted decimals as input, but I later modified the prettyprint functions to only take integers. I modified this function to only take a val input which must have is_null and val members. http://gerrit.cloudera.org:8080/#/c/21038/4/be/src/exprs/string-functions-ir.cc@1902 PS4, Line 1902: StringVal result(context, fmt_str.size()); : if (UNLIKELY(result.is_null)) return StringVal::null(); : uint8_t* ptr = result.ptr; : memcpy(ptr, fmt_str.c_str(), fmt_str.size()); > nit: It seems that these lines of code can be replaced with StringVal::Copy These lines of code perform the same functionality as StringVal::CopyFrom, but this particular implementation was copied from other instances were build-in functions need to allocate additional memory to store a string result and is much more common than calling StringVal::CopyFrom. I dug into the differences, and this implementation should run faster since it uses memcpy instead of std::copy. Since built-in functions have the potential to be called billions of time during execution of a single query, even minor speed improvements can add up to huge differences in query execution time. TL;DR - this implementation is faster, less safe, and repeats code, but for this use case, that is the better implementation. I'm open to changing my mind though if there are other reasons I am not seeing why StringVal::CopyFrom is the better choice. http://gerrit.cloudera.org:8080/#/c/21038/4/docs/topics/impala_string_functions.xml File docs/topics/impala_string_functions.xml: http://gerrit.cloudera.org:8080/#/c/21038/4/docs/topics/impala_string_functions.xml@1183 PS4, Line 1183: PRETTYPRINT_MEMORY > nit: Maybe we can rename it to 'prettyprint_size' or 'prettyprint_bytes'? Done I changed the name to prettyprint_bytes. http://gerrit.cloudera.org:8080/#/c/21038/4/testdata/workloads/functional-query/queries/QueryTest/prettyprint-duration.test File testdata/workloads/functional-query/queries/QueryTest/prettyprint-duration.test: http://gerrit.cloudera.org:8080/#/c/21038/4/testdata/workloads/functional-query/queries/QueryTest/prettyprint-duration.test@1 PS4, Line 1: ==== > Is there corresponding .py code that calls the new tests? Done There is now. I was under the impression that all *.test files were automatically picked up and executed. -- To view, visit http://gerrit.cloudera.org:8080/21038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3e76632ce21ad2ca5df474160338699a542a6913 Gerrit-Change-Number: 21038 Gerrit-PatchSet: 4 Gerrit-Owner: Jason Fehr <jf...@cloudera.com> Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Yifan Zhang <chinazhangyi...@163.com> Gerrit-Reviewer: Zihao Ye <eyiz...@163.com> Gerrit-Comment-Date: Tue, 20 Feb 2024 20:28:45 +0000 Gerrit-HasComments: Yes