Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16908 )

Change subject: IMPALA-2019(Part-1): Provide UTF-8 support in length, substring 
and reverse functions
......................................................................


Patch Set 9:

(3 comments)

LGTM, I wanted a few more tests and a comment but otherwise I'm ready to +2

http://gerrit.cloudera.org:8080/#/c/16908/9/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/16908/9/be/src/exprs/expr-test.cc@10542
PS9, Line 10542: TEST_P(ExprTest, Utf8Test) {
Some of the characters below are > 2 bytes right? It would be helpful to add a 
comment mentioning the # of bytes in the characters you used.


http://gerrit.cloudera.org:8080/#/c/16908/9/be/src/exprs/expr-test.cc@10596
PS9, Line 10596:   TestStringValue("utf8_reverse('mañana')", "anañam");
Can we add tests for a couple of grapheme clusters where we're reversing the 
codepoints instead of the clusters. There are a couple of examples here - 
https://exploringjs.com/impatient-js/ch_unicode.html#grapheme-clusters-the-real-characters.

I tried them in impala-shell and it seems to have the expected behavior - 
https://drive.google.com/file/d/1iPXFAOtkRE5OPc014i6hARfig7W8xmpw/view?usp=sharing


http://gerrit.cloudera.org:8080/#/c/16908/9/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16908/9/be/src/exprs/string-functions-ir.cc@496
PS9, Line 496: StringVal StringFunctions::Utf8Reverse(FunctionContext* context, 
const StringVal& str) {
Comment that this reverses codepoints only, and that's consistent with other 
systems.



--
To view, visit http://gerrit.cloudera.org:8080/16908
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0aaf3544e89f8a3d531ad6afe056b3658b525b7c
Gerrit-Change-Number: 16908
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Jan 2021 20:31:03 +0000
Gerrit-HasComments: Yes

Reply via email to