Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16908 )
Change subject: IMPALA-2019(Part-1): Provide UTF-8 support in length, substring and reverse functions ...................................................................... Patch Set 9: (3 comments) LGTM, I wanted a few more tests and a comment but otherwise I'm ready to +2 http://gerrit.cloudera.org:8080/#/c/16908/9/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/16908/9/be/src/exprs/expr-test.cc@10542 PS9, Line 10542: TEST_P(ExprTest, Utf8Test) { Some of the characters below are > 2 bytes right? It would be helpful to add a comment mentioning the # of bytes in the characters you used. http://gerrit.cloudera.org:8080/#/c/16908/9/be/src/exprs/expr-test.cc@10596 PS9, Line 10596: TestStringValue("utf8_reverse('mañana')", "anañam"); Can we add tests for a couple of grapheme clusters where we're reversing the codepoints instead of the clusters. There are a couple of examples here - https://exploringjs.com/impatient-js/ch_unicode.html#grapheme-clusters-the-real-characters. I tried them in impala-shell and it seems to have the expected behavior - https://drive.google.com/file/d/1iPXFAOtkRE5OPc014i6hARfig7W8xmpw/view?usp=sharing http://gerrit.cloudera.org:8080/#/c/16908/9/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16908/9/be/src/exprs/string-functions-ir.cc@496 PS9, Line 496: StringVal StringFunctions::Utf8Reverse(FunctionContext* context, const StringVal& str) { Comment that this reverses codepoints only, and that's consistent with other systems. -- To view, visit http://gerrit.cloudera.org:8080/16908 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0aaf3544e89f8a3d531ad6afe056b3658b525b7c Gerrit-Change-Number: 16908 Gerrit-PatchSet: 9 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Thu, 21 Jan 2021 20:31:03 +0000 Gerrit-HasComments: Yes