Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16908 )
Change subject: IMPALA-2019(Part-1): Provide UTF-8 support in length, substring and reverse functions ...................................................................... Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/16908/7/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16908/7/be/src/exprs/string-functions-ir.cc@498 PS7, Line 498: StringVal StringFunctions::Utf8Reverse(FunctionContext* context, const StringVal& str) { We might need to be careful with reverse, cause I think reversing the unicode characters directly isn't quite correct for characters that are combined together. https://github.com/mbrubeck/unicode-reverse/blob/master/src/lib.rs https://unicode.org/reports/tr29/#Default_Grapheme_Cluster_Table http://gerrit.cloudera.org:8080/#/c/16908/7/fe/src/main/java/org/apache/impala/catalog/ScalarType.java File fe/src/main/java/org/apache/impala/catalog/ScalarType.java: http://gerrit.cloudera.org:8080/#/c/16908/7/fe/src/main/java/org/apache/impala/catalog/ScalarType.java@47 PS7, Line 47: private boolean isUtf8_ = false; I was thinking about how to think about this. I think isUtf8_ == false can mean: * The expression has legacy string semantics * The expression's behavior is the same with or without utf-8 semantics. If you agree, we should add that to this comment. -- To view, visit http://gerrit.cloudera.org:8080/16908 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0aaf3544e89f8a3d531ad6afe056b3658b525b7c Gerrit-Change-Number: 16908 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Fri, 08 Jan 2021 17:57:30 +0000 Gerrit-HasComments: Yes