Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16908 )

Change subject: IMPALA-2019(Part-1): Provide UTF-8 support in length, substring 
and reverse functions
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16908/7/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16908/7/be/src/exprs/string-functions-ir.cc@498
PS7, Line 498: StringVal StringFunctions::Utf8Reverse(FunctionContext* context, 
const StringVal& str) {
We might need to be careful with reverse, cause I think reversing the unicode 
characters directly isn't quite correct for characters that are combined 
together.

https://github.com/mbrubeck/unicode-reverse/blob/master/src/lib.rs

https://unicode.org/reports/tr29/#Default_Grapheme_Cluster_Table


http://gerrit.cloudera.org:8080/#/c/16908/7/fe/src/main/java/org/apache/impala/catalog/ScalarType.java
File fe/src/main/java/org/apache/impala/catalog/ScalarType.java:

http://gerrit.cloudera.org:8080/#/c/16908/7/fe/src/main/java/org/apache/impala/catalog/ScalarType.java@47
PS7, Line 47:   private boolean isUtf8_ = false;
I was thinking about how to think about this. I think isUtf8_ == false can mean:

* The expression has legacy string semantics
* The expression's behavior is the same with or without utf-8 semantics.

If you agree, we should add that to this comment.



--
To view, visit http://gerrit.cloudera.org:8080/16908
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0aaf3544e89f8a3d531ad6afe056b3658b525b7c
Gerrit-Change-Number: 16908
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Jan 2021 17:57:30 +0000
Gerrit-HasComments: Yes

Reply via email to