Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17785 )

Change subject: IMPALA-2019(part-4): Add UTF-8 support for case conversion 
functions
......................................................................


Patch Set 6:

(7 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/17785/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17785/6//COMMIT_MSG@9
PS6, Line 9: string functions doing case conversion
nit. case conversion string functions


http://gerrit.cloudera.org:8080/#/c/17785/6//COMMIT_MSG@11
PS6, Line 11: unicode
nit Unicode


http://gerrit.cloudera.org:8080/#/c/17785/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17785/6/be/src/exprs/string-functions-ir.cc@366
PS6, Line 366: if (wc_bytes == 0) break;
Per https://en.cppreference.com/w/cpp/string/multibyte/mbtowc, on  return value:

If s is not a null pointer, returns the number of bytes that are contained in 
the multibyte character or -1 if the first bytes pointed to by s do not form a 
valid multibyte character or ?0? if s is pointing at the null charcter '\0'.

Since '\0' is valid in Unicode, we should continue instead of break.


http://gerrit.cloudera.org:8080/#/c/17785/6/be/src/exprs/string-functions-ir.cc@370
PS6, Line 370: ;
nit. 0xFFFD.


http://gerrit.cloudera.org:8080/#/c/17785/6/be/src/exprs/string-functions-ir.cc@371
PS6, Line 371:       wc_bytes = 3;
nit. should it be 4?


http://gerrit.cloudera.org:8080/#/c/17785/6/be/src/exprs/string-functions-ir.cc@408
PS6, Line 408: iswspace
nit. UNLIKELY()


http://gerrit.cloudera.org:8080/#/c/17785/6/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test
File 
testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test:

http://gerrit.cloudera.org:8080/#/c/17785/6/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test@183
PS6, Line 183: ---- QUERY
             : set utf8_mode=false;
             : select upper('abcd áäèü'), lower('ABCD ÁÄÈÜ'), initcap('abcd 
áäèü ABCD ÁÄÈÜ');
             : ---- RESULTS: RAW_STRING
             : 'ABCD áäèü','abcd ÁÄÈÜ','Abcd áäèü Abcd ÁÄÈÜ'
             : ---- TYPES
             : STRING,STRING,STRING
             : ====
             : ---- QUERY
             : set utf8_mode=true;
             : select upper('abcd áäèü'), lower('ABCD ÁÄÈÜ'), initcap('abcd 
áäèü ABCD ÁÄÈÜ');
             : ---- RESULTS: RAW_STRING
             : 'ABCD ÁÄÈÜ','abcd áäèü','Abcd Áäèü Abcd Áäèü'
             : ---- TYPES
             : STRING,STRING,STRING
             : ====
good tests.



--
To view, visit http://gerrit.cloudera.org:8080/17785
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I443e89d46f4638ce85664b021666bc4f03ee8abd
Gerrit-Change-Number: 17785
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Comment-Date: Fri, 04 Feb 2022 18:50:47 +0000
Gerrit-HasComments: Yes

Reply via email to