Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17785 )
Change subject: IMPALA-2019(part-4): Add UTF-8 support for case conversion functions ...................................................................... Patch Set 6: (7 comments) Looks good! http://gerrit.cloudera.org:8080/#/c/17785/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17785/6//COMMIT_MSG@9 PS6, Line 9: string functions doing case conversion nit. case conversion string functions http://gerrit.cloudera.org:8080/#/c/17785/6//COMMIT_MSG@11 PS6, Line 11: unicode nit Unicode http://gerrit.cloudera.org:8080/#/c/17785/6/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17785/6/be/src/exprs/string-functions-ir.cc@366 PS6, Line 366: if (wc_bytes == 0) break; Per https://en.cppreference.com/w/cpp/string/multibyte/mbtowc, on return value: If s is not a null pointer, returns the number of bytes that are contained in the multibyte character or -1 if the first bytes pointed to by s do not form a valid multibyte character or ?0? if s is pointing at the null charcter '\0'. Since '\0' is valid in Unicode, we should continue instead of break. http://gerrit.cloudera.org:8080/#/c/17785/6/be/src/exprs/string-functions-ir.cc@370 PS6, Line 370: ; nit. 0xFFFD. http://gerrit.cloudera.org:8080/#/c/17785/6/be/src/exprs/string-functions-ir.cc@371 PS6, Line 371: wc_bytes = 3; nit. should it be 4? http://gerrit.cloudera.org:8080/#/c/17785/6/be/src/exprs/string-functions-ir.cc@408 PS6, Line 408: iswspace nit. UNLIKELY() http://gerrit.cloudera.org:8080/#/c/17785/6/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test File testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test: http://gerrit.cloudera.org:8080/#/c/17785/6/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test@183 PS6, Line 183: ---- QUERY : set utf8_mode=false; : select upper('abcd áäèü'), lower('ABCD ÁÄÈÜ'), initcap('abcd áäèü ABCD ÁÄÈÜ'); : ---- RESULTS: RAW_STRING : 'ABCD áäèü','abcd ÁÄÈÜ','Abcd áäèü Abcd ÁÄÈÜ' : ---- TYPES : STRING,STRING,STRING : ==== : ---- QUERY : set utf8_mode=true; : select upper('abcd áäèü'), lower('ABCD ÁÄÈÜ'), initcap('abcd áäèü ABCD ÁÄÈÜ'); : ---- RESULTS: RAW_STRING : 'ABCD ÁÄÈÜ','abcd áäèü','Abcd Áäèü Abcd Áäèü' : ---- TYPES : STRING,STRING,STRING : ==== good tests. -- To view, visit http://gerrit.cloudera.org:8080/17785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I443e89d46f4638ce85664b021666bc4f03ee8abd Gerrit-Change-Number: 17785 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Comment-Date: Fri, 04 Feb 2022 18:50:47 +0000 Gerrit-HasComments: Yes