Hello Qifan Chen, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/17785 to look at the new patch set (#7). Change subject: IMPALA-2019(part-4): Add UTF-8 support for case conversion functions ...................................................................... IMPALA-2019(part-4): Add UTF-8 support for case conversion functions There are 3 builtin case conversion string functions: upper(), lower(), and initcap(). Previously they only convert English alphabetic characters. This patch adds support to deal with Unicode characters. There are many corner cases in case conversion depending on the locale and context. E.g. 1) Case conversion is locale-sensitive. Turkish has 4 letter "I"s. English has only two, a lowercase dotted i and an uppercase dotless I. Turkish has lowercase and uppercase forms of both dotted and dotless I. So simply converting "i" to "I" for upper case is wrong in Turkish: +-------+--------+---------+ | | Dotted | Dotless | +-------+--------+---------+ | Upper | İ | I | +-------+--------+---------+ | Lower | i | ı | +-------+--------+---------+ 2) Case conversion may change a string's length. The German word "grüßen" should be converted to "GRÜSSEN" in upper case: the letter "ß" should be converted to "SS". 3) Case conversion is context-sensitive. The Greek word "ὈΔΥΣΣΕΎΣ" should be converted to "ὀδυσσεύς", where the Greek letter "Σ" is converted to "σ" or to "ς", depending on its position in the word. This patch uses the simple wchar_t conversions of std::towupper and stdd::towlower so the above cases are not handled. We will try supporting them in follow-up JIRAs. Test: - Add BE unit tests and e2e tests. Change-Id: I443e89d46f4638ce85664b021666bc4f03ee8abd --- M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test 5 files changed, 212 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/17785/7 -- To view, visit http://gerrit.cloudera.org:8080/17785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I443e89d46f4638ce85664b021666bc4f03ee8abd Gerrit-Change-Number: 17785 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>