Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16909 )
Change subject: IMPALA-5675: Support UTF-8 Varchar and Char types ...................................................................... Patch Set 16: (7 comments) Resolving some comments. Still WIP. http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/exprs/anyval-util.h File be/src/exprs/anyval-util.h: http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/exprs/anyval-util.h@310 PS14, Line 310: sv->len = (type.is_utf8 && type.type == TYPE_CHAR)? : FindUtf8PosForward(reinterpret_cast<const char*>(slot), type.char_len * 4, type.char_len) : : type.char_len; : return; > Maybe separate the two cases to make the code cleaner. Done http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/runtime/types.h File be/src/runtime/types.h: http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/runtime/types.h@73 PS14, Line 73: /// TYPE_VARCHAR, TYPE_FIXED_UDA_INTERMEDIATE. : /// 'char_len' is the lo > Elsewhere in the code, it seems 'len' is interpreted as length in bytes for Done. Split this into 'char_len' and 'byte_len'. http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/runtime/types.h@119 PS14, Line 119: > Add a comment on the unit of 'len'. Done. I renamed this to char_len with a comment. http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/runtime/types.h@129 PS14, Line 129: > Add a comment on the unit of 'len'. Done. I renamed this to char_len with a comment. http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/runtime/types.h@297 PS14, Line 297: function for > It will be good if we pay the cost of storing the length in bytes once (com Done. I added a 'byte_len' field for this, and renamed this to GetMaxByteLen(). http://gerrit.cloudera.org:8080/#/c/16909/14/testdata/workloads/functional-query/queries/QueryTest/utf8-chars-casting.test File testdata/workloads/functional-query/queries/QueryTest/utf8-chars-casting.test: http://gerrit.cloudera.org:8080/#/c/16909/14/testdata/workloads/functional-query/queries/QueryTest/utf8-chars-casting.test@4 PS14, Line 4: '李小龙' > Consider adding literals with a combination of different utf8 lengths, such Done http://gerrit.cloudera.org:8080/#/c/16909/14/testdata/workloads/functional-query/queries/QueryTest/utf8-chars-casting.test@4 PS14, Line 4: '李小龙' > May consider use single quote as double quote is for delimitated identifier Done -- To view, visit http://gerrit.cloudera.org:8080/16909 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f Gerrit-Change-Number: 16909 Gerrit-PatchSet: 16 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Wed, 25 May 2022 03:55:30 +0000 Gerrit-HasComments: Yes