Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16909 )

Change subject: IMPALA-5675: Support UTF-8 Varchar and Char types
......................................................................


Patch Set 16:

(7 comments)

Resolving some comments. Still WIP.

http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/exprs/anyval-util.h
File be/src/exprs/anyval-util.h:

http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/exprs/anyval-util.h@310
PS14, Line 310: sv->len = (type.is_utf8 && type.type == TYPE_CHAR)?
              :             FindUtf8PosForward(reinterpret_cast<const 
char*>(slot), type.char_len * 4, type.char_len)
              :             : type.char_len;
              :         return;
> Maybe separate the two cases to make the code cleaner.
Done


http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/runtime/types.h
File be/src/runtime/types.h:

http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/runtime/types.h@73
PS14, Line 73: /// TYPE_VARCHAR, TYPE_FIXED_UDA_INTERMEDIATE.
             :   /// 'char_len' is the lo
> Elsewhere in the code, it seems 'len' is interpreted as length in bytes for
Done. Split this into 'char_len' and 'byte_len'.


http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/runtime/types.h@119
PS14, Line 119:
> Add a comment on the unit of 'len'.
Done. I renamed this to char_len with a comment.


http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/runtime/types.h@129
PS14, Line 129:
> Add a comment on the unit of 'len'.
Done. I renamed this to char_len with a comment.


http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/runtime/types.h@297
PS14, Line 297: function for
> It will be good if we pay the cost of storing the length in bytes once (com
Done. I added a 'byte_len' field for this, and renamed this to GetMaxByteLen().


http://gerrit.cloudera.org:8080/#/c/16909/14/testdata/workloads/functional-query/queries/QueryTest/utf8-chars-casting.test
File 
testdata/workloads/functional-query/queries/QueryTest/utf8-chars-casting.test:

http://gerrit.cloudera.org:8080/#/c/16909/14/testdata/workloads/functional-query/queries/QueryTest/utf8-chars-casting.test@4
PS14, Line 4: '李小龙'
> Consider adding literals with a combination of different utf8 lengths, such
Done


http://gerrit.cloudera.org:8080/#/c/16909/14/testdata/workloads/functional-query/queries/QueryTest/utf8-chars-casting.test@4
PS14, Line 4: '李小龙'
> May consider use single quote as double quote is for delimitated identifier
Done



--
To view, visit http://gerrit.cloudera.org:8080/16909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f
Gerrit-Change-Number: 16909
Gerrit-PatchSet: 16
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 May 2022 03:55:30 +0000
Gerrit-HasComments: Yes

Reply via email to