Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16909 )

Change subject: IMPALA-5675: Support UTF-8 Varchar and Char types
......................................................................


Patch Set 15:

(8 comments)

Thanks Qifan's feedbacks! I'm reviving this thread since all the other UTF-8 
works on string functions have finished. I'll address the other comments in 
next patch sets.

http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/exprs/cast-functions-ir.cc@166
PS14, Line 166: llint
> It would be nice if we do not pay the new check on UTF8 mode here. That is,
I think it won't have impact since codegen will remove the check on UTF8 mode, 
i.e. replacing ctx->impl()->GetConstFnAttr(...) with the actual values.

https://github.com/apache/impala/blob/6ea15409b879a1286e72848defdda8d5d8568c19/be/src/codegen/llvm-codegen.cc#L995


http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/exprs/cast-functions-ir.cc@192
PS14, Line 192: nific
> Same comment as above. That is, we keep the original code.
same as above.


http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/runtime/string-value.h
File be/src/runtime/string-value.h:

http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/runtime/string-value.h@45
PS14, Line 45:   /// The length of the string in bytes regardless whether it is 
binary or UTF-8 string.
> May append "regardless whether it is binary or UTF8".
Done


http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/util/bit-util.h
File be/src/util/bit-util.h:

http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/util/bit-util.h@133
PS14, Line 133: >
> Should be ">=" as 0x00 is a single byte UTF8 character.
Oops! Done.


http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/util/bit-util.h@138
PS14, Line 138: }
> May add a DCHECK(false) here.
The switch-clause has a default branch. I think adding a DCHECK(false) will be 
a dead code.


http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/util/string-util.cc
File be/src/util/string-util.cc:

http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/util/string-util.cc@96
PS14, Line 96:  DCHECK_GE(index, 0);
             :   int pos = 0;
> These two lines can be removed.
Nice! Done


http://gerrit.cloudera.org:8080/#/c/16909/14/fe/src/main/java/org/apache/impala/catalog/Type.java
File fe/src/main/java/org/apache/impala/catalog/Type.java:

http://gerrit.cloudera.org:8080/#/c/16909/14/fe/src/main/java/org/apache/impala/catalog/Type.java@278
PS14, Line 278: public static TColumnType addUtf8Markers(TColumnType container, 
boolean isUtf8) {
              :     for (TTypeNode t : container.types) {
              :       if (t.type == TTypeNodeType.SCALAR) {
              :         t.scalar_type.setIs_utf8(isUtf8);
              :       }
              :     }
              :     return container;
              :   }
> Sounds like this should be a method on TColumnType.
TColumnType is a thrift generated class with only has basic setter/getter 
methods. So we can't move it to TColumnType class.
Renamed the method.


http://gerrit.cloudera.org:8080/#/c/16909/14/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test:

http://gerrit.cloudera.org:8080/#/c/16909/14/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test@484
PS14, Line 484: kudu;
> May consider other table format such as PARQUET, ORC etc.
Yeah, this is a test only for kudu because kudu doesn't support CHAR type yet. 
Tests for other table formats are in utf8-chars*.test triggered by 
tests/query_test/test_utf8_strings.py.



--
To view, visit http://gerrit.cloudera.org:8080/16909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f
Gerrit-Change-Number: 16909
Gerrit-PatchSet: 15
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 May 2022 05:49:50 +0000
Gerrit-HasComments: Yes

Reply via email to