Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16909 )
Change subject: IMPALA-5675: Support UTF-8 Varchar and Char types ...................................................................... Patch Set 15: (8 comments) Thanks Qifan's feedbacks! I'm reviving this thread since all the other UTF-8 works on string functions have finished. I'll address the other comments in next patch sets. http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/exprs/cast-functions-ir.cc@166 PS14, Line 166: llint > It would be nice if we do not pay the new check on UTF8 mode here. That is, I think it won't have impact since codegen will remove the check on UTF8 mode, i.e. replacing ctx->impl()->GetConstFnAttr(...) with the actual values. https://github.com/apache/impala/blob/6ea15409b879a1286e72848defdda8d5d8568c19/be/src/codegen/llvm-codegen.cc#L995 http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/exprs/cast-functions-ir.cc@192 PS14, Line 192: nific > Same comment as above. That is, we keep the original code. same as above. http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/runtime/string-value.h File be/src/runtime/string-value.h: http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/runtime/string-value.h@45 PS14, Line 45: /// The length of the string in bytes regardless whether it is binary or UTF-8 string. > May append "regardless whether it is binary or UTF8". Done http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/util/bit-util.h File be/src/util/bit-util.h: http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/util/bit-util.h@133 PS14, Line 133: > > Should be ">=" as 0x00 is a single byte UTF8 character. Oops! Done. http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/util/bit-util.h@138 PS14, Line 138: } > May add a DCHECK(false) here. The switch-clause has a default branch. I think adding a DCHECK(false) will be a dead code. http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/util/string-util.cc File be/src/util/string-util.cc: http://gerrit.cloudera.org:8080/#/c/16909/14/be/src/util/string-util.cc@96 PS14, Line 96: DCHECK_GE(index, 0); : int pos = 0; > These two lines can be removed. Nice! Done http://gerrit.cloudera.org:8080/#/c/16909/14/fe/src/main/java/org/apache/impala/catalog/Type.java File fe/src/main/java/org/apache/impala/catalog/Type.java: http://gerrit.cloudera.org:8080/#/c/16909/14/fe/src/main/java/org/apache/impala/catalog/Type.java@278 PS14, Line 278: public static TColumnType addUtf8Markers(TColumnType container, boolean isUtf8) { : for (TTypeNode t : container.types) { : if (t.type == TTypeNodeType.SCALAR) { : t.scalar_type.setIs_utf8(isUtf8); : } : } : return container; : } > Sounds like this should be a method on TColumnType. TColumnType is a thrift generated class with only has basic setter/getter methods. So we can't move it to TColumnType class. Renamed the method. http://gerrit.cloudera.org:8080/#/c/16909/14/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test: http://gerrit.cloudera.org:8080/#/c/16909/14/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test@484 PS14, Line 484: kudu; > May consider other table format such as PARQUET, ORC etc. Yeah, this is a test only for kudu because kudu doesn't support CHAR type yet. Tests for other table formats are in utf8-chars*.test triggered by tests/query_test/test_utf8_strings.py. -- To view, visit http://gerrit.cloudera.org:8080/16909 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f Gerrit-Change-Number: 16909 Gerrit-PatchSet: 15 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Thu, 19 May 2022 05:49:50 +0000 Gerrit-HasComments: Yes