Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16909 to look at the new patch set (#5). Change subject: [WIP] IMPALA-5675: Support UTF-8 Varchar and Char ...................................................................... [WIP] IMPALA-5675: Support UTF-8 Varchar and Char This patch addes support for UTF-8 aware varchar and char types. In UTF-8 mode, when truncating UTF-8 varchar(N) and char(N) strings, lengths will be counted by UTF-8 characters instead of bytes. So the result string will have up to N characters. Tuple memory layout changes: A char(N) slot will occupy 4 * N bytes if it's a UTF-8 type (set in FE in analyzing). Because a UTF-8 character can be encoded into 1~4 bytes. The slot will store up to N UTF-8 characters. The remaining bytes will be padded by whitespace. When converting char(N) to other string types, we re-calculate the actual length. We can optimize this in later patches, e.g. store the UTF-8 length in the slot, or deal with UTF-8 char(N) by the same way as varchar(N), i.e. reallocate the string space and just store the pointer and length in the slot. TODO: correct codegen for varchar in text-converter.cc Tests: - Add tests for reading char(N) and varchar(N) columns in UTF8_MODE. Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/hdfs-avro-scanner-ir.cc M be/src/exec/hdfs-avro-scanner-test.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/kudu-scanner.cc M be/src/exec/orc-column-readers.cc M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/text-converter.cc M be/src/exec/text-converter.inline.h M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/cast-functions-ir.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.inline.h M be/src/runtime/types.h M be/src/service/fe-support.cc M be/src/service/hs2-util.cc M be/src/util/CMakeLists.txt M be/src/util/string-util-test.cc M be/src/util/string-util.cc M be/src/util/string-util.h M be/src/util/tuple-row-compare.cc M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/CastExpr.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/analysis/TypeDef.java M fe/src/main/java/org/apache/impala/catalog/ScalarType.java M testdata/datasets/functional/functional_schema_template.sql M tests/query_test/test_utf8_strings.py 37 files changed, 318 insertions(+), 89 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/16909/5 -- To view, visit http://gerrit.cloudera.org:8080/16909 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f Gerrit-Change-Number: 16909 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>