Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16909

to look at the new patch set (#6).

Change subject: [WIP] IMPALA-5675: Support UTF-8 Varchar and Char
......................................................................

[WIP] IMPALA-5675: Support UTF-8 Varchar and Char

This patch addes support for UTF-8 aware varchar and char types. In
UTF-8 mode, when truncating UTF-8 varchar(N) and char(N) strings,
lengths will be counted by UTF-8 characters instead of bytes. So the
result string will have up to N characters.

Tuple memory layout changes:
A char(N) slot will occupy 4 * N bytes if it's a UTF-8 type (set in FE
in analyzing). Because a UTF-8 character can be encoded into 1~4 bytes.
The slot will store up to N UTF-8 characters. The remaining bytes will
be padded by whitespace. When converting char(N) to other string types,
we re-calculate the actual length. We can optimize this in later
patches, e.g. store the UTF-8 length in the slot, or deal with UTF-8
char(N) by the same way as varchar(N), i.e. reallocate the string space
and just store the pointer and length in the slot.

TODO: correct codegen for varchar in text-converter.cc
TODO: Create varchar, char columns on text files with longer strings.
Verify truncating.

Tests:
 - Add tests for reading char(N) and varchar(N) columns in UTF8_MODE.

Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f
---
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen.cc
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner-test.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/text-converter.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/anyval-util.cc
M be/src/exprs/anyval-util.h
M be/src/exprs/cast-functions-ir.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.inline.h
M be/src/runtime/types.h
M be/src/service/fe-support.cc
M be/src/service/hs2-util.cc
M be/src/util/CMakeLists.txt
M be/src/util/string-util-test.cc
M be/src/util/string-util.cc
M be/src/util/string-util.h
M be/src/util/tuple-row-compare.cc
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TypeDef.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M testdata/datasets/functional/functional_schema_template.sql
M tests/query_test/test_utf8_strings.py
37 files changed, 316 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/16909/6
--
To view, visit http://gerrit.cloudera.org:8080/16909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f
Gerrit-Change-Number: 16909
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>

Reply via email to