Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
......................................................................


Patch Set 14:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@27
PS14, Line 27: IMPALA-5675 tracks adding UTF-8 Character length support to 
VARCHAR
             : columns and marked the truncation code with a TODO that 
references
             : that Jira.
I don't expect any additional sorting or predicate issues outside of any that 
may already exist. VARCHAR is effectively a STRING (which Kudu has had for some 
time) with a length limit.

Impala warns users the UTF-8 functionality is effectively undefined: 
https://impala.apache.org/docs/build/html/topics/impala_string.html

> For full support in all Impala subsystems, restrict string values to the 
> ASCII character set. Although some UTF-8 character data can be stored in 
> Impala and retrieved through queries, UTF-8 strings containing non-ASCII 
> characters are not guaranteed to work properly in combination with many SQL 
> aspects, including but not limited to:

- String manipulation functions.
- Comparison operators.
- The ORDER BY clause.
- Values in partition key columns.

If these edge cases and tests look important. We should prioritize UTF-8 
functionality as a whole in Impala.

Note: Hive, Parquet, and ORC all support UTF-8


http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@33
PS14, Line 33: * Manually reproduced a check failure due to multi-byte 
characters
             :   and tested that length truncation resolve that issue.
> If this test is very hard to integrate into the Impala environment, then I
I will look at implementing it. The main challenge is inserting data directly 
via a Kudu client give Impala doesn't support UTF-8 strings.


http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@47
PS14, Line 47: support
> What is the current state of min/max runtime filters for varchars? Are they
I am under the impression they should work (given it's just strings), but need 
tests. Thomas would likely know.



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Bukor <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Mon, 30 Mar 2020 21:39:45 +0000
Gerrit-HasComments: Yes

Reply via email to