Daniel Becker has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are 
inserted
......................................................................

IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.

The new behaviour is not turned on by default because it is a breaking
change.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we therefore would run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   
testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
A 
testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test
A 
testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test
M tests/query_test/test_chars.py
10 files changed, 383 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/5
--
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 5
Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com>

Reply via email to