Daniel Becker has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/18999 )
Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted ...................................................................... IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE which, when set to true, fixes the problem by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. The new behaviour is not turned on by default because it is a breaking change. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we therefore would run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test M tests/query_test/test_chars.py 10 files changed, 383 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/5 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 5 Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com>