[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. For example: CREATE TABLE impala_char_insert (s STRING); -- all values are CHAR(N) with different N, but all will use the biggest N INSERT OVERWRITE impala_char_insert VALUES (CAST("1" AS CHAR(1))), (CAST("12" AS CHAR(2))), (CAST("123" AS CHAR(3))); SELECT length(s) FROM impala_char_insert; 3 3 3 -- if inserted individually, the result is SELECT length(s) FROM impala_char_insert; 1 2 3 This patch adds the query option VALUES_STMT_AVOID_LOSSY_CHAR_PADDING which, when set to true, fixes the problem by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. The new behaviour is not turned on by default because it is a breaking change. Note that the behaviour in Hive is different from both behaviours in Impala: Hive (and PostgreSQL) implicitly remove trailing spaces from CHAR values when they are cast to other types, which is also lossy. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we would therefore run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. Note: the new VALUES_STMT_AVOID_LOSSY_CHAR_PADDING query option and ALLOW_UNSAFE_CASTS are not allowed to be used at the same time: if both are set to true and the query contains set operation(s), an error is returned. Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Reviewed-on: http://gerrit.cloudera.org:8080/18999 Reviewed-by: Csaba Ringhofer Tested-by: Impala Public Jenkins --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-char-padding.test A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-no-lossy-char-padding.test M tests/query_test/test_chars.py 13 files changed, 331 insertions(+), 10 deletions(-) Approvals: Csaba Ringhofer: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 16 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 15: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 15 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 11 Jul 2023 15:48:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13517/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 15 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 11 Jul 2023 11:51:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13515/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 14 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 11 Jul 2023 11:49:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13516/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 13 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 11 Jul 2023 11:48:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Daniel Becker has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. For example: CREATE TABLE impala_char_insert (s STRING); -- all values are CHAR(N) with different N, but all will use the biggest N INSERT OVERWRITE impala_char_insert VALUES (CAST("1" AS CHAR(1))), (CAST("12" AS CHAR(2))), (CAST("123" AS CHAR(3))); SELECT length(s) FROM impala_char_insert; 3 3 3 -- if inserted individually, the result is SELECT length(s) FROM impala_char_insert; 1 2 3 This patch adds the query option VALUES_STMT_AVOID_LOSSY_CHAR_PADDING which, when set to true, fixes the problem by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. The new behaviour is not turned on by default because it is a breaking change. Note that the behaviour in Hive is different from both behaviours in Impala: Hive (and PostgreSQL) implicitly remove trailing spaces from CHAR values when they are cast to other types, which is also lossy. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we would therefore run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. Note: the new VALUES_STMT_AVOID_LOSSY_CHAR_PADDING query option and ALLOW_UNSAFE_CASTS are not allowed to be used at the same time: if both are set to true and the query contains set operation(s), an error is returned. Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-char-padding.test A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-no-lossy-char-padding.test M tests/query_test/test_chars.py 13 files changed, 331 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/13 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 13 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Daniel Becker has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. For example: CREATE TABLE impala_char_insert (s STRING); -- all values are CHAR(N) with different N, but all will use the biggest N INSERT OVERWRITE impala_char_insert VALUES (CAST("1" AS CHAR(1))), (CAST("12" AS CHAR(2))), (CAST("123" AS CHAR(3))); SELECT length(s) FROM impala_char_insert; 3 3 3 -- if inserted individually, the result is SELECT length(s) FROM impala_char_insert; 1 2 3 This patch adds the query option VALUES_STMT_AVOID_LOSSY_CHAR_PADDING which, when set to true, fixes the problem by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. The new behaviour is not turned on by default because it is a breaking change. Note that the behaviour in Hive is different from both behaviours in Impala: Hive (and PostgreSQL) implicitly remove trailing spaces from CHAR values when they are cast to other types, which is also lossy. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we would therefore run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. Note: the new VALUES_STMT_AVOID_LOSSY_CHAR_PADDING query option and ALLOW_UNSAFE_CASTS are not allowed to be used at the same time: if both are set to true and the query contains set operation(s), an error is returned. Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-char-padding.test A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-no-lossy-char-padding.test M tests/query_test/test_chars.py 13 files changed, 332 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/14 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 14 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 15: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9491/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 15 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 11 Jul 2023 11:29:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 15: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 15 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 11 Jul 2023 11:28:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Daniel Becker has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. For example: CREATE TABLE impala_char_insert (s STRING); -- all values are CHAR(N) with different N, but all will use the biggest N INSERT OVERWRITE impala_char_insert VALUES (CAST("1" AS CHAR(1))), (CAST("12" AS CHAR(2))), (CAST("123" AS CHAR(3))); SELECT length(s) FROM impala_char_insert; 3 3 3 -- if inserted individually, the result is SELECT length(s) FROM impala_char_insert; 1 2 3 This patch adds the query option VALUES_STMT_AVOID_LOSSY_CHAR_PADDING which, when set to true, fixes the problem by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. The new behaviour is not turned on by default because it is a breaking change. Note that the behaviour in Hive is different from both behaviours in Impala: Hive (and PostgreSQL) implicitly remove trailing spaces from CHAR values when they are cast to other types, which is also lossy. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we would therefore run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. Note: the new VALUES_STMT_AVOID_LOSSY_CHAR_PADDING query option and ALLOW_UNSAFE_CASTS are not allowed to be used at the same time: if both are set to true and the query contains set operation(s), an error is returned. Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-char-padding.test A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-no-lossy-char-padding.test M tests/query_test/test_chars.py 13 files changed, 331 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/15 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 15 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13483/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 11 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 06 Jul 2023 14:58:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Daniel Becker has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. For example: CREATE TABLE impala_char_insert (s STRING); -- all values are CHAR(N) with different N, but all will use the biggest N INSERT OVERWRITE impala_char_insert VALUES (CAST("1" AS CHAR(1))), (CAST("12" AS CHAR(2))), (CAST("123" AS CHAR(3))); SELECT length(s) FROM impala_char_insert; 3 3 3 -- if inserted individually, the result is SELECT length(s) FROM impala_char_insert; 1 2 3 This patch adds the query option VALUES_STMT_AVOID_LOSSY_CHAR_PADDING which, when set to true, fixes the problem by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. The new behaviour is not turned on by default because it is a breaking change. Note that the behaviour in Hive is different from both behaviours in Impala: Hive (and PostgreSQL) implicitly remove trailing spaces from CHAR values when they are cast to other types, which is also lossy. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we would therefore run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-char-padding.test A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-no-lossy-char-padding.test M tests/query_test/test_chars.py 13 files changed, 315 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/11 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 11 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13477/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 10 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 04 Jul 2023 13:02:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Daniel Becker has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. For example: CREATE TABLE impala_char_insert (s STRING); -- all values are CHAR(N) with different N, but all will use the biggest N INSERT OVERWRITE impala_char_insert VALUES (CAST("1" AS CHAR(1))), (CAST("12" AS CHAR(2))), (CAST("123" AS CHAR(3))); SELECT length(s) FROM impala_char_insert; 3 3 3 -- if inserted individually, the result is SELECT length(s) FROM impala_char_insert; 1 2 3 This patch adds the query option VALUES_STMT_AVOID_LOSSY_CHAR_PADDING which, when set to true, fixes the problem by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. The new behaviour is not turned on by default because it is a breaking change. Note that the behaviour in Hive is different from both behaviours in Impala: Hive implicitly removes trailing spaces from CHAR values when they are cast to other types, which is also lossy. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we would therefore run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-char-padding.test A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-no-lossy-char-padding.test M tests/query_test/test_chars.py 13 files changed, 340 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/10 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 10 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/18999/10/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/18999/10/be/src/service/query-options.h@53 PS10, Line 53: TImpalaQueryOptions::VALUES_STMT_AVOID_LOSSY_CHAR_PADDING + 1); \ line too long (93 > 90) -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 10 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 04 Jul 2023 12:39:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13475/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 9 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 04 Jul 2023 11:36:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Daniel Becker has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. For example: CREATE TABLE impala_char_insert (s STRING); -- all values are CHAR(N) with different N, but all will use the biggest N INSERT OVERWRITE impala_char_insert VALUES (CAST("1" AS CHAR(1))), (CAST("12" AS CHAR(2))), (CAST("123" AS CHAR(3))); SELECT length(s) FROM impala_char_insert; 3 3 3 -- if inserted individually, the result is SELECT length(s) FROM impala_char_insert; 1 2 3 This patch adds the query option VALUES_STMT_AVOID_LOSSY_CHAR_PADDING which, when set to true, fixes the problem by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. The new behaviour is not turned on by default because it is a breaking change. Note that the behaviour in Hive is different from both behaviours in Impala: Hive implicitly removes trailing spaces from CHAR values when they are cast to other types, which is also lossy. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we would therefore run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-char-padding.test A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-no-lossy-char-padding.test M tests/query_test/test_chars.py 13 files changed, 339 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/9 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 9 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 9: (7 comments) http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@8 PS8, Line 8: : If, in a VALUES clause, for the same column all of the values are CHAR : types but not all are of the same length, the common type chosen is : CHAR(max(lengths)). This means that shorter values are padded with : spaces. If the destination column is not CHAR but VARCHAR or STRING, : this produces different results than if the values in the column are : inserted individually, in separate statements. This behaviour is : suboptimal because information is lost. > An example could make this clearer. Done http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@17 PS8, Line 17: > I would prefer to add CHAR to the name somehow as it is only applied for th Done http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@26 PS8, Line 26: > Can you add a note about already different behavior in CHAR(N) padding comp I mentioned Hive. If it is important I can find out what Postgres does. http://gerrit.cloudera.org:8080/#/c/18999/8/common/thrift/Query.thrift File common/thrift/Query.thrift: http://gerrit.cloudera.org:8080/#/c/18999/8/common/thrift/Query.thrift@646 PS8, Line 646: > nit: false Done http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/StatementBase.java: http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@200 PS8, Line 200: > This is no longer true, right? Right, removed this from the comment. http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java File fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java: http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java@146 PS8, Line 146: : : : : : : : : : : : : : : : : > Calling this before analyze looks a bit convoluted. Done http://gerrit.cloudera.org:8080/#/c/18999/8/testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test File testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test: http://gerrit.cloudera.org:8080/#/c/18999/8/testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test@32 PS8, Line 32: > Can you also add longer string that will be truncated? If the value is longer than the destination field, it results in an error. I added a test case for that. -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 9 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 04 Jul 2023 11:13:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/18999/9/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/18999/9/be/src/service/query-options.h@53 PS9, Line 53: TImpalaQueryOptions::VALUES_STMT_AVOID_LOSSY_CHAR_PADDING + 1); \ line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/18999/9/be/src/service/query-options.h@298 PS9, Line 298: VALUES_STMT_AVOID_LOSSY_CHAR_PADDING, TQueryOptionLevel::REGULAR) \ line too long (129 > 90) -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 9 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 04 Jul 2023 11:14:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 8: (7 comments) http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@8 PS8, Line 8: : If, in a VALUES clause, for the same column all of the values are CHAR : types but not all are of the same length, the common type chosen is : CHAR(max(lengths)). This means that shorter values are padded with : spaces. If the destination column is not CHAR but VARCHAR or STRING, : this produces different results than if the values in the column are : inserted individually, in separate statements. This behaviour is : suboptimal because information is lost. An example could make this clearer. http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@17 PS8, Line 17: VALUES_STMT_NON_LOSSY_COMMON_TYPE I would prefer to add CHAR to the name somehow as it is only applied for that type. e.g. VALUES_STMT_AVOID_LOSSY_CHAR_PADDING http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@26 PS8, Line 26: We choose VARCHAR instead of STRING as the common type because VARCHAR Can you add a note about already different behavior in CHAR(N) padding compared to Hive? Postgres could be also mentioned. http://gerrit.cloudera.org:8080/#/c/18999/8/common/thrift/Query.thrift File common/thrift/Query.thrift: http://gerrit.cloudera.org:8080/#/c/18999/8/common/thrift/Query.thrift@646 PS8, Line 646: 0 nit: false http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/StatementBase.java: http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@200 PS8, Line 200: his is only used This is no longer true, right? http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java File fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java: http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java@146 PS8, Line 146: if (analyzer.getQueryCtx().client_request : .query_options.values_stmt_non_lossy_common_type) { : // We suppress any AnalysisException thrown by castIfAllChars() because it is the : // responsibility of super.analyze() to validate the query and report errors. Any : // error that results in castIfAllChars() throwing an exception should also trigger : // an exception in super.analyze(), but if there are multiple errors, the exception : // thrown by castIfAllChars() may not correspond to the error first encountered by : // super.analyze(). To keep error reporting consistent, we always choose the : // exception thrown by super.analyze() and suppress exceptions thrown by : // castIfAllChars(). : try { : castIfAllChars(analyzer); : } catch (AnalysisException e) { : ex = e; : } : } : super.analyze(analyzer); Calling this before analyze looks a bit convoluted. It would be nicer to do this during the usual analyses, e.g. create function in SetOperationStmt like use EnforceNonLossyCommonType() which could return true for ValuesStmt with values_stmt_non_lossy_common_type. This bool could be passed to castToSetOpCompatibleTypes() which looks to best place for this logic for me. http://gerrit.cloudera.org:8080/#/c/18999/8/testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test File testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test: http://gerrit.cloudera.org:8080/#/c/18999/8/testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test@32 PS8, Line 32: (cast("12" as char(2))); Can you also add longer string that will be truncated? + the same for varchars -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 8 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Mon, 03 Jul 2023 13:10:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13449/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 8 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Fri, 30 Jun 2023 14:02:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Daniel Becker has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE which, when set to true, fixes the problem by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. The new behaviour is not turned on by default because it is a breaking change. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we therefore would run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test M tests/query_test/test_chars.py 10 files changed, 382 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/8 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 8 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13438/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 29 Jun 2023 17:12:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 7: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/13431/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 29 Jun 2023 12:00:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Daniel Becker has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE which, when set to true, fixes the problem by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. The new behaviour is not turned on by default because it is a breaking change. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we therefore would run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test M tests/query_test/test_chars.py 10 files changed, 382 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/7 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12705/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Wed, 29 Mar 2023 08:39:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Daniel Becker has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE which, when set to true, fixes the problem by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. The new behaviour is not turned on by default because it is a breaking change. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we therefore would run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test M tests/query_test/test_chars.py 10 files changed, 383 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/6 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12127/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 5 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Mon, 09 Jan 2023 17:15:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Daniel Becker has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE which, when set to true, fixes the problem by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. The new behaviour is not turned on by default because it is a breaking change. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we therefore would run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test M tests/query_test/test_chars.py 10 files changed, 383 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/5 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 5 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/11417/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Fri, 23 Sep 2022 12:38:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Daniel Becker has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE which, when set to true, fixes the problem by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. The new behaviour is not turned on by default because it is a breaking change. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we therefore would run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test M tests/query_test/test_chars.py 10 files changed, 382 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/4 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/11411/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 22 Sep 2022 14:43:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/11410/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 22 Sep 2022 14:43:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Daniel Becker has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE which, when set to true, fixes the problem by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. The new behaviour is not turned on by default because it is a breaking change. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we therefore would run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test M tests/query_test/test_chars.py 10 files changed, 375 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/2 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Daniel Becker has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE which, when set to true, fixes the problem by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. The new behaviour is not turned on by default because it is a breaking change. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we therefore would run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test M tests/query_test/test_chars.py 10 files changed, 374 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/3 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa