[Impala-ASF-CR] IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16383 ) Change subject: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files .. IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files Currently Impala checks file metadata 'hive.acid.version' to decide the full ACID schema. There are cases when Hive forgets to set this value for full ACID files, e.g. query-based compactions. So it's more robust to check the schema elements instead of the metadata field. Also, sometimes Hive write the schema with different character cases, e.g. originalTransaction vs originaltransaction, so we should rather compare the column names in a case insensitive way. Testing: * added test for full ACID compaction * added test_full_acid_schema_without_file_metadata_tag to test full ACID file without metadata 'hive.acid.version' Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Reviewed-on: http://gerrit.cloudera.org:8080/16383 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M testdata/data/README A testdata/data/full_acid_schema_but_no_acid_version.orc M testdata/workloads/functional-query/queries/QueryTest/acid-compaction.test M tests/query_test/test_acid.py 7 files changed, 88 insertions(+), 27 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Gerrit-Change-Number: 16383 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16383 ) Change subject: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Gerrit-Change-Number: 16383 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 01 Sep 2020 22:27:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16383 ) Change subject: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7065/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Gerrit-Change-Number: 16383 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 01 Sep 2020 17:25:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16383 ) Change subject: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Gerrit-Change-Number: 16383 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 01 Sep 2020 17:05:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16383 ) Change subject: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files .. Patch Set 4: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/16383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Gerrit-Change-Number: 16383 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 01 Sep 2020 17:04:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16383 ) Change subject: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6377/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Gerrit-Change-Number: 16383 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 01 Sep 2020 17:05:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16383 to look at the new patch set (#4). Change subject: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files .. IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files Currently Impala checks file metadata 'hive.acid.version' to decide the full ACID schema. There are cases when Hive forgets to set this value for full ACID files, e.g. query-based compactions. So it's more robust to check the schema elements instead of the metadata field. Also, sometimes Hive write the schema with different character cases, e.g. originalTransaction vs originaltransaction, so we should rather compare the column names in a case insensitive way. Testing: * added test for full ACID compaction * added test_full_acid_schema_without_file_metadata_tag to test full ACID file without metadata 'hive.acid.version' Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M testdata/data/README A testdata/data/full_acid_schema_but_no_acid_version.orc M testdata/workloads/functional-query/queries/QueryTest/acid-compaction.test M tests/query_test/test_acid.py 7 files changed, 88 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/16383/4 -- To view, visit http://gerrit.cloudera.org:8080/16383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Gerrit-Change-Number: 16383 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16383 ) Change subject: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files .. Patch Set 3: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6374/ -- To view, visit http://gerrit.cloudera.org:8080/16383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Gerrit-Change-Number: 16383 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 01 Sep 2020 16:33:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16383 ) Change subject: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7059/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Gerrit-Change-Number: 16383 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 01 Sep 2020 11:32:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16383 ) Change subject: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files .. Patch Set 2: Code-Review+2 (1 comment) Carry +2 http://gerrit.cloudera.org:8080/#/c/16383/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16383/1//COMMIT_MSG@20 PS1, Line 20: * added test_full_acid_schema_without_file_metadata_tag to test full : ACID file without metadata 'hi > I would prefer to have an automatic test with a specific file, as Hive may Done -- To view, visit http://gerrit.cloudera.org:8080/16383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Gerrit-Change-Number: 16383 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 01 Sep 2020 11:21:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16383 ) Change subject: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Gerrit-Change-Number: 16383 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 01 Sep 2020 11:22:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16383 ) Change subject: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6374/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Gerrit-Change-Number: 16383 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 01 Sep 2020 11:22:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16383 to look at the new patch set (#2). Change subject: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files .. IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files Currently Impala checks file metadata 'hive.acid.version' to decide the full ACID schema. There are cases when Hive forgets to set this value for full ACID files, e.g. query-based compactions. So it's more robust to check the schema elements instead of the metadata field. Also, sometimes Hive write the schema with different character cases, e.g. originalTransaction vs originaltransaction, so we should rather compare the column names in a case insensitive way. Testing: * added test for full ACID compaction * added test_full_acid_schema_without_file_metadata_tag to test full ACID file without metadata 'hive.acid.version' Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M testdata/data/README M testdata/workloads/functional-query/queries/QueryTest/acid-compaction.test M tests/query_test/test_acid.py 6 files changed, 88 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/16383/2 -- To view, visit http://gerrit.cloudera.org:8080/16383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Gerrit-Change-Number: 16383 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16383 ) Change subject: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files .. Patch Set 1: Code-Review+2 (1 comment) The patch seems good to me, my only concern is about losing test coverage in the future. http://gerrit.cloudera.org:8080/#/c/16383/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16383/1//COMMIT_MSG@20 PS1, Line 20: * tested manually on a file that has ACIDv2 schema, but : 'hive.acid.version' is missing I would prefer to have an automatic test with a specific file, as Hive may set "hive.acid.version" during query-based compaction in the future, but should still be able to handle files written by older versions. -- To view, visit http://gerrit.cloudera.org:8080/16383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Gerrit-Change-Number: 16383 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 31 Aug 2020 12:01:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16383 ) Change subject: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7034/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Gerrit-Change-Number: 16383 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 28 Aug 2020 16:38:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16383 Change subject: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files .. IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files Currently Impala checks file metadata 'hive.acid.version' to decide the full ACID schema. There are cases when Hive forgets to set this value for full ACID files, e.g. query-based compactions. So it's more robust to check the schema elements instead of the metadata field. Also, sometimes Hive write the schema with different character cases, e.g. originalTransaction vs originaltransaction, so we should rather compare the column names in a case insensitive way. Testing: * added test for full ACID compaction * tested manually on a file that has ACIDv2 schema, but 'hive.acid.version' is missing Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M testdata/workloads/functional-query/queries/QueryTest/acid-compaction.test 4 files changed, 67 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/16383/1 -- To view, visit http://gerrit.cloudera.org:8080/16383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Gerrit-Change-Number: 16383 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy