[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema .. IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema Full ACID row format looks like this: { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1, "row": {"i": 1} } User columns are nested under "row". In the frontend we need to create slot descriptors that correspond to the file schema. In the catalog we could mimic the file schema but that would introduce several complexities and corner cases in column resolution. Also in query results the heading of the above user column would be "row.i". Star expansion should also be modified, etc. Because of that in the Catalog I create the exact opposite of the above schema: { "row__id": { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1 } "i": 1 } This way very little modification is needed in the frontend. And the hidden columns can be easily retrieved via 'SELECT row__id.*' when we need those for debugging/testing. We only need to change Path.getAbsolutePath() to return a schema path that corresponds to the file schema. Also in the backend we need some extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the table schema path from the file schema path. Testing: I changed data loading to load ORC files in full ACID format by default. With this change we should be able to scan full ACID tables that are not minor-compacted, don't have deleted rows, and don't have original files. Newly added Tests: * specific queries about hidden columns (full-acid-rowid.test) * SHOW CREATE TABLE (show-create-table-full-acid.test) * DESCRIBE [FORMATTED] TABLE (describe-path.test) * INSERT should be forbidden (acid-negative.test) * added tests for column masking ( ranger_column_masking_complex_types.test) Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Reviewed-on: http://gerrit.cloudera.org:8080/15395 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/common/logging.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java M testdata/bin/generate-schema-statements.py M testdata/datasets/README M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test M testdata/workloads/functional-query/queries/QueryTest/acid.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/describe-path.test A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test A
[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema .. Patch Set 12: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 02 Apr 2020 12:01:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema .. Patch Set 12: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 02 Apr 2020 07:32:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema .. Patch Set 12: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5602/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 02 Apr 2020 07:32:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 01 Apr 2020 23:18:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5679/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 01 Apr 2020 09:38:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema .. Patch Set 11: Code-Review+1 (2 comments) Thanks for the comments! I'd rather wait for a +2 from someone else than myself. Carrying +1 http://gerrit.cloudera.org:8080/#/c/15395/10/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java: http://gerrit.cloudera.org:8080/#/c/15395/10/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java@112 PS10, Line 112: Analyzer.ensureTableNotTransactional(table_, "ALTER TABLE"); > Is it the time to remove this TODO? Yes, thanks, done. http://gerrit.cloudera.org:8080/#/c/15395/10/testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test File testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test: http://gerrit.cloudera.org:8080/#/c/15395/10/testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test@7 PS10, Line 7: as select * from functional_orc_def.decimal_tiny; > Will this introduce the same failure in S3 tests like IMPALA-9345 found? Be We talked about it offline. Conclusion was to add SkipIfS3.hive to this test. -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 01 Apr 2020 09:01:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema .. Patch Set 11: (2 comments) http://gerrit.cloudera.org:8080/#/c/15395/11/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/15395/11/testdata/bin/generate-schema-statements.py@321 PS11, Line 321: ' flake8: E129 visually indented line with same indent as next logical line http://gerrit.cloudera.org:8080/#/c/15395/11/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/15395/11/tests/query_test/test_scanners_fuzz.py@306 PS11, Line 306: n flake8: E129 visually indented line with same indent as next logical line -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 01 Apr 2020 08:57:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema
Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15395 to look at the new patch set (#11). Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema .. IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema Full ACID row format looks like this: { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1, "row": {"i": 1} } User columns are nested under "row". In the frontend we need to create slot descriptors that correspond to the file schema. In the catalog we could mimic the file schema but that would introduce several complexities and corner cases in column resolution. Also in query results the heading of the above user column would be "row.i". Star expansion should also be modified, etc. Because of that in the Catalog I create the exact opposite of the above schema: { "row__id": { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1 } "i": 1 } This way very little modification is needed in the frontend. And the hidden columns can be easily retrieved via 'SELECT row__id.*' when we need those for debugging/testing. We only need to change Path.getAbsolutePath() to return a schema path that corresponds to the file schema. Also in the backend we need some extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the table schema path from the file schema path. Testing: I changed data loading to load ORC files in full ACID format by default. With this change we should be able to scan full ACID tables that are not minor-compacted, don't have deleted rows, and don't have original files. Newly added Tests: * specific queries about hidden columns (full-acid-rowid.test) * SHOW CREATE TABLE (show-create-table-full-acid.test) * DESCRIBE [FORMATTED] TABLE (describe-path.test) * INSERT should be forbidden (acid-negative.test) * added tests for column masking ( ranger_column_masking_complex_types.test) Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb --- M be/src/common/logging.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java M testdata/bin/generate-schema-statements.py M testdata/datasets/README M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test M testdata/workloads/functional-query/queries/QueryTest/acid.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/describe-path.test A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test A
[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema .. Patch Set 10: Code-Review+1 (3 comments) The other part of the patch looks good to me as well. Just need to verify that test_create_table_like_file_orc in test_ddl.py passes in S3 tests since this patch adds some data load statements in it. Feel free to carry on my +1 and Tim's to be +2. http://gerrit.cloudera.org:8080/#/c/15395/10/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java: http://gerrit.cloudera.org:8080/#/c/15395/10/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java@112 PS10, Line 112: // TODO: remove it or keep it by the end of this change request. Is it the time to remove this TODO? http://gerrit.cloudera.org:8080/#/c/15395/10/testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test File testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test: http://gerrit.cloudera.org:8080/#/c/15395/10/testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test@7 PS10, Line 7: as select * from functional_orc_def.decimal_tiny; Will this introduce the same failure in S3 tests like IMPALA-9345 found? Because Yarn and HiveServer2 are not launched in S3 tests (IMPALA-9365). Maybe we should skip the test on non-hdfs envs as IMPALA-9345 does. Otherwise, do you mind running the S3 tests once before merging this patch? http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py@181 PS8, Line 181: self.run_stmt_in_hive("insert into %s.%s select * from %s.%s" % (fuzz_db, : fuzz_table, src_db, src_table)) > I checked and it seems fine to me. Note that in this CR I changed the data Thanks for the verification! -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 01 Apr 2020 03:55:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5658/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 31 Mar 2020 17:13:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema .. Patch Set 8: (7 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/15395/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15395/8//COMMIT_MSG@7 PS8, Line 7: IMPALA-9042 > Change this to IMPALA-9484? Done http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/hdfs-orc-scanner.cc@194 PS8, Line 194: "hive.acid.version" > nit: make this a constant? Done http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/hdfs-orc-scanner.cc@197 PS8, Line 197: but file " : "is not > nit: explain that it doesn't have metadata "hive.acid.version" = "2"? Done http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/orc-metadata-utils.cc File be/src/exec/orc-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/orc-metadata-utils.cc@87 PS8, Line 87: DCHECK(ValidateFullAcidFileSchema().ok()); // Should have already been validated. > I like the idea, but I run into name clashes because we define KUDU_HEADERS Thanks for reviewing it so quickly. I added the DCHECK_OK macro to common/logging.h. http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_nested_types.py File tests/query_test/test_nested_types.py: http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_nested_types.py@213 PS8, Line 213: base_table = "functional_orc_def.complextypestbl_non_transactional" > Do we have test coverage on reading nested types from full-ACID partitioned Thanks for catching this. I created the ACID version of this test. I think I'll keep the current modifications as well to have coverage for non-ACID ORC tables. http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners.py@207 PS8, Line 207: functional_orc_def > I think we should get the db name by Done http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py@181 PS8, Line 181: self.run_stmt_in_hive("insert into %s.%s select * from %s.%s" % (fuzz_db, : fuzz_table, src_db, src_table)) > I'm not sure if this copies the data of complextypestbl correctly since I p I checked and it seems fine to me. Note that in this CR I changed the data loading of 'complextypestbl'. We don't use the nullable/nonnullable.orc files for it (we only use them for complextypestbl_non_transactional). -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 31 Mar 2020 16:05:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema .. Patch Set 10: (2 comments) http://gerrit.cloudera.org:8080/#/c/15395/10/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/15395/10/testdata/bin/generate-schema-statements.py@321 PS10, Line 321: ' flake8: E129 visually indented line with same indent as next logical line http://gerrit.cloudera.org:8080/#/c/15395/10/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/15395/10/tests/query_test/test_scanners_fuzz.py@306 PS10, Line 306: n flake8: E129 visually indented line with same indent as next logical line -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 31 Mar 2020 16:04:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema
Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15395 to look at the new patch set (#10). Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema .. IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema Full ACID row format looks like this: { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1, "row": {"i": 1} } User columns are nested under "row". In the frontend we need to create slot descriptors that correspond to the file schema. In the catalog we could mimic the file schema but that would introduce several complexities and corner cases in column resolution. Also in query results the heading of the above user column would be "row.i". Star expansion should also be modified, etc. Because of that in the Catalog I create the exact opposite of the above schema: { "row__id": { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1 } "i": 1 } This way very little modification is needed in the frontend. And the hidden columns can be easily retrieved via 'SELECT row__id.*' when we need those for debugging/testing. We only need to change Path.getAbsolutePath() to return a schema path that corresponds to the file schema. Also in the backend we need some extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the table schema path from the file schema path. Testing: I changed data loading to load ORC files in full ACID format by default. With this change we should be able to scan full ACID tables that are not minor-compacted, don't have deleted rows, and don't have original files. Newly added Tests: * specific queries about hidden columns (full-acid-rowid.test) * SHOW CREATE TABLE (show-create-table-full-acid.test) * DESCRIBE [FORMATTED] TABLE (describe-path.test) * INSERT should be forbidden (acid-negative.test) * added tests for column masking ( ranger_column_masking_complex_types.test) Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb --- M be/src/common/logging.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java M testdata/bin/generate-schema-statements.py M testdata/datasets/README M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test M testdata/workloads/functional-query/queries/QueryTest/acid.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/describe-path.test A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test A