[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Hello Quanlong Huang, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15395 to look at the new patch set (#3). Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. IMPALA-9042: Milestone 1: properly scan files that has full ACID schema Full ACID row format looks like this: { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1, "row": {"i": 1} } User columns are nested under "row". In the frontend we need to create slot descriptors that correspond to the file schema. In the catalog we could mimic the file schema but that would introduce several complexities and corner cases in column resolution. Also in query results the heading of the above user column would be "row.i". Star expansion should also be modified, etc. Because of that in the Catalog I create the exact opposite of the above schema: { "row__id": { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1 } "i": 1 } This way very little modification is needed in the frontend. And the hidden columns can be easily retrieved via 'SELECT row__id.*' when we need those for debugging/testing. We only need to change Path.getAbsolutePath() to return a schema path that corresponds to the file schema. Also in the backend we need some extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the table schema path from the file schema path. Testing: I changed data loading to load ORC files in full ACID format by default. With this change we should be able to scan full ACID tables that are not minor-compacted, don't have deleted rows, and don't have original files. Newly added Tests: * specific queries about hidden columns (full-acid-rowid.test) * SHOW CREATE TABLE (show-create-table-full-acid.test) * INSERT should be forbidden (acid-negative.test) * added tests for column masking ( ranger_column_masking_complex_types.test) TODO: * Currently ALTER TABLE is enabled => assess consequences * make all tests green in exhaustive Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java M testdata/bin/generate-schema-statements.py M testdata/datasets/README M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test A testdata/workloads/functional-query/queries/QueryTest/show-create-table-full-acid.test M tests/authorization/test_ranger.py M tests/metadata/test_show_create_table.py M tests/query_test/test_acid.py M tests/query_test/test_mt_dop.py M tests/query_test/test_scanners.py M tests/query_test/test_scanners_fuzz.py 43 files changed, 1,184 insertions(+), 517 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/15395/3 -- To view, vis
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/bin/generate-schema-statements.py@319 PS3, Line 319: ' flake8: E129 visually indented line with same indent as next logical line http://gerrit.cloudera.org:8080/#/c/15395/3/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/15395/3/tests/query_test/test_scanners_fuzz.py@197 PS3, Line 197: . flake8: E501 line too long (91 > 90 characters) http://gerrit.cloudera.org:8080/#/c/15395/3/tests/query_test/test_scanners_fuzz.py@282 PS3, Line 282: flake8: E261 at least two spaces before inline comment http://gerrit.cloudera.org:8080/#/c/15395/3/tests/query_test/test_scanners_fuzz.py@300 PS3, Line 300: n flake8: E129 visually indented line with same indent as next logical line -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 17 Mar 2020 17:15:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 3: (7 comments) Thanks for reviewing it. Most of the changes are related to tests. For new reviewers: I think it's worth to start with PS2 because it contains the essence of this CR. http://gerrit.cloudera.org:8080/#/c/15395/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15395/2//COMMIT_MSG@63 PS2, Line 63: TODO: > Also need tests on column masking since we also have some hacks in path res Updated test_ranger.py http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/hdfs-orc-scanner.cc@185 PS2, Line 185: if (scan_node_->hdfs_table()->IsFullAcid() && !schema_resolver_->IsFullAcid()) { > I think this is too strict. The test on file schema can be false positive. Done http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/orc-metadata-utils.cc File be/src/exec/orc-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/orc-metadata-utils.cc@101 PS2, Line 101: DCHECK(table_idx >= num_part_cols); > I think this can be a DCHECK since partition columns should be skipped by h Done http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/orc-metadata-utils.cc@106 PS2, Line 106: > Could you add a DCHECK that the resulted index won't overflow? Done http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/orc-metadata-utils.cc@234 PS2, Line 234: root_->getSubtype(0)->getKind() != orc::TypeKind::INT || : root_->getSubtype(1)->getKind() ! > I think we should also check other fields and their types. Done http://gerrit.cloudera.org:8080/#/c/15395/2/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/15395/2/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@847 PS2, Line 847: HIVEFULLACIDWRITE > We don't have write support yet. Is this required somewhere? We need it to create tables. http://gerrit.cloudera.org:8080/#/c/15395/2/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/15395/2/tests/query_test/test_scanners.py@1297 PS2, Line 1297: self.client.execute("create table %s.%s like tpch.lineitem stored as orc" % (db, tbl)) > Do we need this? Doesn't the table being translated to EXTERNAL table? Removed it. -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 17 Mar 2020 17:18:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5510/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 17 Mar 2020 18:01:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 3: (6 comments) Thanks for working on this. Left some comments, mostly questions to get a better understanding of how ACID works in Impala. Later I will look at the tests as well. http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc File be/src/exec/orc-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@110 PS3, Line 110: file_idx -= num_part_cols; nit: This and the same @115 could be moved outside the if. http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@118 PS3, Line 118: table_idx += 1 + num_part_cols; Could you explain the values here? In this case the file_idx should be the original table_idx? http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a297 PS3, Line 297: Removing this mean that we support write operations on full ACID tables? Do we really need to remove this one? http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@224 PS3, Line 224: ensureTableNotFullAcid There are now two ensureTableNotFullAcid methods, that only differ in the parameters and the error msg thrown. Is this method and the INSERT_ONLY_ACID_TABLE_SUPPORTED_ERROR_MSG message used anymore? Can they be removed? http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@237 PS3, Line 237: String nit: maybe change this string to enum? http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@238 PS3, Line 238: throws AnalysisException { nit: missing space -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 18 Mar 2020 17:34:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 3: (13 comments) I think the approach makes sense. Thanks for adding so many tests. http://gerrit.cloudera.org:8080/#/c/15395/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15395/3//COMMIT_MSG@58 PS3, Line 58: * SHOW CREATE TABLE (show-create-table-full-acid.test) Can you also add describe and describe formatted tests? I assume these show hide row__id. http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/hdfs-orc-scanner.cc@181 PS3, Line 181: VLOG_QUERY << filename(); I assume you're going to remove the log lines? http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/hdfs-orc-scanner.cc@185 PS3, Line 185: if (scan_node_->hdfs_table()->IsFullAcid() && !schema_resolver_->IsFullAcid()) { This is going to be addressed at some point, right? So we can scan original files too. http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.h File be/src/exec/orc-metadata-utils.h: http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.h@46 PS3, Line 46: bool IsFullAcid() const; Maybe IsFileFullAcid()? Since it's going to get confusing in future about whether this is referring to the file or the table. http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc File be/src/exec/orc-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@94 PS3, Line 94: if (table_idx == num_part_cols + 5) { I'd find it clearer if it was a named constant instead of the magic number http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@101 PS3, Line 101: DCHECK DCHECK_GE http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@231 PS3, Line 231: bool OrcSchemaResolver::IsFullAcid() const { Can you add a brief comment on the approach? It seems to be basically "if it walks like a duck and quacks like a duck, it's a duck". http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/runtime/descriptors.h@333 PS3, Line 333: IsFullAcid IsFullAcidTable? http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java: http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java@94 PS3, Line 94: // analyzer.ensureTableNotTransactional(table_, "ALTER TABLE"); ? http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Path.java File fe/src/main/java/org/apache/impala/analysis/Path.java: http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Path.java@473 PS3, Line 473: convertToFullAcid convertToFullAcidFilePath? http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/workloads/functional-query/queries/QueryTest/acid-negative.test File testdata/workloads/functional-query/queries/QueryTest/acid-negative.test: http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/workloads/functional-query/queries/QueryTest/acid-negative.test@1 PS3, Line 1: # Why commented out? http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test File testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test: http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test@16 PS3, Line 16: QUERY Can you also add a LABELS section to make sure the labels for the ROW__ID components are sane. Any maybe also a test that selects individual elements of row__id. http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test@17 PS3, Line 17: select row__id.*, * from functional_orc_def.alltypestiny; Do we expect the values to be deterministic across data loads? -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 18 Mar 2020 18:08:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15395 to look at the new patch set (#4). Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. IMPALA-9042: Milestone 1: properly scan files that has full ACID schema Full ACID row format looks like this: { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1, "row": {"i": 1} } User columns are nested under "row". In the frontend we need to create slot descriptors that correspond to the file schema. In the catalog we could mimic the file schema but that would introduce several complexities and corner cases in column resolution. Also in query results the heading of the above user column would be "row.i". Star expansion should also be modified, etc. Because of that in the Catalog I create the exact opposite of the above schema: { "row__id": { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1 } "i": 1 } This way very little modification is needed in the frontend. And the hidden columns can be easily retrieved via 'SELECT row__id.*' when we need those for debugging/testing. We only need to change Path.getAbsolutePath() to return a schema path that corresponds to the file schema. Also in the backend we need some extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the table schema path from the file schema path. Testing: I changed data loading to load ORC files in full ACID format by default. With this change we should be able to scan full ACID tables that are not minor-compacted, don't have deleted rows, and don't have original files. Newly added Tests: * specific queries about hidden columns (full-acid-rowid.test) * SHOW CREATE TABLE (show-create-table-full-acid.test) * DESCRIBE [FORMATTED] TABLE (describe-path.test) * INSERT should be forbidden (acid-negative.test) * added tests for column masking ( ranger_column_masking_complex_types.test) TODO: * make all tests green in exhaustive Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java M testdata/bin/generate-schema-statements.py M testdata/datasets/README M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test M testdata/workloads/functional-query/queries/QueryTest/acid.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/describe-path.test A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test A testdata/workloads/function
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/15395/4/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/15395/4/testdata/bin/generate-schema-statements.py@319 PS4, Line 319: ' flake8: E129 visually indented line with same indent as next logical line http://gerrit.cloudera.org:8080/#/c/15395/4/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/15395/4/tests/query_test/test_scanners_fuzz.py@197 PS4, Line 197: . flake8: E501 line too long (91 > 90 characters) http://gerrit.cloudera.org:8080/#/c/15395/4/tests/query_test/test_scanners_fuzz.py@283 PS4, Line 283: flake8: E261 at least two spaces before inline comment http://gerrit.cloudera.org:8080/#/c/15395/4/tests/query_test/test_scanners_fuzz.py@301 PS4, Line 301: n flake8: E129 visually indented line with same indent as next logical line -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 20 Mar 2020 16:32:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 4: (21 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/15395/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15395/3//COMMIT_MSG@58 PS3, Line 58: * specific queries about hidden columns (full-acid-rowid.test) > Can you also add describe and describe formatted tests? I assume these show Added tests to 'describe-path.test'. http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/hdfs-orc-scanner.cc@181 PS3, Line 181: bool is_file_full_acid = reader_->hasMetadataValue("hive.acid.version") && > I assume you're going to remove the log lines? Done http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/hdfs-orc-scanner.cc@185 PS3, Line 185: return Status(Substitute("Error: Table is in full ACID format, but file " > This is going to be addressed at some point, right? So we can scan original Yes, "original files" are tracked with IMPALA-9515. Added TODO comment here. http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.h File be/src/exec/orc-metadata-utils.h: http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.h@46 PS3, Line 46: private: > Maybe IsFileFullAcid()? Since it's going to get confusing in future about w Removed this function. http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc File be/src/exec/orc-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@94 PS3, Line 94: if (table_idx == num_part_cols + FILE_INDEX_OF_FIELD_ROW) { > I'd find it clearer if it was a named constant instead of the magic number Done http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@101 PS3, Line 101: DCHECK > DCHECK_GE Done http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@110 PS3, Line 110: else { > nit: This and the same @115 could be moved outside the if. Done http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@118 PS3, Line 118: // Hence, in the table metadata this is a top-level column, i.e. it is offsetted > Could you explain the values here? Added some comments about it. Yes, file_idx should remain the same because only the first number is offsetted with 'num_part_cols'. http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@231 PS3, Line 231: type.DebugString(), orc_type.toString(), filename_)); > Can you add a brief comment on the approach? It seems to be basically "if i Looking at the ORC file metadata it turned out that the ACID version is also written there, so using that from now on instead of this duck approach. Do you think it's worth to keep this for verifying the ACID schema? http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/runtime/descriptors.h@333 PS3, Line 333: IsTableFul > IsFullAcidTable? Renamed it to IsTableFullAcid() to make it similar to IsFileFullAcid() (though the latter was entirely removed since then) http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java: http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java@94 PS3, Line 94: // TODO: IMPALA-8831 will enable all ALTER TABLE statements on transactional tables. > ? Added a comment about it. We need ALTER TABLE ADD PARTITION and ALTER TABLE SORT BY during data loading so in the new PS I only enable those. ALTER TABLE stmts are disabled because they fail when the table has column statistics (IMPALA-8831). The enabled ALTER TABLE statements should be safe in those regard. Hopefully there'll be a separate patch for IMPALA-8831 that does this for us so we can remove this from the final version of this change request. http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a297 PS3, Line 297: > Removing this mean that we support write operations on full ACID tables? Do ALTER TABLE and DROP TABLE calls this, but they are supported operations. http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@224 PS3, Line 224: r, table.getFullName() > There are now two ensureTableNotFullAcid methods, that only differ in the p
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5541/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 20 Mar 2020 17:16:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15395 to look at the new patch set (#5). Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. IMPALA-9042: Milestone 1: properly scan files that has full ACID schema Full ACID row format looks like this: { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1, "row": {"i": 1} } User columns are nested under "row". In the frontend we need to create slot descriptors that correspond to the file schema. In the catalog we could mimic the file schema but that would introduce several complexities and corner cases in column resolution. Also in query results the heading of the above user column would be "row.i". Star expansion should also be modified, etc. Because of that in the Catalog I create the exact opposite of the above schema: { "row__id": { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1 } "i": 1 } This way very little modification is needed in the frontend. And the hidden columns can be easily retrieved via 'SELECT row__id.*' when we need those for debugging/testing. We only need to change Path.getAbsolutePath() to return a schema path that corresponds to the file schema. Also in the backend we need some extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the table schema path from the file schema path. Testing: I changed data loading to load ORC files in full ACID format by default. With this change we should be able to scan full ACID tables that are not minor-compacted, don't have deleted rows, and don't have original files. Newly added Tests: * specific queries about hidden columns (full-acid-rowid.test) * SHOW CREATE TABLE (show-create-table-full-acid.test) * DESCRIBE [FORMATTED] TABLE (describe-path.test) * INSERT should be forbidden (acid-negative.test) * added tests for column masking ( ranger_column_masking_complex_types.test) Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java M testdata/bin/generate-schema-statements.py M testdata/datasets/README M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test M testdata/workloads/functional-query/queries/QueryTest/acid.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/describe-path.test A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test A testdata/workloads/functional-query/queries/QueryTest/show-create-table-
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/bin/generate-schema-statements.py@319 PS5, Line 319: ' flake8: E129 visually indented line with same indent as next logical line http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py@197 PS5, Line 197: . flake8: E501 line too long (91 > 90 characters) http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py@283 PS5, Line 283: flake8: E261 at least two spaces before inline comment http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py@301 PS5, Line 301: n flake8: E129 visually indented line with same indent as next logical line -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 23 Mar 2020 14:58:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5566/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 23 Mar 2020 15:43:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/15395/6/be/src/exec/orc-metadata-utils.h File be/src/exec/orc-metadata-utils.h: http://gerrit.cloudera.org:8080/#/c/15395/6/be/src/exec/orc-metadata-utils.h@51 PS6, Line 51: bool is_table_full_acid_; These could be const I think, since they're only set in the constructor http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc File be/src/exec/orc-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@231 PS3, Line 231: "Type mismatch: table column $0 is map to column $1 in ORC file '$2'", > Looking at the ORC file metadata it turned out that the ACID version is als I think it would be good to validate that the schema is as expected, just so the rest of the code can reliably assume that the schema is sane. http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@323 PS6, Line 323: table.getMetaStoreTable().getParameters()); nit: indentation is a bit off http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@329 PS6, Line 329: if (isFullAcid && i == table.getNumClusteringCols() && nit: multi-line if should use parentheses. Maybe this would be better to restructure as an else if anyway... http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test File testdata/workloads/functional-query/queries/QueryTest/describe-path.test: http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test@143 PS5, Line 143: 'row__id','struct<\n operation:int comment ,\n originaltransaction:bigint comment ,\n bucket:int comment ,\n rowid:bigint comment ,\n currenttransaction:bigint comment \n>','' I'm not sure if making this visible is a good idea. What does hive do? -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 26 Mar 2020 23:49:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15395 to look at the new patch set (#7). Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. IMPALA-9042: Milestone 1: properly scan files that has full ACID schema Full ACID row format looks like this: { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1, "row": {"i": 1} } User columns are nested under "row". In the frontend we need to create slot descriptors that correspond to the file schema. In the catalog we could mimic the file schema but that would introduce several complexities and corner cases in column resolution. Also in query results the heading of the above user column would be "row.i". Star expansion should also be modified, etc. Because of that in the Catalog I create the exact opposite of the above schema: { "row__id": { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1 } "i": 1 } This way very little modification is needed in the frontend. And the hidden columns can be easily retrieved via 'SELECT row__id.*' when we need those for debugging/testing. We only need to change Path.getAbsolutePath() to return a schema path that corresponds to the file schema. Also in the backend we need some extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the table schema path from the file schema path. Testing: I changed data loading to load ORC files in full ACID format by default. With this change we should be able to scan full ACID tables that are not minor-compacted, don't have deleted rows, and don't have original files. Newly added Tests: * specific queries about hidden columns (full-acid-rowid.test) * SHOW CREATE TABLE (show-create-table-full-acid.test) * DESCRIBE [FORMATTED] TABLE (describe-path.test) * INSERT should be forbidden (acid-negative.test) * added tests for column masking ( ranger_column_masking_complex_types.test) Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java M testdata/bin/generate-schema-statements.py M testdata/datasets/README M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test M testdata/workloads/functional-query/queries/QueryTest/acid.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/describe-path.test A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test A testdata/workloads/functional-query/queries/QueryTest/show-create-table-full-acid.test M test
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/15395/7/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/15395/7/testdata/bin/generate-schema-statements.py@321 PS7, Line 321: ' flake8: E129 visually indented line with same indent as next logical line http://gerrit.cloudera.org:8080/#/c/15395/7/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/15395/7/tests/query_test/test_scanners_fuzz.py@306 PS7, Line 306: n flake8: E129 visually indented line with same indent as next logical line -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Mar 2020 16:15:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/15395/8/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/15395/8/testdata/bin/generate-schema-statements.py@321 PS8, Line 321: ' flake8: E129 visually indented line with same indent as next logical line http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py@306 PS8, Line 306: n flake8: E129 visually indented line with same indent as next logical line -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Mar 2020 16:27:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15395 to look at the new patch set (#8). Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. IMPALA-9042: Milestone 1: properly scan files that has full ACID schema Full ACID row format looks like this: { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1, "row": {"i": 1} } User columns are nested under "row". In the frontend we need to create slot descriptors that correspond to the file schema. In the catalog we could mimic the file schema but that would introduce several complexities and corner cases in column resolution. Also in query results the heading of the above user column would be "row.i". Star expansion should also be modified, etc. Because of that in the Catalog I create the exact opposite of the above schema: { "row__id": { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1 } "i": 1 } This way very little modification is needed in the frontend. And the hidden columns can be easily retrieved via 'SELECT row__id.*' when we need those for debugging/testing. We only need to change Path.getAbsolutePath() to return a schema path that corresponds to the file schema. Also in the backend we need some extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the table schema path from the file schema path. Testing: I changed data loading to load ORC files in full ACID format by default. With this change we should be able to scan full ACID tables that are not minor-compacted, don't have deleted rows, and don't have original files. Newly added Tests: * specific queries about hidden columns (full-acid-rowid.test) * SHOW CREATE TABLE (show-create-table-full-acid.test) * DESCRIBE [FORMATTED] TABLE (describe-path.test) * INSERT should be forbidden (acid-negative.test) * added tests for column masking ( ranger_column_masking_complex_types.test) Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java M testdata/bin/generate-schema-statements.py M testdata/datasets/README M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test M testdata/workloads/functional-query/queries/QueryTest/acid.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/describe-path.test A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test A testdata/workloads/functional-query/queries/QueryTest/show-create-table-full-acid.test M test
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 8: (7 comments) PS8 is a rebase. http://gerrit.cloudera.org:8080/#/c/15395/6/be/src/exec/orc-metadata-utils.h File be/src/exec/orc-metadata-utils.h: http://gerrit.cloudera.org:8080/#/c/15395/6/be/src/exec/orc-metadata-utils.h@51 PS6, Line 51: private: > These could be const I think, since they're only set in the constructor Done http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc File be/src/exec/orc-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@231 PS3, Line 231: return Status(Substitute( > I think it would be good to validate that the schema is as expected, just s Turned it into a validator function that returns a Status, calling it from HdfsOrcScanner::Open(). http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@323 PS6, Line 323: table.getMetaStoreTable().getParameters()); > nit: indentation is a bit off Done http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@329 PS6, Line 329: Preconditions.checkState(col.getName().equals("row__id")); > nit: multi-line if should use parentheses. Maybe this would be better to re Done, also converted the 'equals("row__id")' check to a Precondition. http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test File testdata/workloads/functional-query/queries/QueryTest/describe-path.test: http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test@143 PS5, Line 143: describe functional_orc_def.alltypes > I'm not sure if making this visible is a good idea. What does hive do? Yeah, I wasn't sure about this, but it was handy during development. Hive doesn't reveal the hidden fields in DESCRIBE, so I switched to hiding them too. However, when explicitly asked for it, currently I still allow describe on the row__id field, i.e.: DESCRIBE .row__id; +-++-+ | name| type | comment | +-++-+ | operation | int| | | originaltransaction | bigint | | | bucket | int| | | rowid | bigint | | | currenttransaction | bigint | | +-++-+ I think it's not wrong since we also allow the user to explicitly query the fields of the row__id column. Hive only allows DESCRIBE on tables. http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py@197 PS5, Line 197: > flake8: E501 line too long (91 > 90 characters) Done http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py@283 PS5, Line 283: > flake8: E261 at least two spaces before inline comment Done -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Mar 2020 16:29:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5620/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Mar 2020 16:57:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 8: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Mar 2020 17:05:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/orc-metadata-utils.cc File be/src/exec/orc-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/orc-metadata-utils.cc@87 PS8, Line 87: DCHECK(ValidateFullAcidFileSchema().ok()); // Should have already been validated. It would be nice if this printed the validation error. Maybe we need a DCHECK_OK macro, like EXPECT_OK? http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test File testdata/workloads/functional-query/queries/QueryTest/describe-path.test: http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test@143 PS5, Line 143: describe functional_orc_def.alltypes > Yeah, I wasn't sure about this, but it was handy during development. That's fine, I think there's some similar behaviour with nested types if you describe nested fields that are not explicitly listed, e.g. you can refer to a nested item with the .item member or implicitly by omitting it. -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Mar 2020 17:05:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5621/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Mar 2020 17:08:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/orc-metadata-utils.cc File be/src/exec/orc-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/orc-metadata-utils.cc@87 PS8, Line 87: DCHECK(ValidateFullAcidFileSchema().ok()); // Should have already been validated. > It would be nice if this printed the validation error. Maybe we need a DCHE I like the idea, but I run into name clashes because we define KUDU_HEADERS_USE_SHORT_STATUS_MACROS in be/CMakeLists.txt. Therefore we already have a DCHECK_OK but it expects a kudu::Status. (plus it always evaluates the expression passed to the macro). To resolve this I opened: https://gerrit.cloudera.org/#/c/15596/ -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 30 Mar 2020 16:00:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 8: (6 comments) Thanks for adding so many tests! Left some comments for the tests. The changes for path resolution looks good to me. Haven't gone through the changes related to data load. I'll look into them tomorrow if still needs. http://gerrit.cloudera.org:8080/#/c/15395/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15395/8//COMMIT_MSG@7 PS8, Line 7: IMPALA-9042 Change this to IMPALA-9484? http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/hdfs-orc-scanner.cc@194 PS8, Line 194: "hive.acid.version" nit: make this a constant? http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/hdfs-orc-scanner.cc@197 PS8, Line 197: but file " : "is not nit: explain that it doesn't have metadata "hive.acid.version" = "2"? http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_nested_types.py File tests/query_test/test_nested_types.py: http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_nested_types.py@213 PS8, Line 213: base_table = "functional_orc_def.complextypestbl_non_transactional" Do we have test coverage on reading nested types from full-ACID partitioned table? http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners.py@207 PS8, Line 207: functional_orc_def I think we should get the db name by "functional" + vector.get_value('table_format').db_suffix() in case that the compression is not deflate, but snappy or others in exhaustive test. http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py@181 PS8, Line 181: self.run_stmt_in_hive("insert into %s.%s select * from %s.%s" % (fuzz_db, : fuzz_table, src_db, src_table)) I'm not sure if this copies the data of complextypestbl correctly since I previously encountered a name resolution issue for struct fields: https://issues.apache.org/jira/browse/IMPALA-9410?focusedCommentId=17060613&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17060613 Could you verify it? If the issue still exists, we can copy data from the parquet table. -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 31 Mar 2020 10:10:37 + Gerrit-HasComments: Yes