Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15395 to look at the new patch set (#5). Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema ...................................................................... IMPALA-9042: Milestone 1: properly scan files that has full ACID schema Full ACID row format looks like this: { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1, "row": {"i": 1} } User columns are nested under "row". In the frontend we need to create slot descriptors that correspond to the file schema. In the catalog we could mimic the file schema but that would introduce several complexities and corner cases in column resolution. Also in query results the heading of the above user column would be "row.i". Star expansion should also be modified, etc. Because of that in the Catalog I create the exact opposite of the above schema: { "row__id": { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1 } "i": 1 } This way very little modification is needed in the frontend. And the hidden columns can be easily retrieved via 'SELECT row__id.*' when we need those for debugging/testing. We only need to change Path.getAbsolutePath() to return a schema path that corresponds to the file schema. Also in the backend we need some extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the table schema path from the file schema path. Testing: I changed data loading to load ORC files in full ACID format by default. With this change we should be able to scan full ACID tables that are not minor-compacted, don't have deleted rows, and don't have original files. Newly added Tests: * specific queries about hidden columns (full-acid-rowid.test) * SHOW CREATE TABLE (show-create-table-full-acid.test) * DESCRIBE [FORMATTED] TABLE (describe-path.test) * INSERT should be forbidden (acid-negative.test) * added tests for column masking ( ranger_column_masking_complex_types.test) Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java M testdata/bin/generate-schema-statements.py M testdata/datasets/README M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test M testdata/workloads/functional-query/queries/QueryTest/acid.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/describe-path.test A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test A testdata/workloads/functional-query/queries/QueryTest/show-create-table-full-acid.test M tests/authorization/test_ranger.py M tests/metadata/test_show_create_table.py M tests/query_test/test_acid.py M tests/query_test/test_mt_dop.py M tests/query_test/test_nested_types.py M tests/query_test/test_scanners.py M tests/query_test/test_scanners_fuzz.py 50 files changed, 1,411 insertions(+), 539 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/15395/5 -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Norbert Luksa <norbert.lu...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>