[Impala-ASF-CR] IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19397 ) Change subject: IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12163/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19397 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91e6a9cfe099c263f17b5506d6db459b79ad31a5 Gerrit-Change-Number: 19397 Gerrit-PatchSet: 6 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 13 Jan 2023 11:45:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables
lipeng...@apache.org has posted comments on this change. ( http://gerrit.cloudera.org:8080/19397 ) Change subject: IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables .. Patch Set 6: (3 comments) I fix the migrated Iceberg tables should be Hive Catalog by default. This is a big patch, thanks for comments! http://gerrit.cloudera.org:8080/#/c/19397/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19397/4//COMMIT_MSG@23 PS4, Line 23: s tables must follow > What do you mean by 'temporary table'? In Impala we don't support such tabl https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-TemporaryTables I noticed that hive supports this type table, so I added this logic, which seems to be omitted! http://gerrit.cloudera.org:8080/#/c/19397/4//COMMIT_MSG@30 PS4, Line 30: query > nit: query Done http://gerrit.cloudera.org:8080/#/c/19397/4/fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java File fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java: http://gerrit.cloudera.org:8080/#/c/19397/4/fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java@71 PS4, Line 71: > Would it be possible to use Iceberg's Catalogs API? It was a flaw in my design, and I fixed it. We should migrate to Hive Catalog by default. -- To view, visit http://gerrit.cloudera.org:8080/19397 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91e6a9cfe099c263f17b5506d6db459b79ad31a5 Gerrit-Change-Number: 19397 Gerrit-PatchSet: 6 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 13 Jan 2023 11:22:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables
lipeng...@apache.org has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/19397 ) Change subject: IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables .. IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables This patch implements the migration from Hdfs tables to Iceberg tables. The target Iceberg tables should inherit the location of the original Hdfs tables. For Hdfs tables with lots of partitions, we can use the 'metadata.generator.threads' property to increase the thread concurrency of building the Iceberg metadata according to the data files in the Hdfs tables. We can do that by the following statements: - MIGRATE TABLE TO ICEBERG; - MIGRATE TABLE TO ICEBERG TBLPROPERTIES( 'iceberg.catalog' = 'hadoop.catalog'); - MIGRATE TABLE TO ICEBERG TBLPROPERTIES( 'metadata.generator.threads' = '10'); Hdfs tables must follow those requirements: - external tables - not transactional tables - InputFormat must be either PARQUET, ORC, or AVRO Process of migration: - Child query 1: Ensure that the Hdfs table is a pure external table. - Child query 2: Rename the Hdfs table to a temporary table name. - Create an external Iceberg table by Iceberg API using the data of the Hdfs table. - Child query 3: Create Iceberg table(Hadoop Catalog) inherits the Hdfs table location. - Child query 4: Drop the temporary Hdfs table. Testing: - Add e2e tests - Add fe UTs Change-Id: I91e6a9cfe099c263f17b5506d6db459b79ad31a5 --- M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/frontend.cc M be/src/service/frontend.h M common/thrift/Frontend.thrift M common/thrift/Types.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java A fe/src/main/java/org/apache/impala/analysis/MigrateStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStringBuilder.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java A testdata/workloads/functional-query/queries/QueryTest/iceberg-migrate-from-external-hdfs-tables.test M tests/query_test/test_iceberg.py 23 files changed, 1,005 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/19397/6 -- To view, visit http://gerrit.cloudera.org:8080/19397 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91e6a9cfe099c263f17b5506d6db459b79ad31a5 Gerrit-Change-Number: 19397 Gerrit-PatchSet: 6 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19397 ) Change subject: IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12162/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19397 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91e6a9cfe099c263f17b5506d6db459b79ad31a5 Gerrit-Change-Number: 19397 Gerrit-PatchSet: 5 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 13 Jan 2023 11:12:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables
lipeng...@apache.org has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/19397 ) Change subject: IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables .. IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables This patch implements the migration from Hdfs tables to Iceberg tables. The target Iceberg tables should inherit the location of the original Hdfs tables. For Hdfs tables with lots of partitions, we can use the 'metadata.generator.threads' property to increase the thread concurrency of building the Iceberg metadata according to the data files in the Hdfs tables. We can do that by the following statements: - MIGRATE TABLE TO ICEBERG; - MIGRATE TABLE TO ICEBERG TBLPROPERTIES( 'iceberg.catalog' = 'hadoop.catalog'); - MIGRATE TABLE TO ICEBERG TBLPROPERTIES( 'metadata.generator.threads' = '10'); Hdfs tables must follow those requirements: - external tables - not transactional tables - InputFormat must be either PARQUET, ORC, or AVRO Process of migration: - Child querie 1: Ensure that the Hdfs table is a pure external table. - Child querie 2: Rename the Hdfs table to a temporary table name. - Create an external Iceberg table by Iceberg API using the data of the Hdfs table. - Child querie 3: Create Iceberg table(Hadoop Catalog) inherits the Hdfs table location. - Child querie 4: Drop the temporary Hdfs table. Testing: - Add e2e tests - Add fe UTs Change-Id: I91e6a9cfe099c263f17b5506d6db459b79ad31a5 --- M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/frontend.cc M be/src/service/frontend.h M common/thrift/Frontend.thrift M common/thrift/Types.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java A fe/src/main/java/org/apache/impala/analysis/MigrateStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStringBuilder.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java A testdata/workloads/functional-query/queries/QueryTest/iceberg-migrate-from-external-hdfs-tables.test M tests/query_test/test_iceberg.py 23 files changed, 1,005 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/19397/5 -- To view, visit http://gerrit.cloudera.org:8080/19397 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91e6a9cfe099c263f17b5506d6db459b79ad31a5 Gerrit-Change-Number: 19397 Gerrit-PatchSet: 5 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy