Hello lipeng...@apache.org, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20077

to look at the new patch set (#4).

Change subject: IMPALA-11013: Support 'MIGRATE TABLE' for external HDFS tables
......................................................................

IMPALA-11013: Support 'MIGRATE TABLE' for external HDFS tables

This patch implements the migration from HDFS tables to Iceberg tables.
The target Iceberg tables should inherit the location of the original
Hdfs tables. The HDFS table has to be an external table.

To migrate a Hive format table stored in HDFS to an Iceberg table use
the command:

ALTER TABLE [dbname.]table_name CONVERT TO ICEBERG [TBLPROPERTIES(...)];

Currently only 'iceberg.catalog' as allowed as a table property.

For example
     - ALTER TABLE hive_table CONVERT TO ICEBERG;
     - ALTER TABLE hive_table CONVERT TO ICEBERG TBLPROPERTIES(
       'iceberg.catalog' = 'hadoop.catalog');

The HDFS table to be converted must follow those requirements:
     - table is an external table
     - table is not a transactional table
     - InputFormat must be either PARQUET, ORC, or AVRO

This an in-place migration so the original data files of the HDFS table
are re-used and not moved, copied or re-created by this operation. The
new Iceberg table will have the 'external.table.purge' property set to
true after the migration.

NUM_THREADS_FOR_TABLE_MIGRATION can control the maximum number of
threads to execute the table conversion.

Process of migration:
 - Step 1: Setting table properties,
           e.g. 'external.table.purge'=false on the HDFS table.
 - Step 2: Rename the HDFS table to a temporary table name using a name
           format of "<original_table_name>_tmp_<random_ID>".
 - Step 3: Refresh the renamed HDFS table.
 - Step 4: Create an external Iceberg table by Iceberg API using the
           data of the Hdfs table.
 - Step 5: For an Iceberg table in Hadoop Catalog, run a CREATE TABLE
           query to add the Iceberg table to HMS as well.
 - Step 6: For an Iceberg table in Hadoop Catalog, set the
           'external.table.purge' property to true in an ALTER TABLE
           query.
 - Step 7: Drop the temporary HDFS table.

Testing:
 - Add e2e tests
 - Add FE UTs
 - Manually tested the runtime performance for a table that is
   unpartitioned and has 10k data files. The runtime is around 10-13s.

Co-authored-by: lipenglin <lipeng...@apache.org>

Change-Id: Iacdad996d680fe545cc9a45e6bc64a348a64cd80
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/Frontend.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M common/thrift/Types.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
A fe/src/main/java/org/apache/impala/analysis/ConvertTableToIcebergStmt.java
M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStringBuilder.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalog.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopCatalog.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopTables.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-migrate-from-external-hdfs-tables.test
M tests/authorization/test_ranger.py
M tests/query_test/test_iceberg.py
34 files changed, 1,357 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/20077/4
--
To view, visit http://gerrit.cloudera.org:8080/20077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iacdad996d680fe545cc9a45e6bc64a348a64cd80
Gerrit-Change-Number: 20077
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <lipeng...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>

Reply via email to