[Impala-ASF-CR] IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables

2023-01-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19397 )

Change subject: IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external 
Hdfs tables
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12163/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19397
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91e6a9cfe099c263f17b5506d6db459b79ad31a5
Gerrit-Change-Number: 19397
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 13 Jan 2023 11:45:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables

2023-01-13 Thread Anonymous Coward (Code Review)
lipeng...@apache.org has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19397 )

Change subject: IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external 
Hdfs tables
..


Patch Set 6:

(3 comments)

I fix the migrated Iceberg tables should be Hive Catalog by default. This is a 
big patch, thanks for comments!

http://gerrit.cloudera.org:8080/#/c/19397/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19397/4//COMMIT_MSG@23
PS4, Line 23: s tables must follow
> What do you mean by 'temporary table'? In Impala we don't support such tabl
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-TemporaryTables
I noticed that hive supports this type table, so I added this logic, which 
seems to be omitted!


http://gerrit.cloudera.org:8080/#/c/19397/4//COMMIT_MSG@30
PS4, Line 30: query
> nit: query
Done


http://gerrit.cloudera.org:8080/#/c/19397/4/fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java
File fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java:

http://gerrit.cloudera.org:8080/#/c/19397/4/fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java@71
PS4, Line 71:
> Would it be possible to use Iceberg's Catalogs API?
It was a flaw in my design, and I fixed it. We should migrate to Hive Catalog 
by default.



--
To view, visit http://gerrit.cloudera.org:8080/19397
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91e6a9cfe099c263f17b5506d6db459b79ad31a5
Gerrit-Change-Number: 19397
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 13 Jan 2023 11:22:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables

2023-01-13 Thread Anonymous Coward (Code Review)
lipeng...@apache.org has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/19397 )

Change subject: IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external 
Hdfs tables
..

IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables

This patch implements the migration from Hdfs tables to Iceberg tables.
The target Iceberg tables should inherit the location of the original
Hdfs tables. For Hdfs tables with lots of partitions, we can use the
'metadata.generator.threads' property to increase the thread concurrency
of building the Iceberg metadata according to the data files in the Hdfs
tables.

We can do that by the following statements:
 - MIGRATE TABLE  TO ICEBERG;
 - MIGRATE TABLE  TO ICEBERG TBLPROPERTIES(
   'iceberg.catalog' = 'hadoop.catalog');
 - MIGRATE TABLE  TO ICEBERG TBLPROPERTIES(
   'metadata.generator.threads' = '10');

Hdfs tables must follow those requirements:
 - external tables
 - not transactional tables
 - InputFormat must be either PARQUET, ORC, or AVRO

Process of migration:
 - Child query 1: Ensure that the Hdfs table is a pure external table.
 - Child query 2: Rename the Hdfs table to a temporary table name.
 - Create an external Iceberg table by Iceberg API using the data of the
   Hdfs table.
 - Child query 3: Create Iceberg table(Hadoop Catalog) inherits the
   Hdfs table location.
 - Child query 4: Drop the temporary Hdfs table.

Testing:
 - Add e2e tests
 - Add fe UTs

Change-Id: I91e6a9cfe099c263f17b5506d6db459b79ad31a5
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M common/thrift/Frontend.thrift
M common/thrift/Types.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
A fe/src/main/java/org/apache/impala/analysis/MigrateStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStringBuilder.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-migrate-from-external-hdfs-tables.test
M tests/query_test/test_iceberg.py
23 files changed, 1,005 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/19397/6
--
To view, visit http://gerrit.cloudera.org:8080/19397
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91e6a9cfe099c263f17b5506d6db459b79ad31a5
Gerrit-Change-Number: 19397
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables

2023-01-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19397 )

Change subject: IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external 
Hdfs tables
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12162/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19397
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91e6a9cfe099c263f17b5506d6db459b79ad31a5
Gerrit-Change-Number: 19397
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 13 Jan 2023 11:12:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables

2023-01-13 Thread Anonymous Coward (Code Review)
lipeng...@apache.org has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/19397 )

Change subject: IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external 
Hdfs tables
..

IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables

This patch implements the migration from Hdfs tables to Iceberg tables.
The target Iceberg tables should inherit the location of the original
Hdfs tables. For Hdfs tables with lots of partitions, we can use the
'metadata.generator.threads' property to increase the thread concurrency
of building the Iceberg metadata according to the data files in the Hdfs
tables.

We can do that by the following statements:
 - MIGRATE TABLE  TO ICEBERG;
 - MIGRATE TABLE  TO ICEBERG TBLPROPERTIES(
   'iceberg.catalog' = 'hadoop.catalog');
 - MIGRATE TABLE  TO ICEBERG TBLPROPERTIES(
   'metadata.generator.threads' = '10');

Hdfs tables must follow those requirements:
 - external tables
 - not transactional tables
 - InputFormat must be either PARQUET, ORC, or AVRO

Process of migration:
 - Child querie 1: Ensure that the Hdfs table is a pure external table.
 - Child querie 2: Rename the Hdfs table to a temporary table name.
 - Create an external Iceberg table by Iceberg API using the data of the
   Hdfs table.
 - Child querie 3: Create Iceberg table(Hadoop Catalog) inherits the
   Hdfs table location.
 - Child querie 4: Drop the temporary Hdfs table.

Testing:
 - Add e2e tests
 - Add fe UTs

Change-Id: I91e6a9cfe099c263f17b5506d6db459b79ad31a5
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M common/thrift/Frontend.thrift
M common/thrift/Types.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
A fe/src/main/java/org/apache/impala/analysis/MigrateStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStringBuilder.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-migrate-from-external-hdfs-tables.test
M tests/query_test/test_iceberg.py
23 files changed, 1,005 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/19397/5
--
To view, visit http://gerrit.cloudera.org:8080/19397
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91e6a9cfe099c263f17b5506d6db459b79ad31a5
Gerrit-Change-Number: 19397
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy