Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19397 )

Change subject: IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external 
Hdfs tables
......................................................................


Patch Set 6:

(14 comments)

Thanks for working on this!

http://gerrit.cloudera.org:8080/#/c/19397/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19397/6//COMMIT_MSG@19
PS6, Line 19: 'iceberg.catalog' = 'hadoop.catalog'
Seems like the hadoop catalog is not supported.


http://gerrit.cloudera.org:8080/#/c/19397/6//COMMIT_MSG@26
PS6, Line 26:  - InputFormat must be either PARQUET, ORC, or AVRO
Hive tables support different file formats in different partitions. Does the 
migration work in such scenarios?

It is not a problem if not, as it is an uncommon use case, and we can also deal 
with it in a follow-up jira.


http://gerrit.cloudera.org:8080/#/c/19397/6//COMMIT_MSG@33
PS6, Line 33: Hadoop Catalog
Hive Catalog is the default, right?


http://gerrit.cloudera.org:8080/#/c/19397/6//COMMIT_MSG@35
PS6, Line 35:  - Child query 4: Drop the temporary Hdfs table.
What happens if there is an error at any step? It would be nice if we could 
undo the steps that have been executed and restore the original table.

Probably it could be tested by extending ChildQueryExecutor with the ability to 
inject errors at the Nth query.


http://gerrit.cloudera.org:8080/#/c/19397/6/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/19397/6/be/src/service/client-request-state.cc@2130
PS6, Line 2130:     RuntimeProfile* set_hdfs_table_profile = 
RuntimeProfile::Create(
              :         &profile_pool_, "Set HDFS table query");
              :     child_profile->AddChild(set_hdfs_table_profile);
              :     
child_queries.emplace_back(params.set_hdfs_table_external_query, this,
              :         parent_server_, set_hdfs_table_profile, &profile_pool_);
nit: can we refactor these code fragments to a helper function?


http://gerrit.cloudera.org:8080/#/c/19397/6/be/src/service/client-request-state.cc@2143
PS6, Line 2143: Refresh temporary HDFS table query
This is not mentioned in the commit message.


http://gerrit.cloudera.org:8080/#/c/19397/6/be/src/service/client-request-state.cc@2168
PS6, Line 2168: params.__isset.create_iceberg_table_query
Could you please add a comment when this can be unset?
IIUC it is not set in case of HiveCatalog, because in that case Iceberg's 
HiveCatalog creates the table for us.


http://gerrit.cloudera.org:8080/#/c/19397/6/fe/src/main/java/org/apache/impala/analysis/MigrateStmt.java
File fe/src/main/java/org/apache/impala/analysis/MigrateStmt.java:

http://gerrit.cloudera.org:8080/#/c/19397/6/fe/src/main/java/org/apache/impala/analysis/MigrateStmt.java@85
PS6, Line 85:     FeTable table = analyzer.getTable(tableName_, 
Privilege.OWNER);
Could you please add authorization tests (e.g. in test_ranger.py) for the case 
when someone without the necessary priviliges tries to migrate an Iceberg table?


http://gerrit.cloudera.org:8080/#/c/19397/6/fe/src/main/java/org/apache/impala/analysis/MigrateStmt.java@118
PS6, Line 118: TRANSLATED_TO_EXTERNAL", "FALSE
Do we need to set this as well?


http://gerrit.cloudera.org:8080/#/c/19397/6/fe/src/main/java/org/apache/impala/analysis/MigrateStmt.java@139
PS6, Line 139: TBL_PROP_EXTERNAL_TABLE_PURGE, "false");
What happens if the original table's purge property was true?


http://gerrit.cloudera.org:8080/#/c/19397/6/fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java
File fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java:

http://gerrit.cloudera.org:8080/#/c/19397/6/fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java@156
PS6, Line 156:     return Math.min(max, Math.max((max + 7) / 8, 
Integer.parseInt(threadNum)));
Could you please add some comments about this formula, and why we are using 
this?


http://gerrit.cloudera.org:8080/#/c/19397/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-migrate-from-external-hdfs-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-migrate-from-external-hdfs-tables.test:

http://gerrit.cloudera.org:8080/#/c/19397/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-migrate-from-external-hdfs-tables.test@26
PS6, Line 26: create table parquet_partitioned like alltypes stored as parquet;
Could you please add tests with more partition types? E.g. partition by STRING, 
DATE, DECIMAL, etc.


http://gerrit.cloudera.org:8080/#/c/19397/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-migrate-from-external-hdfs-tables.test@27
PS6, Line 27: insert into parquet_partitioned partition(year, month) select * 
from alltypes;
Could you please also add tests with partition value being NULL?


http://gerrit.cloudera.org:8080/#/c/19397/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-migrate-from-external-hdfs-tables.test@168
PS6, Line 168: ====
Can we have a test where the original table has external.table.purge=false?



--
To view, visit http://gerrit.cloudera.org:8080/19397
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91e6a9cfe099c263f17b5506d6db459b79ad31a5
Gerrit-Change-Number: 19397
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward <lipeng...@apache.org>
Gerrit-Reviewer: Anonymous Coward <lipeng...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Jan 2023 14:16:28 +0000
Gerrit-HasComments: Yes

Reply via email to