tomtongue commented on code in PR #8931:
URL: https://github.com/apache/iceberg/pull/8931#discussion_r1386791377
##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/MigrateTableSparkAction.java:
##########
@@ -108,6 +109,23 @@ public MigrateTableSparkAction backupTableName(String
tableName) {
return this;
}
+ @Override
+ public MigrateTableSparkAction destCatalogName(String catalogName) {
+ CatalogManager catalogManager = spark().sessionState().catalogManager();
+
+ CatalogPlugin catalogPlugin;
+ if (catalogManager.isCatalogRegistered(catalogName)) {
+ catalogPlugin = catalogManager.catalog(catalogName);
+ } else {
+ LOG.warn(
+ "{} doesn't exist in SparkSession. " + "Fallback to current
SparkSession catalog.",
+ catalogName);
+ catalogPlugin = catalogManager.currentCatalog();
+ }
+ this.destCatalog = checkDestinationCatalog(catalogPlugin);
Review Comment:
Thanks for the review, @singhpk234. Yes, as you're saying, the Iceberg
GlueCatalogImpl replicates the "partial" metadata in the rename. So if the
source Spark/Hive table is partitioned, the restore process will fail as
follows:
```
23/11/06 09:54:03 INFO MigrateTableSparkAction: Generating Iceberg metadata
for db.tbl in s3://bucket/path/tbl/metadata
23/11/06 09:54:03 WARN BaseCatalogToHiveConverter: Hive Exception type not
found for AccessDeniedException
23/11/06 09:54:05 INFO ClientConfigurationFactory: Set initial getObject
socket timeout to 2000 ms.
23/11/06 09:54:06 INFO CodeGenerator: Code generated in 230.388332 ms
23/11/06 09:54:06 INFO CodeGenerator: Code generated in 17.169875 ms
23/11/06 09:54:06 INFO CodeGenerator: Code generated in 18.598328 ms
23/11/06 09:54:07 ERROR MigrateTableSparkAction: Failed to perform the
migration, aborting table creation and restoring the original table
23/11/06 09:54:07 INFO MigrateTableSparkAction: Restoring db.tbl from
db.tbl_backup
23/11/06 09:54:08 INFO GlueCatalog: created rename destination table db.tbl
23/11/06 09:54:08 INFO GlueCatalog: Successfully dropped table db.tbl_backup
from Glue
23/11/06 09:54:08 INFO GlueCatalog: Dropped table: db.tbl_backup
23/11/06 09:54:08 INFO GlueCatalog: Successfully renamed table from
db.tbl_backup to garbagedb.iceberg_migrate_w_year_partition
Exception in thread "main"
org.apache.iceberg.exceptions.ValidationException: Unable to get partition spec
for table: `db`.`tbl_backup`
at
org.apache.iceberg.spark.SparkExceptionUtil.toUncheckedException(SparkExceptionUtil.java:55)
at
org.apache.iceberg.spark.SparkTableUtil.importSparkTable(SparkTableUtil.java:415)
at
org.apache.iceberg.spark.SparkTableUtil.importSparkTable(SparkTableUtil.java:460)
...
Caused by: org.apache.spark.sql.AnalysisException:
org.apache.hadoop.hive.ql.metadata.HiveException: Table tbl_backup is not a
partitioned table
at
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:133)
at
org.apache.spark.sql.hive.HiveExternalCatalog.doListPartitions(HiveExternalCatalog.scala:1308)
at
org.apache.spark.sql.hive.HiveExternalCatalog.listPartitions(HiveExternalCatalog.scala:1302)
...
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Table
tbl_backup is not a partitioned table
at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:2676)
at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:2709)
...
```
This error was caused by the partition lost in the renamed table.
So as you know, the way to resolve the migrate restriction, supporting the
rename for GlueHiveMetastoreClient should be the best.
At least there are people who have tried to migrate from their table into
Iceberg on custom catalog like Glue Catalog. But the migrate query cannot be
used because of the rename restriction. So let me consider the better way to
resolve this issue. If there's no way to resolve this issue, I think we need to
ask the GlueHiveMetastoreClient to support the rename operation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]