tomtongue opened a new pull request #3448: URL: https://github.com/apache/iceberg/pull/3448
### Changes Making the error message of ALTER TABLE RENAME TO for Glue Data Catalog compatible with the error `java.lang.UnsupportedOperationException` which is shown by Spark 3.1.1/Glue 3.0 by adding input/out format and SerdeLib for create table operation. ### Current situation Currently when running [ALTER TABLE RENAME TO](https://spark.apache.org/docs/3.1.1/sql-ref-syntax-ddl-alter-table.html) for an iceberg table in Glue Data Catalog by SparkSQL in Glue 3.0 (/Spark3.1.1), the following error message is shown in the logs. ``` // Example error: Exception in User Class: org.apache.spark.sql.AnalysisException : org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table iceberg_1635860355. StorageDescriptor#InputFormat cannot be null for table: iceberg_1635860355 (Service: null; Status Code: 0; Error Code: null; Request ID: null; Proxy: null) ``` Running script in Glue: ```py object GlueApp { def main(sysArgs: Array[String]) { val sc: SparkContext = new SparkContext() val gc: GlueContext = new GlueContext(sc) val spark = gc.getSparkSession // CREATE ICEBERG TABLE val table = s"iceberg_${Instant.now.getEpochSecond}" val ddl = s""" CREATE TABLE glue_catalog.garbagedb.$table(id bigint, data string) USING iceberg """ spark.sql(ddl) // ALTER TABLE RENAME TO by SparkSQL val renamedTable = table + "_rename" println("Running ALTER TABLE query.") spark.sql(s"ALTER TABLE garbagedb.$table RENAME TO garbagedb.$renamedTable") // Query by SparkSQL, NOT ALTER TABLE query with iceberg, ``` This error message is caused by no iceberg table input format because the input format is not added to Glue Data Catalog table when creating an iceberg table by the SparkSQL DDL. The similar errors occur if there's no output format or serdelib in the Glue Data Catalog. The error messages are not expected. #### Expected result If a table information is correctly filled in (for example, by Glue Crawler), we can get the following error message. (As you know, the Glue Data Catalog currently doesn't support ALTER TABLE RENAME TO by SparkSQL. I understand Iceberg can handle this query by Drop and Re-create a table). ``` // Expected error message if a table in Glue Data Catalog has input/output format and serdelib. Exception in User Class: org.apache.spark.sql.AnalysisException : java.lang.UnsupportedOperationException: Table rename is not supported ``` ### After changes After adding input/output format and serdelib to the iceberg table in Glue Data Catalog, the error message is shown as follows: ``` Exception in User Class: org.apache.spark.sql.AnalysisException : java.lang.UnsupportedOperationException: Table rename is not supported ``` And the result of GetTable API is here: ```json { "Table": { "Name": "iceberg_1635861451", "DatabaseName": "db_name", "CreateTime": 1635861458.0, "UpdateTime": 1635861458.0, "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "id", "Type": "bigint", "Parameters": { "iceberg.field.id": "1", "iceberg.field.optional": "true", "iceberg.field.type.string": "bigint", "iceberg.field.type.typeid": "LONG", "iceberg.field.usage": "schema-column" } }, { "Name": "data", "Type": "string", "Parameters": { "iceberg.field.id": "2", "iceberg.field.optional": "true", "iceberg.field.type.string": "string", "iceberg.field.type.typeid": "STRING", "iceberg.field.usage": "schema-column" } } ], "Location": "s3://bucket/db_name.db/iceberg_1635861451", "InputFormat": "org.apache.hadoop.mapred.FileInputFormat", "OutputFormat": "org.apache.hadoop.mapred.FileOutputFormat", "Compressed": false, "NumberOfBuckets": 0, "SerdeInfo": { "SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe" }, "SortColumns": [], "StoredAsSubDirectories": false }, "TableType": "EXTERNAL_TABLE", "Parameters": { "metadata_location": "s3://bucket/db_name.db/iceberg_1635861451/metadata/00000-af1973c4-44f8-4a98-95a1-457b309a4f9d.metadata.json", "table_type": "ICEBERG" }, "CreatedBy": "arn:aws:sts::account_id:assumed-role/role_name", "IsRegisteredWithLakeFormation": false, "CatalogId": "account_id", "IsRowFilteringEnabled": false } } ``` #### Why these input/outformat and serdelib are selected? The values for input/output format and serdlib are chosen from https://github.com/apache/iceberg/blob/master/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L404. it's because that these values are used for Glue Catalog (If i misunderstand, please correct me.) Best regards, Tom -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
