[GitHub] [iceberg] tomtongue opened a new pull request #3448: Make the error message of ALTER TABLE RENAME TO by SparkSQL for Glue …

GitBox Tue, 02 Nov 2021 07:28:19 -0700


tomtongue opened a new pull request #3448:
URL: https://github.com/apache/iceberg/pull/3448



   ### Changes
   Making the error message of ALTER TABLE RENAME TO for Glue Data Catalog 
compatible with the error `java.lang.UnsupportedOperationException` which is  
shown by Spark 3.1.1/Glue 3.0 by adding input/out format and SerdeLib for 
create table operation.
   
   
   ### Current situation
   Currently when running [ALTER TABLE RENAME 
TO](https://spark.apache.org/docs/3.1.1/sql-ref-syntax-ddl-alter-table.html) 
for an iceberg table in Glue Data Catalog by SparkSQL in Glue 3.0 
(/Spark3.1.1), the following error message is shown in the logs.
   
   ```
   // Example error:
   Exception in User Class: org.apache.spark.sql.AnalysisException : 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
iceberg_1635860355. StorageDescriptor#InputFormat cannot be null for table: 
iceberg_1635860355 (Service: null; Status Code: 0; Error Code: null; Request 
ID: null; Proxy: null)
   ```
   
   Running script in Glue:
   
   ```py
   object GlueApp {
       def main(sysArgs: Array[String]) {
           val sc: SparkContext = new SparkContext()
           val gc: GlueContext = new GlueContext(sc)
           val spark = gc.getSparkSession
        
           // CREATE ICEBERG TABLE
           val table = s"iceberg_${Instant.now.getEpochSecond}"
           val ddl = s"""
               CREATE TABLE glue_catalog.garbagedb.$table(id bigint, data 
string) USING iceberg
           """
           spark.sql(ddl)
   
           // ALTER TABLE RENAME TO by SparkSQL
           val renamedTable = table + "_rename"
           println("Running ALTER TABLE query.")
           spark.sql(s"ALTER TABLE garbagedb.$table RENAME TO 
garbagedb.$renamedTable")  // Query by SparkSQL, NOT ALTER TABLE query with 
iceberg, 
   
   ```
   
   This error message is caused by no iceberg table input format because the 
input format is not added to Glue Data Catalog table when creating an iceberg 
table by the SparkSQL DDL. 
   
   The similar errors occur if there's no output format or serdelib in the Glue 
Data Catalog.
   
   The error messages are not expected.
   
   #### Expected result
   If a table information is correctly filled in (for example, by Glue 
Crawler), we can get the following error message. (As you know, the Glue Data 
Catalog currently doesn't support ALTER TABLE RENAME TO by SparkSQL. I 
understand Iceberg can handle this query by Drop and Re-create a table).
   
   ```
   // Expected error message if a table in Glue Data Catalog has input/output 
format and serdelib.
   Exception in User Class: org.apache.spark.sql.AnalysisException : 
java.lang.UnsupportedOperationException: Table rename is not supported
   ```
   
   ### After changes
   After adding input/output format and serdelib to the iceberg table in Glue 
Data Catalog, the error message is shown as follows:
   
   ```
   Exception in User Class: org.apache.spark.sql.AnalysisException : 
java.lang.UnsupportedOperationException: Table rename is not supported
   ```
   
   And the result of GetTable API is here:
   
   ```json
   {
       "Table": {
           "Name": "iceberg_1635861451",
           "DatabaseName": "db_name",
           "CreateTime": 1635861458.0,
           "UpdateTime": 1635861458.0,
           "Retention": 0,
           "StorageDescriptor": {
               "Columns": [
                   {
                       "Name": "id",
                       "Type": "bigint",
                       "Parameters": {
                           "iceberg.field.id": "1",
                           "iceberg.field.optional": "true",
                           "iceberg.field.type.string": "bigint",
                           "iceberg.field.type.typeid": "LONG",
                           "iceberg.field.usage": "schema-column"
                       }
                   },
                   {
                       "Name": "data",
                       "Type": "string",
                       "Parameters": {
                           "iceberg.field.id": "2",
                           "iceberg.field.optional": "true",
                           "iceberg.field.type.string": "string",
                           "iceberg.field.type.typeid": "STRING",
                           "iceberg.field.usage": "schema-column"
                       }
                   }
               ],
               "Location": "s3://bucket/db_name.db/iceberg_1635861451",
               "InputFormat": "org.apache.hadoop.mapred.FileInputFormat",
               "OutputFormat": "org.apache.hadoop.mapred.FileOutputFormat",
               "Compressed": false,
               "NumberOfBuckets": 0,
               "SerdeInfo": {
                   "SerializationLibrary": 
"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
               },
               "SortColumns": [],
               "StoredAsSubDirectories": false
           },
           "TableType": "EXTERNAL_TABLE",
           "Parameters": {
               "metadata_location": 
"s3://bucket/db_name.db/iceberg_1635861451/metadata/00000-af1973c4-44f8-4a98-95a1-457b309a4f9d.metadata.json",
               "table_type": "ICEBERG"
           },
           "CreatedBy": "arn:aws:sts::account_id:assumed-role/role_name",
           "IsRegisteredWithLakeFormation": false,
           "CatalogId": "account_id",
           "IsRowFilteringEnabled": false
       }
   }
   ```
   
   #### Why these input/outformat and serdelib are selected?
   The values for input/output format and serdlib are chosen from 
https://github.com/apache/iceberg/blob/master/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L404.
 it's because that these values are used for Glue Catalog (If i misunderstand, 
please correct me.)
   
   Best regards,
   Tom
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] tomtongue opened a new pull request #3448: Make the error message of ALTER TABLE RENAME TO by SparkSQL for Glue …

Reply via email to