[GitHub] [hudi] MathurCodes1 opened a new issue, #9096: [SUPPORT] Unable to alter column name for a Hudi table.

via GitHub Thu, 29 Jun 2023 11:19:00 -0700


MathurCodes1 opened a new issue, #9096:
URL: https://github.com/apache/hudi/issues/9096


   
   
   **Describe the problem you faced**
   I'm unable to alter the column name of Hudi table .
   spark.sql("ALTER TABLE customer_db.customer RENAME COLUMN subid TO 
subidentifier") unbable to change the column name.
   
   A clear and concise description of the problem.
   
   I'm unable to alter the column name of Hudi table .
   spark.sql("ALTER TABLE customer_db.customer RENAME COLUMN subid TO 
subidentifier") code is unable to change the column name.
   
   Getting the following error when trying to change the column using above 
code:
   **RENAME COLUMN is only supported with v2 tables**
   
   
   **To Reproduce**
   
   ```
   import com.amazonaws.services.glue.GlueContext
   import com.amazonaws.services.glue.util.{GlueArgParser, Job}
   import org.apache.hudi.DataSourceWriteOptions
   import org.apache.spark.sql.functions._
   import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
   import org.apache.spark.{SparkConf, SparkContext}
   
   import scala.collection.JavaConverters._
   import scala.collection.mutable
   
   object ReportingJob {
   
     var spark: SparkSession = _
     var glueContext: GlueContext = _
   
     def main(inputParams: Array[String]): Unit = {
   
       val args: Map[String, String] = 
GlueArgParser.getResolvedOptions(inputParams, Seq("JOB_NAME").toArray)
       val sysArgs: mutable.Map[String, String] = 
scala.collection.mutable.Map(args.toSeq: _*)
      
       implicit val glueContext: GlueContext = init(sysArgs)
       implicit val spark: SparkSession = glueContext.getSparkSession
   
       import spark.implicits._
        
   val partitionColumnName: String = "id"
       val hudiTableName: String = "Customer"
       val preCombineKey: String = "id"
       val recordKey = "id"
       val basePath= "s3://aws-amazon-uk/customer/production/"
       
       
      val df= Seq((123,"1","seq1"),(124,"0","seq2")).toDF("id","subid","subseq")
       
         val hudiCommonOptions: Map[String, String] = Map(
           "hoodie.table.name" -> hudiTableName,
           "hoodie.datasource.write.keygenerator.class" -> 
"org.apache.hudi.keygen.ComplexKeyGenerator",
           "hoodie.datasource.write.precombine.field" -> preCombineKey,
           "hoodie.datasource.write.recordkey.field" -> recordKey,
           "hoodie.datasource.write.operation" -> "bulk_insert",
           //"hoodie.datasource.write.operation" -> "upsert",
           "hoodie.datasource.write.row.writer.enable" -> "true",
           "hoodie.datasource.write.reconcile.schema" -> "true",
           "hoodie.datasource.write.partitionpath.field" -> partitionColumnName,
           "hoodie.datasource.write.hive_style_partitioning" -> "true",
           // "hoodie.bulkinsert.shuffle.parallelism" -> "2000",
           //  "hoodie.upsert.shuffle.parallelism" -> "400",
           "hoodie.datasource.hive_sync.enable" -> "true",
           "hoodie.datasource.hive_sync.table" -> hudiTableName,
           "hoodie.datasource.hive_sync.database" -> "customer_db",
           "hoodie.datasource.hive_sync.partition_fields" -> 
partitionColumnName,
           "hoodie.datasource.hive_sync.partition_extractor_class" -> 
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
           "hoodie.datasource.hive_sync.use_jdbc" -> "false",
           "hoodie.combine.before.upsert" -> "true",
           "hoodie.avro.schema.external.transformation" -> "true",
           "hoodie.schema.on.read.enable" -> "true",
           "hoodie.datasource.write.schema.allow.auto.evolution.column.drop" -> 
"true",
           "hoodie.index.type" -> "BLOOM",
           "spark.hadoop.parquet.avro.write-old-list-structure" -> "false",
           DataSourceWriteOptions.TABLE_TYPE.key() -> "COPY_ON_WRITE"
         )
   
   
    
         df.write.format("org.apache.hudi")
           .options(hudiCommonOptions)
           .mode(SaveMode.Overwrite)
           .save(basePath+hudiTableName)
                
                spark.sql("ALTER TABLE customer_db.customer RENAME COLUMN subid 
TO subidentifier")
     commit()
     }
   
     def commit(): Unit = {
       Job.commit()
     }
   
   
     def init(sysArgs: mutable.Map[String, String]): GlueContext = {
   
       val conf = new SparkConf()
   
       conf.set("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer")
       conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "CORRECTED")
       conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite", "CORRECTED")
       conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", 
"CORRECTED")
       conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", 
"CORRECTED")
       conf.set("spark.sql.avro.datetimeRebaseModeInRead", "CORRECTED")
       val sparkContext = new SparkContext(conf)
       glueContext = new GlueContext(sparkContext)
       Job.init(sysArgs("JOB_NAME"), glueContext, sysArgs.asJava)
       glueContext
   
     }
   }
   ```
   
   
   
   
   Steps to reproduce the behavior:
   
   1. I'm using AWS glue job to run the above job.
   2. In Dependent JARs path 
   hudi-spark3-bundle_2.12-0.12.1
   calcite-core-1.16.0
   libfb303-0.9.3
   3. Run the above code.
   
   
   **Expected behavior**
   
   spark.sql("ALTER TABLE customer_db.customer RENAME COLUMN subid TO 
subidentifier") should be able to rename a column name. Could you suggest any 
other way to rename the Hudi column name.
   
   A clear and concise description of what you expected to happen.
   Change Column name of a hudi table
   
   **Environment Description**
   
   * Hudi version : 0.12.1
   
   * Spark version :3.3
   
   Glue Version : 4
   
   Jars used:
   hudi-spark3-bundle_2.12-0.12.1
   calcite-core-1.16.0
   libfb303-0.9.3
   
   * Storage (HDFS/S3/GCS..) :S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   **Exception in User Class: org.apache.spark.sql.AnalysisException : RENAME 
COLUMN is only supported with v2 tables.**
   at 
org.apache.spark.sql.errors.QueryCompilationErrors$.operationOnlySupportedWithV2TableError(QueryCompilationErrors.scala:506)
 ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:94)
 ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:49)
 ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138)
 ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177)
 ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138)
 ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
 ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
 ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
 ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
 ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp(AnalysisHelper.scala:111)
 ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp$(AnalysisHelper.scala:110)
 ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:30)
 ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog.apply(ResolveSessionCatalog.scala:49)
 ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog.apply(ResolveSessionCatalog.scala:43)
 ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] MathurCodes1 opened a new issue, #9096: [SUPPORT] Unable to alter column name for a Hudi table.

Reply via email to