[I] Support YYYY-MM-DD partition format with hive [hudi]

via GitHub Sun, 30 Nov 2025 05:32:11 -0800


hudi-bot opened a new issue, #17283:
URL: https://github.com/apache/hudi/issues/17283


   Currently a format like YYYY-MM-DD fails when syncing with hive. The Jira 
aims to add a fix so that such a format is supported.
   Steps to reproduce: The table created below uses a custom keygen with 
combination of simple and timestamp keygen. Timestamp keygen produces an output 
of format - YYYY-MM-DD
   
   
   {code:java}
   import org.apache.hudi.HoodieSparkUtils
   import org.apache.hudi.common.config.TypedProperties
   import org.apache.hudi.common.util.StringUtils
   import org.apache.hudi.exception.HoodieException
   import org.apache.hudi.functional.TestSparkSqlWithCustomKeyGenerator._
   import org.apache.hudi.testutils.HoodieClientTestUtils.createMetaClient
   import org.apache.hudi.util.SparkKeyGenUtilsimport 
org.apache.spark.sql.SaveMode
   import org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase
   import org.joda.time.DateTime
   import org.joda.time.format.DateTimeFormat
   import org.junit.jupiter.api.Assertions.{assertEquals, assertFalse, 
assertTrue}
   import org.slf4j.LoggerFactory
       val df = spark.sql(
         s"""SELECT 1 as id, 'a1' as name, 1.6 as price, 1704121827 as ts, 
'cat1' as segment
            | UNION
            | SELECT 2 as id, 'a2' as name, 10.8 as price, 1704121827 as ts, 
'cat1' as segment
            | UNION
            | SELECT 3 as id, 'a3' as name, 30.0 as price, 1706800227 as ts, 
'cat1' as segment
            | UNION
            | SELECT 4 as id, 'a4' as name, 103.4 as price, 1701443427 as ts, 
'cat2' as segment
            | UNION
            | SELECT 5 as id, 'a5' as name, 1999.0 as price, 1704121827 as ts, 
'cat2' as segment
            | UNION
            | SELECT 6 as id, 'a6' as name, 80.0 as price, 1704121827 as ts, 
'cat3' as segment
            |""".stripMargin)    
df.write.format("hudi").option("hoodie.datasource.write.table.type", 
"MERGE_ON_READ").option("hoodie.datasource.write.keygenerator.class<span 
class="code-quote">", 
"org.apache.hudi.keygen.CustomAvroKeyGenerator").option("hoodie.datasource.write.partitionpath.field",
 
"segment:simple,ts:timestamp").option("hoodie.datasource.write.recordkey.field",
 "id").option("hoodie.datasource.write.precombine.field", 
"name").option("hoodie.table.name", 
"hudi_table_2").option("hoodie.insert.shuffle.parallelism", 
"1").option("hoodie.upsert.shuffle.parallelism", 
"1").option("hoodie.bulkinsert.shuffle.parallelism", 
"1").option("hoodie.keygen.timebased.timestamp.type", 
"SCALAR").option("hoodie.keygen.timebased.output.dateformat", 
"yyyy-MM-DD").option("hoodie.keygen.timebased.timestamp.scalar.time.unit", 
"seconds").mode(SaveMode.Overwrite).save("/user/hive/warehouse/hudi_table_2") 
   
   // Sync with hive
   /var/hoodie/ws/hudi-sync/hudi-hive-sync/run_sync_tool.sh \
     --jdbc-url jdbc:hive2://hiveserver:10000 \
     --user hive \
     --pass hive \
     --partitioned-by segment,ts \
     --base-path /user/hive/warehouse/hudi_table_2 \
     --database default \
     --table hudi_table_2 \
     --partition-value-extractor 
org.apache.hudi.hive.MultiPartKeysValueExtractor     {code}
   Error
   {code:java}
   2024-10-06 15:18:22,220 INFO  [main] ddl.JDBCExecutor 
(JDBCExecutor.java:runSQL(67)) - Executing SQL ALTER TABLE 
`default`.`hudi_table_2_ro` ADD IF NOT EXISTS   PARTITION 
(`segment`='cat1',`ts`='2024-10-01') LOCATION 
'/user/hive/warehouse/hudi_table_2/cat1/2024-10-01'   PARTITION 
(`segment`='cat2',`ts`='2023-10-01') LOCATION 
'/user/hive/warehouse/hudi_table_2/cat2/2023-10-01'   PARTITION 
(`segment`='cat2',`ts`='2024-10-01') LOCATION 
'/user/hive/warehouse/hudi_table_2/cat2/2024-10-01'   PARTITION 
(`segment`='cat3',`ts`='2024-10-01') LOCATION 
'/user/hive/warehouse/hudi_table_2/cat3/2024-10-01'
   2024-10-06 15:18:22,299 INFO  [main] hive.metastore 
(HiveMetaStoreClient.java:close(564)) - Closed a connection to metastore, 
current connections: 0
   Exception in thread "main" org.apache.hudi.exception.HoodieException: Got 
runtime exception when hive syncing hudi_table_2
       at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:180)
       at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:547)
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: failed to sync the 
table hudi_table_2_ro
       at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:272)
       at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:203)
       at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:177)
       ... 1 more
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
partitions for table hudi_table_2_ro
       at 
org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:474)
       at 
org.apache.hudi.hive.HiveSyncTool.validateAndSyncPartitions(HiveSyncTool.java:321)
       at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:261)
       ... 3 more
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing 
SQL ALTER TABLE `default`.`hudi_table_2_ro` ADD IF NOT EXISTS   PARTITION 
(`segment`='cat1',`ts`='2024-10-01') LOCATION 
'/user/hive/warehouse/hudi_table_2/cat1/2024-10-01'   PARTITION 
(`segment`='cat2',`ts`='2023-10-01') LOCATION 
'/user/hive/warehouse/hudi_table_2/cat2/2023-10-01'   PARTITION 
(`segment`='cat2',`ts`='2024-10-01') LOCATION 
'/user/hive/warehouse/hudi_table_2/cat2/2024-10-01'   PARTITION 
(`segment`='cat3',`ts`='2024-10-01') LOCATION 
'/user/hive/warehouse/hudi_table_2/cat3/2024-10-01'
       at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:70)
       at 
org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.lambda$addPartitionsToTable$0(QueryBasedDDLExecutor.java:125)
       at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
       at 
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
       at 
org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.addPartitionsToTable(QueryBasedDDLExecutor.java:125)
       at 
org.apache.hudi.hive.HoodieHiveSyncClient.addPartitionsToTable(HoodieHiveSyncClient.java:118)
       at 
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:516)
       at 
org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:470)
       ... 5 more
   Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
compiling statement: FAILED: SemanticException [Error 10248]: Cannot add 
partition column ts of type string as it cannot be converted to type int
       at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:267)
   -Dspark3.5 -Dscala-2.12
       at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:253)
       at 
org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:313)
       at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:253)
       at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:68)
       ... 12 more
   Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
compiling statement: FAILED: SemanticException [Error 10248]: Cannot add 
partition column ts of type string as it cannot be converted to type int
       at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
       at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
       at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
       at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
       at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
       at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
       at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
       at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
       at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
       at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
       at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
       at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
       at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Cannot add 
partition column ts of type string as it cannot be converted to type int
       at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.validatePartColumnType(BaseSemanticAnalyzer.java:1582)
       at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.validatePartSpec(BaseSemanticAnalyzer.java:1536)
       at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.getValidatedPartSpec(DDLSemanticAnalyzer.java:2096)
       at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableAddParts(DDLSemanticAnalyzer.java:2866)
       at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:285)
       at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
       at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
       at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
       at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1295)
       at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
       ... 15 more {code}
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-8312
   - Type: Sub-task
   - Parent: https://issues.apache.org/jira/browse/HUDI-9113
   - Fix version(s):
     - 1.1.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Support YYYY-MM-DD partition format with hive [hudi]

Reply via email to