hudi-bot opened a new issue, #17282:
URL: https://github.com/apache/hudi/issues/17282

   Currently a format like YYYY/MM/DD fails when syncing with hive. The Jira 
aims to add a fix so that such a format is supported.
   Steps to reproduce: The table created below uses a custom keygen with 
combination of simple and timestamp keygen. Timestamp keygen produces an output 
of format - YYYY/MM/DD
   {code:java}
   import org.apache.hudi.HoodieSparkUtils
   import org.apache.hudi.common.config.TypedProperties
   import org.apache.hudi.common.util.StringUtils
   import org.apache.hudi.exception.HoodieException
   import org.apache.hudi.functional.TestSparkSqlWithCustomKeyGenerator._
   import org.apache.hudi.testutils.HoodieClientTestUtils.createMetaClient
   import org.apache.hudi.util.SparkKeyGenUtilsimport 
org.apache.spark.sql.SaveMode
   import org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase
   import org.joda.time.DateTime
   import org.joda.time.format.DateTimeFormat
   import org.junit.jupiter.api.Assertions.{assertEquals, assertFalse, 
assertTrue}
   import org.slf4j.LoggerFactory
       val df = spark.sql(
         s"""SELECT 1 as id, 'a1' as name, 1.6 as price, 1704121827 as ts, 
'cat1' as segment
            | UNION
            | SELECT 2 as id, 'a2' as name, 10.8 as price, 1704121827 as ts, 
'cat1' as segment
            | UNION
            | SELECT 3 as id, 'a3' as name, 30.0 as price, 1706800227 as ts, 
'cat1' as segment
            | UNION
            | SELECT 4 as id, 'a4' as name, 103.4 as price, 1701443427 as ts, 
'cat2' as segment
            | UNION
            | SELECT 5 as id, 'a5' as name, 1999.0 as price, 1704121827 as ts, 
'cat2' as segment
            | UNION
            | SELECT 6 as id, 'a6' as name, 80.0 as price, 1704121827 as ts, 
'cat3' as segment
            |""".stripMargin)    
df.write.format("hudi").option("hoodie.datasource.write.table.type", 
"MERGE_ON_READ").option("hoodie.datasource.write.keygenerator.class", 
"org.apache.hudi.keygen.CustomAvroKeyGenerator").option("hoodie.datasource.write.partitionpath.field",
 
"segment:simple,ts:timestamp").option("hoodie.datasource.write.recordkey.field",
 "id").option("hoodie.datasource.write.precombine.field", 
"name").option("hoodie.table.name", 
"hudi_table_2").option("hoodie.insert.shuffle.parallelism", 
"1").option("hoodie.upsert.shuffle.parallelism", 
"1").option("hoodie.bulkinsert.shuffle.parallelism", 
"1").option("hoodie.keygen.timebased.timestamp.type", 
"SCALAR").option("hoodie.keygen.timebased.output.dateformat", 
"yyyy/MM/DD").option("hoodie.keygen.timebased.timestamp.scalar.time.unit", 
"seconds").mode(SaveMode.Overwrite).save("/user/hive/warehouse/hudi_table_2") 
   
   // Sync with hive
   /var/hoodie/ws/hudi-sync/hudi-hive-sync/run_sync_tool.sh \
     --jdbc-url jdbc:hive2://hiveserver:10000 \
     --user hive \
     --pass hive \
     --partitioned-by segment,ts \
     --base-path /user/hive/warehouse/hudi_table_2 \
     --database default \
     --table hudi_table_2 \
     --partition-value-extractor 
org.apache.hudi.hive.MultiPartKeysValueExtractor    {code}
    
   
   Hive creation fails now.
   {code:java}
   2024-10-06 14:33:44,200 INFO  [main] hive.metastore 
(HiveMetaStoreClient.java:close(564)) - Closed a connection to metastore, 
current connections: 0
   Exception in thread "main" org.apache.hudi.exception.HoodieException: Got 
runtime exception when hive syncing hudi_table_2
       at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:180)
       at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:547)
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: failed to sync the 
table hudi_table_2_ro
       at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:272)
       at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:203)
       at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:177)
       ... 1 more
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
partitions for table hudi_table_2_ro
       at 
org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:474)
       at 
org.apache.hudi.hive.HiveSyncTool.validateAndSyncPartitions(HiveSyncTool.java:321)
       at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:261)
       ... 3 more
   Caused by: java.lang.IllegalArgumentException: Partition key parts [segment, 
ts] does not match with partition values [cat1, 2024, 01, 01]. Check partition 
strategy.
       at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:42)
       at 
org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.getPartitionClause(QueryBasedDDLExecutor.java:191)
       at 
org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.constructAddPartitions(QueryBasedDDLExecutor.java:164)
       at 
org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.addPartitionsToTable(QueryBasedDDLExecutor.java:124)
       at 
org.apache.hudi.hive.HoodieHiveSyncClient.addPartitionsToTable(HoodieHiveSyncClient.java:118)
       at 
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:516)
       at 
org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:470)
       ... 5 more {code}
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-8311
   - Type: Sub-task
   - Parent: https://issues.apache.org/jira/browse/HUDI-9113
   - Fix version(s):
     - 1.1.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to