hudi-bot opened a new issue, #17282:
URL: https://github.com/apache/hudi/issues/17282
Currently a format like YYYY/MM/DD fails when syncing with hive. The Jira
aims to add a fix so that such a format is supported.
Steps to reproduce: The table created below uses a custom keygen with
combination of simple and timestamp keygen. Timestamp keygen produces an output
of format - YYYY/MM/DD
{code:java}
import org.apache.hudi.HoodieSparkUtils
import org.apache.hudi.common.config.TypedProperties
import org.apache.hudi.common.util.StringUtils
import org.apache.hudi.exception.HoodieException
import org.apache.hudi.functional.TestSparkSqlWithCustomKeyGenerator._
import org.apache.hudi.testutils.HoodieClientTestUtils.createMetaClient
import org.apache.hudi.util.SparkKeyGenUtilsimport
org.apache.spark.sql.SaveMode
import org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase
import org.joda.time.DateTime
import org.joda.time.format.DateTimeFormat
import org.junit.jupiter.api.Assertions.{assertEquals, assertFalse,
assertTrue}
import org.slf4j.LoggerFactory
val df = spark.sql(
s"""SELECT 1 as id, 'a1' as name, 1.6 as price, 1704121827 as ts,
'cat1' as segment
| UNION
| SELECT 2 as id, 'a2' as name, 10.8 as price, 1704121827 as ts,
'cat1' as segment
| UNION
| SELECT 3 as id, 'a3' as name, 30.0 as price, 1706800227 as ts,
'cat1' as segment
| UNION
| SELECT 4 as id, 'a4' as name, 103.4 as price, 1701443427 as ts,
'cat2' as segment
| UNION
| SELECT 5 as id, 'a5' as name, 1999.0 as price, 1704121827 as ts,
'cat2' as segment
| UNION
| SELECT 6 as id, 'a6' as name, 80.0 as price, 1704121827 as ts,
'cat3' as segment
|""".stripMargin)
df.write.format("hudi").option("hoodie.datasource.write.table.type",
"MERGE_ON_READ").option("hoodie.datasource.write.keygenerator.class",
"org.apache.hudi.keygen.CustomAvroKeyGenerator").option("hoodie.datasource.write.partitionpath.field",
"segment:simple,ts:timestamp").option("hoodie.datasource.write.recordkey.field",
"id").option("hoodie.datasource.write.precombine.field",
"name").option("hoodie.table.name",
"hudi_table_2").option("hoodie.insert.shuffle.parallelism",
"1").option("hoodie.upsert.shuffle.parallelism",
"1").option("hoodie.bulkinsert.shuffle.parallelism",
"1").option("hoodie.keygen.timebased.timestamp.type",
"SCALAR").option("hoodie.keygen.timebased.output.dateformat",
"yyyy/MM/DD").option("hoodie.keygen.timebased.timestamp.scalar.time.unit",
"seconds").mode(SaveMode.Overwrite).save("/user/hive/warehouse/hudi_table_2")
// Sync with hive
/var/hoodie/ws/hudi-sync/hudi-hive-sync/run_sync_tool.sh \
--jdbc-url jdbc:hive2://hiveserver:10000 \
--user hive \
--pass hive \
--partitioned-by segment,ts \
--base-path /user/hive/warehouse/hudi_table_2 \
--database default \
--table hudi_table_2 \
--partition-value-extractor
org.apache.hudi.hive.MultiPartKeysValueExtractor {code}
Hive creation fails now.
{code:java}
2024-10-06 14:33:44,200 INFO [main] hive.metastore
(HiveMetaStoreClient.java:close(564)) - Closed a connection to metastore,
current connections: 0
Exception in thread "main" org.apache.hudi.exception.HoodieException: Got
runtime exception when hive syncing hudi_table_2
at
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:180)
at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:547)
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: failed to sync the
table hudi_table_2_ro
at
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:272)
at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:203)
at
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:177)
... 1 more
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync
partitions for table hudi_table_2_ro
at
org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:474)
at
org.apache.hudi.hive.HiveSyncTool.validateAndSyncPartitions(HiveSyncTool.java:321)
at
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:261)
... 3 more
Caused by: java.lang.IllegalArgumentException: Partition key parts [segment,
ts] does not match with partition values [cat1, 2024, 01, 01]. Check partition
strategy.
at
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:42)
at
org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.getPartitionClause(QueryBasedDDLExecutor.java:191)
at
org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.constructAddPartitions(QueryBasedDDLExecutor.java:164)
at
org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.addPartitionsToTable(QueryBasedDDLExecutor.java:124)
at
org.apache.hudi.hive.HoodieHiveSyncClient.addPartitionsToTable(HoodieHiveSyncClient.java:118)
at
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:516)
at
org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:470)
... 5 more {code}
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-8311
- Type: Sub-task
- Parent: https://issues.apache.org/jira/browse/HUDI-9113
- Fix version(s):
- 1.1.0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]