hudi-bot opened a new issue, #17283:
URL: https://github.com/apache/hudi/issues/17283
Currently a format like YYYY-MM-DD fails when syncing with hive. The Jira
aims to add a fix so that such a format is supported.
Steps to reproduce: The table created below uses a custom keygen with
combination of simple and timestamp keygen. Timestamp keygen produces an output
of format - YYYY-MM-DD
{code:java}
import org.apache.hudi.HoodieSparkUtils
import org.apache.hudi.common.config.TypedProperties
import org.apache.hudi.common.util.StringUtils
import org.apache.hudi.exception.HoodieException
import org.apache.hudi.functional.TestSparkSqlWithCustomKeyGenerator._
import org.apache.hudi.testutils.HoodieClientTestUtils.createMetaClient
import org.apache.hudi.util.SparkKeyGenUtilsimport
org.apache.spark.sql.SaveMode
import org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase
import org.joda.time.DateTime
import org.joda.time.format.DateTimeFormat
import org.junit.jupiter.api.Assertions.{assertEquals, assertFalse,
assertTrue}
import org.slf4j.LoggerFactory
val df = spark.sql(
s"""SELECT 1 as id, 'a1' as name, 1.6 as price, 1704121827 as ts,
'cat1' as segment
| UNION
| SELECT 2 as id, 'a2' as name, 10.8 as price, 1704121827 as ts,
'cat1' as segment
| UNION
| SELECT 3 as id, 'a3' as name, 30.0 as price, 1706800227 as ts,
'cat1' as segment
| UNION
| SELECT 4 as id, 'a4' as name, 103.4 as price, 1701443427 as ts,
'cat2' as segment
| UNION
| SELECT 5 as id, 'a5' as name, 1999.0 as price, 1704121827 as ts,
'cat2' as segment
| UNION
| SELECT 6 as id, 'a6' as name, 80.0 as price, 1704121827 as ts,
'cat3' as segment
|""".stripMargin)
df.write.format("hudi").option("hoodie.datasource.write.table.type",
"MERGE_ON_READ").option("hoodie.datasource.write.keygenerator.class<span
class="code-quote">",
"org.apache.hudi.keygen.CustomAvroKeyGenerator").option("hoodie.datasource.write.partitionpath.field",
"segment:simple,ts:timestamp").option("hoodie.datasource.write.recordkey.field",
"id").option("hoodie.datasource.write.precombine.field",
"name").option("hoodie.table.name",
"hudi_table_2").option("hoodie.insert.shuffle.parallelism",
"1").option("hoodie.upsert.shuffle.parallelism",
"1").option("hoodie.bulkinsert.shuffle.parallelism",
"1").option("hoodie.keygen.timebased.timestamp.type",
"SCALAR").option("hoodie.keygen.timebased.output.dateformat",
"yyyy-MM-DD").option("hoodie.keygen.timebased.timestamp.scalar.time.unit",
"seconds").mode(SaveMode.Overwrite).save("/user/hive/warehouse/hudi_table_2")
// Sync with hive
/var/hoodie/ws/hudi-sync/hudi-hive-sync/run_sync_tool.sh \
--jdbc-url jdbc:hive2://hiveserver:10000 \
--user hive \
--pass hive \
--partitioned-by segment,ts \
--base-path /user/hive/warehouse/hudi_table_2 \
--database default \
--table hudi_table_2 \
--partition-value-extractor
org.apache.hudi.hive.MultiPartKeysValueExtractor {code}
Error
{code:java}
2024-10-06 15:18:22,220 INFO [main] ddl.JDBCExecutor
(JDBCExecutor.java:runSQL(67)) - Executing SQL ALTER TABLE
`default`.`hudi_table_2_ro` ADD IF NOT EXISTS PARTITION
(`segment`='cat1',`ts`='2024-10-01') LOCATION
'/user/hive/warehouse/hudi_table_2/cat1/2024-10-01' PARTITION
(`segment`='cat2',`ts`='2023-10-01') LOCATION
'/user/hive/warehouse/hudi_table_2/cat2/2023-10-01' PARTITION
(`segment`='cat2',`ts`='2024-10-01') LOCATION
'/user/hive/warehouse/hudi_table_2/cat2/2024-10-01' PARTITION
(`segment`='cat3',`ts`='2024-10-01') LOCATION
'/user/hive/warehouse/hudi_table_2/cat3/2024-10-01'
2024-10-06 15:18:22,299 INFO [main] hive.metastore
(HiveMetaStoreClient.java:close(564)) - Closed a connection to metastore,
current connections: 0
Exception in thread "main" org.apache.hudi.exception.HoodieException: Got
runtime exception when hive syncing hudi_table_2
at
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:180)
at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:547)
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: failed to sync the
table hudi_table_2_ro
at
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:272)
at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:203)
at
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:177)
... 1 more
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync
partitions for table hudi_table_2_ro
at
org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:474)
at
org.apache.hudi.hive.HiveSyncTool.validateAndSyncPartitions(HiveSyncTool.java:321)
at
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:261)
... 3 more
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing
SQL ALTER TABLE `default`.`hudi_table_2_ro` ADD IF NOT EXISTS PARTITION
(`segment`='cat1',`ts`='2024-10-01') LOCATION
'/user/hive/warehouse/hudi_table_2/cat1/2024-10-01' PARTITION
(`segment`='cat2',`ts`='2023-10-01') LOCATION
'/user/hive/warehouse/hudi_table_2/cat2/2023-10-01' PARTITION
(`segment`='cat2',`ts`='2024-10-01') LOCATION
'/user/hive/warehouse/hudi_table_2/cat2/2024-10-01' PARTITION
(`segment`='cat3',`ts`='2024-10-01') LOCATION
'/user/hive/warehouse/hudi_table_2/cat3/2024-10-01'
at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:70)
at
org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.lambda$addPartitionsToTable$0(QueryBasedDDLExecutor.java:125)
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at
org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.addPartitionsToTable(QueryBasedDDLExecutor.java:125)
at
org.apache.hudi.hive.HoodieHiveSyncClient.addPartitionsToTable(HoodieHiveSyncClient.java:118)
at
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:516)
at
org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:470)
... 5 more
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while
compiling statement: FAILED: SemanticException [Error 10248]: Cannot add
partition column ts of type string as it cannot be converted to type int
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:267)
-Dspark3.5 -Dscala-2.12
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:253)
at
org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:313)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:253)
at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:68)
... 12 more
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while
compiling statement: FAILED: SemanticException [Error 10248]: Cannot add
partition column ts of type string as it cannot be converted to type int
at
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
at
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
at
org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
at
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
at
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
at
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
at
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Cannot add
partition column ts of type string as it cannot be converted to type int
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.validatePartColumnType(BaseSemanticAnalyzer.java:1582)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.validatePartSpec(BaseSemanticAnalyzer.java:1536)
at
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.getValidatedPartSpec(DDLSemanticAnalyzer.java:2096)
at
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableAddParts(DDLSemanticAnalyzer.java:2866)
at
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:285)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1295)
at
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
... 15 more {code}
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-8312
- Type: Sub-task
- Parent: https://issues.apache.org/jira/browse/HUDI-9113
- Fix version(s):
- 1.1.0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]