[ https://issues.apache.org/jira/browse/HUDI-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Guo updated HUDI-7564: ---------------------------- Fix Version/s: 1.0.0 > Fix HiveSync configuration inconsistencies > ------------------------------------------ > > Key: HUDI-7564 > URL: https://issues.apache.org/jira/browse/HUDI-7564 > Project: Apache Hudi > Issue Type: Bug > Reporter: voon > Assignee: voon > Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > *hoodie.datasource.hive_sync.support_timestamp* is required to be *false* > such that *TIMESTAMP (MICROS)* columns will be synced onto HMS as *LONG* > types. > > While this is not visible to hive-console/spark-sql console with the > {_}show-create-database{_}/{_}describe-table{_} command, HMS will store the > timestamp type as: > > {code:java} > support_timestamp=false LONG > support_timestamp=true TIMESTAMP{code} > > By overriding this to {*}true{*}, Trino/Presto queries will fail with this > error as it is reliant on HMS information: > {code:java} > Caused by: io.prestosql.jdbc.$internal.client.FailureInfo$FailureException: > Expected field to be long, actual timestamp(9) (field 0) > at > io.trino.plugin.hive.GenericHiveRecordCursor.validateType(GenericHiveRecordCursor.java:569) > at > io.trino.plugin.hive.GenericHiveRecordCursor.getLong(GenericHiveRecordCursor.java:274) > at > io.trino.spi.connector.RecordPageSource.getNextPage(RecordPageSource.java:106) > at io.trino.plugin.hudi.HudiPageSource.getNextPage(HudiPageSource.java:120) > at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:299) > at io.trino.operator.Driver.processInternal(Driver.java:395) > at io.trino.operator.Driver.lambda$process$8(Driver.java:298) > at io.trino.operator.Driver.tryWithLock(Driver.java:694) > at io.trino.operator.Driver.process(Driver.java:290) > at io.trino.operator.Driver.processForDuration(Driver.java:261) > at > io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:911) > at > io.trino.execution.executor.timesharing.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:188) > at > io.trino.execution.executor.timesharing.TimeSharingTaskExecutor$TaskRunner.run(TimeSharingTaskExecutor.java:569) > at > io.trino.$gen.Trino_trino426_sql_hudi_di07_001____20240326_074936_2.run(Unknown > Source) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > 2024-04-02 17:32:21 (UTC+8) INFO - Clear session property for connection. > 2024-04-02 17:32:21 (UTC+8) ERROR- Task Execution failed with > CommonException: Query failed (#20240402_093220_06724_cg4jg): Expected field > to be long, actual timestamp(9) (field 0) {code} > To demonstrate that the default support_timestamp config is not true via > spark-sql: > {code:java} > -- EXECUTE THESE QUERIES IN SPARK > -- Create a table > create table if not exists dev_hudi.timestamp_issue ( > int_col bigint, > `timestamp_col` TIMESTAMP > ) using hudi > tblproperties ( > type = 'mor', > primaryKey = 'int_col' > ); > -- Perform an insert to trigger hive sync to create _ro and _rt tables > insert into dev_hudi.timestamp_issue select > 1 as int_col, > to_timestamp('2023-01-01', 'yyyy-MM-dd') as timestamp_col; > -- Execute a query to verify that data has been written > select * from dev_hudi.timestamp_issue_rt; > -- Set support_timestamp to it's supposed default value (false) > set hoodie.datasource.hive_sync.support_timestamp=false; > -- Perform an insert again (Will throw an error) > insert into dev_hudi.timestamp_issue select > 1 as int_col, > to_timestamp('2023-01-01', 'yyyy-MM-dd') as timestamp_col;{code} > The last insert query will throw the error below, showing that > {*}support_timestamp{*}'s default value is {*}true{*}. > {code:java} > Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception > when hive syncing timestamp_issue > at > org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:190) > at > org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58) > ... 64 more > Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Could not convert > field Type from TIMESTAMP to bigint for field timestamp_col > at > org.apache.hudi.hive.util.HiveSchemaUtil.getSchemaDifference(HiveSchemaUtil.java:118) > at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:402) > at > org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:313) > at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:231) > at > org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:187) > ... 65 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)