hudi-bot opened a new issue, #15702:
URL: https://github.com/apache/hudi/issues/15702
when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only
one table to be synchronized to hive without suffix _ro.
But sometimes tables have been created in hive early,
like:
{code:java}
create table hive.test.HUDI_5584 (
id int,
ts int)
using hudi
tblproperties (
type = 'mor',
primaryKey = 'id',
preCombineField = 'ts',
hoodie.datasource.hive_sync.enable = 'true',
hoodie.datasource.hive_sync.table.strategy='ro'
) location '/tmp/HUDI_5584' {code}
and show create table .
{code:java}
CREATE EXTERNAL TABLE `hudi_5584`(
`_hoodie_commit_time` string,
`_hoodie_commit_seqno` string,
`_hoodie_record_key` string,
`_hoodie_partition_path` string,
`_hoodie_file_name` string,
`id` int,
`ts` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'path'='file:///tmp/HUDI_5584')
STORED AS INPUTFORMAT
'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'file:/tmp/HUDI_5584'
TBLPROPERTIES (
'hoodie.datasource.hive_sync.enable'='true',
'hoodie.datasource.hive_sync.table.strategy'='ro',
'preCombineField'='ts',
'primaryKey'='id',
'spark.sql.create.version'='3.3.1',
'spark.sql.sources.provider'='hudi',
'spark.sql.sources.schema.numParts'='1',
'spark.sql.sources.schema.part.0'='xx'
'transient_lastDdlTime'='1674108302',
'type'='mor') {code}
*The table like a realtime table.*
When we finish writing data and synchronize ro table , because the table
already exists, so SERDEPROPERTIES and OUTPUTFORMAT will not be modified.
This causes the type of the table is not match as expect.
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-5584
- Type: Bug
- Fix version(s):
- 1.1.0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]