[
https://issues.apache.org/jira/browse/HUDI-7349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Y Ethan Guo updated HUDI-7349:
------------------------------
Fix Version/s: 1.0.2
> Spark structured streaming didnt work after upgrade from hudi 0.11 to 0.13
> --------------------------------------------------------------------------
>
> Key: HUDI-7349
> URL: https://issues.apache.org/jira/browse/HUDI-7349
> Project: Apache Hudi
> Issue Type: Bug
> Components: spark, spark-sql
> Affects Versions: 0.13.0
> Environment: Environment Description
> Hudi version : 0.13.0
> Amazon EMR version : emr-6.11.1
> Spark version : 3.3.2
> Hive version : 3.1.3
>
> Hadoop version : 3.3.3
> Storage (HDFS/S3/GCS..): S3
> Cluster manager : yarn
> Reporter: Haitham Eltaweel
> Priority: Major
> Fix For: 1.0.2
>
>
> We have Spark structured streaming job writing data in hudi format. After we
> made an upgrade from hudi 0.11.0 to hudi 0.13.0, the streaming app doesn't
> write data to existing hudi table. The streaming app started successfully,
> triggered listing job but didn't trigger any other job to compact, clean ,
> write data , etc. No errors in Spark UI nor Stdout/Stderr logs. When running
> the streaming application to write to new s3 location (hudie table),
> everything works fine. We use append output mode and 30 seconds trigger
> processing time.
> Here are hudi configurations used (confiscated some values with xxx):
> 'hoodie.datasource.write.table.type': 'MERGE_ON_READ',
> 'hoodie.datasource.write.keygenerator.class':
> 'org.apache.hudi.keygen.CustomKeyGenerator',
> 'hoodie.datasource.write.precombine.field': 'xxx',
> 'hoodie.datasource.write.partitionpath.field': 'xxx:SIMPLE',
> 'hoodie.embed.timeline.server': False,
> 'hoodie.index.type': 'BLOOM',
> 'hoodie.parquet.compression.codec': 'snappy',
> 'hoodie.clean.async': True,
> 'hoodie.clean.max.commits': 5,
> 'hoodie.parquet.max.file.size': 125829120,
> 'hoodie.parquet.small.file.limit': 104857600,
> 'hoodie.parquet.block.size': 125829120,
> 'hoodie.metadata.enable': True,
> 'hoodie.metadata.validate': True,
> 'hoodie.datasource.write.hive_style_partitioning': True,
> 'hoodie.datasource.hive_sync.support_timestamp': True,
> 'hoodie.datasource.hive_sync.jdbcurl': "xxx",
> 'hoodie.datasource.hive_sync.username': 'xxx',
> 'hoodie.datasource.hive_sync.password': 'xxx',
> 'hoodie.datasource.hive_sync.partition_fields': 'xxx',
> 'hoodie.datasource.hive_sync.enable': True,
> 'hoodie.datasource.hive_sync.partition_extractor_class':
> 'org.apache.hudi.hive.MultiPartKeysValueExtractor',
> 'hoodie.avro.schema.external.transformation': True,
> 'hoodie.avro.schema.validate': True,
> 'hoodie.table.name', 'xxx'
> 'hoodie.datasource.write.table.name', 'xxx'
> 'hoodie.datasource.write.recordkey.field', 'xxx'
> 'hoodie.datasource.hive_sync.database', 'xxx'
> 'hoodie.datasource.hive_sync.table', 'xxx'
> 'hoodie.datasource.write.operation', 'upsert'
--
This message was sent by Atlassian Jira
(v8.20.10#820010)