[ 
https://issues.apache.org/jira/browse/HUDI-7349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Y Ethan Guo updated HUDI-7349:
------------------------------
    Fix Version/s: 1.0.2

> Spark structured streaming didnt work after upgrade from hudi 0.11 to 0.13
> --------------------------------------------------------------------------
>
>                 Key: HUDI-7349
>                 URL: https://issues.apache.org/jira/browse/HUDI-7349
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: spark, spark-sql
>    Affects Versions: 0.13.0
>         Environment: Environment Description
> Hudi version : 0.13.0
> Amazon EMR version : emr-6.11.1
> Spark version : 3.3.2
> Hive version : 3.1.3
>  
> Hadoop version : 3.3.3 
> Storage (HDFS/S3/GCS..): S3
> Cluster manager : yarn
>            Reporter: Haitham Eltaweel
>            Priority: Major
>             Fix For: 1.0.2
>
>
> We have Spark structured streaming job writing data in hudi format. After we 
> made an upgrade from hudi 0.11.0 to hudi 0.13.0, the streaming app doesn't 
> write data to existing hudi table. The streaming app started successfully, 
> triggered listing job but didn't trigger any other job to compact, clean , 
> write data , etc. No errors in Spark UI nor Stdout/Stderr logs. When running 
> the streaming application to write to new s3 location (hudie table), 
> everything works fine.  We use append output mode and 30 seconds trigger 
> processing time. 
> Here are hudi configurations used (confiscated some values with xxx): 
> 'hoodie.datasource.write.table.type': 'MERGE_ON_READ',
> 'hoodie.datasource.write.keygenerator.class': 
> 'org.apache.hudi.keygen.CustomKeyGenerator',
> 'hoodie.datasource.write.precombine.field': 'xxx',
> 'hoodie.datasource.write.partitionpath.field': 'xxx:SIMPLE',
> 'hoodie.embed.timeline.server': False,
> 'hoodie.index.type': 'BLOOM',
> 'hoodie.parquet.compression.codec': 'snappy',
> 'hoodie.clean.async': True,
> 'hoodie.clean.max.commits': 5,
> 'hoodie.parquet.max.file.size': 125829120,
> 'hoodie.parquet.small.file.limit': 104857600,
> 'hoodie.parquet.block.size': 125829120,
> 'hoodie.metadata.enable': True,
> 'hoodie.metadata.validate': True,
> 'hoodie.datasource.write.hive_style_partitioning': True,
> 'hoodie.datasource.hive_sync.support_timestamp': True,
> 'hoodie.datasource.hive_sync.jdbcurl': "xxx",
> 'hoodie.datasource.hive_sync.username': 'xxx',
> 'hoodie.datasource.hive_sync.password': 'xxx',
> 'hoodie.datasource.hive_sync.partition_fields': 'xxx',
> 'hoodie.datasource.hive_sync.enable': True,
> 'hoodie.datasource.hive_sync.partition_extractor_class': 
> 'org.apache.hudi.hive.MultiPartKeysValueExtractor',
> 'hoodie.avro.schema.external.transformation': True,
> 'hoodie.avro.schema.validate': True,
> 'hoodie.table.name', 'xxx'
> 'hoodie.datasource.write.table.name', 'xxx'
> 'hoodie.datasource.write.recordkey.field', 'xxx'
> 'hoodie.datasource.hive_sync.database', 'xxx'
> 'hoodie.datasource.hive_sync.table', 'xxx'
> 'hoodie.datasource.write.operation', 'upsert'



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to