ChiehFu opened a new issue, #10914: URL: https://github.com/apache/hudi/issues/10914
**Describe the problem you faced** Hi, My team wants to build Flink pipelines to generate financial report and save the report results into a Hudi COW table. The data sources for the report consist of two types of data - snapshot and incremental data. To have a complete report we need to ingest both snapshot and incremental data, and for that we are thinking about running two Flink jobs against the same Hudi table - batch and incremental - sequentially where the batch job processes all snapshot data up to the current time, and the stream job continuously processes new incremental data. According to Hudi's documentation, it uses Flink state to store index information for the records it has processed and relies on that information to perform upsert correctly. My question is that the stream job doesn't have access to the state information of the batch job, would Hudi in the stream job be able to perform upsert operations to update records that were previously ingested via the batch job correctly? If not, do you have any recommendation on how we can set up the Flink-Hudi workflow to meet our use case? **To Reproduce** **Environment Description** * EMR emr-6.15.0, Flink 1.17.1, Hadoop 3.3.6, Hive 3.1.3, Zeppelin 0.10.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org