GitHub user geserdugarov added a comment to the discussion: RLI support for Flink streaming
Sorry, @danny0405 - you're right, it's not possible to decouple the Flink checkpoint and the Hudi commit while still providing exactly-once semantics. After I tried to describe step by step what I meant, I realized the issue. So for exactly-once semantics, it could look like the following with Flink state: ```text income records >> buffer - if buffer is full >> append to existing log (or create new file, flush, and close) - on checkpoint >> write buffer to local Flink state >> Hudi commit >> continue filling buffers - - on failure >> load buffers from local state >> continue filling buffers - - if buffer is full >> append to existing log ... ``` The Hudi commit here is still coupled with the Flink checkpoint, but saving byte buffers to local state during the checkpoint should be fast, and may improve performance. However, this idea goes off the main topic of this discussion. GitHub link: https://github.com/apache/hudi/discussions/17452#discussioncomment-15206166 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
