GitHub user geserdugarov added a comment to the discussion: RLI support for 
Flink streaming

Sorry, @danny0405  - you're right, it's not possible to decouple the Flink 
checkpoint and the Hudi commit while still providing exactly-once semantics. 
After I tried to describe step by step what I meant, I realized the issue.

So for exactly-once semantics, it could look like the following with Flink 
state:
```text
income records >> buffer 
 -  if buffer is full >> append to existing log (or create new file, flush, and 
close)
 -  on checkpoint >> write buffer to local Flink state >> Hudi commit >> 
continue filling buffers
 -   - on failure >> load buffers from local state >> continue filling buffers
 -   - if buffer is full >> append to existing log ...
```
The Hudi commit here is still coupled with the Flink checkpoint, but saving 
byte buffers to local state during the checkpoint should be fast, and may 
improve performance. However, this idea goes off the main topic of this 
discussion.

GitHub link: 
https://github.com/apache/hudi/discussions/17452#discussioncomment-15206166

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to