Hi folks,

I did some initial investigation, and the problem seems twofold.

If no post-commit topology is used, we do not run into a problem where
we could lose data but since we do not clean up the state correctly,
we will hit this [1] when trying to stop the pipeline with a savepoint
after we have started it from a savepoint.
AFAICT all two-phase commit sinks are affected Kafka, File etc.

For sinks using the post-commit topology, the same applies.
Additionally, we might never do the commit from the post-commit
topology resulting in lost data.

Best,
Fabian

[1] 
https://github.com/apache/flink/blob/ed46cb2fd64f1cb306ae5b7654d2b4d64ab69f22/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/sink/committables/CheckpointCommittableManagerImpl.java#L83

Reply via email to