Hi,
I am using FileIO with continuously watching folder for new files to process. The problem is when flink starts reading 200MB file (around 3M elements) and also starts checkpointing. Checkpoint never finishes until WHOLE file is processed. Minimal example : https://github.com/seznam/beam/blob/simunek/failingCheckpoint/examples/java/ src/main/java/org/apache/beam/examples/CheckpointFailingExample.java (https://github.com/seznam/beam/blob/simunek/failingCheckpoint/examples/java/src/main/java/org/apache/beam/examples/CheckpointFailingExample.java) My theory what could be wrong from my understanding : CheckpointMark in this case starts from Create.ofProvider and then its propagated to downstream operators where it will be (in queue) behind all splits, which means all splits have to be read to successfully checkpoint the operator. The problem is even bigger when there are more files, then we need to wait for processing all files to successfully checkpoint. 1. Are my assumption correct? 2. Is there some possibility to improve behavior of SplittableDoFn (or subsequent reading from BoundedSource) for Flink to better propagate checkpoint barrier? For now my fix is reading smaller files (30MB) one by one, by it’s not very future proof. Versions: Beam 2.17 Flink 1.9 Please correct my poor understanding of checkpointing with Beam and Flink and it would be wonderful if you have some advice what to improve or where to look.
