Hi,
We are running a stateful application in Flink with RocksDB as backend and set 
incremental state to true with checkpoints written to S3.

  *   10 task managers each with 2 task slots
  *   Checkpoint interval 3 minutes
  *   Checkpointing mode – At-least once processing

After running app for 2-3 days, we are seeing end to end checkpoint takes 
almost 2 minutes with Sync time 2 sec and async time 15 sec max. But initially 
when state is less, it takes 10-15 sec for checkpointing. As checkpointing mode 
is at least once, align duration is 0. We are seeing a dip in processing during 
this time. Couldn’t find out what the actual issue is.

We also tried with remote HDFS for checkpointing but observed similar behavior.

We have couple of questions:

  *   When sync time is max 2 sec and async time is 15 sec why is end to end 
checkpointing taking almost 2 minutes?
  *   How can we reduce the checkpoint time?
[A screenshot of a cell phone  Description automatically generated]

Any help would be appreciated.


Thank you
Sandeep Kathula

Reply via email to