Hi devs,

I'm opening this thread to discuss FLIP-414: Support Retry Mechanism in
RocksDBStateDataTransfer[1].

Currently, there is no retry mechanism for downloading and uploading
RocksDB state files. Any jittering of remote filesystem might lead to a
checkpoint failure. By supporting retry mechanism in
`RocksDBStateDataTransfer`, we can significantly reduce the failure rate of
checkpoint during asynchronous phrase.

To make this retry mechanism configurable, we have introduced two options
in this FLIP: `state.backend.rocksdb.checkpoint.transfer.retry.times` and `
state.backend.rocksdb.checkpoint.transfer.retry.interval`. The default
behavior remains to be no retry will be performed in order to be consistent
with the original behavior.

Looking forward to your feedback, thanks.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-414%3A+Support+Retry+Mechanism+in+RocksDBStateDataTransfer

Best regards,
Xiangyu Feng

Reply via email to