dawidwys commented on a change in pull request #18092: URL: https://github.com/apache/flink/pull/18092#discussion_r770443223
########## File path: docs/content/docs/ops/state/savepoints.md ########## @@ -130,9 +130,23 @@ Unlike savepoints, checkpoints cannot generally be moved to a different location If you use `JobManagerCheckpointStorage`, metadata *and* savepoint state will be stored in the `_metadata` file, so don't be confused by the absence of additional data files. -{{< hint warning >}} -It is discouraged to move or delete the last savepoint of a running job, because this might interfere with failure-recovery. Savepoints have side-effects on exactly-once sinks, therefore -to ensure exactly-once semantics, if there is no checkpoint after the last savepoint, the savepoint will be used for recovery. +{{< hint warning >}} +Starting from Flink 1.15 intermediate savepoints (savepoints other than +created with [stop-with-savepoint](#stopping-a-job-with-savepoint)) are not used for recovery and do +not commit any side effects. + +This has to be taken into consideration, especially when running multiple jobs in the same +checkpointing timeline. It is possible in that solution that if the original job (after taking a +savepoint) fails, then it will fall back to a checkpoint prior to the savepoint. However, if we now +resume a job from the savepoint, then we might commit transactions that might’ve never happened +because of falling back to a checkpoint before the savepoint (assuming non-determinism). Review comment: It does guarantee correctness. If you start a single job from a savepoint, the next checkpoint will commit the transactions from the savepoint as well. The issue is if you still run the original job and start a new one from the savepoint. If the original job fails before the next checkpoint it might recreate data from the transactions. The purpose of such savepoints is: * you want to take a savepoint and verify it before stopping the original job * you want to replicate the job into a separate zone/cluster/... (you need to drop the transactional state then) We were discussing dropping the sink's state automatically from intermediate savepoints, but we decided it's better to have it there and possibly drop it on restore. If we drop it while taking a savepoint there is no turning back. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org