tillrohrmann commented on a change in pull request #17693:
URL: https://github.com/apache/flink/pull/17693#discussion_r745803446
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/PendingCheckpoint.java
##########
@@ -336,6 +336,8 @@ public CompletedCheckpoint finalizeCheckpoint(
props,
finalizedLocation);
+ // Mark this pending checkpoint as disposed, but do NOT drop
the state.
+ dispose(false, checkpointsCleaner, postCleanup, executor);
Review comment:
I think the problem is in the `SchedulerBase` and the
`AdaptiveScheduler` how they shut down the `CheckpointsCleaner`. If we stop the
`CheckpointsCleaner` after the `ExecutionGraph` has terminated and all
completed checkpoints have been released, then this problem should hopefully be
gone.
The underlying problem is that the `ExecutionGraph` is using a service that
stops working in the midst of some method (`finalizeCheckpoint`). This should
not happen since this is super hard to reason about.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]