tillrohrmann commented on a change in pull request #14798: URL: https://github.com/apache/flink/pull/14798#discussion_r570084862
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/SchedulerBase.java ########## @@ -522,6 +527,7 @@ protected ComponentMainThreadExecutor getMainThreadExecutor() { protected void failJob(Throwable cause) { incrementVersionsOfAllVertices(); executionGraph.failJob(cause); + getTerminationFuture().thenRun(() -> archiveGlobalFailure(cause)); Review comment: I mean the case that a failure happens, the job goes into the `FAILING` state and tries to cancel the tasks and now the user cancels the job because it takes too long for him. Then the job will go into the `CANCELING` state which will result to the `CANCELED` state once all tasks have terminated. I think you are right that we should still record the failure cause. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org