[ https://issues.apache.org/jira/browse/FLINK-25277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481432#comment-17481432 ]
Niklas Semmler commented on FLINK-25277: ---------------------------------------- [~chesnay] Yes, you are right. [~trohrmann] needed the shutdown hook for a different use case, so he included the code already in dd6069fabf8a7ff65fbd9ff8dd7b0c47f492288f. When I saw this, I removed it from the commits above to avoid merge conflicts. Also, I just want to stress, the shutdown code was really just the icing on the cake. All the signaling functionality was already implemented, but was just not called during shutdown. > Introduce explicit shutdown signalling between TaskManager and JobManager > -------------------------------------------------------------------------- > > Key: FLINK-25277 > URL: https://issues.apache.org/jira/browse/FLINK-25277 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination > Affects Versions: 1.13.0, 1.14.0 > Reporter: Niklas Semmler > Assignee: Niklas Semmler > Priority: Major > Labels: pull-request-available, reactive > Fix For: 1.15.0 > > Original Estimate: 504h > Remaining Estimate: 504h > > We need to introduce shutdown signalling between TaskManager and JobManager > for fast & graceful shutdown in reactive scheduler mode. > In Flink 1.14 and earlier versions, the JobManager tracks the availability of > a TaskManager using a hearbeat. This heartbeat is by default configured with > an interval of 10 seconds and a timeout of 50 seconds [1]. Hence, the > shutdown of a TaskManager is recognized only after about 50-60 seconds. This > works fine for the static scheduling mode, where a TaskManager only > disappears as part of a cluster shutdown or a job failure. However, in the > reactive scheduler mode (FLINK-10407), TaskManagers are regularly added and > removed from a running job. Here, the heartbeat-mechanisms incurs additional > delays. > To remove these delays, we add an explicit shutdown signal from the > TaskManager to the JobManager. > > [1]https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#heartbeat-timeout -- This message was sent by Atlassian Jira (v8.20.1#820001)