[ https://issues.apache.org/jira/browse/FLINK-26400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Niklas Semmler updated FLINK-26400: ----------------------------------- Description: FLINK-25277 introduces explicit signalling between a TaskManager and the JobManager when the TaskManager shuts down. This reduces the time it takes for a reactive cluster to down-scale & restart. *Setup* # Add the following line to your flink config to enable reactive mode: {code} taskmanager.host: localhost # a workaround scheduler-mode: reactive restart-strategy: fixeddelay restart-strategy.fixed-delay.attempts: 100 {code} # Create a “usrlib” folder and place the TopSpeedWindowing jar into it {code:bash} $ mkdir usrlib $ cp examples/streaming/TopSpeedWindowing.jar usrlib/ {code} # Start the job {code:bash} $ bin/standalone-job.sh start --main-class org.apache.flink.streaming.examples.windowing.TopSpeedWindowing {code} # Start three task managers {code:bash} $ bin/taskmanager.sh start $ bin/taskmanager.sh start $ bin/taskmanager.sh start {code} # Wait for the job to stabilize. The log file should show that three tasks start for every operator. {code} GlobalWindows -> Sink: Print to Std. Out (3/3) (d10339d5755d07f3d9864ed1b2147af2) switched from INITIALIZING to RUNNING.{code} *Test* Stop one taskmanager {code:bash} $ bin/taskmanager.sh stop {code} Success condition: You should see that the job cancels and re-runs after a few seconds. In the logs you should find a line with the text “The TaskExecutor is shutting down”. *Teardown* Stop all taskmanagers and the jobmanager: {code:bash} bin/standalone-job.sh stop bin/taskmanager.sh stop-all {code} was: FLINK-25277 introduces explicit signalling between a TaskManager and the JobManager when the TaskManager shuts down. This reduces the time it takes for a reactive cluster to down-scale & restart. *Setup* # Add the following line to your flink config to enable reactive mode: {code:java} taskmanager.host: localhost # a workaround scheduler-mode: reactive restart-strategy: fixeddelay restart-strategy.fixed-delay.attempts: 100 {code} # Create a “usrlib” folder and place the TopSpeedWindowing jar into it {code:java} $ mkdir usrlib $ cp examples/streaming/TopSpeedWindowing.jar usrlib/ {code} # Start the job {code:java} $ bin/standalone-job.sh start --main-class org.apache.flink.streaming.examples.windowing.TopSpeedWindowing {code} # Start three task managers {code:java} $ bin/taskmanager.sh start $ bin/taskmanager.sh start $ bin/taskmanager.sh start {code} # Wait for the job to stabilize. The log file should show that three tasks start for every operator. {code:java} GlobalWindows -> Sink: Print to Std. Out (3/3) (d10339d5755d07f3d9864ed1b2147af2) switched from INITIALIZING to RUNNING.{code} *Test* Stop one taskmanager {{bin/taskmanager.sh stop}} Success condition: You should see that the job cancels and re-runs after a few seconds. In the logs you should find a line with the text “The TaskExecutor is shutting down”. *Teardown* Stop all taskmanagers and the jobmanager: {{bin/standalone-job.sh stop}} {{bin/taskmanager.sh stop-all}} > Release Testing: Explicit shutdown signalling from TaskManager to JobManager > ---------------------------------------------------------------------------- > > Key: FLINK-26400 > URL: https://issues.apache.org/jira/browse/FLINK-26400 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination > Affects Versions: 1.15.0 > Reporter: Niklas Semmler > Priority: Blocker > Labels: release-testing > Fix For: 1.15.0 > > > FLINK-25277 introduces explicit signalling between a TaskManager and the > JobManager when the TaskManager shuts down. This reduces the time it takes > for a reactive cluster to down-scale & restart. > > *Setup* > # Add the following line to your flink config to enable reactive mode: > {code} > taskmanager.host: localhost # a workaround > scheduler-mode: reactive > restart-strategy: fixeddelay > restart-strategy.fixed-delay.attempts: 100 > {code} > # Create a “usrlib” folder and place the TopSpeedWindowing jar into it > {code:bash} > $ mkdir usrlib > $ cp examples/streaming/TopSpeedWindowing.jar usrlib/ > {code} > # Start the job > {code:bash} > $ bin/standalone-job.sh start --main-class > org.apache.flink.streaming.examples.windowing.TopSpeedWindowing > {code} > # Start three task managers > {code:bash} > $ bin/taskmanager.sh start > $ bin/taskmanager.sh start > $ bin/taskmanager.sh start > {code} > # Wait for the job to stabilize. The log file should show that three tasks > start for every operator. > {code} > GlobalWindows -> Sink: Print to Std. Out (3/3) > (d10339d5755d07f3d9864ed1b2147af2) switched from INITIALIZING to > RUNNING.{code} > *Test* > Stop one taskmanager > {code:bash} > $ bin/taskmanager.sh stop > {code} > Success condition: You should see that the job cancels and re-runs after a > few seconds. In the logs you should find a line with the text “The > TaskExecutor is shutting down”. > *Teardown* > Stop all taskmanagers and the jobmanager: > {code:bash} > bin/standalone-job.sh stop > bin/taskmanager.sh stop-all > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)