[ https://issues.apache.org/jira/browse/FLINK-32883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755319#comment-17755319 ]
Yangze Guo commented on FLINK-32883: ------------------------------------ Hi, [~laughingman7743]. IIUC, your requirement should be fulfill by the [redundant taskmanager|https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/config/#slotmanager-redundant-taskmanager-num]. Also, [~xiangyu0xf] and I is working on FLINK-15959, in which Flink should support to retain min resources even if the cluster is idle. I think these two feature should cover the functionality of teh standby TM yon mentioned in this ticket. WDYT? > Support for standby task managers > --------------------------------- > > Key: FLINK-32883 > URL: https://issues.apache.org/jira/browse/FLINK-32883 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator > Affects Versions: kubernetes-operator-1.6.0 > Reporter: Tomoyuki NAKAMURA > Priority: Major > > [https://docs.ververica.com/user_guide/application_operations/deployments/scaling.html#run-with-standby-taskmanager] > I would like to be able to support standby task managers. Because on K8s, > pods are often evicted or deleted due to node failure or autoscaling. > With the current implementation, it is not possible to set up a standby task > manager, and jobs cannot run until all task managers are up and running. If a > standby task manager could be supported, jobs could continue to run without > downtime using the standby task manager, even if the task manager is > unexpectedly deleted. > [https://github.com/apache/flink-kubernetes-operator/blob/release-1.6.0/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkConfigBuilder.java#L370-L380|https://github.com/apache/flink-kubernetes-operator/blob/release-1.6.0/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkConfigBuilder.java#L370-L380] > If the job manager's number of replicas is set, the job's parallelism setting > is ignored, but it should be possible to support a standby task manager by > automatically setting parallelism to the replicas*task slot only if the job's > parallelism is not set (i.e. 0) and using that value if parallelism is set. > If this change looks good, I will send a PR on GitHub. -- This message was sent by Atlassian Jira (v8.20.10#820010)