[ https://issues.apache.org/jira/browse/FLINK-23849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403761#comment-17403761 ]
zlzhang0122 commented on FLINK-23849: ------------------------------------- [~trohrmann] ok, I see, maybe community have much more concern about some other things. But IMO the auto recover strategy can't guaranty the end-to-end exactly once if the downstream doesn't support transactional or idempotent. And support reaction to node updates such as decommission can make yarn come to a functional consistency just like k8s taint, also it's useful for graceful restart of streaming job. > Support react to the node decommissioning change state on yarn and do > graceful restart > -------------------------------------------------------------------------------------- > > Key: FLINK-23849 > URL: https://issues.apache.org/jira/browse/FLINK-23849 > Project: Flink > Issue Type: New Feature > Components: Deployment / YARN > Affects Versions: 1.12.2, 1.13.1, 1.13.2 > Reporter: zlzhang0122 > Priority: Major > Fix For: 1.15.0 > > > Now we are not interested in node updates in > YarnContainerEventHandler.onNodesUpdated , but sometimes we want to evict the > running flink process on one node and graceful restart on the other node > because of some unexpected reason such as the physical machine need to be > recycle or the cloud computing cluster need to be migration. Thus, we can > react to the node decommissioning change state, and call the > stopWithSavepoint function and then restart it. -- This message was sent by Atlassian Jira (v8.3.4#803005)