[ 
https://issues.apache.org/jira/browse/FLINK-23849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403761#comment-17403761
 ] 

zlzhang0122 commented on FLINK-23849:
-------------------------------------

[~trohrmann] ok, I see, maybe community have much more concern about some other 
things. But IMO the auto recover strategy can't guaranty the end-to-end exactly 
once if the downstream doesn't support transactional or idempotent. And support 
reaction to node updates such as decommission can make yarn come to a 
functional consistency just like k8s taint, also it's useful for graceful 
restart of streaming job.

> Support react to the node decommissioning change state on yarn and do 
> graceful restart
> --------------------------------------------------------------------------------------
>
>                 Key: FLINK-23849
>                 URL: https://issues.apache.org/jira/browse/FLINK-23849
>             Project: Flink
>          Issue Type: New Feature
>          Components: Deployment / YARN
>    Affects Versions: 1.12.2, 1.13.1, 1.13.2
>            Reporter: zlzhang0122
>            Priority: Major
>             Fix For: 1.15.0
>
>
> Now we are not interested in node updates in 
> YarnContainerEventHandler.onNodesUpdated , but sometimes we want to evict the 
> running flink process on one node and graceful restart on the other node 
> because of some unexpected reason such as the physical machine need to be 
> recycle or the cloud computing cluster need to be migration. Thus, we can 
> react to the node decommissioning change state, and call the 
> stopWithSavepoint function and then restart it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to