[ 
https://issues.apache.org/jira/browse/FLINK-14112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932720#comment-16932720
 ] 

Stephan Ewen commented on FLINK-14112:
--------------------------------------

>From the RM / JM side (leader contender) can we treat the disappearing of a 
>ZNode as "loss of leadership"? A leader contender should recreate the node 
>when applying for leader status.

On the TM side, I am not sure what the issue is. Is it just some exception 
handling / null handling missing?

Side note: I remember in the past we also had some complexity from the fact 
that the "leader lock" is one Znode, but "leader address" and "leader fencing 
token" are different ZNodes. We were thinking to put the leader fencing token 
and the address as payload into the leader lock ZNode as a simplification of 
things.


> Removing zookeeper state should cause the task manager and job managers to 
> restart
> ----------------------------------------------------------------------------------
>
>                 Key: FLINK-14112
>                 URL: https://issues.apache.org/jira/browse/FLINK-14112
>             Project: Flink
>          Issue Type: Wish
>          Components: Runtime / Coordination
>    Affects Versions: 1.8.1, 1.9.0
>            Reporter: Aaron Levin
>            Priority: Minor
>
> Suppose you have a flink application running on a cluster with the following 
> configuration:
> {noformat}
> high-availability.zookeeper.path.root: /flink
> {noformat}
> Now suppose you delete all the znodes within {{/flink}}. I experienced the 
> following:
>  * massive amount of logging
>  * application did not restart
>  * task manager did not crash or restart
>  * job manager did not crash or restart
> From this state I had to restart all the task managers and all the job 
> managers in order for the flink application to recover.
> It would be desirable for the Task Managers and Job Managers to crash if the 
> znode is not available (though perhaps you all have thought about this more 
> deeply than I!)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to