[jira] [Commented] (FLINK-14112) Removing zookeeper state should cause the task manager and job managers to restart

TisonKun (Jira) Thu, 19 Sep 2019 01:52:46 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-14112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933184#comment-16933184
 ]


TisonKun commented on FLINK-14112:
----------------------------------

Thanks for your insights [~trohrmann]. Sounds reasonable to me. Currently I'm 
fine to handle {{null}} value in leader listener.

One thing in addition. There is an edge case that leader election will be 
affected if znodes deleted out of control. If there is only one contender(which 
is in YARN scenario), if the leader latch deleted, no one will be noticed by 
this event and the contender will think itself still the leader. Due to our 
implement details {{ZooKeeperLeaderElectionService}} is a {{NodeCacheListener}} 
and thus if the leader info node got deleted it will try to re-create the 
znode. Thus it is strange if TM cannot recover the connection of RM. 
[~aaronlevin] did you see a reconnect successfully log?

> Removing zookeeper state should cause the task manager and job managers to 
> restart
> ----------------------------------------------------------------------------------
>
>                 Key: FLINK-14112
>                 URL: https://issues.apache.org/jira/browse/FLINK-14112
>             Project: Flink
>          Issue Type: Wish
>          Components: Runtime / Coordination
>    Affects Versions: 1.8.1, 1.9.0
>            Reporter: Aaron Levin
>            Priority: Minor
>
> Suppose you have a flink application running on a cluster with the following 
> configuration:
> {noformat}
> high-availability.zookeeper.path.root: /flink
> {noformat}
> Now suppose you delete all the znodes within {{/flink}}. I experienced the 
> following:
>  * massive amount of logging
>  * application did not restart
>  * task manager did not crash or restart
>  * job manager did not crash or restart
> From this state I had to restart all the task managers and all the job 
> managers in order for the flink application to recover.
> It would be desirable for the Task Managers and Job Managers to crash if the 
> znode is not available (though perhaps you all have thought about this more 
> deeply than I!)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-14112) Removing zookeeper state should cause the task manager and job managers to restart

Reply via email to