[ https://issues.apache.org/jira/browse/FLINK-14112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932660#comment-16932660 ]
Aaron Levin commented on FLINK-14112: ------------------------------------- [~Tison] [~till.rohrmann] thanks for responding so quickly! I agree that it's unlikely that someone is going to delete the znodes in {{/flink}} but figured in the rare case it happens it might be nice to hard fail, but if you decide to {{WONTFIX}} I understand! :) [~Tison] there were a lot of the following in the logs: {noformat} [2019-08-21 21:38:48.549762] 2019-08-21 21:38:48,549 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Closing TaskExecutor connection 62be23badd5a51b757c221cc750881cb because: ResourceManager leader changed to new address null {noformat} and {noformat} [2019-08-21 21:39:11.008135] 2019-08-21 21:39:11,007 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.ConnectException: Connection refused: qa-flinkjobmanager--087a757cd7fe67436.northwest.stripe.io/10.100.46.70:6123 {noformat} The second example is {{WARN}} so potentially we could just configure not to log that. Anyway, I think most of the log chunder is likely due to the way {{null}} is handled in this code. > Removing zookeeper state should cause the task manager and job managers to > restart > ---------------------------------------------------------------------------------- > > Key: FLINK-14112 > URL: https://issues.apache.org/jira/browse/FLINK-14112 > Project: Flink > Issue Type: Wish > Components: Runtime / Coordination > Affects Versions: 1.8.1, 1.9.0 > Reporter: Aaron Levin > Priority: Minor > > Suppose you have a flink application running on a cluster with the following > configuration: > {noformat} > high-availability.zookeeper.path.root: /flink > {noformat} > Now suppose you delete all the znodes within {{/flink}}. I experienced the > following: > * massive amount of logging > * application did not restart > * task manager did not crash or restart > * job manager did not crash or restart > From this state I had to restart all the task managers and all the job > managers in order for the flink application to recover. > It would be desirable for the Task Managers and Job Managers to crash if the > znode is not available (though perhaps you all have thought about this more > deeply than I!) -- This message was sent by Atlassian Jira (v8.3.4#803005)