[ https://issues.apache.org/jira/browse/FLINK-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lamber-ken closed FLINK-13189. ------------------------------ Resolution: Duplicate Fix Version/s: (was: 1.9.0) Release Note: duplicate to FLINK-10052 > Fix the impact of zookeeper network disconnect temporarily on flink long > running jobs > ------------------------------------------------------------------------------------- > > Key: FLINK-13189 > URL: https://issues.apache.org/jira/browse/FLINK-13189 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.8.1 > Reporter: lamber-ken > Assignee: lamber-ken > Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > *Issue detail info* > We deploy flink streaming jobs on hadoop cluster on per-job model and use > zookeeper as HighAvailabilityService, but we found that flink job will > restart because of the network was disconnected temporarily between > jobmanager and zookeeper. > So we analyze this problem deeply. Flink JobManager use curator's > `+LeaderLatch+` to maintain the leadership. When network disconncet, the > `+LeaderLatch+` will change leadership to false directly. We think it's too > brutally that many flink longrunning jobs will restart because of the network > shake. > > *Fix this issue* > From curator official website, we found that this issuse was fixed at > curator-3.x.x, but we can't not just change the flink-curator-version(2.12.0) > to 3.x.x because of zk-compatibility. Curator-2.x.x support zookeeper-3.4.x > and zookeeper-3.5.0, curator-3.x.x just compatible with ZooKeeper 3.5.x. > Based on the above considerations, we update `LeaderLatch` at > flink-shaded-curator module. > > *Other* > Any suggestions are webcome, thanks > > *Useful links* > [https://curator.apache.org/zk-compatibility.html] > [https://cwiki.apache.org/confluence/display/CURATOR/Releases] > [http://curator.apache.org/curator-recipes/leader-latch.html] > -- This message was sent by Atlassian JIRA (v7.6.14#76016)