[ https://issues.apache.org/jira/browse/ZOOKEEPER-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264314#comment-14264314 ]
Rakesh R commented on ZOOKEEPER-1865: ------------------------------------- bq. Are there plans to require Java 7 or later for Zookeeper in the near future? [~ecarter], since ZOOKEEPER-1963 is in, we can go ahead with this. BTW could you tell me the reason for (self.initLimit - self.syncLimit). Also, there could be chance of self.syncLimit > self.initLimit and evaluate to negative integer? > Fix retry logic in Learner.connectToLeader() > --------------------------------------------- > > Key: ZOOKEEPER-1865 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1865 > Project: ZooKeeper > Issue Type: Bug > Components: server > Reporter: Thawan Kooburat > Assignee: Edward Carter > Fix For: 3.5.1 > > Attachments: ZOOKEEPER-1865.patch > > > We discovered a long leader election time today in one of our prod ensemble. > Here is the description of the event. > Before the old leader goes down, it is able to announce notification message. > So 3 out 5 (including the old leader) elected the old leader to be a new > leader for the next epoch. While, the old leader is being rebooted, 2 other > machines are trying to connect to the old leader. So the quorum couldn't > form until those 2 machines give up and move to the next round of leader > election. > This is because Learner.connectToLeader() use a simple retry logic. The > contract for this method is that it should never spend longer that initLimit > trying to connect to the leader. In our outage, each sock.connect() is > probably blocked for initLimit and it is called 5 times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)