[ https://issues.apache.org/jira/browse/ZOOKEEPER-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941321#comment-13941321 ]
Michi Mutsuzaki commented on ZOOKEEPER-1870: -------------------------------------------- [~shralex], it looks like the problem is in FastLeaderElection. WorkerReceiver.run() doesn't get out of the while loop after calling self.getElectionAlg().shutdown(), and the node 1 is becoming the leader when it shouldn't. Should we put break after self.getElectionAlg().shutdown() so that the rest of the logic doesn't get executed when restarting the leader election? > flakey test in StandaloneDisabledTest.startSingleServerTest > ----------------------------------------------------------- > > Key: ZOOKEEPER-1870 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1870 > Project: ZooKeeper > Issue Type: Bug > Components: tests > Affects Versions: 3.5.0 > Reporter: Patrick Hunt > Assignee: Helen Hastings > Priority: Critical > Attachments: ZOOKEEPER-1870.patch, test.log > > > I'm seeing lots of the following failure. Seems like a flakey test (passes > every so often). > {noformat} > junit.framework.AssertionFailedError: client could not connect to > reestablished quorum: giving up after 30+ seconds. > at > org.apache.zookeeper.test.ReconfigTest.testNormalOperation(ReconfigTest.java:143) > at > org.apache.zookeeper.server.quorum.StandaloneDisabledTest.startSingleServerTest(StandaloneDisabledTest.java:75) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > {noformat} > I've found 3 problems: > 1. QuorumCnxManager.Listener.run() leaks the socket depending on when the > shutdown flag gets set. > 2. QuorumCnxManager.halt() doesn't wait for the listener to terminate. > 3. QuorumPeer.shuttingDownLE flag doesn't get reset when restarting the > leader election. -- This message was sent by Atlassian JIRA (v6.2#6252)