[
https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334720#comment-15334720
]
Michael Han commented on ZOOKEEPER-2080:
----------------------------------------
I've been trying reproduce the failure cases and analyzing failed logs in last
couple of days. After mining enough data, I am fairly confident to say that the
culprit responsible for the sporadic failure of this test case is
FastLeaderElection.shutdown, which never returns in the failed cases. What
happened looks like:
* Server 3 joins ensemble and starts looking for a leader.
* Connections between server 3 and 2/1/0 were broken for some reasons (unclear
to me, but it happen on both failed and succeeded cases.).
* Server 3 restarts leader election (happens on both failed and succeed cases.).
* The first thing when restart leader election is to shutdown the old FLE,
where server 3 halts (when joining listener thread.) in failed cases. From this
point, server 3 is left in a bad state and would never recover (increase
timeout would not help).
This also aligns with some of observations previously pointed out by Alex and
Akihiro. Fix ZOOKEEPER-2246 might fix this as well, so I assigned that issue to
myself. Working on a patch now (which, not might require get ZOOKEEPER-900 done
first, we will see.).
> ReconfigRecoveryTest fails intermittently
> -----------------------------------------
>
> Key: ZOOKEEPER-2080
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
> Project: ZooKeeper
> Issue Type: Sub-task
> Reporter: Ted Yu
> Assignee: Michael Han
> Attachments: jacoco-ZOOKEEPER-2080.unzip-grows-to-70MB.7z,
> repro-20150816.log
>
>
> I got the following test failure on MacBook with trunk code:
> {code}
> Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
> FAILED
> waiting for server 2 being up
> junit.framework.AssertionFailedError: waiting for server 2 being up
> at
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
> at
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)