[
https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475408#comment-15475408
]
Michael Han commented on ZOOKEEPER-2080:
----------------------------------------
Thanks for feedback, [~fpj]!
bq. Is it the case that the {{ synchronized(self) }} block only needs to cover
up to {{self.processReconfig}}
I think from the way the code was implemented, the block is to cover not just
the processReconfig, but also the {{if (!rqv.equals(curQV)) {}} following, and
the entire block has to be a single atomic operation.
bq. If so, perhaps we should end the block there and process the following
outside.
Are you suggesting something like
[this|https://github.com/hanm/zookeeper/blob/60bcf70b83605e0fd574a95fd00a32e7e763f6f7/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L292]?
Here I make the sync block only covers processReconfig. However there are a
couple of reconfig tests failed deterministically because of such a change. I
haven't figured out why those tests failed, but it could be the entire code
block (processReconfig + the rest of what's the current sync covered currently)
has to be covered such that the processReconfig and this has to be a
transaction together:
{code}
if (!rqv.equals(curQV)) {
LOG.info("restarting leader election");
self.shuttingDownLE = true;
self.getElectionAlg().shutdown();
break;
}
{code}
So making the sync block only covers up to processReconfig probably would not
work ... (I need to dig deeper into the reconfig logic - probably do some data
flow analysis to figure out the dependencies:) - for now I can't prove on paper
it's safe (or not) as for where the sync block ends (I am using existing tests
as the verifier).
> ReconfigRecoveryTest fails intermittently
> -----------------------------------------
>
> Key: ZOOKEEPER-2080
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
> Project: ZooKeeper
> Issue Type: Sub-task
> Reporter: Ted Yu
> Assignee: Michael Han
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch,
> ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch,
> jacoco-ZOOKEEPER-2080.unzip-grows-to-70MB.7z, repro-20150816.log,
> threaddump.log
>
>
> I got the following test failure on MacBook with trunk code:
> {code}
> Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
> FAILED
> waiting for server 2 being up
> junit.framework.AssertionFailedError: waiting for server 2 being up
> at
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
> at
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)