Looking at the log Patrick sent: https://builds.apache.org/job/ZooKeeper-trunk/2386/testReport/junit/org.apache.zookeeper.server.quorum/ReconfigRecoveryTest/testCurrentObserverIsParticipantInNewConfig/
This is really interesting. I see that server 2 boots and then the following happens: 2014-07-25 17:12:01,159 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@682] - Notification: 2 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2014-07-25 17:12:01,161 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@682] - Notification: 2 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LEADING (my state)0 (n.config version) 2014-07-25 17:12:01,163 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@293] - 2 Received version: 200000000 my version: 0 This just doesn't make sense. First, I start it with config 100000000 so it can't have "0 (n.config version)", second no one even established the first config, so who's sending it version 200000000 ? This is the first time it appears in he log. Is it possible that multiple tests are running at the same time and interfere with each other ? I'm not sure how to debug this without access to the machine and adding more debug messages. Alex On Fri, Jul 25, 2014 at 11:28 AM, Alexander Shraer <[email protected]> wrote: > Hi, > > Hongchao, could you please clarify what you propose ? > > If I understand correctly we have two main options - ZK-1989 that disables > reconfig and keeps the old static config and a simpler fix which is to > check for client ports on boot and shut down the server with an error if no > ports were found. > > Patrick, I'll take a look on the log. Michi and I didn't succeed to get > access to the build machine, which makes it very difficult to debug... > > Thanks, > Alex > > > On Fri, Jul 25, 2014 at 10:53 AM, Hongchao Deng <[email protected]> > wrote: > >> ZK-1989 gets pretty complicated if it needs to support the full backward >> compatibility. >> >> My plan is divide a small task out: simply keep the old config and make it >> work. There could be unexpected cases when users of old config tried to >> use >> reconfig. >> >> Is it okay for the first alpha release? >> >> >> On Fri, Jul 25, 2014 at 10:46 AM, Patrick Hunt <[email protected]> wrote: >> >> > 1974 has been committed (kudos folks!), along with a few other patches >> > that were ready to go. >> > >> > Hongchao, how is 1989 coming? >> > >> > Patrick >> > >> > >> > On Thu, Jul 24, 2014 at 4:50 PM, Patrick Hunt <[email protected]> wrote: >> > > Can someone take a look at this issue? The windows c client build is >> > > failing for a while now, would be great to fix this for 3.5.0... >> > > >> > > ZOOKEEPER-1974 winvs2008 jenkins job failing with "unresolved external >> > symbol" >> > > >> > > Patrick >> > > >> > > On Thu, Jul 24, 2014 at 10:19 AM, Raúl Gutiérrez Segalés >> > > <[email protected]> wrote: >> > >> On 24 July 2014 09:47, Patrick Hunt <[email protected]> wrote: >> > >> >> > >>> We've identified the issues with 1987, it would be good if folks >> could >> > >>> take a look. >> > >> >> > >> >> > >> Great - thanks Patrick. Added some comments to the patch. >> > >> >> > >> >> > >>> Nothing looks unsolvable, but we should tweak things a >> > >>> bit before 3.5.0, esp given the current upgrade experience. The new >> > >>> docs will help a lot - see >> > >>> https://issues.apache.org/jira/browse/ZOOKEEPER-1660 which we need >> to >> > >>> review and commit. >> > >>> >> > >> >> > >> Reading the docs, will follow-up with comments. >> > >> >> > >> >> > >> -rgs >> > >> >> >> >> -- >> *- Hongchao Deng* >> *Software Engineer, CCE* >> > >
