Thanks, Oscar. It's good to hear you have found a work-around. These cases are tricky to root cause unless we have a repro, but at least now we have a description of your scenario so with some effort we could try to recreate it.
Thanks, Dag On 27.03.2013 11:05, Oskar Zinger wrote: > Here is an update... > > I started using stopMaster / stopSlave - URL connection attributes > before shutting down Derby in replication mode and also a 1000 ms > sleep time, and everything seems to be working reliably now. > > Not sure what is going on here, but there is a work-around. > > I also tried to modify Derby code to bypass all of the > NullPointerExceptions (one after another), but on the next start-up I > could no longer start replication. > > Thanks, > Oskar > > ------------------------------------------------------------------------ > *From:* Oskar Zinger <oska...@yahoo.com> > *To:* Derby Discussion <derby-user@db.apache.org> > *Sent:* Monday, February 25, 2013 2:42 PM > *Subject:* Re: NullPointerException when Shuting Down Derby > > Hi Dag, > > Imagine a two server system, S1 and S2. One is "designated" primary > (S1) and another is secondary (S2). Here is a scenario, and the > sequence of events: > > Note: Designated primary will always take control back as a primary > server in the cluster > > 1. (S1) primary starts and starts Derby - right now its stand-alone server > 2. (S2) secondary starts and starts Derby - now it will setup > replication, will execute startSlave in a new thread, and execute > startMaster > 3. (S1) now designated primary gets shutdown or crashes > 4. (S2) the secondary server detects this, assumes the role of primary > and stops Derby (shutdown of entire Derby engine - including all > databases - NOT using stopMaster / stopSlave), starts Derby as the new > master (primary) > 5. (S1) now designated primary comes back and wants to take control > back as the primary - that's where the problem happens - we call it > failback, a couple of things happen: > -- (S1) starts first as a secondary of the cluster - it needs to > resync configuration and database, now (S2) Derby is Primary, (S1) > Derby is Secondary > -- (S1) now sends message to switch roles, (S1) Derby is going > to shutdown (NullPointerException) and restart, (S2) is going to > shutdown and restart (cannot setup replication because of NPE on S1) > > Basically, it works the same way as in Step 4, and no NPE. And the > strangest thing is - this is only happening on 1-processor system, its > not possible to reproduce on a 2-processor system. > > Thanks, > Oskar > > ------------------------------------------------------------------------ > *From:* Dag Wanvik <dag.wan...@oracle.com> > *To:* Derby Discussion <derby-user@db.apache.org> > *Sent:* Sunday, February 24, 2013 10:42 PM > *Subject:* Re: NullPointerException when Shuting Down Derby > > > On 31.01.2013 06:13, Oskar Zinger wrote: >> This is only happening in a specific scenario when a host application >> server failbacks, so what it does is stops a service that manages >> derby network server, and restarts it. > > So, is this an attempt to shut down the ex-slave (now the failed over > master) after the old master has been (re)started? I would perhaps be > helpful if you can explain the replication scenario in some detail, > since replication contains much code specific to replication. > > Thanks, > Dag > > >