Thanks, Oscar. It's good to hear you have found a work-around. These
cases are tricky to root cause unless we have a repro, but at least now
we have a description of your scenario so with some effort we could try
to recreate it.

Thanks,
Dag

On 27.03.2013 11:05, Oskar Zinger wrote:
> Here is an update...
>
> I started using stopMaster / stopSlave - URL connection attributes
> before shutting down Derby in replication mode and also a 1000 ms
> sleep time, and everything seems to be working reliably now.
>
> Not sure what is going on here, but there is a work-around.
>
> I also tried to modify Derby code to bypass all of the
> NullPointerExceptions (one after another), but on the next start-up I
> could no longer start replication.
>
> Thanks,
> Oskar
>
> ------------------------------------------------------------------------
> *From:* Oskar Zinger <oska...@yahoo.com>
> *To:* Derby Discussion <derby-user@db.apache.org>
> *Sent:* Monday, February 25, 2013 2:42 PM
> *Subject:* Re: NullPointerException when Shuting Down Derby
>
> Hi Dag,
>
> Imagine a two server system, S1 and S2. One is "designated" primary
> (S1) and another is secondary (S2). Here is a scenario, and the
> sequence of events:
>
> Note: Designated primary will always take control back as a primary
> server in the cluster
>
> 1. (S1) primary starts and starts Derby - right now its stand-alone server
> 2. (S2) secondary starts and starts Derby - now it will setup
> replication, will execute startSlave in a new thread, and execute
> startMaster
> 3. (S1) now designated primary gets shutdown or crashes
> 4. (S2) the secondary server detects this, assumes the role of primary
> and stops Derby (shutdown of entire Derby engine - including all
> databases - NOT using stopMaster / stopSlave), starts Derby as the new
> master (primary)
> 5. (S1) now designated primary comes back and wants to take control
> back as the primary - that's where the problem happens - we call it
> failback, a couple of things happen:
>       -- (S1) starts first as a secondary of the cluster - it needs to
> resync configuration and database, now (S2) Derby is Primary, (S1)
> Derby is Secondary
>       -- (S1) now sends message to switch roles, (S1) Derby is going
> to shutdown (NullPointerException) and restart, (S2) is going to
> shutdown and restart (cannot setup replication because of NPE on S1)
>
> Basically, it works the same way as in Step 4, and no NPE. And the
> strangest thing is - this is only happening on 1-processor system, its
> not possible to reproduce on a 2-processor system.
>
> Thanks,
> Oskar
>
> ------------------------------------------------------------------------
> *From:* Dag Wanvik <dag.wan...@oracle.com>
> *To:* Derby Discussion <derby-user@db.apache.org>
> *Sent:* Sunday, February 24, 2013 10:42 PM
> *Subject:* Re: NullPointerException when Shuting Down Derby
>
>
> On 31.01.2013 06:13, Oskar Zinger wrote:
>> This is only happening in a specific scenario when a host application
>> server failbacks, so what it does is stops a service that manages
>> derby network server, and restarts it.
>
> So, is this an attempt to shut down the ex-slave (now the failed over
> master) after the old master has been (re)started? I would perhaps be
> helpful if you can explain the replication scenario in some detail,
> since replication contains much code specific to replication.
>
> Thanks,
> Dag
>
>
>

Reply via email to