Hi Dag,

Imagine a two server system, S1 and S2. One is "designated" primary (S1) and 
another is secondary (S2). Here is a scenario, and the sequence of events:

Note: Designated primary will always take control back as a primary server in 
the cluster


1. (S1) primary starts and starts Derby - right now its stand-alone server
2. (S2) secondary starts and starts Derby - now it will setup replication, will 
execute startSlave in a new thread, and execute startMaster
3. (S1) now designated primary gets shutdown or crashes
4. (S2) the secondary server detects this, assumes the role of primary and 
stops Derby (shutdown of entire Derby engine - including all databases - NOT 
using stopMaster / stopSlave), starts Derby as the new master (primary)
5. (S1) now designated primary comes back and wants to take control back as the 
primary - that's where the problem happens - we call it failback, a couple of 
things happen:
      -- (S1) starts first as a secondary of the cluster - it needs to resync 
configuration and database, now (S2) Derby is Primary, (S1) Derby is Secondary
      -- (S1) now sends message to switch roles, (S1) Derby is going to 
shutdown (NullPointerException) and restart, (S2) is going to shutdown and 
restart (cannot setup replication because of NPE on S1)

Basically, it works the same way as in Step 4, and no NPE. And the strangest 
thing is - this is only happening on 1-processor system, its not possible to 
reproduce on a 2-processor system.

Thanks,
Oskar



________________________________
 From: Dag Wanvik <dag.wan...@oracle.com>
To: Derby Discussion <derby-user@db.apache.org> 
Sent: Sunday, February 24, 2013 10:42 PM
Subject: Re: NullPointerException when Shuting Down Derby
 



On 31.01.2013 06:13, Oskar Zinger wrote:

This is only happening in a specific scenario when a host application server 
failbacks, so what it does is stops a service that manages derby network 
server, and restarts it.
>
So, is this an attempt to shut down the ex-slave (now the failed
    over master) after the old master has been (re)started? I would
    perhaps be helpful if you can explain the replication scenario in
    some detail, since replication contains much code specific to
    replication.

Thanks,
Dag

Reply via email to