Re: NullPointerException when Shuting Down Derby

2013-03-28 Thread Dag Wanvik
Thanks, Oscar. It's good to hear you have found a work-around. These
cases are tricky to root cause unless we have a repro, but at least now
we have a description of your scenario so with some effort we could try
to recreate it.

Thanks,
Dag

On 27.03.2013 11:05, Oskar Zinger wrote:
 Here is an update...

 I started using stopMaster / stopSlave - URL connection attributes
 before shutting down Derby in replication mode and also a 1000 ms
 sleep time, and everything seems to be working reliably now.

 Not sure what is going on here, but there is a work-around.

 I also tried to modify Derby code to bypass all of the
 NullPointerExceptions (one after another), but on the next start-up I
 could no longer start replication.

 Thanks,
 Oskar

 
 *From:* Oskar Zinger oska...@yahoo.com
 *To:* Derby Discussion derby-user@db.apache.org
 *Sent:* Monday, February 25, 2013 2:42 PM
 *Subject:* Re: NullPointerException when Shuting Down Derby

 Hi Dag,

 Imagine a two server system, S1 and S2. One is designated primary
 (S1) and another is secondary (S2). Here is a scenario, and the
 sequence of events:

 Note: Designated primary will always take control back as a primary
 server in the cluster

 1. (S1) primary starts and starts Derby - right now its stand-alone server
 2. (S2) secondary starts and starts Derby - now it will setup
 replication, will execute startSlave in a new thread, and execute
 startMaster
 3. (S1) now designated primary gets shutdown or crashes
 4. (S2) the secondary server detects this, assumes the role of primary
 and stops Derby (shutdown of entire Derby engine - including all
 databases - NOT using stopMaster / stopSlave), starts Derby as the new
 master (primary)
 5. (S1) now designated primary comes back and wants to take control
 back as the primary - that's where the problem happens - we call it
 failback, a couple of things happen:
   -- (S1) starts first as a secondary of the cluster - it needs to
 resync configuration and database, now (S2) Derby is Primary, (S1)
 Derby is Secondary
   -- (S1) now sends message to switch roles, (S1) Derby is going
 to shutdown (NullPointerException) and restart, (S2) is going to
 shutdown and restart (cannot setup replication because of NPE on S1)

 Basically, it works the same way as in Step 4, and no NPE. And the
 strangest thing is - this is only happening on 1-processor system, its
 not possible to reproduce on a 2-processor system.

 Thanks,
 Oskar

 
 *From:* Dag Wanvik dag.wan...@oracle.com
 *To:* Derby Discussion derby-user@db.apache.org
 *Sent:* Sunday, February 24, 2013 10:42 PM
 *Subject:* Re: NullPointerException when Shuting Down Derby


 On 31.01.2013 06:13, Oskar Zinger wrote:
 This is only happening in a specific scenario when a host application
 server failbacks, so what it does is stops a service that manages
 derby network server, and restarts it.

 So, is this an attempt to shut down the ex-slave (now the failed over
 master) after the old master has been (re)started? I would perhaps be
 helpful if you can explain the replication scenario in some detail,
 since replication contains much code specific to replication.

 Thanks,
 Dag






Re: NullPointerException when Shuting Down Derby

2013-03-26 Thread Oskar Zinger
Here is an update...

I started using stopMaster / stopSlave - URL connection attributes before 
shutting down Derby in replication mode and also a 1000 ms sleep time, and 
everything seems to be working reliably now.

Not sure what is going on here, but there is a work-around.

I also tried to modify Derby code to bypass all of the NullPointerExceptions 
(one after another), but on the next start-up I could no longer start 
replication.

Thanks,
Oskar



 From: Oskar Zinger oska...@yahoo.com
To: Derby Discussion derby-user@db.apache.org 
Sent: Monday, February 25, 2013 2:42 PM
Subject: Re: NullPointerException when Shuting Down Derby
 

Hi Dag,

Imagine a two server system, S1 and S2. One is designated primary (S1) and 
another is secondary (S2). Here is a scenario, and the sequence of events:

Note: Designated primary will always take control back as a primary server in 
the cluster


1. (S1) primary starts and starts Derby - right now its stand-alone server
2. (S2) secondary starts and starts Derby - now it will setup replication, will 
execute startSlave in a new thread, and execute startMaster
3. (S1) now designated primary gets shutdown or crashes
4. (S2) the secondary server detects this, assumes the role of primary and 
stops Derby (shutdown of entire Derby engine - including all databases - NOT 
using stopMaster / stopSlave), starts Derby as the new master (primary)
5. (S1) now designated primary comes back and wants to take control back as the 
primary - that's where the problem happens - we call it failback, a couple of 
things happen:
  -- (S1) starts first as a secondary of the cluster - it needs to resync 
configuration and database, now (S2) Derby is Primary, (S1) Derby is Secondary
  -- (S1) now sends message to switch roles, (S1) Derby is going to 
shutdown (NullPointerException) and restart, (S2) is going to shutdown and 
restart (cannot setup replication because of NPE on S1)

Basically, it works the same way as in Step 4, and no NPE. And the strangest 
thing is - this is only happening on 1-processor system, its not possible to 
reproduce on a 2-processor system.

Thanks,
Oskar




 From: Dag Wanvik dag.wan...@oracle.com
To: Derby Discussion derby-user@db.apache.org 
Sent: Sunday, February 24, 2013 10:42 PM
Subject: Re: NullPointerException when Shuting Down Derby
 



On 31.01.2013 06:13, Oskar Zinger wrote:

This is only happening in a specific scenario when a host application server 
failbacks, so what it does is stops a service that manages derby network 
server, and restarts it.

So, is this an attempt to shut down the ex-slave (now the failed
over master) after the old master has been (re)started? I would
perhaps be helpful if you can explain the replication scenario in
some detail, since replication contains much code specific to
replication.

Thanks,
Dag

Re: NullPointerException when Shuting Down Derby

2013-02-25 Thread Oskar Zinger
Hi Dag,

Imagine a two server system, S1 and S2. One is designated primary (S1) and 
another is secondary (S2). Here is a scenario, and the sequence of events:

Note: Designated primary will always take control back as a primary server in 
the cluster


1. (S1) primary starts and starts Derby - right now its stand-alone server
2. (S2) secondary starts and starts Derby - now it will setup replication, will 
execute startSlave in a new thread, and execute startMaster
3. (S1) now designated primary gets shutdown or crashes
4. (S2) the secondary server detects this, assumes the role of primary and 
stops Derby (shutdown of entire Derby engine - including all databases - NOT 
using stopMaster / stopSlave), starts Derby as the new master (primary)
5. (S1) now designated primary comes back and wants to take control back as the 
primary - that's where the problem happens - we call it failback, a couple of 
things happen:
  -- (S1) starts first as a secondary of the cluster - it needs to resync 
configuration and database, now (S2) Derby is Primary, (S1) Derby is Secondary
  -- (S1) now sends message to switch roles, (S1) Derby is going to 
shutdown (NullPointerException) and restart, (S2) is going to shutdown and 
restart (cannot setup replication because of NPE on S1)

Basically, it works the same way as in Step 4, and no NPE. And the strangest 
thing is - this is only happening on 1-processor system, its not possible to 
reproduce on a 2-processor system.

Thanks,
Oskar




 From: Dag Wanvik dag.wan...@oracle.com
To: Derby Discussion derby-user@db.apache.org 
Sent: Sunday, February 24, 2013 10:42 PM
Subject: Re: NullPointerException when Shuting Down Derby
 



On 31.01.2013 06:13, Oskar Zinger wrote:

This is only happening in a specific scenario when a host application server 
failbacks, so what it does is stops a service that manages derby network 
server, and restarts it.

So, is this an attempt to shut down the ex-slave (now the failed
over master) after the old master has been (re)started? I would
perhaps be helpful if you can explain the replication scenario in
some detail, since replication contains much code specific to
replication.

Thanks,
Dag

Re: NullPointerException when Shuting Down Derby

2013-02-24 Thread Dag Wanvik

On 31.01.2013 06:13, Oskar Zinger wrote:
 This is only happening in a specific scenario when a host application
 server failbacks, so what it does is stops a service that manages
 derby network server, and restarts it.

So, is this an attempt to shut down the ex-slave (now the failed over
master) after the old master has been (re)started? I would perhaps be
helpful if you can explain the replication scenario in some detail,
since replication contains much code specific to replication.

Thanks,
Dag


 There is no second shutdown, because if that was happening I would see
 duplicate debug message that tell me that the code is about to execute
 shutdown.

 This is done in master and slave derby replication environment. This
 does not happen always during shutdown of derby.

 Thanks,
 Oskar

 
 *From:* Katherine Marsden kmarsdende...@sbcglobal.net
 *To:* Derby Discussion derby-user@db.apache.org
 *Sent:* Wednesday, January 30, 2013 2:16 PM
 *Subject:* Re: NullPointerException when Shuting Down Derby

 On 1/30/2013 8:38 AM, Oskar Zinger wrote:
 Hi Kathey,

 Here is what happens...

 Yes, this is a multi-threaded system. No, only one shutdown request
 is happening. I have not tried with deregister=true.

 The new finding is...

 This NullPointerException is only happening in a Single CPU (or
 single Core) system. When the system and Kernel is upgraded to 2 CPU
 / core system - the NullPointerException no longer happens, also the
 same has been tried on a 4 CPU / core system and it does not happen
 there again. So strangely this is only happening on a single
 processor system.

 I feel that there should be a check for null in
 org.apache.derby.impl.store.raw.xact.XactFactory.add() method to
 check for ttab for null.

 It seems to me that there is something hapenning that we don't
 understand. Although I would certainly accept a patch to put a null
 check here, my guess is that there might be another NPE  that  you
 will hit down the line if you fix that one,  so worth taking a little
 time to understand what is going on and try to get a stand alone
 reproduction if there is a bug.It has the feel of two shutdowns at
 once or maybe some post commit action still happening during the shutdown.

 Can you describe what is going on when this happens and try to make a 
 reproduction you can post?
 I'd say go ahead and file a Jira and put the information there and
 attach logs and such for help from the community as you debug.  It
 would be helpful too to use debug  jars so we get line numbers in the
 stack trace.
  
 Best

 Kathey




 Regards,
 Oskar

 
 *From:* Katherine Marsden kmarsdende...@sbcglobal.net
 mailto:kmarsdende...@sbcglobal.net
 *To:* Derby Discussion derby-user@db.apache.org
 mailto:derby-user@db.apache.org
 *Cc:* Oskar Zinger oska...@yahoo.com mailto:oska...@yahoo.com
 *Sent:* Tuesday, January 29, 2013 12:22 AM
 *Subject:* Re: NullPointerException when Shuting Down Derby

 On 1/28/2013 3:04 PM, Oskar Zinger wrote:
 I just upgraded to 10.8.3 and I am still running into the same
 NullPointerException (NPE).

 This is actually different, it comes from:
 org.apache.derby.impl.store.raw.xact.XactFactory.add(Unknown Source)


 Hi Oskar,

 Thank you for upgrading to the latest. That always makes things
 easier to debug.
 I guess the next step is to understand how you get into this state.
 Is there a stand alone reproduction that you can post in Jira?
 Is there something in the log prior to this NPE that might give us an
 indication of what was going on when you got the NPE?

 Is your program multi-threaded? Is there possibly more than one
 thread shutting down at once?  Do you have the same problem if you
 use deregister=true?


 Best

 Kathey


 Thanks,
 Oskar

 
 *From:* Katherine Marsden kmarsdende...@sbcglobal.net
 mailto:kmarsdende...@sbcglobal.net
 *To:* Derby Discussion derby-user@db.apache.org
 mailto:derby-user@db.apache.org
 *Cc:* Oskar Zinger oska...@yahoo.com mailto:oska...@yahoo.com
 *Sent:* Monday, January 28, 2013 4:28 PM
 *Subject:* Re: NullPointerException when Shuting Down Derby

 On 1/28/2013 12:52 PM, Oskar Zinger wrote:
 When I do the following it sometimes returns a NullPointerException:

  
 DriverManager.getConnection(jdbc:derby:;shutdown=true;deregister=false);

 Here is the exception stack trace:

 Caused by: java.lang.NullPointerException
 at
 org.apache.derby.impl.store.raw.xact.XactFactory.add(Unknown Source)
 at
 org.apache.derby.impl.store.raw.xact.XactFactory.pushTransactionContext(Unknown
 Source)
 at
 org.apache.derby.impl.store.raw.xact.XactFactory.startInternalTransaction(Unknown
 Source)
 at
 org.apache.derby.impl.store.raw.log.LogToFile.checkpointWithTran(Unknown
 Source

Re: NullPointerException when Shuting Down Derby

2013-02-22 Thread Katherine Marsden

On 1/30/2013 2:56 PM, Katherine Marsden wrote:

On 1/30/2013 12:13 PM, Oskar Zinger wrote:
This is only happening in a specific scenario when a host application 
server failbacks, so what it does is stops a service that manages 
derby network server, and restarts it.


There is no second shutdown, because if that was happening I would 
see duplicate debug message that tell me that the code is about to 
execute shutdown.


This is done in master and slave derby replication environment. This 
does not happen always during shutdown of derby.
Thanks Oskar for the information. I am not at all familiar with the 
replication environment, so can't say what to expect or not or whether 
adding the null check would help. I would suggest you go ahead and  
file an issue in Jira and then  perhaps some folks on the dev list 
with a knowledge of replication will be able to provide input.


If you can reproduce with a debug build that would be helpful and 
provide line numbers.  Derby is easy to build if you want to try out 
the Null Pointer Check in your environment.



There is a property you can put in derby.properties:
derby.stream.error.logBootTrace=true
that might be helpful in debugging this issue especially if you are 
using custom classloaders.





Re: NullPointerException when Shuting Down Derby

2013-01-28 Thread Katherine Marsden

On 1/28/2013 12:52 PM, Oskar Zinger wrote:

When I do the following it sometimes returns a NullPointerException:

DriverManager.getConnection(jdbc:derby:;shutdown=true;deregister=false);

Here is the exception stack trace:

Caused by: java.lang.NullPointerException
at org.apache.derby.impl.store.raw.xact.XactFactory.add(Unknown Source)
at 
org.apache.derby.impl.store.raw.xact.XactFactory.pushTransactionContext(Unknown 
Source)
at 
org.apache.derby.impl.store.raw.xact.XactFactory.startInternalTransaction(Unknown 
Source)
at 
org.apache.derby.impl.store.raw.log.LogToFile.checkpointWithTran(Unknown 
Source)
at 
org.apache.derby.impl.store.raw.log.LogToFile.checkpoint(Unknown Source)

at org.apache.derby.impl.store.raw.RawStore.stop(Unknown Source)
at org.apache.derby.impl.services.monitor.TopService.stop(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.TopService.shutdown(Unknown Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.shutdown(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.shutdown(Unknown 
Source)

at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source)
at java.sql.DriverManager.getConnection(DriverManager.java:399)
at java.sql.DriverManager.getConnection(DriverManager.java:350)

Does anyone know why this is happening?

I am using Derby 10.8.2.3

Thanks,
Oskar Zinger


https://issues.apache.org/jira/browse/DERBY-5916 was backported to 10.8 
with revision 1395186 and looks similar

I suggest you pick up 10.8.3 that has that fix.



Re: NullPointerException when Shuting Down Derby

2013-01-28 Thread Oskar Zinger
I just upgraded to 10.8.3 and I am still running into the same 
NullPointerException (NPE).


This is actually different, it comes from:
org.apache.derby.impl.store.raw.xact.XactFactory.add(Unknown Source)

Thanks,
Oskar




 From: Katherine Marsden kmarsdende...@sbcglobal.net
To: Derby Discussion derby-user@db.apache.org 
Cc: Oskar Zinger oska...@yahoo.com 
Sent: Monday, January 28, 2013 4:28 PM
Subject: Re: NullPointerException when Shuting Down Derby
 

On 1/28/2013 12:52 PM, Oskar Zinger wrote:

When I do the following it sometimes returns a NullPointerException:


 
DriverManager.getConnection(jdbc:derby:;shutdown=true;deregister=false);


Here is the exception stack trace:


Caused by: java.lang.NullPointerException
        at org.apache.derby.impl.store.raw.xact.XactFactory.add(Unknown Source)
        at 
org.apache.derby.impl.store.raw.xact.XactFactory.pushTransactionContext(Unknown
 Source)
        at 
org.apache.derby.impl.store.raw.xact.XactFactory.startInternalTransaction(Unknown
 Source)
        at 
org.apache.derby.impl.store.raw.log.LogToFile.checkpointWithTran(Unknown 
Source)
        at org.apache.derby.impl.store.raw.log.LogToFile.checkpoint(Unknown 
Source)
        at org.apache.derby.impl.store.raw.RawStore.stop(Unknown Source)
        at org.apache.derby.impl.services.monitor.TopService.stop(Unknown 
Source)
        at org.apache.derby.impl.services.monitor.TopService.shutdown(Unknown 
Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.shutdown(Unknown 
Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.shutdown(Unknown 
Source)
        at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
        at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source)
        at java.sql.DriverManager.getConnection(DriverManager.java:399)
        at java.sql.DriverManager.getConnection(DriverManager.java:350)


Does anyone know why this is happening?


I am using Derby 10.8.2.3


Thanks,
Oskar Zinger
https://issues.apache.org/jira/browse/DERBY-5916 was backported to 10.8 with 
revision 1395186 and looks similar
I suggest you pick up 10.8.3 that has that fix.