Alex Rudyy created QPID-6997:
--------------------------------

             Summary: [Java Broker, BDBStore] HA JE Transaction commit might 
end up in NullPointerException when commit is invoked when majority is lost
                 Key: QPID-6997
                 URL: https://issues.apache.org/jira/browse/QPID-6997
             Project: Qpid
          Issue Type: Bug
          Components: Java Broker
    Affects Versions: qpid-java-6.0
            Reporter: Alex Rudyy


On running some tests for BDB HA  with default settings 
(ReplicaAckPolicy=SIMPLE_MAJORITY) involving publishing messages in parallel 
from multiple connections and disconnecting replica nodes in the middle of 
publishing transactions, in some rare circumstances the publishing transaction 
JE commit failed with NPE thrown from JE code as below:
{noformat}
########################################################################
#
# Unhandled Exception java.lang.NullPointerException in Thread 
IO-/127.0.0.1:35408
#
# Exiting
#
########################################################################
java.lang.NullPointerException
 at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:130)
 at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1130)
 at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:191)
 at com.sleepycat.je.txn.Txn.commit(Txn.java:756)
 at com.sleepycat.je.Transaction.doCommit(Transaction.java:609)
 at com.sleepycat.je.Transaction.commit(Transaction.java:410)
 at 
org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.commitAsync(ReplicatedEnvironmentFacade.java:302)
 at 
org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.commitTranAsyncImpl(AbstractBDBMessageStore.java:794)
 at 
org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.access$1200(AbstractBDBMessageStore.java:74)
 at 
org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.commitTranAsync(AbstractBDBMessageStore.java:1364)
 at 
org.apache.qpid.server.txn.LocalTransaction.commitAsync(LocalTransaction.java:399)
 at org.apache.qpid.server.protocol.v0_8.AMQChannel.commit(AMQChannel.java:1220)
 at 
org.apache.qpid.server.protocol.v0_8.AMQChannel.receiveTxCommit(AMQChannel.java:3622)
 at org.apache.qpid.codec.ServerDecoder.processMethod(ServerDecoder.java:228)
 at org.apache.qpid.codec.AMQDecoder.processFrame(AMQDecoder.java:191)
 at 
org.apache.qpid.server.protocol.v0_8.BrokerDecoder.doProcessFrame(BrokerDecoder.java:114)
 at 
org.apache.qpid.server.protocol.v0_8.BrokerDecoder.access$000(BrokerDecoder.java:36)
 at 
org.apache.qpid.server.protocol.v0_8.BrokerDecoder$1.run(BrokerDecoder.java:78)
 at 
org.apache.qpid.server.protocol.v0_8.BrokerDecoder$1.run(BrokerDecoder.java:74)
 at java.security.AccessController.doPrivileged(Native Method)
 at 
org.apache.qpid.server.protocol.v0_8.BrokerDecoder.processFrame(BrokerDecoder.java:73)
 at org.apache.qpid.codec.AMQDecoder.processInput(AMQDecoder.java:173)
 at org.apache.qpid.codec.AMQDecoder.decode(AMQDecoder.java:114)
 at org.apache.qpid.codec.ServerDecoder.decodeBuffer(ServerDecoder.java:43)
 at 
org.apache.qpid.server.protocol.v0_8.AMQPConnection_0_8$1.run(AMQPConnection_0_8.java:266)
 at 
org.apache.qpid.server.protocol.v0_8.AMQPConnection_0_8$1.run(AMQPConnection_0_8.java:258)
 at java.security.AccessController.doPrivileged(Native Method)
 at 
org.apache.qpid.server.protocol.v0_8.AMQPConnection_0_8.received(AMQPConnection_0_8.java:257)
 at 
org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:142)
 at 
org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:547)
 at 
org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
 at 
org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:446)
 at 
org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:253)
 at 
org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:108)
 at 
org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:499)
 at 
org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:337)
 at 
org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:86)
 at org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:457)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
{noformat}

It seems that majority was lost in the middle of transaction. Qpid HA logic 
detected that and restarted the environment, however, the transaction was not 
aborted.

On closing of the environment the following has been reported:
{noformat}
Problem: 3 locks left
---- LSN: 0x0/0x2d80----
 ThinLockAddr:1794979774 Owner:1919034342 
-50_Selector-virtualhost-test-iopool_MasterTxn Waiters: (none)
---- LSN: 0x0/0x1ed8----
 ThinLockAddr:374788562 Owner:1919034342 
-50_Selector-virtualhost-test-iopool_MasterTxn Waiters: (none)
---- LSN: 0x0/0x20db----
 ThinLockAddr:501824883 Owner:1919034342 
-50_Selector-virtualhost-test-iopool_MasterTxn Waiters: (none)
[BROKER-1] o.a.q.t.u.SpawnedBrokerHolder
Problem: 1 txns left
txnId = -50 txnName = 1919034342 -50_Selector-virtualhost-test-iopool_MasterTxn
Local Cache Usage = 240
Cache Layout: Allocation of resources in the cache.
 adminBytes=0
 cacheTotalBytes=1,622,051
 dataAdminBytes=0
 dataBytes=0
 lockBytes=240
 sharedCacheTotalBytes=0
WARN  [Environment-test:nodetestTransferMasterWhilstTransactionsInFlight10001] 
o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Ignoring an exception whilst closing 
environment
com.sleepycat.je.EnvironmentFailureException: (JE 5.0.104) 
nodetestTransferMasterWhilstTransactionsInFlight10001(1):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testTransferMasterWhilstTransactionsInFlight-1-8967653656689695718/test/config
 Problem closing handle 
nodetestTransferMasterWhilstTransactionsInFlight10001(1) UNEXPECTED_EXCEPTION: 
Unexpected internal Exception, may have side effects. Environment is invalid 
and must be closed.
        at 
com.sleepycat.je.rep.ReplicatedEnvironment.close(ReplicatedEnvironment.java:709)
 ~[je-5.0.104.jar:5.0.104]
        at 
org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.closeEnvironmentOnRestart(ReplicatedEnvironmentFacade.java:1277)
 [qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
        at 
org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.restartEnvironment(ReplicatedEnvironmentFacade.java:1253)
 [qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
        at 
org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.access$500(ReplicatedEnvironmentFacade.java:92)
 [qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
        at 
org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$3.run(ReplicatedEnvironmentFacade.java:481)
 [qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_80]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_80]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
Caused by: java.lang.IllegalStateException: Unclosed Database: MESSAGE_CONTENT
Unclosed Database: MESSAGE_CONTENT
Databases left open: 2
There is 1 existing transaction opened against the Environment.
Aborting open transactions ...
aborting <Transaction id="-46">
        at com.sleepycat.je.Environment.close(Environment.java:390) 
~[je-5.0.104.jar:5.0.104]
        at 
com.sleepycat.je.rep.ReplicatedEnvironment.close(ReplicatedEnvironment.java:702)
 ~[je-5.0.104.jar:5.0.104]
        ... 7 common frames omitted
{noformat}

It appears we need to abort all transactions whilst restarting the environment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to