Alex Rudyy created QPID-7078:
--------------------------------
Summary: [Java Broker,HA] BDB HA VHN in master role designated as
primary can sporadically tramsit into unknown role after loosing second replica
node
Key: QPID-7078
URL: https://issues.apache.org/jira/browse/QPID-7078
Project: Qpid
Issue Type: Bug
Components: Java Broker
Affects Versions: qpid-java-6.0, 0.32, qpid-java-6.0.1, qpid-java-6.1
Reporter: Alex Rudyy
Attachments:
TEST-org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest.testDesignatedPrimaryContinuesAfterSecondaryStopped.txt
Failure of test TwoNodeTest#testDesignatedPrimaryContinuesAfterSecondaryStopped
reviled an unexpected behavior of BDB JE when master node designated as
primary suddenly transits into UNKNOWN role after shutting down of second
replica node.
The test failed as below:
{noformat}
testDesignatedPrimaryContinuesAfterSecondaryStopped(org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest)
Time elapsed: 7.236 sec <<< ERROR!
javax.jms.JMSException: Error registering consumer:
org.apache.qpid.QpidException: Fail-over exception interrupted basic consume.
at
org.apache.qpid.client.AMQSession.registerConsumer(AMQSession.java:3093)
at org.apache.qpid.client.AMQSession.access$400(AMQSession.java:94)
at org.apache.qpid.client.AMQSession$5.execute(AMQSession.java:2094)
at org.apache.qpid.client.AMQSession$5.execute(AMQSession.java:2069)
at
org.apache.qpid.client.AMQConnectionDelegate_8_0.executeRetrySupport(AMQConnectionDelegate_8_0.java:416)
at
org.apache.qpid.client.AMQConnection.executeRetrySupport(AMQConnection.java:737)
at
org.apache.qpid.client.failover.FailoverRetrySupport.execute(FailoverRetrySupport.java:90)
at
org.apache.qpid.client.AMQSession.createConsumerImpl(AMQSession.java:2067)
at org.apache.qpid.client.AMQSession.createConsumer(AMQSession.java:989)
at
org.apache.qpid.client.AMQConnection.retrieveVirtualHostPropertiesIfNecessary(AMQConnection.java:809)
at
org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:796)
at
org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:771)
at
org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:765)
at
org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:88)
at
org.apache.qpid.test.utils.QpidBrokerTestCase.assertProducingConsuming(QpidBrokerTestCase.java:1256)
at
org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest.testDesignatedPrimaryContinuesAfterSecondaryStopped(TwoNodeTest.java:108)
Caused by: org.apache.qpid.client.failover.FailoverException: Failing over
about to start
at
org.apache.qpid.client.AMQProtocolHandler.notifyFailoverStarting(AMQProtocolHandler.java:434)
at
org.apache.qpid.client.AMQProtocolHandler$1.run(AMQProtocolHandler.java:287)
at java.lang.Thread.run(Thread.java:745)
{noformat}
On broker side a transition into UNKNOWN state occurred as below:
{noformat}
10:15:44,279 B-10000 DEBUG
[Group-Change-Learner:test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001]
o.a.q.s.s.b.r.DatabasePinger Ping transaction completed
10:15:44,279 B-10000 DEBUG [IO-/127.0.0.1:58662] o.a.q.s.p.v.BrokerDecoder
Frame handled in 1344 ms.
10:15:44,279 B-10000 INFO [MASTER
nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001(1)]
o.a.q.s.s.b.r.ReplicatedEnvironmentFacade The node
'test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001' state is
UNKNOWN
10:15:44,279 B-10000 DEBUG
[StateChange-test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001]
o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Received BDB event, new BDB state
UNKNOWN Facade state : OPEN
10:15:44,279 B-10000 INFO
[StateChange-test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001]
o.a.q.s.v.b.BDBHAVirtualHostNodeImpl Received BDB event indicating transition
from state MASTER to UNKNOWN for
nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001
10:15:44,280 B-10000 DEBUG
[VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
o.a.q.s.c.u.TaskExecutorImpl Performing Task['close' on 'BDBHAVirtualHostImpl
[id=3e9eac0d-ff2e-4469-a7ed-aded200c0881, name=test]']
10:15:44,281 B-10000 DEBUG
[VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
o.a.q.s.m.AbstractConfiguredObject Closing BDBHAVirtualHostImpl : test
2016-02-17 10:15:44,281 B-10000 DEBUG
[VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
o.a.q.s.v.AbstractVirtualHost Closing connection registry :1 connections.
10:15:44,282 B-10000 DEBUG
[VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
o.a.q.s.c.u.TaskExecutorImpl Task['close' on 'BDBHAVirtualHostImpl
[id=3e9eac0d-ff2e-4469-a7ed-aded200c0881, name=test]'] performed successfully
with result: null
10:15:44,283 B-10000 DEBUG [Broker-Config] o.a.q.s.c.u.TaskExecutorImpl
Performing Task['close' on '/127.0.0.1:58662(guest)']
10:15:44,284 B-10000 DEBUG [Broker-Config] o.a.q.s.m.AbstractConfiguredObject
Closing AMQPConnection_0_8 : [1] 127.0.0.1:58662
10:15:44,284 B-10000 DEBUG [Broker-Config] o.a.q.s.c.u.TaskExecutorImpl
Task['close' on '/127.0.0.1:58662(guest)'] performed successfully with result:
null
{noformat}
The transition into UNKNOWN state should not happen as MASTER node is
designated as primary. The exhibit behavior indicates about BDB JE bug.
It is unclear whether JE Environment can recover from this unexpected flip into
UNKNOWN state. If JE can recover, then on next transition into MASTER VHN
should recover VH and connected applications can continue as usual. If JE can
not recover, then BDB HA VHN will not recover automatically from this
conditions, as we do not restart the environment on MasterUnknownException. The
operator intervention would be required to restart BDB HA VHN.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]