Alex Rudyy created QPID-7078:
--------------------------------

             Summary: [Java Broker,HA] BDB HA VHN in master role designated as 
primary can sporadically tramsit into unknown role after loosing second replica 
node
                 Key: QPID-7078
                 URL: https://issues.apache.org/jira/browse/QPID-7078
             Project: Qpid
          Issue Type: Bug
          Components: Java Broker
    Affects Versions: qpid-java-6.0, 0.32, qpid-java-6.0.1, qpid-java-6.1
            Reporter: Alex Rudyy
         Attachments: 
TEST-org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest.testDesignatedPrimaryContinuesAfterSecondaryStopped.txt

Failure of test TwoNodeTest#testDesignatedPrimaryContinuesAfterSecondaryStopped 
reviled an unexpected behavior of  BDB JE when master node designated as 
primary suddenly transits into UNKNOWN role after shutting down of second 
replica node.

The test failed as below:
{noformat}
testDesignatedPrimaryContinuesAfterSecondaryStopped(org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest)
  Time elapsed: 7.236 sec  <<< ERROR!
javax.jms.JMSException: Error registering consumer: 
org.apache.qpid.QpidException: Fail-over exception interrupted basic consume.
        at 
org.apache.qpid.client.AMQSession.registerConsumer(AMQSession.java:3093)
        at org.apache.qpid.client.AMQSession.access$400(AMQSession.java:94)
        at org.apache.qpid.client.AMQSession$5.execute(AMQSession.java:2094)
        at org.apache.qpid.client.AMQSession$5.execute(AMQSession.java:2069)
        at 
org.apache.qpid.client.AMQConnectionDelegate_8_0.executeRetrySupport(AMQConnectionDelegate_8_0.java:416)
        at 
org.apache.qpid.client.AMQConnection.executeRetrySupport(AMQConnection.java:737)
        at 
org.apache.qpid.client.failover.FailoverRetrySupport.execute(FailoverRetrySupport.java:90)
        at 
org.apache.qpid.client.AMQSession.createConsumerImpl(AMQSession.java:2067)
        at org.apache.qpid.client.AMQSession.createConsumer(AMQSession.java:989)
        at 
org.apache.qpid.client.AMQConnection.retrieveVirtualHostPropertiesIfNecessary(AMQConnection.java:809)
        at 
org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:796)
        at 
org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:771)
        at 
org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:765)
        at 
org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:88)
        at 
org.apache.qpid.test.utils.QpidBrokerTestCase.assertProducingConsuming(QpidBrokerTestCase.java:1256)
        at 
org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest.testDesignatedPrimaryContinuesAfterSecondaryStopped(TwoNodeTest.java:108)
Caused by: org.apache.qpid.client.failover.FailoverException: Failing over 
about to start
        at 
org.apache.qpid.client.AMQProtocolHandler.notifyFailoverStarting(AMQProtocolHandler.java:434)
        at 
org.apache.qpid.client.AMQProtocolHandler$1.run(AMQProtocolHandler.java:287)
        at java.lang.Thread.run(Thread.java:745)
{noformat}

On broker side a transition into UNKNOWN state occurred as below:
{noformat}
10:15:44,279 B-10000 DEBUG 
[Group-Change-Learner:test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001]
 o.a.q.s.s.b.r.DatabasePinger Ping transaction completed
10:15:44,279 B-10000 DEBUG [IO-/127.0.0.1:58662] o.a.q.s.p.v.BrokerDecoder 
Frame handled in 1344 ms.
10:15:44,279 B-10000 INFO  [MASTER 
nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001(1)] 
o.a.q.s.s.b.r.ReplicatedEnvironmentFacade The node 
'test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001' state is 
UNKNOWN
10:15:44,279 B-10000 DEBUG 
[StateChange-test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001] 
o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Received BDB event, new BDB state 
UNKNOWN Facade state : OPEN
10:15:44,279 B-10000 INFO  
[StateChange-test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001] 
o.a.q.s.v.b.BDBHAVirtualHostNodeImpl Received BDB event indicating transition 
from state MASTER to UNKNOWN for 
nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001
10:15:44,280 B-10000 DEBUG 
[VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
 o.a.q.s.c.u.TaskExecutorImpl Performing Task['close' on 'BDBHAVirtualHostImpl 
[id=3e9eac0d-ff2e-4469-a7ed-aded200c0881, name=test]']
10:15:44,281 B-10000 DEBUG 
[VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
 o.a.q.s.m.AbstractConfiguredObject Closing BDBHAVirtualHostImpl : test
2016-02-17 10:15:44,281 B-10000 DEBUG 
[VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
 o.a.q.s.v.AbstractVirtualHost Closing connection registry :1 connections.
10:15:44,282 B-10000 DEBUG 
[VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
 o.a.q.s.c.u.TaskExecutorImpl Task['close' on 'BDBHAVirtualHostImpl 
[id=3e9eac0d-ff2e-4469-a7ed-aded200c0881, name=test]'] performed successfully 
with result: null
10:15:44,283 B-10000 DEBUG [Broker-Config] o.a.q.s.c.u.TaskExecutorImpl 
Performing Task['close' on '/127.0.0.1:58662(guest)']
10:15:44,284 B-10000 DEBUG [Broker-Config] o.a.q.s.m.AbstractConfiguredObject 
Closing AMQPConnection_0_8 : [1] 127.0.0.1:58662
10:15:44,284 B-10000 DEBUG [Broker-Config] o.a.q.s.c.u.TaskExecutorImpl 
Task['close' on '/127.0.0.1:58662(guest)'] performed successfully with result: 
null
{noformat}

The transition into UNKNOWN state should not happen as MASTER node is 
designated as primary. The exhibit behavior indicates about BDB JE bug.

It is unclear whether JE Environment can recover from this unexpected flip into 
UNKNOWN state. If JE can recover, then on next transition into MASTER VHN 
should recover VH and connected applications can continue as usual. If JE can 
not recover, then BDB HA VHN will not recover automatically from this 
conditions, as we do not restart the environment on MasterUnknownException. The 
operator intervention would be required to restart BDB HA VHN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to