[ https://issues.apache.org/jira/browse/QPID-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942681#comment-16942681 ]
ASF subversion and git services commented on QPID-8366: ------------------------------------------------------- Commit c3d0590b7687c19958da8ef963531104f801b904 in qpid-broker-j's branch refs/heads/7.1.x from Alex Rudyy [ https://gitbox.apache.org/repos/asf?p=qpid-broker-j.git;h=c3d0590 ] QPID-8366: [Broker-J] Handle ConnectionScopeRuntimeException on execution of HouseKeepingTaks (cherry picked from commit 98261ad92020c11784a3be2ab890cbabddec5fbc) > [Broker-J] The loss of BDB HA majority on invocation of house keeping > operations can crash the broker > ----------------------------------------------------------------------------------------------------- > > Key: QPID-8366 > URL: https://issues.apache.org/jira/browse/QPID-8366 > Project: Qpid > Issue Type: Bug > Components: Broker-J > Affects Versions: qpid-java-broker-7.1.0, qpid-java-broker-7.0.4, > qpid-java-broker-7.0.5, qpid-java-broker-7.0.6, qpid-java-broker-7.0.7, > qpid-java-broker-7.1.1, qpid-java-broker-7.1.2, qpid-java-broker-7.0.8, > qpid-java-broker-7.1.3, qpid-java-broker-7.1.4 > Reporter: Alex Rudyy > Assignee: Alex Rudyy > Priority: Major > Fix For: qpid-java-broker-8.0.0, qpid-java-broker-7.0.9, > qpid-java-broker-7.1.5 > > > The {{ConnectionScopedRuntimeException}} thrown from {{VirtualHost}} {{House > Keeping}} thread on invocation of {{MessageStore}} operations like > {{checkMessageStatus}} can crash the broker. An example of such exception > stack trace (from Qpid Broker version 7.0.6) is provided below: > {noformat} > 2019-09-27 07:53:38,168 ERROR [virtualhost-test-pool-1] (o.a.q.s.Main) - > Uncaught exception, shutting down. > org.apache.qpid.server.util.ConnectionScopedRuntimeException: Required number > of nodes not reachable > at > org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.handleDatabaseException(ReplicatedEnvironmentFacade.java:495) > at > org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.commit(ReplicatedEnvironmentFacade.java:332) > at > org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.removeMessage(AbstractBDBMessageStore.java:288) > at > org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.remove(AbstractBDBMessageStore.java:1090) > at > org.apache.qpid.server.message.AbstractServerMessageImpl.decrementReference(AbstractServerMessageImpl.java:118) > at > org.apache.qpid.server.message.AbstractServerMessageImpl.access$500(AbstractServerMessageImpl.java:37) > at > org.apache.qpid.server.message.AbstractServerMessageImpl$Reference.release(AbstractServerMessageImpl.java:309) > at > org.apache.qpid.server.queue.QueueEntryImpl.dispose(QueueEntryImpl.java:557) > at > org.apache.qpid.server.queue.QueueEntryImpl.delete(QueueEntryImpl.java:572) > at > org.apache.qpid.server.queue.AbstractQueue$11.postCommit(AbstractQueue.java:1729) > at > org.apache.qpid.server.txn.AutoCommitTransaction.dequeue(AutoCommitTransaction.java:92) > at > org.apache.qpid.server.queue.AbstractQueue.dequeueEntry(AbstractQueue.java:1722) > at > org.apache.qpid.server.queue.AbstractQueue.dequeueEntry(AbstractQueue.java:1717) > at > org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1761) > at > org.apache.qpid.server.queue.AbstractQueue.checkMessageStatus(AbstractQueue.java:2165) > at > org.apache.qpid.server.virtualhost.AbstractVirtualHost$VirtualHostHouseKeepingTask.execute(AbstractVirtualHost.java:1965) > at > org.apache.qpid.server.virtualhost.HouseKeepingTask$1.run(HouseKeepingTask.java:56) > at java.security.AccessController.doPrivileged(Native Method) > at > org.apache.qpid.server.virtualhost.HouseKeepingTask.run(HouseKeepingTask.java:51) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464) > at java.lang.Thread.run(Thread.java:748) > Caused by: com.sleepycat.je.rep.InsufficientAcksException: (JE 7.4.5) > Transaction: -3459038252 VLSN: 10,380,435,448, initiated at: 07:53:20. > Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 2. Missing > replica acks: 2. Timeout: 15000ms. FeederState=acc3_2(3)[MASTER] > Current feeds: > acc3_1: feederVLSN=10,380,435,456 replicaTxnEndVLSN=10,380,435,396 > acc3: feederVLSN=10,380,435,456 replicaTxnEndVLSN=10,380,435,396 > at > com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) > at > com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:189) > at > com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1426) > at > com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1385) > at > com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:228) > at com.sleepycat.je.txn.Txn.commit(Txn.java:772) > at com.sleepycat.je.Transaction.doCommit(Transaction.java:621) > at com.sleepycat.je.Transaction.commit(Transaction.java:401) > at > org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.commit(ReplicatedEnvironmentFacade.java:328) > ... 25 common frames omitted > {noformat} > The issue reported with the stack trace above occurred when BDB HA > {{VirtualHost}} was trying to delete an expired message, but its BDB HA group > lost the majority when the {{VirtualHost}} tried to commit a BDB HA > transaction for message deletion operation. The majority loss is communicated > as {{ConnectionScopeRuntimeException}} to the caller. It seems we need to > catch and handle {{ConnectionScopeRuntimeException}} in House Keeping > operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org