[ https://issues.apache.org/jira/browse/QPID-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210495#comment-17210495 ]
ASF subversion and git services commented on QPID-8470: ------------------------------------------------------- Commit 1b1989118c14431165f70ca70cc9829a03715dca in qpid-broker-j's branch refs/heads/7.1.x from aw924 [ https://gitbox.apache.org/repos/asf?p=qpid-broker-j.git;h=1b19891 ] QPID-8470: Added retrying logic for storing metadata > Deadlock / latch timeout in BDB database > ----------------------------------------- > > Key: QPID-8470 > URL: https://issues.apache.org/jira/browse/QPID-8470 > Project: Qpid > Issue Type: Bug > Components: Broker-J > Affects Versions: qpid-java-broker-8.0.1 > Reporter: Daniil Kirilyuk > Assignee: Alex Rudyy > Priority: Minor > Attachments: JE-thread-dump.txt > > > There was deadlock (or [latch > timeout|[https://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/EnvironmentConfig.html#ENV_LATCH_TIMEOUT]) > in BDB database or latch was hold for more then timeout. Then was database > environment closed, database transactions cancelled and broker initiated > shutdown. > {noformat} > 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55550] (c.s.j.d.EnvironmentImpl) > - default Thread dump follows for latch timeout: BIN2605191 currentThread: > Thread[IO-/127.0.0.1:55550,5,main] currentTime: 1597790561955 exclusiveOwner: > Thread[IO-/127.0.0.1:55522,5,main] > 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55580] (c.s.j.d.EnvironmentImpl) > - default Thread dump follows for latch timeout: BIN2605060 currentThread: > Thread[IO-/127.0.0.1:55580,5,main] currentTime: 1597790561955 exclusiveOwner: > Thread[IO-/127.0.0.1:55584,5,main] > 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55544] (c.s.j.d.EnvironmentImpl) > - default Thread dump follows for latch timeout: BIN2605128 currentThread: > Thread[IO-/127.0.0.1:55544,5,main] currentTime: 1597790561955 exclusiveOwner: > Thread[IO-/127.0.0.1:55548,5,main] > .... > 2020-08-19 00:42:42,174 ERROR [IO-/127.0.0.1:55538] > (o.a.q.s.u.ServerScopedRuntimeException) - Error reading message metadata for > message with id 52671357: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages > Latch timeout. BIN2605191 currentThread: Thread[IO-/127.0.0.1:55538,5,main] > currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55522,5,main] > UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. > Environment is invalid and must be closed. > org.apache.qpid.server.logging.logback.Logback1027WorkaroundTurboFilter$StringifiedException: > com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) > /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605191 currentThread: > Thread[IO-/127.0.0.1:55538,5,main] currentTime: 1597790561955 exclusiveOwner: > Thread[IO-/127.0.0.1:55522,5,main] UNEXPECTED_STATE_FATAL: Unexpected > internal state, unable to continue. Environment is invalid and must be closed. > at > com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:443) > at > com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:211) > .... > at > org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496) > at > org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270) > at > org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134) > at > org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575) > at > org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366) > at > org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97) > ... 5 more > Suppressed: com.sleepycat.je.EnvironmentFailureException: (JE > 7.4.5) Environment must be closed, caused by: > com.sleepycat.je.EnvironmentFailureException: Environment invalid because of > previous exception: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch > timeout. BIN2605176 currentThread: Thread[IO-/127.0.0.1:55496,5,main] > currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55494,5,main] > UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. > Environment is invalid and must be closed. > at > com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228) > at > com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766) > at > com.sleepycat.je.dbi.CursorImpl.checkEnv(CursorImpl.java:3869) > at com.sleepycat.je.Cursor.checkEnv(Cursor.java:5394) > at com.sleepycat.je.Cursor.close(Cursor.java:540) > at com.sleepycat.je.Database.get(Database.java:1344) > ... 41 more > [CIRCULAR REFERENCE:com.sleepycat.je.EnvironmentFailureException: > Environment invalid because of previous exception: (JE 7.4.5) > /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605176 currentThread: > Thread[IO-/127.0.0.1:55496,5,main] currentTime: 1597790561955 exclusiveOwner: > Thread[IO-/127.0.0.1:55494,5,main] UNEXPECTED_STATE_FATAL: Unexpected > internal state, unable to continue. Environment is invalid and must be > closed.] > End of stringified Stacktrace > ... > 2020-08-19 00:42:42,201 ERROR [IO-/127.0.0.1:55498] (o.a.q.s.Main) - Uncaught > exception, shutting down. > org.apache.qpid.server.logging.logback.Logback1027WorkaroundTurboFilter$StringifiedException: > org.apache.qpid.server.util.ServerScopedRuntimeException: > org.apache.qpid.server.store.StoreException: Error reading message metadata > for message with id 52671021: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages > Latch timeout. BIN2605177 currentThread: Thread[IO-/127.0.0.1:55498,5,main] > currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55554,5,main] > UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. > Environment is invalid and must be closed. > at > org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:582) > at java.security.AccessController.doPrivileged(Native Method) > {noformat} > BDB thread dump attached. > Altogether there were 16 threads handling BDB latch timeout (in stacktrace > there was a call of com.sleepycat.je.latch.LatchSupport.handleTimeout() > method). > 2 of them – when sending messages: > {noformat} > java.lang.Thread.dumpThreads(Native Method) > java.lang.Thread.getAllStackTraces(Thread.java:1610) > com.sleepycat.je.utilint.LoggerUtils.fullThreadDump(LoggerUtils.java:566) > com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:209) > com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:63) > com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107) > com.sleepycat.je.tree.IN.latchShared(IN.java:560) > com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372) > com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1817) > com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1748) > com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1666) > com.sleepycat.je.tree.Tree.findBinForInsert(Tree.java:1637) > com.sleepycat.je.dbi.CursorImpl.insertRecordInternal(CursorImpl.java:1337) > com.sleepycat.je.dbi.CursorImpl.insertOrUpdateRecord(CursorImpl.java:1221) > com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2962) > com.sleepycat.je.Cursor.putNotify(Cursor.java:2800) > com.sleepycat.je.Cursor.putNoDups(Cursor.java:2647) > com.sleepycat.je.Cursor.putInternal(Cursor.java:2478) > com.sleepycat.je.Cursor.putInternal(Cursor.java:830) > com.sleepycat.je.Database.put(Database.java:1574) > com.sleepycat.je.Database.put(Database.java:1627) > org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.storeMetaData(AbstractBDBMessageStore.java:591) > org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.access$600(AbstractBDBMessageStore.java:77) > org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.store(AbstractBDBMessageStore.java:1165) > org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction$1.run(AbstractBDBMessageStore.java:1344) > org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.doPreCommitActions(AbstractBDBMessageStore.java:1378) > org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.commitTranAsync(AbstractBDBMessageStore.java:1400) > org.apache.qpid.server.txn.AsyncAutoCommitTransaction.enqueue(AsyncAutoCommitTransaction.java:294) > org.apache.qpid.server.message.RoutingResult.send(RoutingResult.java:124) > org.apache.qpid.server.protocol.v1_0.NodeReceivingDestination.send(NodeReceivingDestination.java:117) > org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveDelivery(StandardReceivingLinkEndpoint.java:280) > org.apache.qpid.server.protocol.v1_0.AbstractReceivingLinkEndpoint.receiveTransfer(AbstractReceivingLinkEndpoint.java:165) > org.apache.qpid.server.protocol.v1_0.Session_1_0.receiveTransfer(Session_1_0.java:637) > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receiveTransfer(AMQPConnection_1_0Impl.java:792) > org.apache.qpid.server.protocol.v1_0.type.transport.Transfer.invoke(Transfer.java:295) > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.received(AMQPConnection_1_0Impl.java:515) > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receive$0(AMQPConnection_1_0Impl.java:467) > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$146/229440539.run(Unknown > Source) > java.security.AccessController.doPrivileged(Native Method) > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receive(AMQPConnection_1_0Impl.java:461) > org.apache.qpid.server.protocol.v1_0.framing.FrameHandler.parse(FrameHandler.java:224) > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1312) > org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576) > org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$74/2141950108.run(Unknown > Source) > java.security.AccessController.doPrivileged(Native Method) > org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571) > org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135) > org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611) > org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58) > org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496) > org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270) > org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134) > org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575) > org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366) > org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97) > org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464) > org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1136768342.run(Unknown > Source) > java.lang.Thread.run(Thread.java:748) > {noformat} > 14 of them – when receiving messages: > {noformat} > com.sleepycat.je.utilint.LoggerUtils.fullThreadDump(LoggerUtils.java:557) > com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:209) > com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:63) > com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107) > com.sleepycat.je.tree.IN.latchShared(IN.java:560) > com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372) > com.sleepycat.je.tree.Tree.search(Tree.java:2275) > com.sleepycat.je.tree.Tree.search(Tree.java:2172) > com.sleepycat.je.dbi.CursorImpl.searchExact(CursorImpl.java:1951) > com.sleepycat.je.Cursor.searchExact(Cursor.java:4195) > com.sleepycat.je.Cursor.searchNoDups(Cursor.java:4056) > com.sleepycat.je.Cursor.search(Cursor.java:3858) > com.sleepycat.je.Cursor.getInternal(Cursor.java:1282) > com.sleepycat.je.Database.get(Database.java:1342) > com.sleepycat.je.Database.get(Database.java:1401) > org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.getMessageMetaData(AbstractBDBMessageStore.java:299) > org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.getMetaData(AbstractBDBMessageStore.java:1077) > org.apache.qpid.server.message.AbstractServerMessageImpl.isPersistent(AbstractServerMessageImpl.java:178) > org.apache.qpid.server.queue.QueueEntryImpl.populateInstanceProperties(QueueEntryImpl.java:155) > org.apache.qpid.server.queue.QueueEntryImpl.<init>(QueueEntryImpl.java:147) > org.apache.qpid.server.queue.OrderedQueueEntry.<init>(OrderedQueueEntry.java:46) > org.apache.qpid.server.queue.StandardQueueEntry.<init>(StandardQueueEntry.java:37) > org.apache.qpid.server.queue.StandardQueueEntryList.createQueueEntry(StandardQueueEntryList.java:48) > org.apache.qpid.server.queue.StandardQueueEntryList.createQueueEntry(StandardQueueEntryList.java:26) > org.apache.qpid.server.queue.OrderedQueueEntryList.add(OrderedQueueEntryList.java:92) > org.apache.qpid.server.queue.AbstractQueue.doEnqueue(AbstractQueue.java:1356) > org.apache.qpid.server.queue.AbstractQueue.enqueue(AbstractQueue.java:1289) > org.apache.qpid.server.message.RoutingResult$1.postCommit(RoutingResult.java:135) > org.apache.qpid.server.txn.AsyncAutoCommitTransaction$3.postCommit(AsyncAutoCommitTransaction.java:309) > org.apache.qpid.server.txn.AsyncCommand.complete(AsyncCommand.java:84) > org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveComplete(StandardReceivingLinkEndpoint.java:588) > org.apache.qpid.server.protocol.v1_0.Session_1_0.lambda$receivedComplete$5(Session_1_0.java:1358) > org.apache.qpid.server.protocol.v1_0.Session_1_0$$Lambda$147/1374598514.accept(Unknown > Source) > java.lang.Iterable.forEach(Iterable.java:75) > org.apache.qpid.server.protocol.v1_0.Session_1_0.receivedComplete(Session_1_0.java:1358) > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receivedComplete$11(AMQPConnection_1_0Impl.java:1347) > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$152/950450513.run(Unknown > Source) > java.security.AccessController.doPrivileged(Native Method) > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receivedComplete(AMQPConnection_1_0Impl.java:1345) > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1326) > org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576) > org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$74/2141950108.run(Unknown > Source) > java.security.AccessController.doPrivileged(Native Method) > org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571) > org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135) > org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611) > org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58) > org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496) > org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270) > org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134) > org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575) > org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366) > org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97) > org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464) > org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1136768342.run(Unknown > Source) > {noformat} > Before the thread dump in log were present massive “QUE-1005 : Dropped” > messages (ca. 73.000) and many error messages “Connection closed with error : > amqp:resource-limit-exceeded - local-idle-timeout expired” (191). > I’m not sure, that AbstractServerMessageImpl.isPersistent() call can cause a > deadlock, because get() operation is invoked with LockMode.READ_UNCOMMITTED > parameter. According to BDB documentation READ_UNCOMMITTED mode should be > non-blocking. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org