[ 
https://issues.apache.org/jira/browse/QPID-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208612#comment-17208612
 ] 

Alex Rudyy commented on QPID-8470:
----------------------------------

Hi [~daniel.kirilyuk],
Thanks for reporting the issue. I merged you PR into the master branch. Though, 
I would like to request more details on reproduction of the issue.
It seems the issue is reported for a first time. At least, I cannot remember 
any  message store failures manifesting in similar way.

Based on the thread dump, It looks like you are running some sort of 
performance tests. How often the issue happens for you ? Is it possible to 
reproduce it? If yes, could you please share the details how to reproduce it?

I would like to understand what exactly causes the latch timeout. 

Thanks,
Alex

> Deadlock  / latch timeout in BDB database
> -----------------------------------------
>
>                 Key: QPID-8470
>                 URL: https://issues.apache.org/jira/browse/QPID-8470
>             Project: Qpid
>          Issue Type: Bug
>          Components: Broker-J
>    Affects Versions: qpid-java-broker-8.0.1
>            Reporter: Daniil Kirilyuk
>            Priority: Minor
>         Attachments: JE-thread-dump.txt
>
>
> There was deadlock (or [latch 
> timeout|[https://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/EnvironmentConfig.html#ENV_LATCH_TIMEOUT])
>  in BDB database or latch was hold for more then timeout. Then was database 
> environment closed, database transactions cancelled and broker initiated 
> shutdown.
> {noformat}
> 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55550] (c.s.j.d.EnvironmentImpl) 
> - default Thread dump follows for latch timeout: BIN2605191 currentThread: 
> Thread[IO-/127.0.0.1:55550,5,main] currentTime: 1597790561955 exclusiveOwner: 
> Thread[IO-/127.0.0.1:55522,5,main]
> 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55580] (c.s.j.d.EnvironmentImpl) 
> - default Thread dump follows for latch timeout: BIN2605060 currentThread: 
> Thread[IO-/127.0.0.1:55580,5,main] currentTime: 1597790561955 exclusiveOwner: 
> Thread[IO-/127.0.0.1:55584,5,main]
> 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55544] (c.s.j.d.EnvironmentImpl) 
> - default Thread dump follows for latch timeout: BIN2605128 currentThread: 
> Thread[IO-/127.0.0.1:55544,5,main] currentTime: 1597790561955 exclusiveOwner: 
> Thread[IO-/127.0.0.1:55548,5,main]
> ....
> 2020-08-19 00:42:42,174 ERROR [IO-/127.0.0.1:55538] 
> (o.a.q.s.u.ServerScopedRuntimeException) - Error reading message metadata for 
> message with id 52671357: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages 
> Latch timeout. BIN2605191 currentThread: Thread[IO-/127.0.0.1:55538,5,main] 
> currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55522,5,main] 
> UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. 
> Environment is invalid and must be closed.
> org.apache.qpid.server.logging.logback.Logback1027WorkaroundTurboFilter$StringifiedException:
>  com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) 
> /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605191 currentThread: 
> Thread[IO-/127.0.0.1:55538,5,main] currentTime: 1597790561955 exclusiveOwner: 
> Thread[IO-/127.0.0.1:55522,5,main] UNEXPECTED_STATE_FATAL: Unexpected 
> internal state, unable to continue. Environment is invalid and must be closed.
>         at 
> com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:443)
>         at 
> com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:211)
> ....
>                 at 
> org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
>                 at 
> org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
>                 at 
> org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
>                 at 
> org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
>                 at 
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
>                 at 
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
>                 ... 5 more
>                 Suppressed: com.sleepycat.je.EnvironmentFailureException: (JE 
> 7.4.5) Environment must be closed, caused by: 
> com.sleepycat.je.EnvironmentFailureException: Environment invalid because of 
> previous exception: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch 
> timeout. BIN2605176 currentThread: Thread[IO-/127.0.0.1:55496,5,main] 
> currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55494,5,main] 
> UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. 
> Environment is invalid and must be closed.
>                         at 
> com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)
>                         at 
> com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)
>                         at 
> com.sleepycat.je.dbi.CursorImpl.checkEnv(CursorImpl.java:3869)
>                         at com.sleepycat.je.Cursor.checkEnv(Cursor.java:5394)
>                         at com.sleepycat.je.Cursor.close(Cursor.java:540)
>                         at com.sleepycat.je.Database.get(Database.java:1344)
>                         ... 41 more
>         [CIRCULAR REFERENCE:com.sleepycat.je.EnvironmentFailureException: 
> Environment invalid because of previous exception: (JE 7.4.5) 
> /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605176 currentThread: 
> Thread[IO-/127.0.0.1:55496,5,main] currentTime: 1597790561955 exclusiveOwner: 
> Thread[IO-/127.0.0.1:55494,5,main] UNEXPECTED_STATE_FATAL: Unexpected 
> internal state, unable to continue. Environment is invalid and must be 
> closed.]
> End of stringified Stacktrace
> ...
> 2020-08-19 00:42:42,201 ERROR [IO-/127.0.0.1:55498] (o.a.q.s.Main) - Uncaught 
> exception, shutting down.
> org.apache.qpid.server.logging.logback.Logback1027WorkaroundTurboFilter$StringifiedException:
>  org.apache.qpid.server.util.ServerScopedRuntimeException: 
> org.apache.qpid.server.store.StoreException: Error reading message metadata 
> for message with id 52671021: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages 
> Latch timeout. BIN2605177 currentThread: Thread[IO-/127.0.0.1:55498,5,main] 
> currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55554,5,main] 
> UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. 
> Environment is invalid and must be closed.
>         at 
> org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:582)
>         at java.security.AccessController.doPrivileged(Native Method)
> {noformat}
>  BDB thread dump attached.
> Altogether there were 16 threads handling BDB latch timeout (in stacktrace 
> there was a call of com.sleepycat.je.latch.LatchSupport.handleTimeout() 
> method).
> 2 of them – when sending messages: 
> {noformat}
> java.lang.Thread.dumpThreads(Native Method)
> java.lang.Thread.getAllStackTraces(Thread.java:1610)
> com.sleepycat.je.utilint.LoggerUtils.fullThreadDump(LoggerUtils.java:566)
> com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:209)
> com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:63)
> com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)
> com.sleepycat.je.tree.IN.latchShared(IN.java:560)
> com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)
> com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1817)
> com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1748)
> com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1666)
> com.sleepycat.je.tree.Tree.findBinForInsert(Tree.java:1637)
> com.sleepycat.je.dbi.CursorImpl.insertRecordInternal(CursorImpl.java:1337)
> com.sleepycat.je.dbi.CursorImpl.insertOrUpdateRecord(CursorImpl.java:1221)
> com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2962)
> com.sleepycat.je.Cursor.putNotify(Cursor.java:2800)
> com.sleepycat.je.Cursor.putNoDups(Cursor.java:2647)
> com.sleepycat.je.Cursor.putInternal(Cursor.java:2478)
> com.sleepycat.je.Cursor.putInternal(Cursor.java:830)
> com.sleepycat.je.Database.put(Database.java:1574)
> com.sleepycat.je.Database.put(Database.java:1627)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.storeMetaData(AbstractBDBMessageStore.java:591)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.access$600(AbstractBDBMessageStore.java:77)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.store(AbstractBDBMessageStore.java:1165)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction$1.run(AbstractBDBMessageStore.java:1344)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.doPreCommitActions(AbstractBDBMessageStore.java:1378)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.commitTranAsync(AbstractBDBMessageStore.java:1400)
> org.apache.qpid.server.txn.AsyncAutoCommitTransaction.enqueue(AsyncAutoCommitTransaction.java:294)
> org.apache.qpid.server.message.RoutingResult.send(RoutingResult.java:124)
> org.apache.qpid.server.protocol.v1_0.NodeReceivingDestination.send(NodeReceivingDestination.java:117)
> org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveDelivery(StandardReceivingLinkEndpoint.java:280)
> org.apache.qpid.server.protocol.v1_0.AbstractReceivingLinkEndpoint.receiveTransfer(AbstractReceivingLinkEndpoint.java:165)
> org.apache.qpid.server.protocol.v1_0.Session_1_0.receiveTransfer(Session_1_0.java:637)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receiveTransfer(AMQPConnection_1_0Impl.java:792)
> org.apache.qpid.server.protocol.v1_0.type.transport.Transfer.invoke(Transfer.java:295)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.received(AMQPConnection_1_0Impl.java:515)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receive$0(AMQPConnection_1_0Impl.java:467)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$146/229440539.run(Unknown
>  Source)
> java.security.AccessController.doPrivileged(Native Method)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receive(AMQPConnection_1_0Impl.java:461)
> org.apache.qpid.server.protocol.v1_0.framing.FrameHandler.parse(FrameHandler.java:224)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1312)
> org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576)
> org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$74/2141950108.run(Unknown
>  Source)
> java.security.AccessController.doPrivileged(Native Method)
> org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571)
> org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135)
> org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611)
> org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
> org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
> org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
> org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
> org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
> org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1136768342.run(Unknown
>  Source)
> java.lang.Thread.run(Thread.java:748)
> {noformat}
> 14 of them – when receiving messages:
> {noformat}
> com.sleepycat.je.utilint.LoggerUtils.fullThreadDump(LoggerUtils.java:557)
> com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:209)
> com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:63)
> com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)
> com.sleepycat.je.tree.IN.latchShared(IN.java:560)
> com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)
> com.sleepycat.je.tree.Tree.search(Tree.java:2275)
> com.sleepycat.je.tree.Tree.search(Tree.java:2172)
> com.sleepycat.je.dbi.CursorImpl.searchExact(CursorImpl.java:1951)
> com.sleepycat.je.Cursor.searchExact(Cursor.java:4195)
> com.sleepycat.je.Cursor.searchNoDups(Cursor.java:4056)
> com.sleepycat.je.Cursor.search(Cursor.java:3858)
> com.sleepycat.je.Cursor.getInternal(Cursor.java:1282)
> com.sleepycat.je.Database.get(Database.java:1342)
> com.sleepycat.je.Database.get(Database.java:1401)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.getMessageMetaData(AbstractBDBMessageStore.java:299)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.getMetaData(AbstractBDBMessageStore.java:1077)
> org.apache.qpid.server.message.AbstractServerMessageImpl.isPersistent(AbstractServerMessageImpl.java:178)
> org.apache.qpid.server.queue.QueueEntryImpl.populateInstanceProperties(QueueEntryImpl.java:155)
> org.apache.qpid.server.queue.QueueEntryImpl.<init>(QueueEntryImpl.java:147)
> org.apache.qpid.server.queue.OrderedQueueEntry.<init>(OrderedQueueEntry.java:46)
> org.apache.qpid.server.queue.StandardQueueEntry.<init>(StandardQueueEntry.java:37)
> org.apache.qpid.server.queue.StandardQueueEntryList.createQueueEntry(StandardQueueEntryList.java:48)
> org.apache.qpid.server.queue.StandardQueueEntryList.createQueueEntry(StandardQueueEntryList.java:26)
> org.apache.qpid.server.queue.OrderedQueueEntryList.add(OrderedQueueEntryList.java:92)
> org.apache.qpid.server.queue.AbstractQueue.doEnqueue(AbstractQueue.java:1356)
> org.apache.qpid.server.queue.AbstractQueue.enqueue(AbstractQueue.java:1289)
> org.apache.qpid.server.message.RoutingResult$1.postCommit(RoutingResult.java:135)
> org.apache.qpid.server.txn.AsyncAutoCommitTransaction$3.postCommit(AsyncAutoCommitTransaction.java:309)
> org.apache.qpid.server.txn.AsyncCommand.complete(AsyncCommand.java:84)
> org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveComplete(StandardReceivingLinkEndpoint.java:588)
> org.apache.qpid.server.protocol.v1_0.Session_1_0.lambda$receivedComplete$5(Session_1_0.java:1358)
> org.apache.qpid.server.protocol.v1_0.Session_1_0$$Lambda$147/1374598514.accept(Unknown
>  Source)
> java.lang.Iterable.forEach(Iterable.java:75)
> org.apache.qpid.server.protocol.v1_0.Session_1_0.receivedComplete(Session_1_0.java:1358)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receivedComplete$11(AMQPConnection_1_0Impl.java:1347)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$152/950450513.run(Unknown
>  Source)
> java.security.AccessController.doPrivileged(Native Method)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receivedComplete(AMQPConnection_1_0Impl.java:1345)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1326)
> org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576)
> org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$74/2141950108.run(Unknown
>  Source)
> java.security.AccessController.doPrivileged(Native Method)
> org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571)
> org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135)
> org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611)
> org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
> org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
> org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
> org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
> org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
> org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1136768342.run(Unknown
>  Source)
> {noformat}
> Before the thread dump in log were present massive “QUE-1005 : Dropped” 
> messages (ca. 73.000) and many error messages “Connection closed with error : 
> amqp:resource-limit-exceeded - local-idle-timeout expired” (191).
> I’m not sure, that AbstractServerMessageImpl.isPersistent() call can cause a 
> deadlock, because get() operation is invoked with LockMode.READ_UNCOMMITTED 
> parameter. According to BDB documentation READ_UNCOMMITTED mode should be 
> non-blocking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to