[ 
https://issues.apache.org/jira/browse/QPID-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210495#comment-17210495
 ] 

ASF subversion and git services commented on QPID-8470:
-------------------------------------------------------

Commit 1b1989118c14431165f70ca70cc9829a03715dca in qpid-broker-j's branch 
refs/heads/7.1.x from aw924
[ https://gitbox.apache.org/repos/asf?p=qpid-broker-j.git;h=1b19891 ]

QPID-8470: Added retrying logic for storing metadata


> Deadlock  / latch timeout in BDB database
> -----------------------------------------
>
>                 Key: QPID-8470
>                 URL: https://issues.apache.org/jira/browse/QPID-8470
>             Project: Qpid
>          Issue Type: Bug
>          Components: Broker-J
>    Affects Versions: qpid-java-broker-8.0.1
>            Reporter: Daniil Kirilyuk
>            Assignee: Alex Rudyy
>            Priority: Minor
>         Attachments: JE-thread-dump.txt
>
>
> There was deadlock (or [latch 
> timeout|[https://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/EnvironmentConfig.html#ENV_LATCH_TIMEOUT])
>  in BDB database or latch was hold for more then timeout. Then was database 
> environment closed, database transactions cancelled and broker initiated 
> shutdown.
> {noformat}
> 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55550] (c.s.j.d.EnvironmentImpl) 
> - default Thread dump follows for latch timeout: BIN2605191 currentThread: 
> Thread[IO-/127.0.0.1:55550,5,main] currentTime: 1597790561955 exclusiveOwner: 
> Thread[IO-/127.0.0.1:55522,5,main]
> 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55580] (c.s.j.d.EnvironmentImpl) 
> - default Thread dump follows for latch timeout: BIN2605060 currentThread: 
> Thread[IO-/127.0.0.1:55580,5,main] currentTime: 1597790561955 exclusiveOwner: 
> Thread[IO-/127.0.0.1:55584,5,main]
> 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55544] (c.s.j.d.EnvironmentImpl) 
> - default Thread dump follows for latch timeout: BIN2605128 currentThread: 
> Thread[IO-/127.0.0.1:55544,5,main] currentTime: 1597790561955 exclusiveOwner: 
> Thread[IO-/127.0.0.1:55548,5,main]
> ....
> 2020-08-19 00:42:42,174 ERROR [IO-/127.0.0.1:55538] 
> (o.a.q.s.u.ServerScopedRuntimeException) - Error reading message metadata for 
> message with id 52671357: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages 
> Latch timeout. BIN2605191 currentThread: Thread[IO-/127.0.0.1:55538,5,main] 
> currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55522,5,main] 
> UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. 
> Environment is invalid and must be closed.
> org.apache.qpid.server.logging.logback.Logback1027WorkaroundTurboFilter$StringifiedException:
>  com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) 
> /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605191 currentThread: 
> Thread[IO-/127.0.0.1:55538,5,main] currentTime: 1597790561955 exclusiveOwner: 
> Thread[IO-/127.0.0.1:55522,5,main] UNEXPECTED_STATE_FATAL: Unexpected 
> internal state, unable to continue. Environment is invalid and must be closed.
>         at 
> com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:443)
>         at 
> com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:211)
> ....
>                 at 
> org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
>                 at 
> org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
>                 at 
> org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
>                 at 
> org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
>                 at 
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
>                 at 
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
>                 ... 5 more
>                 Suppressed: com.sleepycat.je.EnvironmentFailureException: (JE 
> 7.4.5) Environment must be closed, caused by: 
> com.sleepycat.je.EnvironmentFailureException: Environment invalid because of 
> previous exception: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch 
> timeout. BIN2605176 currentThread: Thread[IO-/127.0.0.1:55496,5,main] 
> currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55494,5,main] 
> UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. 
> Environment is invalid and must be closed.
>                         at 
> com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)
>                         at 
> com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)
>                         at 
> com.sleepycat.je.dbi.CursorImpl.checkEnv(CursorImpl.java:3869)
>                         at com.sleepycat.je.Cursor.checkEnv(Cursor.java:5394)
>                         at com.sleepycat.je.Cursor.close(Cursor.java:540)
>                         at com.sleepycat.je.Database.get(Database.java:1344)
>                         ... 41 more
>         [CIRCULAR REFERENCE:com.sleepycat.je.EnvironmentFailureException: 
> Environment invalid because of previous exception: (JE 7.4.5) 
> /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605176 currentThread: 
> Thread[IO-/127.0.0.1:55496,5,main] currentTime: 1597790561955 exclusiveOwner: 
> Thread[IO-/127.0.0.1:55494,5,main] UNEXPECTED_STATE_FATAL: Unexpected 
> internal state, unable to continue. Environment is invalid and must be 
> closed.]
> End of stringified Stacktrace
> ...
> 2020-08-19 00:42:42,201 ERROR [IO-/127.0.0.1:55498] (o.a.q.s.Main) - Uncaught 
> exception, shutting down.
> org.apache.qpid.server.logging.logback.Logback1027WorkaroundTurboFilter$StringifiedException:
>  org.apache.qpid.server.util.ServerScopedRuntimeException: 
> org.apache.qpid.server.store.StoreException: Error reading message metadata 
> for message with id 52671021: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages 
> Latch timeout. BIN2605177 currentThread: Thread[IO-/127.0.0.1:55498,5,main] 
> currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55554,5,main] 
> UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. 
> Environment is invalid and must be closed.
>         at 
> org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:582)
>         at java.security.AccessController.doPrivileged(Native Method)
> {noformat}
>  BDB thread dump attached.
> Altogether there were 16 threads handling BDB latch timeout (in stacktrace 
> there was a call of com.sleepycat.je.latch.LatchSupport.handleTimeout() 
> method).
> 2 of them – when sending messages: 
> {noformat}
> java.lang.Thread.dumpThreads(Native Method)
> java.lang.Thread.getAllStackTraces(Thread.java:1610)
> com.sleepycat.je.utilint.LoggerUtils.fullThreadDump(LoggerUtils.java:566)
> com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:209)
> com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:63)
> com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)
> com.sleepycat.je.tree.IN.latchShared(IN.java:560)
> com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)
> com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1817)
> com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1748)
> com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1666)
> com.sleepycat.je.tree.Tree.findBinForInsert(Tree.java:1637)
> com.sleepycat.je.dbi.CursorImpl.insertRecordInternal(CursorImpl.java:1337)
> com.sleepycat.je.dbi.CursorImpl.insertOrUpdateRecord(CursorImpl.java:1221)
> com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2962)
> com.sleepycat.je.Cursor.putNotify(Cursor.java:2800)
> com.sleepycat.je.Cursor.putNoDups(Cursor.java:2647)
> com.sleepycat.je.Cursor.putInternal(Cursor.java:2478)
> com.sleepycat.je.Cursor.putInternal(Cursor.java:830)
> com.sleepycat.je.Database.put(Database.java:1574)
> com.sleepycat.je.Database.put(Database.java:1627)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.storeMetaData(AbstractBDBMessageStore.java:591)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.access$600(AbstractBDBMessageStore.java:77)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.store(AbstractBDBMessageStore.java:1165)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction$1.run(AbstractBDBMessageStore.java:1344)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.doPreCommitActions(AbstractBDBMessageStore.java:1378)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.commitTranAsync(AbstractBDBMessageStore.java:1400)
> org.apache.qpid.server.txn.AsyncAutoCommitTransaction.enqueue(AsyncAutoCommitTransaction.java:294)
> org.apache.qpid.server.message.RoutingResult.send(RoutingResult.java:124)
> org.apache.qpid.server.protocol.v1_0.NodeReceivingDestination.send(NodeReceivingDestination.java:117)
> org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveDelivery(StandardReceivingLinkEndpoint.java:280)
> org.apache.qpid.server.protocol.v1_0.AbstractReceivingLinkEndpoint.receiveTransfer(AbstractReceivingLinkEndpoint.java:165)
> org.apache.qpid.server.protocol.v1_0.Session_1_0.receiveTransfer(Session_1_0.java:637)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receiveTransfer(AMQPConnection_1_0Impl.java:792)
> org.apache.qpid.server.protocol.v1_0.type.transport.Transfer.invoke(Transfer.java:295)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.received(AMQPConnection_1_0Impl.java:515)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receive$0(AMQPConnection_1_0Impl.java:467)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$146/229440539.run(Unknown
>  Source)
> java.security.AccessController.doPrivileged(Native Method)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receive(AMQPConnection_1_0Impl.java:461)
> org.apache.qpid.server.protocol.v1_0.framing.FrameHandler.parse(FrameHandler.java:224)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1312)
> org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576)
> org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$74/2141950108.run(Unknown
>  Source)
> java.security.AccessController.doPrivileged(Native Method)
> org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571)
> org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135)
> org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611)
> org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
> org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
> org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
> org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
> org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
> org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1136768342.run(Unknown
>  Source)
> java.lang.Thread.run(Thread.java:748)
> {noformat}
> 14 of them – when receiving messages:
> {noformat}
> com.sleepycat.je.utilint.LoggerUtils.fullThreadDump(LoggerUtils.java:557)
> com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:209)
> com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:63)
> com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)
> com.sleepycat.je.tree.IN.latchShared(IN.java:560)
> com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)
> com.sleepycat.je.tree.Tree.search(Tree.java:2275)
> com.sleepycat.je.tree.Tree.search(Tree.java:2172)
> com.sleepycat.je.dbi.CursorImpl.searchExact(CursorImpl.java:1951)
> com.sleepycat.je.Cursor.searchExact(Cursor.java:4195)
> com.sleepycat.je.Cursor.searchNoDups(Cursor.java:4056)
> com.sleepycat.je.Cursor.search(Cursor.java:3858)
> com.sleepycat.je.Cursor.getInternal(Cursor.java:1282)
> com.sleepycat.je.Database.get(Database.java:1342)
> com.sleepycat.je.Database.get(Database.java:1401)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.getMessageMetaData(AbstractBDBMessageStore.java:299)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.getMetaData(AbstractBDBMessageStore.java:1077)
> org.apache.qpid.server.message.AbstractServerMessageImpl.isPersistent(AbstractServerMessageImpl.java:178)
> org.apache.qpid.server.queue.QueueEntryImpl.populateInstanceProperties(QueueEntryImpl.java:155)
> org.apache.qpid.server.queue.QueueEntryImpl.<init>(QueueEntryImpl.java:147)
> org.apache.qpid.server.queue.OrderedQueueEntry.<init>(OrderedQueueEntry.java:46)
> org.apache.qpid.server.queue.StandardQueueEntry.<init>(StandardQueueEntry.java:37)
> org.apache.qpid.server.queue.StandardQueueEntryList.createQueueEntry(StandardQueueEntryList.java:48)
> org.apache.qpid.server.queue.StandardQueueEntryList.createQueueEntry(StandardQueueEntryList.java:26)
> org.apache.qpid.server.queue.OrderedQueueEntryList.add(OrderedQueueEntryList.java:92)
> org.apache.qpid.server.queue.AbstractQueue.doEnqueue(AbstractQueue.java:1356)
> org.apache.qpid.server.queue.AbstractQueue.enqueue(AbstractQueue.java:1289)
> org.apache.qpid.server.message.RoutingResult$1.postCommit(RoutingResult.java:135)
> org.apache.qpid.server.txn.AsyncAutoCommitTransaction$3.postCommit(AsyncAutoCommitTransaction.java:309)
> org.apache.qpid.server.txn.AsyncCommand.complete(AsyncCommand.java:84)
> org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveComplete(StandardReceivingLinkEndpoint.java:588)
> org.apache.qpid.server.protocol.v1_0.Session_1_0.lambda$receivedComplete$5(Session_1_0.java:1358)
> org.apache.qpid.server.protocol.v1_0.Session_1_0$$Lambda$147/1374598514.accept(Unknown
>  Source)
> java.lang.Iterable.forEach(Iterable.java:75)
> org.apache.qpid.server.protocol.v1_0.Session_1_0.receivedComplete(Session_1_0.java:1358)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receivedComplete$11(AMQPConnection_1_0Impl.java:1347)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$152/950450513.run(Unknown
>  Source)
> java.security.AccessController.doPrivileged(Native Method)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receivedComplete(AMQPConnection_1_0Impl.java:1345)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1326)
> org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576)
> org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$74/2141950108.run(Unknown
>  Source)
> java.security.AccessController.doPrivileged(Native Method)
> org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571)
> org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135)
> org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611)
> org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
> org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
> org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
> org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
> org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
> org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1136768342.run(Unknown
>  Source)
> {noformat}
> Before the thread dump in log were present massive “QUE-1005 : Dropped” 
> messages (ca. 73.000) and many error messages “Connection closed with error : 
> amqp:resource-limit-exceeded - local-idle-timeout expired” (191).
> I’m not sure, that AbstractServerMessageImpl.isPersistent() call can cause a 
> deadlock, because get() operation is invoked with LockMode.READ_UNCOMMITTED 
> parameter. According to BDB documentation READ_UNCOMMITTED mode should be 
> non-blocking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to