[ 
https://issues.apache.org/jira/browse/QPID-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202173#comment-17202173
 ] 

ASF GitHub Bot commented on QPID-8470:
--------------------------------------

dakirily opened a new pull request #61:
URL: https://github.com/apache/qpid-broker-j/pull/61


   Added retrying logic for storing message metadata.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Deadlock  / latch timeout in BDB database
> -----------------------------------------
>
>                 Key: QPID-8470
>                 URL: https://issues.apache.org/jira/browse/QPID-8470
>             Project: Qpid
>          Issue Type: Bug
>          Components: Broker-J
>    Affects Versions: qpid-java-broker-8.0.1
>            Reporter: Daniil Kirilyuk
>            Priority: Minor
>         Attachments: JE-thread-dump.txt
>
>
> There was deadlock (or [latch 
> timeout|[https://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/EnvironmentConfig.html#ENV_LATCH_TIMEOUT])
>  in BDB database or latch was hold for more then timeout. Then was database 
> environment closed, database transactions cancelled and broker initiated 
> shutdown.
> {noformat}
> 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55550] (c.s.j.d.EnvironmentImpl) 
> - default Thread dump follows for latch timeout: BIN2605191 currentThread: 
> Thread[IO-/127.0.0.1:55550,5,main] currentTime: 1597790561955 exclusiveOwner: 
> Thread[IO-/127.0.0.1:55522,5,main]
> 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55580] (c.s.j.d.EnvironmentImpl) 
> - default Thread dump follows for latch timeout: BIN2605060 currentThread: 
> Thread[IO-/127.0.0.1:55580,5,main] currentTime: 1597790561955 exclusiveOwner: 
> Thread[IO-/127.0.0.1:55584,5,main]
> 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55544] (c.s.j.d.EnvironmentImpl) 
> - default Thread dump follows for latch timeout: BIN2605128 currentThread: 
> Thread[IO-/127.0.0.1:55544,5,main] currentTime: 1597790561955 exclusiveOwner: 
> Thread[IO-/127.0.0.1:55548,5,main]
> ....
> 2020-08-19 00:42:42,174 ERROR [IO-/127.0.0.1:55538] 
> (o.a.q.s.u.ServerScopedRuntimeException) - Error reading message metadata for 
> message with id 52671357: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages 
> Latch timeout. BIN2605191 currentThread: Thread[IO-/127.0.0.1:55538,5,main] 
> currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55522,5,main] 
> UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. 
> Environment is invalid and must be closed.
> org.apache.qpid.server.logging.logback.Logback1027WorkaroundTurboFilter$StringifiedException:
>  com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) 
> /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605191 currentThread: 
> Thread[IO-/127.0.0.1:55538,5,main] currentTime: 1597790561955 exclusiveOwner: 
> Thread[IO-/127.0.0.1:55522,5,main] UNEXPECTED_STATE_FATAL: Unexpected 
> internal state, unable to continue. Environment is invalid and must be closed.
>         at 
> com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:443)
>         at 
> com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:211)
> ....
>                 at 
> org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
>                 at 
> org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
>                 at 
> org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
>                 at 
> org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
>                 at 
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
>                 at 
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
>                 ... 5 more
>                 Suppressed: com.sleepycat.je.EnvironmentFailureException: (JE 
> 7.4.5) Environment must be closed, caused by: 
> com.sleepycat.je.EnvironmentFailureException: Environment invalid because of 
> previous exception: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch 
> timeout. BIN2605176 currentThread: Thread[IO-/127.0.0.1:55496,5,main] 
> currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55494,5,main] 
> UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. 
> Environment is invalid and must be closed.
>                         at 
> com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)
>                         at 
> com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)
>                         at 
> com.sleepycat.je.dbi.CursorImpl.checkEnv(CursorImpl.java:3869)
>                         at com.sleepycat.je.Cursor.checkEnv(Cursor.java:5394)
>                         at com.sleepycat.je.Cursor.close(Cursor.java:540)
>                         at com.sleepycat.je.Database.get(Database.java:1344)
>                         ... 41 more
>         [CIRCULAR REFERENCE:com.sleepycat.je.EnvironmentFailureException: 
> Environment invalid because of previous exception: (JE 7.4.5) 
> /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605176 currentThread: 
> Thread[IO-/127.0.0.1:55496,5,main] currentTime: 1597790561955 exclusiveOwner: 
> Thread[IO-/127.0.0.1:55494,5,main] UNEXPECTED_STATE_FATAL: Unexpected 
> internal state, unable to continue. Environment is invalid and must be 
> closed.]
> End of stringified Stacktrace
> ...
> 2020-08-19 00:42:42,201 ERROR [IO-/127.0.0.1:55498] (o.a.q.s.Main) - Uncaught 
> exception, shutting down.
> org.apache.qpid.server.logging.logback.Logback1027WorkaroundTurboFilter$StringifiedException:
>  org.apache.qpid.server.util.ServerScopedRuntimeException: 
> org.apache.qpid.server.store.StoreException: Error reading message metadata 
> for message with id 52671021: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages 
> Latch timeout. BIN2605177 currentThread: Thread[IO-/127.0.0.1:55498,5,main] 
> currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55554,5,main] 
> UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. 
> Environment is invalid and must be closed.
>         at 
> org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:582)
>         at java.security.AccessController.doPrivileged(Native Method)
> {noformat}
>  BDB thread dump attached.
> Altogether there were 16 threads handling BDB latch timeout (in stacktrace 
> there was a call of com.sleepycat.je.latch.LatchSupport.handleTimeout() 
> method).
> 2 of them – when sending messages: 
> {noformat}
> java.lang.Thread.dumpThreads(Native Method)
> java.lang.Thread.getAllStackTraces(Thread.java:1610)
> com.sleepycat.je.utilint.LoggerUtils.fullThreadDump(LoggerUtils.java:566)
> com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:209)
> com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:63)
> com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)
> com.sleepycat.je.tree.IN.latchShared(IN.java:560)
> com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)
> com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1817)
> com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1748)
> com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1666)
> com.sleepycat.je.tree.Tree.findBinForInsert(Tree.java:1637)
> com.sleepycat.je.dbi.CursorImpl.insertRecordInternal(CursorImpl.java:1337)
> com.sleepycat.je.dbi.CursorImpl.insertOrUpdateRecord(CursorImpl.java:1221)
> com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2962)
> com.sleepycat.je.Cursor.putNotify(Cursor.java:2800)
> com.sleepycat.je.Cursor.putNoDups(Cursor.java:2647)
> com.sleepycat.je.Cursor.putInternal(Cursor.java:2478)
> com.sleepycat.je.Cursor.putInternal(Cursor.java:830)
> com.sleepycat.je.Database.put(Database.java:1574)
> com.sleepycat.je.Database.put(Database.java:1627)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.storeMetaData(AbstractBDBMessageStore.java:591)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.access$600(AbstractBDBMessageStore.java:77)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.store(AbstractBDBMessageStore.java:1165)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction$1.run(AbstractBDBMessageStore.java:1344)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.doPreCommitActions(AbstractBDBMessageStore.java:1378)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.commitTranAsync(AbstractBDBMessageStore.java:1400)
> org.apache.qpid.server.txn.AsyncAutoCommitTransaction.enqueue(AsyncAutoCommitTransaction.java:294)
> org.apache.qpid.server.message.RoutingResult.send(RoutingResult.java:124)
> org.apache.qpid.server.protocol.v1_0.NodeReceivingDestination.send(NodeReceivingDestination.java:117)
> org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveDelivery(StandardReceivingLinkEndpoint.java:280)
> org.apache.qpid.server.protocol.v1_0.AbstractReceivingLinkEndpoint.receiveTransfer(AbstractReceivingLinkEndpoint.java:165)
> org.apache.qpid.server.protocol.v1_0.Session_1_0.receiveTransfer(Session_1_0.java:637)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receiveTransfer(AMQPConnection_1_0Impl.java:792)
> org.apache.qpid.server.protocol.v1_0.type.transport.Transfer.invoke(Transfer.java:295)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.received(AMQPConnection_1_0Impl.java:515)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receive$0(AMQPConnection_1_0Impl.java:467)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$146/229440539.run(Unknown
>  Source)
> java.security.AccessController.doPrivileged(Native Method)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receive(AMQPConnection_1_0Impl.java:461)
> org.apache.qpid.server.protocol.v1_0.framing.FrameHandler.parse(FrameHandler.java:224)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1312)
> org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576)
> org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$74/2141950108.run(Unknown
>  Source)
> java.security.AccessController.doPrivileged(Native Method)
> org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571)
> org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135)
> org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611)
> org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
> org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
> org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
> org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
> org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
> org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1136768342.run(Unknown
>  Source)
> java.lang.Thread.run(Thread.java:748)
> {noformat}
> 14 of them – when receiving messages:
> {noformat}
> com.sleepycat.je.utilint.LoggerUtils.fullThreadDump(LoggerUtils.java:557)
> com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:209)
> com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:63)
> com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)
> com.sleepycat.je.tree.IN.latchShared(IN.java:560)
> com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)
> com.sleepycat.je.tree.Tree.search(Tree.java:2275)
> com.sleepycat.je.tree.Tree.search(Tree.java:2172)
> com.sleepycat.je.dbi.CursorImpl.searchExact(CursorImpl.java:1951)
> com.sleepycat.je.Cursor.searchExact(Cursor.java:4195)
> com.sleepycat.je.Cursor.searchNoDups(Cursor.java:4056)
> com.sleepycat.je.Cursor.search(Cursor.java:3858)
> com.sleepycat.je.Cursor.getInternal(Cursor.java:1282)
> com.sleepycat.je.Database.get(Database.java:1342)
> com.sleepycat.je.Database.get(Database.java:1401)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.getMessageMetaData(AbstractBDBMessageStore.java:299)
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.getMetaData(AbstractBDBMessageStore.java:1077)
> org.apache.qpid.server.message.AbstractServerMessageImpl.isPersistent(AbstractServerMessageImpl.java:178)
> org.apache.qpid.server.queue.QueueEntryImpl.populateInstanceProperties(QueueEntryImpl.java:155)
> org.apache.qpid.server.queue.QueueEntryImpl.<init>(QueueEntryImpl.java:147)
> org.apache.qpid.server.queue.OrderedQueueEntry.<init>(OrderedQueueEntry.java:46)
> org.apache.qpid.server.queue.StandardQueueEntry.<init>(StandardQueueEntry.java:37)
> org.apache.qpid.server.queue.StandardQueueEntryList.createQueueEntry(StandardQueueEntryList.java:48)
> org.apache.qpid.server.queue.StandardQueueEntryList.createQueueEntry(StandardQueueEntryList.java:26)
> org.apache.qpid.server.queue.OrderedQueueEntryList.add(OrderedQueueEntryList.java:92)
> org.apache.qpid.server.queue.AbstractQueue.doEnqueue(AbstractQueue.java:1356)
> org.apache.qpid.server.queue.AbstractQueue.enqueue(AbstractQueue.java:1289)
> org.apache.qpid.server.message.RoutingResult$1.postCommit(RoutingResult.java:135)
> org.apache.qpid.server.txn.AsyncAutoCommitTransaction$3.postCommit(AsyncAutoCommitTransaction.java:309)
> org.apache.qpid.server.txn.AsyncCommand.complete(AsyncCommand.java:84)
> org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveComplete(StandardReceivingLinkEndpoint.java:588)
> org.apache.qpid.server.protocol.v1_0.Session_1_0.lambda$receivedComplete$5(Session_1_0.java:1358)
> org.apache.qpid.server.protocol.v1_0.Session_1_0$$Lambda$147/1374598514.accept(Unknown
>  Source)
> java.lang.Iterable.forEach(Iterable.java:75)
> org.apache.qpid.server.protocol.v1_0.Session_1_0.receivedComplete(Session_1_0.java:1358)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receivedComplete$11(AMQPConnection_1_0Impl.java:1347)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$152/950450513.run(Unknown
>  Source)
> java.security.AccessController.doPrivileged(Native Method)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receivedComplete(AMQPConnection_1_0Impl.java:1345)
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1326)
> org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576)
> org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$74/2141950108.run(Unknown
>  Source)
> java.security.AccessController.doPrivileged(Native Method)
> org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571)
> org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135)
> org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611)
> org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
> org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
> org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
> org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
> org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
> org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1136768342.run(Unknown
>  Source)
> {noformat}
> Before the thread dump in log were present massive “QUE-1005 : Dropped” 
> messages (ca. 73.000) and many error messages “Connection closed with error : 
> amqp:resource-limit-exceeded - local-idle-timeout expired” (191).
> I’m not sure, that AbstractServerMessageImpl.isPersistent() call can cause a 
> deadlock, because get() operation is invoked with LockMode.READ_UNCOMMITTED 
> parameter. According to BDB documentation READ_UNCOMMITTED mode should be 
> non-blocking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to