[ https://issues.apache.org/jira/browse/ARTEMIS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422949#comment-17422949 ]
Ekta commented on ARTEMIS-3505: ------------------------------- Hello Justin, The previous ticket related to same issue was NOT opened by me, it was opened by one of our team members name Mahendra Sonawale. Below is the related ticket and I also copied the comment from that ticket by YOU and pasted here. See the very last line. Thanks https://issues.apache.org/jira/browse/ARTEMIS-3355 This looks like a "soft" deadlock which was properly caught by the "[critical analyzer|https://activemq.apache.org/components/artemis/documentation/latest/critical-analysis.html]." The critical analyzer is a kind of safe-guard to catch nasty issues like this and shut down the broker so that it can be restarted and restored to proper working order rather than just sitting there in a dead-locked state potentially forever while clients are unable to perform their work. {{Thread-38676}} was blocked and triggered the failure because it is in a section of code which is deemed "critical" to broker performance (i.e. the TimedBuffer - which flushes data to disk): {noformat} "Thread-38676 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$7@5f5effb0)" Id=243147 BLOCKED on org.apache.activemq.artemis.protocol.amqp.broker.AMQPLargeMessage@3a185aa5 owned by "Thread-16 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@41b13f3d)" Id=125 at org.apache.activemq.artemis.protocol.amqp.broker.AMQPMessage.ensureMessageDataScanned(AMQPMessage.java:572) - blocked on org.apache.activemq.artemis.protocol.amqp.broker.AMQPLargeMessage@3a185aa5 at org.apache.activemq.artemis.protocol.amqp.broker.AMQPMessage.getExpiration(AMQPMessage.java:962) at org.apache.activemq.artemis.protocol.amqp.broker.AMQPLargeMessagePersister.encode(AMQPLargeMessagePersister.java:97) at org.apache.activemq.artemis.protocol.amqp.broker.AMQPLargeMessagePersister.encode(AMQPLargeMessagePersister.java:32) at org.apache.activemq.artemis.core.journal.impl.dataformat.JournalAddRecord.encode(JournalAddRecord.java:72) at org.apache.activemq.artemis.core.io.buffer.TimedBuffer.addBytes(TimedBuffer.java:321) - locked org.apache.activemq.artemis.core.io.buffer.TimedBuffer@40639fab at org.apache.activemq.artemis.core.io.AbstractSequentialFile.write(AbstractSequentialFile.java:231) at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:2937) at org.apache.activemq.artemis.core.journal.impl.JournalImpl.access$100(JournalImpl.java:92) at org.apache.activemq.artemis.core.journal.impl.JournalImpl$1.run(JournalImpl.java:850) at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$39/2124562732.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@556802c0 {noformat} {{Thread-38676}} was blocked waiting on {{Thread-16}}: {noformat} "Thread-16 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@41b13f3d)" Id=125 WAITING on java.util.concurrent.CountDownLatch$Sync@2f662f7b at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.CountDownLatch$Sync@2f662f7b at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.apache.activemq.artemis.utils.SimpleFutureImpl.get(SimpleFutureImpl.java:62) at org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkKnownRecordID(JournalImpl.java:1155) at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:989) at org.apache.activemq.artemis.core.replication.ReplicatedJournal.appendDeleteRecord(ReplicatedJournal.java:233) at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.confirmPendingLargeMessage(AbstractJournalStorageManager.java:359) at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.confirmLargeMessageSend(PostOfficeImpl.java:1620) - locked org.apache.activemq.artemis.protocol.amqp.broker.AMQPLargeMessage@3a185aa5 at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.processRoute(PostOfficeImpl.java:1562) at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.java:1191) at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.java:1063) at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.doSend(ServerSessionImpl.java:2172) - locked org.apache.activemq.artemis.core.server.impl.ServerSessionImpl@18f48d32 at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.send(ServerSessionImpl.java:1812) - locked org.apache.activemq.artemis.core.server.impl.ServerSessionImpl@18f48d32 at org.apache.activemq.artemis.protocol.amqp.broker.AMQPSessionCallback.inSessionSend(AMQPSessionCallback.java:563) at org.apache.activemq.artemis.protocol.amqp.broker.AMQPSessionCallback.lambda$serverSend$2(AMQPSessionCallback.java:522) at org.apache.activemq.artemis.protocol.amqp.broker.AMQPSessionCallback$$Lambda$275/60269086.run(Unknown Source) at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$39/2124562732.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@4fd856a6 {noformat} However, {{Thread-16}} itself is waiting _indefinitely_ in {{java.util.concurrent.CountDownLatch#await()}}. Unfortunately, this call will never return because it's waiting for a thread to run that itself is blocked by {{Thread-38676}} because they're both being run by the same ordered executor. This issue has already been resolved by the commit for ARTEMIS-3327. It will be available in 2.18.0. > Activemq Broker Keeps Crashing > ------------------------------ > > Key: ARTEMIS-3505 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3505 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker > Affects Versions: 2.18.0 > Environment: DEV/UAT/PROD > Reporter: Ekta > Priority: Critical > Attachments: samplebroker.xml, threadDump.txt > > > Hello All, > > We have noticed the same problem which we reported earlier with 2.17 and > were told that it would be fixed in 2.18 version. We have recently moved all > our environments to 2.18 version and can see the problem still exists across > all of our env's. > > We have below architecture in respect to activemq master/slave setup. > {noformat} > producer/consumer --> Apache QPID (1.14) --> Artemis 2.18 (master/slave) > {noformat} > Basically, we see our master and slave brokers going down abruptly with below > log. I have also attached the thread dump for analysis to see if anyone can > spot anything, for sure we can see it is to do with some concurrent > deadlocks. Please go through the attached logs and suggest any feedback, if > any. > The log that is causing the issue is highlighted below, as soon as the broker > prints this, it prints The Critical Analyzer detected slow paths on the > broker. *and therefore*, AMQ224079: The process for the virtual machine will > be killed. > 2021-09-29 10:37:43,327 WARN > [org.apache.activemq.artemis.utils.critical.CriticalMeasure] Component > org.apache.activemq.artemis.core.io.buffer.TimedBuffer is expired on path 4 > It has been happening quite frequently now and we need to come to bottom of > this. > > Appreciate everyone's effort on this. [^threadDump.txt] -- This message was sent by Atlassian Jira (v8.3.4#803005)