[
https://issues.apache.org/jira/browse/QPID-8514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17321117#comment-17321117
]
Daniil Kirilyuk commented on QPID-8514:
---------------------------------------
[~orudyy],
There is also another scenario possible: client could connect to a ring queue
and consume messages without acknowledging them. For JMS client that could be
achievable using Session.CLIENT_ACKNOWLEDGE mode without calling
session.commit() after receiving a message. Such situation will lead to the
queue overflow and possibly to broker crash as well. Messages remain in
acquired state after misbehaved client is stopped or killed and therefore
cannot be deleted e.g. via broker web GUI.
> High CPU usage and stucked connections
> --------------------------------------
>
> Key: QPID-8514
> URL: https://issues.apache.org/jira/browse/QPID-8514
> Project: Qpid
> Issue Type: Bug
> Components: Broker-J
> Affects Versions: qpid-java-broker-8.0.4
> Reporter: Daniil Kirilyuk
> Priority: Minor
> Attachments: QPID-8514 - broker-log.zip,
> RingPolicyConcurrencyTest.java, threaddump.zip
>
>
> There were around 600 connections in state CLOSE__WAIT:
> {noformat}
> [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep TIME_WAIT | wc -l
> 38
> [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep CLOSE_WAIT | wc -l
> 625
> [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep -v CLOSE_WAIT | grep
> -V TIME_WAIT | wc -l
> 7
> {noformat}
> CPU usage raises up to 90% and higher.
> Connections couldn't be closed by webgui. Broker log contains message below,
> but connections were not closed.
> {noformat}
> 2021-03-26 15:12:28,474 INFO [Broker-Config] (q.m.c.model_delete) -
> [mng:3hpU+miq(admin@/127.0.0.1:62743)]
> [con:751(TEST@/127.0.0.1:40422/default)] CON-1007 : Connection close
> initiated by operator
> {noformat}
> Thread dumps shows around 49 threads with same stacktrace related to ring
> queue
> {noformat}
> java.lang.Thread.State: RUNNABLE
> at
> org.apache.qpid.server.queue.QueueEntryImpl.addStateChangeListener(QueueEntryImpl.java:674)
> at
> org.apache.qpid.server.queue.QueueEntryImpl.acquireOrSteal(QueueEntryImpl.java:316)
> at
> org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1913)
> at
> org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1863)
> at
> org.apache.qpid.server.queue.RingOverflowPolicyHandler$Handler.checkOverflow(RingOverflowPolicyHandler.java:102)
> at
> org.apache.qpid.server.queue.RingOverflowPolicyHandler$Handler.access$000(RingOverflowPolicyHandler.java:44)
> at
> org.apache.qpid.server.queue.RingOverflowPolicyHandler.checkOverflow(RingOverflowPolicyHandler.java:41)
> at
> org.apache.qpid.server.queue.AbstractQueue.doEnqueue(AbstractQueue.java:1382)
> at
> org.apache.qpid.server.queue.AbstractQueue.enqueue(AbstractQueue.java:1290)
> at
> org.apache.qpid.server.message.RoutingResult$1.postCommit(RoutingResult.java:135)
> at
> org.apache.qpid.server.txn.AsyncAutoCommitTransaction$3.postCommit(AsyncAutoCommitTransaction.java:309)
> at
> org.apache.qpid.server.txn.AsyncCommand.complete(AsyncCommand.java:84)
> at
> org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveComplete(StandardReceivingLinkEndpoint.java:588)
> at
> org.apache.qpid.server.protocol.v1_0.Session_1_0.lambda$receivedComplete$5(Session_1_0.java:1343)
> at
> org.apache.qpid.server.protocol.v1_0.Session_1_0$$Lambda$142/959267941.accept(Unknown
> Source)
> at java.lang.Iterable.forEach(Iterable.java:75)
> at
> org.apache.qpid.server.protocol.v1_0.Session_1_0.receivedComplete(Session_1_0.java:1343)
> at
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receivedComplete$11(AMQPConnection_1_0Impl.java:1349)
> at
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$141/1774823837.run(Unknown
> Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receivedComplete(AMQPConnection_1_0Impl.java:1347)
> at
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1328)
> at
> org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576)
> at
> org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$119/6380061.run(Unknown
> Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571)
> at
> org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135)
> at
> org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611)
> at
> org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
> at
> org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
> at
> org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
> at
> org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
> at
> org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
> at
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
> at
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
> at
> org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
> at
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1487500813.run(Unknown
> Source)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Also ring queue behaves strangely: queue "{{DeadLetterQueue"}} has set
> maximum message depth to {{1}}, but in there were 55 message with
> {{Acquired}} state.
> First message in {{DeadLetterQueue}} - {{2021-03-25, 16:50:59.381 (00:00
> UTC)}}.
> Last message in {{DeadLetterQueue}} - {{2021-03-25, 16:50:59.683 (00:00
> UTC)}}.
> These times correlates to timed out drain request at 16:52:00,410, because
> default drain request is set 60 seconds:
> {noformat}
> ^[[0m^[[0m2021-03-25 16:52:00,408 INFO [org.apache.qpid.jms.JmsSession]
> (QpidJMS Connection Executor: ID:986cd915-d072-4711-bd71-50a74aab4be8:1) A
> JMS MessageConsumer has been closed: JmsConsumerInfo: {
> ID:986cd915-d072-4711-bd71-50a74aab4be8:1:1:1, destination =
> broadcast.XX.PRISMA_LIVERiskLimitTrigger }
> ^[[0m^[[33m2021-03-25 16:52:00,410 WARN
> [org.apache.camel.component.jms.DefaultJmsMessageListenerContainer] (Camel
> (camel-1) thread #3 - JmsConsumer[broadcast.XX.PRISMA_LIVERiskLimitTrigger])
> Setup of JMS message listener invoker failed for destination
> 'broadcast.XX.PRISMA_LIVERiskLimitTrigger' - trying to recover. Cause: Remote
> did not respond to a drain request in time
> ^[[0m^[[33m2021-03-25 16:52:01,414 WARN
> [org.apache.camel.component.jms.DefaultJmsMessageListenerContainer] (Camel
> (camel-1) thread #6 - JmsConsumer[broadcast.XX.PRISMA_LIVERiskLimitTrigger])
> Setup of JMS message listener invoker failed for destination
> 'broadcast.XX.PRISMA_LIVERiskLimitTrigger'
> {noformat}
> After the problem was triggered broker started to reject client connections:
> {noformat}
> 2021-03-25 17:55:43,716 INFO [Broker-Config] (q.m.c.open) -
> [con:901(/127.0.0.1:56012)] CON-1001 : Open : Destination :
> amqp(127.0.0.1:22101) : Protocol Version : 1.0
> 2021-03-25 17:55:43,745 INFO [IO-/127.0.0.1:56012] (q.m.c.open) -
> [con:901(ARP@/127.0.0.1:56012/default)] CON-1001 : Open : Destination :
> amqp(172.16.75.197:22101) : Protocol Version : 1.0 : Client ID :
> ID:af50d087-0610-406a-9792-ffa5bf6b68b4:5 : Client Version : 0.51.0 : Client
> Product : QpidJMS
> 2021-03-25 17:55:43,747 INFO [IO-/127.0.0.1:56012] (q.m.a.denied) -
> [con:901(ARP@/127.0.0.1:56012/default)] ACL-1002 : Denied : Access
> Virtualhost NAME=default,VIRTUALHOST_NAME=default
> 2021-03-25 17:55:43,755 INFO [Broker-Config] (q.m.c.dropped_connection) -
> [con:901(ARP@/127.0.0.1:56012/default)] CON-1004 : Connection dropped
> {noformat}
> Issue seems to be quite rare: same test ran in a loop over several days and
> this issue was not triggered.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]