Ram Mantripragada created QPID-8681:
---------------------------------------
Summary: Addressing lock contention in Sorted Queues under high
load by optimizing property fetching
Key: QPID-8681
URL: https://issues.apache.org/jira/browse/QPID-8681
Project: Qpid
Issue Type: Improvement
Components: Broker-J
Affects Versions: qpid-java-broker-7.0.9
Reporter: Ram Mantripragada
Attachments: image-2025-01-27-22-41-16-668.png,
image-2025-01-27-22-42-13-739.png
Apache Qpid Broker J provides Sorted Queues, which allow users to implement a
message re-enqueue with delay feature. This feature is crucial in scenarios
where messages dequeued from regular queues cannot be processed immediately due
to certain preconditions (e.g., available resources, concurrency limits). These
messages are re-enqueued to Sorted Queues, where they are sorted based on their
delay expiry time. Periodic jobs then check these Sorted Queues for expired
messages and move them back to the regular queues for processing.
Under high load conditions (e.g., a re-enqueue rate of ~7,000 messages per
second), we observed that the broker experiences contention issues, causing it
to stop responding to REST API calls (which time out if REST timeouts are set
on the client side otherwise just hung around). These APIs are used to
periodically fetch queue statistics.
*Analysis:*
By analyzing Java Flight Recorder (JFR) data, we identified the root cause of
the contention:
# REST API calls to retrieve queue depths required querying some specific
predefined properties from the broker.
# However, for each requested property, the broker was inadvertently fetching
all 26 properties, resulting in repeated and excessive lock acquisition
attempts on the Sorted Queue data structure.
# Among the requested properties, the *oldestMessageAge* property (used for
delay queues) significantly contributed to the contention by increasing the
number of lock requests.
# On the application side, querying for the *oldestMessageAge* property is
unnecessary when dealing with delay queues, so avoiding this query will further
mitigate the contention issue.
*Stack Trace:*
The following stack trace illustrates the contention observed during the issue:
{code:java}
at
org.apache.qpid.server.queue.SortedQueueEntryList.next(SortedQueueEntryList.java:292)
at
org.apache.qpid.server.queue.SortedQueueEntryList$QueueEntryIteratorImpl.atTail(SortedQueueEntryList.java:686)
at
org.apache.qpid.server.queue.SortedQueueEntryList$QueueEntryIteratorImpl.advance(SortedQueueEntryList.java:698)
at
org.apache.qpid.server.queue.SortedQueueEntryList.getOldestEntry(SortedQueueEntryList.java:350)
at
org.apache.qpid.server.queue.AbstractQueue.getOldestMessageArrivalTime(AbstractQueue.java:1518)
at
org.apache.qpid.server.queue.AbstractQueue.getOldestMessageAge(AbstractQueue.java:1546)
at jdk.internal.reflect.GeneratedMethodAccessor167.invoke(Unknown
Source:-1) at
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:566) at
org.apache.qpid.server.model.ConfiguredObjectMethodAttributeOrStatistic.getValue(ConfiguredObjectMethodAttributeOrStatistic.java:68)
at
org.apache.qpid.server.model.ConfiguredObjectMethodStatistic.getValue(ConfiguredObjectMethodStatistic.java:26)
at
org.apache.qpid.server.model.AbstractConfiguredObject.getStatistics(AbstractConfiguredObject.java:3181)
at
org.apache.qpid.server.queue.SortedQueueImplWithAccessChecking.getStatistics(SortedQueueImplWithAccessChecking.java:42)
at
org.apache.qpid.server.model.AbstractConfiguredObject.getStatistics(AbstractConfiguredObject.java:3168)
at
org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectExpressionFactory$ConfiguredObjectPropertyExpression.getValue(ConfiguredObjectExpressionFactory.java:312)
at
org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectExpressionFactory$ConfiguredObjectPropertyExpression.evaluate(ConfiguredObjectExpressionFactory.java:285)
at
org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectExpressionFactory$ConfiguredObjectPropertyExpression.evaluate(ConfiguredObjectExpressionFactory.java:272)
at
org.apache.qpid.server.filter.ComparisonExpression.evaluate(ComparisonExpression.java:388)
at
org.apache.qpid.server.filter.ComparisonExpression.matches(ComparisonExpression.java:580)
at
org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectQuery.filterObjects(ConfiguredObjectQuery.java:210)
at
org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectQuery.<init>(ConfiguredObjectQuery.java:86)
at
org.apache.qpid.server.management.plugin.servlet.rest.QueryServlet.performQuery(QueryServlet.java:93)
at
org.apache.qpid.server.management.plugin.servlet.rest.QueryServlet.doGet(QueryServlet.java:56)
at
org.apache.qpid.server.management.plugin.servlet.rest.AbstractServlet.doGet(AbstractServlet.java:128)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at
javax.servlet.http.HttpServlet.service(HttpServlet.java:790) {code}
* Java Flight Recorder (JFR) data
!image-2025-01-27-22-41-16-668.png!
*Steps to Reproduce:*
# Configure a Qpid Broker with Sorted Queues.
# Enable REST APIs for queue statistics retrieval.
# Simulate high load by re-enqueuing ~7,000 messages per second to the Sorted
Queues.
# Monitor broker performance and REST API response times.
*Expected Behavior:* The broker should handle high re-enqueue rates without
contention issues, and REST API calls for queue statistics should not time out.
*Actual Behavior:* Under high load conditions, the broker experiences
contention on the Sorted Queue data structure, leading to REST API timeouts.
*Proposed Solution:*
To address this issue, we propose optimizing the property fetching mechanism in
the Qpid Broker in ConfiguredObjectExpressionFactory.java:
# Modify the broker code (ConfiguredObjectExpressionFactory) to retrieve only
the specifically requested properties rather than all 26 properties.
!image-2025-01-27-22-42-13-739.png!
# Avoid querying the oldestMessageAge property on the application side when
dealing with delay queues, as it is not required for processing.
# These optimizations will reduce the load on the Sorted Queue's locking
mechanism and prevent redundant data fetches, thereby improving the broker's
responsiveness under high load conditions.
We have tested the proposed changes in our environment and observed significant
performance improvements, including reduced lock contention and faster REST API
responses under high load.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]