[
https://issues.apache.org/jira/browse/QPID-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robbie Gemmell updated QPID-8681:
---------------------------------
Fix Version/s: qpid-java-broker-9.2.1
> Addressing lock contention in Sorted Queues under high load by optimizing
> property fetching
> -------------------------------------------------------------------------------------------
>
> Key: QPID-8681
> URL: https://issues.apache.org/jira/browse/QPID-8681
> Project: Qpid
> Issue Type: Improvement
> Components: Broker-J
> Affects Versions: qpid-java-broker-7.0.9
> Reporter: Ram Mantripragada
> Assignee: Robert Godfrey
> Priority: Critical
> Labels: contention, performance
> Fix For: qpid-java-broker-9.2.1
>
> Attachments: image-2025-01-27-22-41-16-668.png,
> image-2025-01-27-22-42-13-739.png
>
> Original Estimate: 6h
> Remaining Estimate: 6h
>
> *Summary:*
> Apache Qpid Broker J provides Sorted Queues, which allow users to implement a
> message re-enqueue with delay feature. This feature is crucial in scenarios
> where messages dequeued from regular queues cannot be processed immediately
> due to certain preconditions (e.g., available resources, concurrency limits).
> These messages are re-enqueued to Sorted Queues, where they are sorted based
> on their delay expiry time. Periodic jobs then check these Sorted Queues for
> expired messages and move them back to the regular queues for processing.
> Under high load conditions (e.g., a re-enqueue rate of ~7,000 messages per
> second), we observed that the broker experiences contention issues, causing
> it to stop responding to REST API calls (which time out if REST timeouts are
> set on the client side otherwise just hung around). These APIs are used to
> periodically fetch queue statistics.
> *Analysis:*
> By analyzing Java Flight Recorder (JFR) data, we identified the root cause of
> the contention:
> # REST API calls to retrieve queue depths required querying some specific
> predefined properties from the broker.
> # However, for each requested property, the broker was inadvertently
> fetching all 26 properties, resulting in repeated and excessive lock
> acquisition attempts on the Sorted Queue data structure.
> # Among the requested properties, the *oldestMessageAge* property (used for
> delay queues) significantly contributed to the contention by increasing the
> number of lock requests.
> # On the application side, querying for the *oldestMessageAge* property is
> unnecessary when dealing with delay queues, so avoiding this query will
> further mitigate the contention issue.
> *Stack Trace:*
> The following stack trace illustrates the contention observed during the
> issue:
> {code:java}
> at
> org.apache.qpid.server.queue.SortedQueueEntryList.next(SortedQueueEntryList.java:292)
> at
> org.apache.qpid.server.queue.SortedQueueEntryList$QueueEntryIteratorImpl.atTail(SortedQueueEntryList.java:686)
> at
> org.apache.qpid.server.queue.SortedQueueEntryList$QueueEntryIteratorImpl.advance(SortedQueueEntryList.java:698)
> at
> org.apache.qpid.server.queue.SortedQueueEntryList.getOldestEntry(SortedQueueEntryList.java:350)
> at
> org.apache.qpid.server.queue.AbstractQueue.getOldestMessageArrivalTime(AbstractQueue.java:1518)
> at
> org.apache.qpid.server.queue.AbstractQueue.getOldestMessageAge(AbstractQueue.java:1546)
> at jdk.internal.reflect.GeneratedMethodAccessor167.invoke(Unknown
> Source:-1) at
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:566) at
> org.apache.qpid.server.model.ConfiguredObjectMethodAttributeOrStatistic.getValue(ConfiguredObjectMethodAttributeOrStatistic.java:68)
> at
> org.apache.qpid.server.model.ConfiguredObjectMethodStatistic.getValue(ConfiguredObjectMethodStatistic.java:26)
> at
> org.apache.qpid.server.model.AbstractConfiguredObject.getStatistics(AbstractConfiguredObject.java:3181)
> at
> org.apache.qpid.server.queue.SortedQueueImplWithAccessChecking.getStatistics(SortedQueueImplWithAccessChecking.java:42)
> at
> org.apache.qpid.server.model.AbstractConfiguredObject.getStatistics(AbstractConfiguredObject.java:3168)
> at
> org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectExpressionFactory$ConfiguredObjectPropertyExpression.getValue(ConfiguredObjectExpressionFactory.java:312)
> at
> org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectExpressionFactory$ConfiguredObjectPropertyExpression.evaluate(ConfiguredObjectExpressionFactory.java:285)
> at
> org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectExpressionFactory$ConfiguredObjectPropertyExpression.evaluate(ConfiguredObjectExpressionFactory.java:272)
> at
> org.apache.qpid.server.filter.ComparisonExpression.evaluate(ComparisonExpression.java:388)
> at
> org.apache.qpid.server.filter.ComparisonExpression.matches(ComparisonExpression.java:580)
> at
> org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectQuery.filterObjects(ConfiguredObjectQuery.java:210)
> at
> org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectQuery.<init>(ConfiguredObjectQuery.java:86)
> at
> org.apache.qpid.server.management.plugin.servlet.rest.QueryServlet.performQuery(QueryServlet.java:93)
> at
> org.apache.qpid.server.management.plugin.servlet.rest.QueryServlet.doGet(QueryServlet.java:56)
> at
> org.apache.qpid.server.management.plugin.servlet.rest.AbstractServlet.doGet(AbstractServlet.java:128)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) {code}
> * Java Flight Recorder (JFR) data
> !image-2025-01-27-22-41-16-668.png!
> *Steps to Reproduce:*
> # Configure a Qpid Broker with Sorted Queues.
> # Enable REST APIs for queue statistics retrieval.
> # Simulate high load by re-enqueuing ~7,000 messages per second to the
> Sorted Queues.
> # Monitor broker performance and REST API response times.
> *Expected Behavior:* The broker should handle high re-enqueue rates without
> contention issues, and REST API calls for queue statistics should not time
> out.
> *Actual Behavior:* Under high load conditions, the broker experiences
> contention on the Sorted Queue data structure, leading to REST API timeouts.
> *Proposed Solution:*
> To address this issue, we propose optimizing the property fetching mechanism
> in the Qpid Broker in ConfiguredObjectExpressionFactory.java:
> # Modify the broker code (ConfiguredObjectExpressionFactory) to retrieve
> only the specifically requested properties rather than all 26 properties.
> !image-2025-01-27-22-42-13-739.png!
> # Avoid querying the oldestMessageAge property on the application side when
> dealing with delay queues, as it is not required for processing.
> # These optimizations will reduce the load on the Sorted Queue's locking
> mechanism and prevent redundant data fetches, thereby improving the broker's
> responsiveness under high load conditions.
> We have tested the proposed changes in our environment and observed
> significant performance improvements, including reduced lock contention and
> faster REST API responses under high load.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]