Ram Mantripragada created QPID-8681:
---------------------------------------

             Summary: Addressing lock contention in Sorted Queues under high 
load by optimizing property fetching
                 Key: QPID-8681
                 URL: https://issues.apache.org/jira/browse/QPID-8681
             Project: Qpid
          Issue Type: Improvement
          Components: Broker-J
    Affects Versions: qpid-java-broker-7.0.9
            Reporter: Ram Mantripragada
         Attachments: image-2025-01-27-22-41-16-668.png, 
image-2025-01-27-22-42-13-739.png

Apache Qpid Broker J provides Sorted Queues, which allow users to implement a 
message re-enqueue with delay feature. This feature is crucial in scenarios 
where messages dequeued from regular queues cannot be processed immediately due 
to certain preconditions (e.g., available resources, concurrency limits). These 
messages are re-enqueued to Sorted Queues, where they are sorted based on their 
delay expiry time. Periodic jobs then check these Sorted Queues for expired 
messages and move them back to the regular queues for processing.

Under high load conditions (e.g., a re-enqueue rate of ~7,000 messages per 
second), we observed that the broker experiences contention issues, causing it 
to stop responding to REST API calls (which time out if REST timeouts are set 
on the client side otherwise just hung around). These APIs are used to 
periodically fetch queue statistics.

*Analysis:*

By analyzing Java Flight Recorder (JFR) data, we identified the root cause of 
the contention:
 # REST API calls to retrieve queue depths required querying some specific 
predefined properties from the broker.
 # However, for each requested property, the broker was inadvertently fetching 
all 26 properties, resulting in repeated and excessive lock acquisition 
attempts on the Sorted Queue data structure.
 # Among the requested properties, the *oldestMessageAge* property (used for 
delay queues) significantly contributed to the contention by increasing the 
number of lock requests.
 # On the application side, querying for the *oldestMessageAge* property is 
unnecessary when dealing with delay queues, so avoiding this query will further 
mitigate the contention issue.

*Stack Trace:*

The following stack trace illustrates the contention observed during the issue:

 

 
{code:java}
at 
org.apache.qpid.server.queue.SortedQueueEntryList.next(SortedQueueEntryList.java:292)
    at 
org.apache.qpid.server.queue.SortedQueueEntryList$QueueEntryIteratorImpl.atTail(SortedQueueEntryList.java:686)
    at 
org.apache.qpid.server.queue.SortedQueueEntryList$QueueEntryIteratorImpl.advance(SortedQueueEntryList.java:698)
    at 
org.apache.qpid.server.queue.SortedQueueEntryList.getOldestEntry(SortedQueueEntryList.java:350)
    at 
org.apache.qpid.server.queue.AbstractQueue.getOldestMessageArrivalTime(AbstractQueue.java:1518)
    at 
org.apache.qpid.server.queue.AbstractQueue.getOldestMessageAge(AbstractQueue.java:1546)
    at jdk.internal.reflect.GeneratedMethodAccessor167.invoke(Unknown 
Source:-1)    at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:566)    at 
org.apache.qpid.server.model.ConfiguredObjectMethodAttributeOrStatistic.getValue(ConfiguredObjectMethodAttributeOrStatistic.java:68)
    at 
org.apache.qpid.server.model.ConfiguredObjectMethodStatistic.getValue(ConfiguredObjectMethodStatistic.java:26)
    at 
org.apache.qpid.server.model.AbstractConfiguredObject.getStatistics(AbstractConfiguredObject.java:3181)
    at 
org.apache.qpid.server.queue.SortedQueueImplWithAccessChecking.getStatistics(SortedQueueImplWithAccessChecking.java:42)
    at 
org.apache.qpid.server.model.AbstractConfiguredObject.getStatistics(AbstractConfiguredObject.java:3168)
    at 
org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectExpressionFactory$ConfiguredObjectPropertyExpression.getValue(ConfiguredObjectExpressionFactory.java:312)
    at 
org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectExpressionFactory$ConfiguredObjectPropertyExpression.evaluate(ConfiguredObjectExpressionFactory.java:285)
    at 
org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectExpressionFactory$ConfiguredObjectPropertyExpression.evaluate(ConfiguredObjectExpressionFactory.java:272)
    at 
org.apache.qpid.server.filter.ComparisonExpression.evaluate(ComparisonExpression.java:388)
    at 
org.apache.qpid.server.filter.ComparisonExpression.matches(ComparisonExpression.java:580)
    at 
org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectQuery.filterObjects(ConfiguredObjectQuery.java:210)
    at 
org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectQuery.<init>(ConfiguredObjectQuery.java:86)
    at 
org.apache.qpid.server.management.plugin.servlet.rest.QueryServlet.performQuery(QueryServlet.java:93)
    at 
org.apache.qpid.server.management.plugin.servlet.rest.QueryServlet.doGet(QueryServlet.java:56)
    at 
org.apache.qpid.server.management.plugin.servlet.rest.AbstractServlet.doGet(AbstractServlet.java:128)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)    at 
javax.servlet.http.HttpServlet.service(HttpServlet.java:790) {code}
 * Java Flight Recorder (JFR) data 

!image-2025-01-27-22-41-16-668.png!

*Steps to Reproduce:*
 # Configure a Qpid Broker with Sorted Queues.
 # Enable REST APIs for queue statistics retrieval.
 # Simulate high load by re-enqueuing ~7,000 messages per second to the Sorted 
Queues.
 # Monitor broker performance and REST API response times.

*Expected Behavior:* The broker should handle high re-enqueue rates without 
contention issues, and REST API calls for queue statistics should not time out.

*Actual Behavior:* Under high load conditions, the broker experiences 
contention on the Sorted Queue data structure, leading to REST API timeouts.

*Proposed Solution:* 

To address this issue, we propose optimizing the property fetching mechanism in 
the Qpid Broker in ConfiguredObjectExpressionFactory.java:
 # Modify the broker code (ConfiguredObjectExpressionFactory) to retrieve only 
the specifically requested properties rather than all 26 properties. 
!image-2025-01-27-22-42-13-739.png!
 # Avoid querying the oldestMessageAge property on the application side when 
dealing with delay queues, as it is not required for processing.
 # These optimizations will reduce the load on the Sorted Queue's locking 
mechanism and prevent redundant data fetches, thereby improving the broker's 
responsiveness under high load conditions.

We have tested the proposed changes in our environment and observed significant 
performance improvements, including reduced lock contention and faster REST API 
responses under high load.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to