[ https://issues.apache.org/jira/browse/AMQ-7340?focusedWorklogId=686195&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-686195 ]
ASF GitHub Bot logged work on AMQ-7340: --------------------------------------- Author: ASF GitHub Bot Created on: 25/Nov/21 03:14 Start Date: 25/Nov/21 03:14 Worklog Time Spent: 10m Work Description: lucastetreault opened a new pull request #728: URL: https://github.com/apache/activemq/pull/728 We recently encountered a similar performance degradation as is described here: https://issues.apache.org/jira/browse/AMQ-7340 A specific example we encountered was with around 20k messages scheduled very quickly where each message adds 100ms delay times with the previous message. Example: message 1 > delay time 100ms message 2 > delay time 200ms message 3 > delay time 300ms ... message 10,000 >> delay time 1000000ms (1000s) ... message 20,000 >> delay time 2000000ms (2000s) If delivered on schedule all these messages should be moved to the queue within ~33 minutes but we observed that it took nearly 3 hours for all the messages to be moved to the queue. After diving deep on the issue it seems the main loop that processes scheduled messages process them by traversing the B+ Tree index from the root to find the leftmost leaf node which contains the messages with the earliest executionTime. This traversal does a disk read at each branch and unmarshals the raw data before moving to the next branch and is not cached for future reads as far as I can tell. The scheduled jobs in that leaf are processed then we repeat the traversal to find the execution time of the next batch of jobs to calculate how long the loop should sleep for. So for every loop, we find the left most node twice, at the start and end of the loop. It seems like this loop can take a long time and we end up falling behind and not being able to catch up since we're processing one node at a time. This change still does the traversal from the root node to the leftmost node at the start of the loop but once it finds that leftmost node it will iterate over the leaf nodes sequentially until it finds a job with an execution time greater than the current time. This means that it will always "catch up" on every iteration of the loop. The index is locked for the duration of the iteration so there's probably some risk that scheduled messages can't make it in but in practice it doesn't seem like an issue. I ran 100 connections against a broker scheduling messages as fast as they could with 100ms delays as in the example above and I was able to schedule 120k messages within a couple of seconds without any issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@activemq.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 686195) Time Spent: 20m (was: 10m) > Scheduled messages performance degrade > -------------------------------------- > > Key: AMQ-7340 > URL: https://issues.apache.org/jira/browse/AMQ-7340 > Project: ActiveMQ > Issue Type: Bug > Environment: ActiveMQ broker has been started in a docker container, > with (most likely) sufficient allocation of resources. > Reporter: Daynews > Assignee: Matt Pavlovich > Priority: Minor > Attachments: ScheduleActiveMQ.zip > > Time Spent: 20m > Remaining Estimate: 0h > > I have sent lot of scheduled messages with 10ms delay between each to see if > the broker can cope with high load of scheduled messages. Sending delayed > messages to the queue works fine, however I get a problem when those messages > need to be put to the main queue when next schedule time is reached. The rate > of putting scheduled messages to the main queue drops drastically at around > 1500-3000 messages. I tried to search for a potential cause why this happen, > but was not able to indicate anything. Even restarting the broker or cleaning > the main queue, the rate of putting scheduled messages stays at ~0.5s leaving > many scheduled messages behind. > Does anyone know a potential cause for his problem? Is this performance > bottleneck or insufficient resources or badly configured RabitMQ (I've used > default settings). > Thanks for the support. > -- This message was sent by Atlassian Jira (v8.20.1#820001)