[
https://issues.apache.org/jira/browse/AMQ-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christopher L. Shannon updated AMQ-9721:
----------------------------------------
Description:
While testing various scenarios with expiration on subscriptions I noticed that
when there is a large backlog of non-persistent messages the CPU usage would
spike in some cases for removing messages. This only happened when non
persistent messages were using the pending disk list and not in memory.
After investigation, it turns out there are a couple of problems that need to
be resolved.
# FilePendingMessageCursor flushes messages to disk when memory is full. The
disk cursor uses a linked list as an index which has O\(n) performance for
random access. This is not an issue because the broker is going to append
messages to the front/end and then page them in from the front for dispatch,
expire, discard, etc. Except...it turns out in a couple spots the broker is
actually incorrectly calling remove twice on the cursor for the same message.
(Happens on discard for topic subs, also during expiration for durable subs).
Usually when remove is called it finds the message pretty quickly as it was
recently dispatched or paged in, except since it's called twice it iterates
over the entire linked list for no reason and finds nothing. This causes a huge
performance problem if there's a large backup (ie millions)
# The second problem is specific to durable subscriptions. The durable sub
cursor contains a list of cursors for its topic stores and also for it's
non-persistent cursor. The problem is that when the expiration task runs and
calls the remove method, it is being delegated to all cursors, including non
persistent. So even though it will never find the message in the cursor if
there's a huge non-persistent backlog it causes a lot of CPU usage to search.
This can be fixed by checking the persistent type and only calling the right
prefetches for remove, similar to how the add method works and how queue
subscriptions work.
was:
While testing various scenarios with expiration on subscriptions I noticed that
when there is a large backlog of non-persistent messages the CPU usage would
spike in some cases for removing messages. This only happened when non
persistent messages were using the pending disk list and not in memory.
After investigation, it turns out there are a couple of problems that need to
be resolved.
# FilePendingMessageCursor flushes messages to disk when memory is full. The
disk cursor uses a linked list as an index which has O(n) performance for
random access. This is not an issue because the broker is going to append
messages to the front/end and then page them in from the front for dispatch,
expire, discard, etc. Except...it turns out in a couple spots the broker is
actually incorrectly calling remove twice on the cursor for the same message.
(Happens on discard for topic subs, also during expiration for durable subs).
Usually when remove is called it finds the message pretty quickly as it was
recently dispatched or paged in, except since it's called twice it iterates
over the entire linked list for no reason and finds nothing. This causes a huge
performance problem if there's a large backup (ie millions)
# The second problem is specific to durable subscriptions. The durable sub
cursor contains a list of cursors for its topic stores and also for it's
non-persistent cursor. The problem is that when the expiration task runs and
calls the remove method, it is being delegated to all cursors, including non
persistent. So even though it will never find the message in the cursor if
there's a huge non-persistent backlog it causes a lot of CPU usage to search.
This can be fixed by checking the persistent type and only calling the right
prefetches for remove, similar to how the add method works and how queue
subscriptions work.
> Fix performance issues with message removal from topic and durable
> subscriptions
> --------------------------------------------------------------------------------
>
> Key: AMQ-9721
> URL: https://issues.apache.org/jira/browse/AMQ-9721
> Project: ActiveMQ Classic
> Issue Type: Bug
> Affects Versions: 6.1.6, 5.18.7
> Reporter: Christopher L. Shannon
> Assignee: Christopher L. Shannon
> Priority: Major
> Fix For: 6.2.0, 5.19.1, 6.1.7
>
>
> While testing various scenarios with expiration on subscriptions I noticed
> that when there is a large backlog of non-persistent messages the CPU usage
> would spike in some cases for removing messages. This only happened when non
> persistent messages were using the pending disk list and not in memory.
> After investigation, it turns out there are a couple of problems that need to
> be resolved.
> # FilePendingMessageCursor flushes messages to disk when memory is full. The
> disk cursor uses a linked list as an index which has O\(n) performance for
> random access. This is not an issue because the broker is going to append
> messages to the front/end and then page them in from the front for dispatch,
> expire, discard, etc. Except...it turns out in a couple spots the broker is
> actually incorrectly calling remove twice on the cursor for the same message.
> (Happens on discard for topic subs, also during expiration for durable subs).
> Usually when remove is called it finds the message pretty quickly as it was
> recently dispatched or paged in, except since it's called twice it iterates
> over the entire linked list for no reason and finds nothing. This causes a
> huge performance problem if there's a large backup (ie millions)
> # The second problem is specific to durable subscriptions. The durable sub
> cursor contains a list of cursors for its topic stores and also for it's
> non-persistent cursor. The problem is that when the expiration task runs and
> calls the remove method, it is being delegated to all cursors, including non
> persistent. So even though it will never find the message in the cursor if
> there's a huge non-persistent backlog it causes a lot of CPU usage to search.
> This can be fixed by checking the persistent type and only calling the right
> prefetches for remove, similar to how the add method works and how queue
> subscriptions work.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact