[ 
https://issues.apache.org/jira/browse/AMQ-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher L. Shannon updated AMQ-9721:
----------------------------------------
    Description: 
While testing various scenarios with expiration on subscriptions I noticed that 
when there is a large backlog of non-persistent messages the CPU usage would 
spike in some cases for removing messages. This only happened when non 
persistent messages were using the pending disk list and not in memory.

After investigation, it turns out there are a couple of problems that need to 
be resolved.

# FilePendingMessageCursor flushes messages to disk when memory is full. The 
disk cursor uses a linked list as an index which has O\(n) performance for 
random access. This is not an issue because the broker is going to append 
messages to the front/end and then page them in from the front for dispatch, 
expire, discard, etc. Except...it turns out in a couple spots the broker is 
actually incorrectly calling remove twice on the cursor for the same message. 
(Happens on discard for topic subs, also during expiration for durable subs). 
Usually when remove is called it finds the message pretty quickly as it was 
recently dispatched or paged in, except since it's called twice it iterates 
over the entire linked list for no reason and finds nothing. This causes a huge 
performance problem if there's a large backup (ie millions)
# The second problem is specific to durable subscriptions. The durable sub 
cursor contains a list of cursors for its topic stores and also for it's 
non-persistent cursor. The problem is that when the expiration task runs and 
calls the remove method, it is being delegated to all cursors, including non 
persistent. So even though it will never find the message in the cursor if 
there's a huge non-persistent backlog it causes a lot of CPU usage to search. 
This can be fixed by checking the persistent type and only calling the right 
prefetches for remove, similar to how the add method works and how queue 
subscriptions work.

  was:
While testing various scenarios with expiration on subscriptions I noticed that 
when there is a large backlog of non-persistent messages the CPU usage would 
spike in some cases for removing messages. This only happened when non 
persistent messages were using the pending disk list and not in memory.

After investigation, it turns out there are a couple of problems that need to 
be resolved.

# FilePendingMessageCursor flushes messages to disk when memory is full. The 
disk cursor uses a linked list as an index which has O(n) performance for 
random access. This is not an issue because the broker is going to append 
messages to the front/end and then page them in from the front for dispatch, 
expire, discard, etc. Except...it turns out in a couple spots the broker is 
actually incorrectly calling remove twice on the cursor for the same message. 
(Happens on discard for topic subs, also during expiration for durable subs). 
Usually when remove is called it finds the message pretty quickly as it was 
recently dispatched or paged in, except since it's called twice it iterates 
over the entire linked list for no reason and finds nothing. This causes a huge 
performance problem if there's a large backup (ie millions)
# The second problem is specific to durable subscriptions. The durable sub 
cursor contains a list of cursors for its topic stores and also for it's 
non-persistent cursor. The problem is that when the expiration task runs and 
calls the remove method, it is being delegated to all cursors, including non 
persistent. So even though it will never find the message in the cursor if 
there's a huge non-persistent backlog it causes a lot of CPU usage to search. 
This can be fixed by checking the persistent type and only calling the right 
prefetches for remove, similar to how the add method works and how queue 
subscriptions work.


> Fix performance issues with message removal from topic and durable 
> subscriptions
> --------------------------------------------------------------------------------
>
>                 Key: AMQ-9721
>                 URL: https://issues.apache.org/jira/browse/AMQ-9721
>             Project: ActiveMQ Classic
>          Issue Type: Bug
>    Affects Versions: 6.1.6, 5.18.7
>            Reporter: Christopher L. Shannon
>            Assignee: Christopher L. Shannon
>            Priority: Major
>             Fix For: 6.2.0, 5.19.1, 6.1.7
>
>
> While testing various scenarios with expiration on subscriptions I noticed 
> that when there is a large backlog of non-persistent messages the CPU usage 
> would spike in some cases for removing messages. This only happened when non 
> persistent messages were using the pending disk list and not in memory.
> After investigation, it turns out there are a couple of problems that need to 
> be resolved.
> # FilePendingMessageCursor flushes messages to disk when memory is full. The 
> disk cursor uses a linked list as an index which has O\(n) performance for 
> random access. This is not an issue because the broker is going to append 
> messages to the front/end and then page them in from the front for dispatch, 
> expire, discard, etc. Except...it turns out in a couple spots the broker is 
> actually incorrectly calling remove twice on the cursor for the same message. 
> (Happens on discard for topic subs, also during expiration for durable subs). 
> Usually when remove is called it finds the message pretty quickly as it was 
> recently dispatched or paged in, except since it's called twice it iterates 
> over the entire linked list for no reason and finds nothing. This causes a 
> huge performance problem if there's a large backup (ie millions)
> # The second problem is specific to durable subscriptions. The durable sub 
> cursor contains a list of cursors for its topic stores and also for it's 
> non-persistent cursor. The problem is that when the expiration task runs and 
> calls the remove method, it is being delegated to all cursors, including non 
> persistent. So even though it will never find the message in the cursor if 
> there's a huge non-persistent backlog it causes a lot of CPU usage to search. 
> This can be fixed by checking the persistent type and only calling the right 
> prefetches for remove, similar to how the add method works and how queue 
> subscriptions work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to