[GitHub] [activemq-artemis] wy96f opened a new pull request #2730: ARTEMIS-2399 Fix performance degradation when there are a lot of subscribers

GitBox Mon, 24 Jun 2019 06:21:54 -0700

wy96f opened a new pull request #2730: ARTEMIS-2399 Fix performance degradation
when there are a lot of subscribers
URL: https://github.com/apache/activemq-artemis/pull/2730

We noticed that there was a significant drop in performance when entering
page mode in the case of multiple subscribers.

### **Environment**:
broker 2.9.0
cpu: 4 cores, memory: 8G, disk: ssd 500G
broker.xml:
<thread-pool-max-size>60</thread-pool-max-size>
<address-setting match="#">
<max-size-bytes>51Mb</max-size-bytes>
<page-size-bytes>50Mb</page-size-bytes>
<page-max-cache-size>1</page-max-cache-size>
<address-full-policy>PAGE</address-full-policy>
</address-setting>
<message-expiry-scan-period>-1</message-expiry-scan-period>

### **Test steps**:
We created a topic and 100 queues bound to it. We ran our GrinderRunner test
in our inner test infra cluster with 500 threads producing messages(200-500
bytes size) and 560 threads, each one picked a random queue to subscribe. The
test showed performance is bad: 13000 msg/s sent and 5000 msg/s received.
producer tps and latency:

![orig_producer](https://user-images.githubusercontent.com/7719761/60018577-55444180-96be-11e9-816d-d0d6881ddcc7.png)
consumer tps and latency:

![orig_consumer](https://user-images.githubusercontent.com/7719761/60018663-81f85900-96be-11e9-892c-f2e27874c24f.png)

### **Analysis**:
There were two root causes:
1. Usually 1000 messages are delivered at once, sometimes less if consumers
are busy. Then depage() is called to fill the queue with number of messages
delivered in last step. To read these messages we need to read the whole page.
It's acceptable if the page is in the softValueCache. But in the case of
multiple subscribers with different cursor position, they read there own page
file, put it in the cache and evict other pages. At the later time, when they
need to read it again, they found it not in cache and have to read the whole
page from disk which spends much time and puts pressure on disk.
2. For multiple subscribers to the same address, just one executor is
responsible for delivering which means at the same moment only one queue is
delivering. Thus the queue maybe stalled for a long time. We get
queueMemorySize messages into memory, and when we deliver these after a long
time, we probably need to query message and read page file again.

### **Solution**:
1. We add a new cache called PageIndexCache which stores the message number
and file position. This cache is built when we first read the page and put it
into softValueCache. If the page is evicted from softValueCache later, we'll
use PageIndexCache to read message. In this way, we don't need to read the
whole page for just a few messages.
2. In most cases, one depage round is followed by at most
MAX_SCHEDULED_RUNNERS deliver round. Thus we just need to read
MAX_DELIVERIES_IN_LOOP * MAX_SCHEDULED_RUNNERS messages. This reduces the
possibility of requering the reference case the Garbage Collection removes it
and keeps enough messages to deliver each time.
The optimized performance in the pr:
producer tps and latency:

![pr1_producer](https://user-images.githubusercontent.com/7719761/60022057-a99eef80-96c5-11e9-9b9c-d6c7ae06e8fa.png)
consumer tps and latency:

![pr1_consumer](https://user-images.githubusercontent.com/7719761/60022227-f84c8980-96c5-11e9-90bf-76f1a48bf85f.png)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [activemq-artemis] wy96f opened a new pull request #2730: ARTEMIS-2399 Fix performance degradation when there are a lot of subscribers

Reply via email to