[jira] [Updated] (ARTEMIS-2678) Incomplete records for pages under high load

Ansgar J. Sachs (Jira) Tue, 24 Mar 2020 00:17:09 -0700


     [ 
https://issues.apache.org/jira/browse/ARTEMIS-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ansgar J. Sachs updated ARTEMIS-2678:
-------------------------------------
    Description: 
{quote}As developer, I expect paging to be resource saving and resilient to 
high load{quote}

h3. Current Situation

During a performance test with a throughput of ~25.000 messages per second that 
run mulitple hours, some consumers were too slow to consume and messages piled 
up on the broker. For this reason, the broker started to page the messages of 
growing queues. When we reduced the load from the broker, some queues stopped 
consuming due to the following logs:
{code}
AMQ222033: Page file 000000007.page had incomplete records at position 
39,795,399 at record number 13,952

target message no.16146 not found from start offset 46032883 and start message 
number 16146: java.lang.RuntimeException: target message no.16146 not found 
from start offset 46032883 and start message number 16146
{code}

It wasnt possible to recover from this state but deleting the paging directory.

h3. Expected Situation

I would expect that the paging mechanism is resilient to any errors.

h3. Scenario Setup

Master configuration:
{code:xml}
<ha-policy>
  <shared-store>
    <master>
      <failover-on-shutdown>true</failover-on-shutdown>
    </master>
  </shared-store>
</ha-policy>
<!-- ... -->
 <address-setting match="#">
        <max-size-bytes>256Mb</max-size-bytes>
        <page-size-bytes>64Mb</page-size-bytes>
        
<message-counter-history-day-limit>10</message-counter-history-day-limit>
        <address-full-policy>PAGE</address-full-policy>
</address-setting>
{code}

An extract of the monitoring of the Performance Test is attached to this issue.

h3. Workaround

Right now we disabled paging at all and only use our Heap. However, the heap is 
exhausted at 5 million messages which is in our use case better than loosing 
any of them.


  was:
{quote}As developer, I expect paging to be resource saving and resilient to 
high load{quote}

h3. Current Situation

During a performance test with a throughput of ~25.000 messages per second that 
run mulitple hours, some consumers were too slow to consumed and messages piled 
up on the broker. For this reason, the broker started to page the messages of 
growing queues. When we reduced the load from the broker, some queues stopped 
consuming due to the following logs:
{code}
AMQ222033: Page file 000000007.page had incomplete records at position 
39,795,399 at record number 13,952

target message no.16146 not found from start offset 46032883 and start message 
number 16146: java.lang.RuntimeException: target message no.16146 not found 
from start offset 46032883 and start message number 16146
{code}

It wasnt possible to recover from this state but deleting the paging directory.

h3. Expected Situation

I would expect that the paging mechanism is resilient to any errors.

h3. Scenario Setup

Master configuration:
{code:xml}
<ha-policy>
  <shared-store>
    <master>
      <failover-on-shutdown>true</failover-on-shutdown>
    </master>
  </shared-store>
</ha-policy>
<!-- ... -->
 <address-setting match="#">
        <max-size-bytes>256Mb</max-size-bytes>
        <page-size-bytes>64Mb</page-size-bytes>
        
<message-counter-history-day-limit>10</message-counter-history-day-limit>
        <address-full-policy>PAGE</address-full-policy>
</address-setting>
{code}

An extract of the monitoring of the Performance Test is attached to this issue.

h3. Workaround

Right now we disabled paging at all and only use our Heap. However, the heap is 
exhausted at 5 million messages which is in our use case better than loosing 
any of them.



> Incomplete records for pages under high load
> --------------------------------------------
>
>                 Key: ARTEMIS-2678
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2678
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>         Environment: Linux
>            Reporter: Ansgar J. Sachs
>            Priority: Major
>         Attachments: Bildschirmfoto 2020-03-23 um 17.35.41.png
>
>
> {quote}As developer, I expect paging to be resource saving and resilient to 
> high load{quote}
> h3. Current Situation
> During a performance test with a throughput of ~25.000 messages per second 
> that run mulitple hours, some consumers were too slow to consume and messages 
> piled up on the broker. For this reason, the broker started to page the 
> messages of growing queues. When we reduced the load from the broker, some 
> queues stopped consuming due to the following logs:
> {code}
> AMQ222033: Page file 000000007.page had incomplete records at position 
> 39,795,399 at record number 13,952
> target message no.16146 not found from start offset 46032883 and start 
> message number 16146: java.lang.RuntimeException: target message no.16146 not 
> found from start offset 46032883 and start message number 16146
> {code}
> It wasnt possible to recover from this state but deleting the paging 
> directory.
> h3. Expected Situation
> I would expect that the paging mechanism is resilient to any errors.
> h3. Scenario Setup
> Master configuration:
> {code:xml}
> <ha-policy>
>   <shared-store>
>     <master>
>       <failover-on-shutdown>true</failover-on-shutdown>
>     </master>
>   </shared-store>
> </ha-policy>
> <!-- ... -->
>  <address-setting match="#">
>         <max-size-bytes>256Mb</max-size-bytes>
>         <page-size-bytes>64Mb</page-size-bytes>
>         
> <message-counter-history-day-limit>10</message-counter-history-day-limit>
>         <address-full-policy>PAGE</address-full-policy>
> </address-setting>
> {code}
> An extract of the monitoring of the Performance Test is attached to this 
> issue.
> h3. Workaround
> Right now we disabled paging at all and only use our Heap. However, the heap 
> is exhausted at 5 million messages which is in our use case better than 
> loosing any of them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2678) Incomplete records for pages under high load

Reply via email to