[jira] [Work logged] (ARTEMIS-4797) Failover connection references are not always cleaned up in NettyAcceptor, leaking memory

ASF GitHub Bot (Jira) Fri, 07 Jun 2024 07:02:11 -0700


     [ 
https://issues.apache.org/jira/browse/ARTEMIS-4797?focusedWorklogId=922588&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-922588
 ]


ASF GitHub Bot logged work on ARTEMIS-4797:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Jun/24 14:00
            Start Date: 07/Jun/24 14:00
    Worklog Time Spent: 10m 
      Work Description: gemmellr commented on PR #4960:
URL: 
https://github.com/apache/activemq-artemis/pull/4960#issuecomment-2154908286

   Your Jira and PR comments are aimed at Acceptor behaviour, and you note on 
the Jira that you are using Openwire clients...however the changed bits are 
also used by the Artemis Core client. I'm not familiar with this code, but it 
seems to me like the change could result in the connectionDestroyed listener 
callback being called for clients when it wouldnt currently be, after the 
connectionException listener callback could already have been called, 
potentially causing a double-failover or other weirdness?




Issue Time Tracking
-------------------

    Worklog Id:     (was: 922588)
    Time Spent: 0.5h  (was: 20m)

> Failover connection references are not always cleaned up in NettyAcceptor, 
> leaking memory
> -----------------------------------------------------------------------------------------
>
>                 Key: ARTEMIS-4797
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4797
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: OpenWire
>            Reporter: Josh Byster
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I'm still trying to parse through exactly what conditions this occurs in, 
> since I'm able to reproduce it in a very specific production setup but not in 
> an isolated environment locally.
> For context, we have custom slow consumer detection that closes connection 
> IDs with slow consumers. These connections are connected via failover 
> transport using client ActiveMQ Classic 5.16.4 (OpenWire). This seems to be 
> specific to Netty.
> It appears this specific order of events causes the connection to not get 
> cleaned up and retained indefinitely on the broker. With frequent kicking of 
> connections, this ends up causing the broker to eventually OOM.
> 1. Connection is created, {{ActiveMQServerChannelHandler}} is created as well
> 2. {{ActiveMQServerChannelHandler#createConnection}} is called, {{active}} 
> flag is set {{true}}.
> 3. A few minutes go by, then we call 
> {{ActiveMQServerControl#closeConnectionWithID}} with the connection ID.
> 4. {{ActiveMQChannelHandler#exceptionCaught}} gets called—*this is the key 
> point that causes issues*. The connection is cleaned up if and only if this 
> is *not* called. The root cause of the exception is 
> {{AbstractChannel.close(ChannelPromise)}}, however the comment above it says 
> this is normal for failover.
> 5. The {{active}} flag is set to {{false}}.
> 6. {{ActiveMQChannelHandler#channelInactive}} gets called, but does *not* 
> call {{listener.connectionDestroyed}} since the {{active}} flag is false.
> 7. The connection is never removed from the {{connections}} map in 
> {{NettyAcceptor}}, causing a leak and eventual OOM of the broker if it 
> happens frequently enough.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@activemq.apache.org
For additional commands, e-mail: issues-h...@activemq.apache.org
For further information, visit: https://activemq.apache.org/contact

[jira] [Work logged] (ARTEMIS-4797) Failover connection references are not always cleaned up in NettyAcceptor, leaking memory

Reply via email to