[ https://issues.apache.org/jira/browse/ARTEMIS-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Meierhofer updated ARTEMIS-2870: --------------------------------------- Attachment: connection_nonexistent.png > CORE connection failure sometimes doesn't cleanup sessions > ---------------------------------------------------------- > > Key: ARTEMIS-2870 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2870 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker > Affects Versions: 2.10.1, 2.14.0, 2.15.0 > Reporter: Markus Meierhofer > Priority: Blocker > Attachments: artemis.log, broker.xml, connection_nonexistent.png, > consumer_list_for_one_queue.png, duplicated consumers.png, > multiple_consumers_per_queue.png, session_with_connection_id.png, three > consumers per queue.png > > > h3. Summary > Since the upgrade of our deployed artemis instances from version 2.6.4 to > 2.10.1 we have noticed the problem that sometimes, a connection failure > doesn't include the cleanup of its connected sessions, leading to "zombie" > consumers and producers on queues. > > h3. The issue > Our Artemis Clients are connected to the broker via the provided JMS > abstraction, using the default connection TTL of 60 seconds. we are using > both JMS Topics and JMS Queues. > As most of our Clients are mobile and in a WiFi, connection losses may occur > frequently, depending on the quality of the network. When the client is > disconnected for 60 seconds, the broker usually closes the connection and > cleans up all the sessions connected to it. The mobile Clients then create > reconnect when they are online again. What we have noticed is that after many > connection failures, messages may to be sent twice to the mobile clients. > When analyzing the problem on the broker console, we found out that there > were two consumers connected to each of the queues one mobile client usually > consumes from. One of them belonged to the new connection of the mobile > Client, which is fine. > The other consumer belonged to a session whose connection already failed and > was closed at that time. When analyzing the logs, we saw that for these > connections, it contained a "Connection failure to ... has been detected" > line, but no following "clearing up resources for session ..." log lines for > these connections. > > h3. Instance of the issue > > The broken Session is the "7a9292cb-xxx" in the picture. In the logs you can > see that the connection failure was detected, but the session was never > cleared by the broker (mind the timestamp). > !duplicated consumers.png! > {code:java} > [WARN 2020-07-27 14:33:29,794 Thread-13 > org.apache.activemq.artemis.core.client]: AMQ212037: Connection failure to > /10.255.0.2:54812 has been detected: syscall:read(..) failed: Connection > reset by peer [code=GENERIC_EXCEPTION] > [WARN 2020-07-29 09:31:30,828 Thread-20 > org.apache.activemq.artemis.core.client]: AMQ212037: Connection failure to > /10.255.0.2:55994 has been detected: AMQ229014: Did not receive data from > /10.255.0.2:55994 within the 60,000ms connection TTL. The connection will now > be closed. [code=CONNECTION_TIMEDOUT] > {code} > > Attached you can find the full [^artemis.log] and our [^broker.xml] -- This message was sent by Atlassian Jira (v8.3.4#803005)