[ https://issues.apache.org/jira/browse/ARTEMIS-4476?focusedWorklogId=893144&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-893144 ]
ASF GitHub Bot logged work on ARTEMIS-4476: ------------------------------------------- Author: ASF GitHub Bot Created on: 30/Nov/23 11:24 Start Date: 30/Nov/23 11:24 Worklog Time Spent: 10m Work Description: gtully commented on code in PR #4694: URL: https://github.com/apache/activemq-artemis/pull/4694#discussion_r1410534274 ########## artemis-protocols/artemis-openwire-protocol/src/main/java/org/apache/activemq/artemis/core/protocol/openwire/OpenWireConnection.java: ########## @@ -761,7 +761,11 @@ public void fail(ActiveMQException me, String message) { final ThresholdActor<Command> localVisibleActor = openWireActor; if (localVisibleActor != null) { - localVisibleActor.shutdown(() -> doFail(me, message)); + localVisibleActor.requestShutdown(); + } + + if (executor != null) { + executor.execute(() -> doFail(me, message)); Review Comment: I don't follow, the point is to terminate processing of commands and execute the doFail as the last/next task. The only call to fail should be from the netty socket handler that sees a socket error, remote close etc. It is the transport initiating a close on a socket error. Issue Time Tracking ------------------- Worklog Id: (was: 893144) Time Spent: 5h 20m (was: 5h 10m) > Connection Failure Race Conditions in AMQP and Core > --------------------------------------------------- > > Key: ARTEMIS-4476 > URL: https://issues.apache.org/jira/browse/ARTEMIS-4476 > Project: ActiveMQ Artemis > Issue Type: Task > Reporter: Clebert Suconic > Assignee: Clebert Suconic > Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > Failure Detection has a possibility to a race condition with the processing > of the client packets (or frames in the case of AMQP). > This is because Netty detects the failure and removes the connection objects > while the packets are still processing things. > I was not able to reproduce this particular issue, but I have seen a case > from a memory dump where the consumer was created while the connection was > already dropped, leaving the consumer isolated without any communication with > clients. > That particular case I could see a possibility because of these races. > I am adding tests to exercise connection failure in stress and I was able to > reproduce other issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)