[ 
https://issues.apache.org/jira/browse/ARTEMIS-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17807973#comment-17807973
 ] 

ASF subversion and git services commented on ARTEMIS-4571:
----------------------------------------------------------

Commit 1197898232dff8f724cd8dd6a22c1bb88e58b572 in activemq-artemis's branch 
refs/heads/main from Justin Bertram
[ https://gitbox.apache.org/repos/asf?p=activemq-artemis.git;h=1197898232 ]

ARTEMIS-4571 race condition w/TTL impacting in-vm connections

There is a race condition between ConnectionEntry.ttl and
FailureCheckAndFlushThread whereby an in-vm connection may get closed
inadvertently due to a TTL timeout. This is because ConnectionEntry.ttl
is initialized to 60000 and then later set to -1 upon the initial Ping.
If this update happens at *just* the right time in
FailureCheckAndFlushThread then the connection will be closed.

The fix ensures that the ConnectionEntry.ttl is set to -1 for in-vm
connections from the start. It also eliminates the possibility of the
race in FailureCheckAndFlushThread.

This fix is based on static analysis of the code. The timing window is
just too small to contruct a reliable test. The failure has only been
seen in the wild a handful of times.


> Race condition w/TTL impacting in-vm connections
> ------------------------------------------------
>
>                 Key: ARTEMIS-4571
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4571
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>            Reporter: Justin Bertram
>            Assignee: Justin Bertram
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> The following WARN can occur due to a race condition between the 
> initialization of 
> {{org.apache.activemq.artemis.spi.core.protocol.ConnectionEntry}} and the 
> periodic check by {{RemotingServiceImpl$FailureCheckAndFlushThread}}:
> {noformat}
> AMQ212037: Connection failure to invm:0 has been detected: AMQ229014: Did not 
> receive data from invm:0 within the -1ms connection TTL. The connection will 
> now be closed. [code=CONNECTION_TIMEDOUT]{noformat}
> Also, the following ERROR message can happen at the same time:
> {noformat}
> ActiveMQNotConnectedException[errorType=NOT_CONNECTED message=AMQ219010: 
> Connection is destroyed]{noformat}
> Internally, the 
> {{org.apache.activemq.artemis.spi.core.protocol.ConnectionEntry}} is subject 
> to the following race condition:
> # {{ConnectionEntry}} is initilalized with the default 
> {{ActiveMQClient.DEFAULT_CONNECTION_TTL}} (60000) at 
> {{CoreProtocolManager#createConnectionEntry()}}
> # {{RemotingServiceImpl$FailureCheckAndFlushThread}} evaluates {{if 
> (entry.ttl != -1)}} as {{true}}.
> # A {{Ping}} is sent. Then {{ttl}} is updated to 
> {{ActiveMQClient.DEFAULT_CONNECTION_TTL_INVM}} (-1).
> # {{RemotingServiceImpl$FailureCheckAndFlushThread}} checks {{if (now >= 
> entry.lastCheck + entry.ttl)}}. Since {{ttl}} has been updated to {{-1}} the 
> check passes (= expired) and the connection will be added to {{toRemove}}.
> # The WARN and ERROR occur.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to