[
https://issues.apache.org/jira/browse/AMQ-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Timothy Bish resolved AMQ-3993.
-------------------------------
Resolution: Fixed
Assignee: Timothy Bish
This bit should be fixed by AMQ-4159 and AMQ-4160
> NetworkBridge sometimes stops trying to reconnect after connection is lost
> --------------------------------------------------------------------------
>
> Key: AMQ-3993
> URL: https://issues.apache.org/jira/browse/AMQ-3993
> Project: ActiveMQ
> Issue Type: Bug
> Affects Versions: 5.6.0
> Environment: using static:// networkConnector (i.e.
> SimpleDiscoveryAgent)
> Reporter: Ron Koerner
> Assignee: Timothy Bish
> Fix For: 5.8.0
>
> Attachments: reconnect-problem-annotated.txt
>
>
> After losing connection due to shutdown of the peer the broker tries to
> rebuild the connection once, fails again and stops trying afterwards.
> While this also happens with a standard setup, it seems to happen much more
> often with a certain type of firewall which always accepts a connection, but
> closes it if the real destination cannot be reached.
> This can be simulated by using a "socat" forwarder between the two brokers.
> The problems seems to lie in the following sequence of events, a race
> condition and the use of {{event.failed}} in
> {{SimpleDiscoveryAgent.serviceFailed}} and {{bridges}} in
> {{DiscoveryNetworkConnector}}:
> # connection "failure" due to ShutdownInfo
> #- event.failed=true
> #- bridge is unregistered
> # start establishing a new connection
> #- event.failed=false
> #- bridge is not yet registered
> # second connection failure of the old connection due to EOF
> #- not blocked, since event.failed==false
> #- event.failed=true
> #- bridge would be unregistered, but currently there is none
> #- wait one second (continued below)
> # new connection is started
> #- bridge is registered
> # receive multiple connection failures of the new connection
> #- all blocked, since event.failed=true
> # continue after one second, try to establish a new connection
> #- blocked, since bridge is already registered
> To fix this problem a NetworkBridge should probably not be allowed to call
> {{SimpleDiscoveryAgent.serviceFailed}} more than once, since {{event.failed}}
> cannot keep track of multiple connections at one time.
> The chain of events holds a lot of race conditions. If the second failure of
> the old connection occurs before the new connection is started (which seems
> to be the case most of the time) or the new connection's bridge is registered
> before the EOF occurs, the problem does not manifest.
> Attached is a log excerpt with my comments about the state of event.failed
> and bridges.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira