[ 
https://issues.apache.org/jira/browse/ARTEMIS-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Domenico Francesco Bruscino updated ARTEMIS-2854:
-------------------------------------------------
    Fix Version/s:     (was: 2.15.0)
                   2.16.0

> Non-durable subscribers may stop receiving after failover
> ---------------------------------------------------------
>
>                 Key: ARTEMIS-2854
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2854
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.14.0
>            Reporter: Howard Gao
>            Assignee: Howard Gao
>            Priority: Major
>             Fix For: 2.16.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In a cluster scenario where non durable subscribers fail over to backup while 
> another live node forwarding messages to it, there is a chance that the the 
> live node keeps the old remote binding for the subs and messages go to those
> old remote bindings will result in "finding not found".
> For example suppose there are 2 live-backup pairs in the cluster: Live1 
> backup1
> Live2 and backup2. A non durable subscriber connects to Live1 and messages
> are sent to Live2 and then redistributed to the sub on Live1.
> Now Live1 crashes and backup1 becomes live. The subscriber fails over to 
> backup1.
> In the mean time Live2 re-connects backup1 too. During the process Live2 
> didn't
> successfully remove the old remote binding for the subs and it still point to 
> the
> old temp queue's id (which is gone with the Live1 as it's a temp queue).
> So the messages (after failover) still are routed to the old queue which is 
> no longer there. The subscriber will be idle without receiving new messages 
> from it.
> The code concerned this :
> https://github.com/apache/activemq-artemis/blob/master/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/cluster/impl/ClusterConnectionImpl.java#L1239
> The code doesn't take care of the case where it's possible that the old 
> remote binding is still in the map the it's key (clusterName) will be the 
> same as the new remote binding (which references to a new temp queue) 
> recreated on fail over.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to