[jira] Created: (QPID-2992) Cluster failing to resurrect durable static route depending on order of shutdown

Mark Moseley (JIRA) Fri, 07 Jan 2011 15:46:09 -0800

Cluster failing to resurrect durable static route depending on order of shutdown
--------------------------------------------------------------------------------


                 Key: QPID-2992
                 URL: https://issues.apache.org/jira/browse/QPID-2992
             Project: Qpid
          Issue Type: Bug
          Components: C++ Broker, C++ Clustering
    Affects Versions: 0.8
         Environment: Debian Linux Squeeze, 32-bit, kernel 2.6.36.2, Dell 
Poweredge 1950s. Corosync==1.3.0, Openais==1.1.4
            Reporter: Mark Moseley


I've got a 2-node qpid test cluster at each of 2 datacenters, which are 
federated together with a single durable static route between each. Qpid is 
version 0.8. Corosync and openais are stock Squeeze (1.2.1-3 and 1.1.2-2, 
respectively). OS is Squeeze, 32-bit, on Dell Poweredge 1950s, kernel 2.6.36. 
The static route is durable and is set up over SSL (but I can replicate as well 
with non-SSL). I've tried to normalize the hostnames below to make things 
clearer; hopefully I didn't mess anything up.

Given two clusters, cluster A (consisting of hosts A1 and A2) and cluster B 
(with B1 and B2), I've got a static exchange route from A1 to B1, as well as 
another from B1 to A1. Federation is working correctly, so I can send a message 
on A2 and have it successfully retrieved on B2. The exchange local to cluster A 
is walmyex1; the local exchange for B is bosmyex1.

If I shut down the cluster in this order: B2, then B1, and start back up with 
B1, B2, the static route route fails to get recreated. That is, on A1/A2, 
looking at the bindings, exchange 'bosmyex1' does not get re-bound to cluster 
B; the only output for it in "qpid-config exchanges --bindings" is just:

<snip>
Exchange 'bosmyex1' (direct)
</snip>

If however I shut the cluster down in this order: B1, then B2, and start B2, 
then B1, the static route gets re-bound. The output then is:

<snip>
Exchange 'bosmyex1' (direct)
    bind [unix.boston.cust] => 
bridge_queue_1_8870523d-2286-408e-b5b5-50d53db2fa61
</bind>

and I can message over the federated link with no further modification. Prior 
to a few minutes ago, I was seeing this with the Squeeze stock openais==1.1.2 
and corosync==1.2.1. In debugging this, I've upgraded both to the latest 
versions with no change.

I can replicate this every time I try. These are just test clusters, so I don't 
have any other activity going on on them, or any other exchanges/queues. My 
steps:

On all boxes in cluster A and B:
* Kill the qpidd if it's running and delete all existing store files, i.e. 
contents of /var/lib/qpid/

On host A1 in cluster A (I'm leaving out the -a user/t...@host stuff):
* Start up qpid
* qpid-config add exchange direct bosmyex1 --durable
* qpid-config add exchange direct walmyex1 --durable
* qpid-config add queue walmyq1 --durable
* qpid-config bind walmyex1 walmyq1 unix.waltham.cust

On host B1 in cluster B:
* qpid-config add exchange direct bosmyex1 --durable
* qpid-config add exchange direct walmyex1 --durable
* qpid-config add queue bosmyq1 --durable
* qpid-config bind bosmyex1 bosmyq1 unix.boston.cust

On cluster A:
* Start other member of cluster, A2
* qpid-route route add amqps://user/p...@hosta1:5671 
amqps://user/p...@hostb1:5671 walmyex1 unix.waltham.cust -d

On cluster B:
* Start other member of cluster, B2
* qpid-route route add amqps://user/p...@hostb1:5671 
amqps://user/p...@hosta1:5671 bosmyex1 unix.boston.cust -d

On either cluster:
* Check "qpid-config exchanges --bindings" to make sure bindings are correct 
for remote exchanges
* To see correct behaviour, stop cluster in the order B1->B2, or A1->A2, start 
cluster back up, check bindings.
* To see broken behaviour, stop cluster in the order B2->B1, or A2->A1, start 
cluster back up, check bindings.

This is a test cluster, so I'm free to do anything with it, debugging-wise, 
that would be useful. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

[jira] Created: (QPID-2992) Cluster failing to resurrect durable static route depending on order of shutdown

Reply via email to