[ https://issues.apache.org/jira/browse/QPID-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
michael j. goulish updated QPID-2993: ------------------------------------- Fix Version/s: 0.11 > Federated source-local links crash remotely federated cluster member on local > cluster startup > --------------------------------------------------------------------------------------------- > > Key: QPID-2993 > URL: https://issues.apache.org/jira/browse/QPID-2993 > Project: Qpid > Issue Type: Bug > Components: C++ Broker, C++ Clustering > Affects Versions: 0.8 > Environment: Debian Linux Squeeze, 32-bit, kernel 2.6.36.2, Dell > Poweredge 1950s. Corosync==1.3.0, Openais==1.1.4 > Reporter: Mark Moseley > Assignee: michael j. goulish > Fix For: 0.11 > > Attachments: 2993_bug.sh, cluster-fed-src.sh > > > This is related to JIRA 2992 that I opened, but this is for source-local > routes. Given the same setup as in JIRA 2992 but using source-local routes > (and obviously with the exchanges switched accordingly in the qpid-route > statements), i.e. cluster A and cluster B with the routes between A1<->B1, > when cluster B shuts down in the order B2->B1 and starts back up, the static > routes are not correctly re-bound on cluster A's side. However if cluster B > is shut down in the order B1->B2 and started back up, the route is correctly > created and works. However in the non-functioning case (B2->B1, or A2->A1), > there is an additional side-effect: on node A2, qpidd crashes with the > following error (cluster A is called 'walclust', B is bosclust): > 2011-01-07 18:57:35 error Channel exception: not-attached: Channel 1 is not > attached (qpid/amqp_0_10/SessionHandler.cpp:39) > 2011-01-07 18:57:35 critical cluster(102.0.0.0:13650 READY/error) local error > 2030 did not occur on member 101.0.0.0:9920: not-attached: Channel 1 is not > attached (qpid/amqp_0_10/SessionHandler.cpp:39) > 2011-01-07 18:57:35 critical Error delivering frames: local error did not > occur on all cluster members : not-attached: Channel 1 is not attached > (qpid/amqp_0_10/SessionHandler.cpp:39) (qpid/cluster/ErrorCheck.cpp:89) > 2011-01-07 18:57:35 notice cluster(102.0.0.0:13650 LEFT/error) leaving > cluster walclust > 2011-01-07 18:57:35 notice Shut down > This happens on both sides of the cluster, so it's not limited to one or the > other. This crash does *not* occur in the A1->A2/B1->B2 test (i.e. the test > where the route is re-bound correctly). I can cause this to reoccur pretty > much every time. I've been resetting the cluster completely to a new state > between each test. Occasionally in the B2->B1 test, A1 will also crash with > the same error (and vice versa for A2->A1 for node B1), though most of the > time, it's A2/B2 that crashes. > I was getting this same behaviour prior to upgrading corosync/openais as > well. Previously I was using the stock Squeeze versions of corosync==1.2.1 > and openais==1.1.2. The results are the same with corosync=1.3.0 and > openais==1.1.4. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org