[jira] Assigned: (QPID-2993) Federated source-local links crash remotely federated cluster member on local cluster startup

2011-01-27 Thread michael j. goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/QPID-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael j. goulish reassigned QPID-2993:


Assignee: michael j. goulish  (was: Alan Conway)

 Federated source-local links crash remotely federated cluster member on local 
 cluster startup
 -

 Key: QPID-2993
 URL: https://issues.apache.org/jira/browse/QPID-2993
 Project: Qpid
  Issue Type: Bug
  Components: C++ Broker, C++ Clustering
Affects Versions: 0.8
 Environment: Debian Linux Squeeze, 32-bit, kernel 2.6.36.2, Dell 
 Poweredge 1950s. Corosync==1.3.0, Openais==1.1.4
Reporter: Mark Moseley
Assignee: michael j. goulish
 Attachments: cluster-fed-src.sh


 This is related to JIRA 2992 that I opened, but this is for source-local 
 routes. Given the same setup as in JIRA 2992 but using source-local routes 
 (and obviously with the exchanges switched accordingly in the qpid-route 
 statements), i.e. cluster A and cluster B with the routes between A1-B1, 
 when cluster B shuts down in the order B2-B1 and starts back up, the static 
 routes are not correctly re-bound on cluster A's side. However if cluster B 
 is shut down in the order B1-B2 and started back up, the route is correctly 
 created and works. However in the non-functioning case (B2-B1, or A2-A1), 
 there is an additional side-effect: on node A2, qpidd crashes with the 
 following error (cluster A is called 'walclust', B is bosclust):
 2011-01-07 18:57:35 error Channel exception: not-attached: Channel 1 is not 
 attached (qpid/amqp_0_10/SessionHandler.cpp:39)
 2011-01-07 18:57:35 critical cluster(102.0.0.0:13650 READY/error) local error 
 2030 did not occur on member 101.0.0.0:9920: not-attached: Channel 1 is not 
 attached (qpid/amqp_0_10/SessionHandler.cpp:39)
 2011-01-07 18:57:35 critical Error delivering frames: local error did not 
 occur on all cluster members : not-attached: Channel 1 is not attached 
 (qpid/amqp_0_10/SessionHandler.cpp:39) (qpid/cluster/ErrorCheck.cpp:89)
 2011-01-07 18:57:35 notice cluster(102.0.0.0:13650 LEFT/error) leaving 
 cluster walclust
 2011-01-07 18:57:35 notice Shut down
 This happens on both sides of the cluster, so it's not limited to one or the 
 other. This crash does *not* occur in the A1-A2/B1-B2 test (i.e. the test 
 where the route is re-bound correctly). I can cause this to reoccur pretty 
 much every time. I've been resetting the cluster completely to a new state 
 between each test. Occasionally in the B2-B1 test, A1 will also crash with 
 the same error (and vice versa for A2-A1 for node B1), though most of the 
 time, it's A2/B2 that crashes.
 I was getting this same behaviour prior to upgrading corosync/openais as 
 well. Previously I was using the stock Squeeze versions of corosync==1.2.1 
 and openais==1.1.2. The results are the same with corosync=1.3.0 and 
 openais==1.1.4.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
Apache Qpid - AMQP Messaging Implementation
Project:  http://qpid.apache.org
Use/Interact: mailto:dev-subscr...@qpid.apache.org



[jira] Assigned: (QPID-2993) Federated source-local links crash remotely federated cluster member on local cluster startup

2011-01-10 Thread Alan Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/QPID-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Conway reassigned QPID-2993:
-

Assignee: Alan Conway

 Federated source-local links crash remotely federated cluster member on local 
 cluster startup
 -

 Key: QPID-2993
 URL: https://issues.apache.org/jira/browse/QPID-2993
 Project: Qpid
  Issue Type: Bug
  Components: C++ Broker, C++ Clustering
Affects Versions: 0.8
 Environment: Debian Linux Squeeze, 32-bit, kernel 2.6.36.2, Dell 
 Poweredge 1950s. Corosync==1.3.0, Openais==1.1.4
Reporter: Mark Moseley
Assignee: Alan Conway

 This is related to JIRA 2992 that I opened, but this is for source-local 
 routes. Given the same setup as in JIRA 2992 but using source-local routes 
 (and obviously with the exchanges switched accordingly in the qpid-route 
 statements), i.e. cluster A and cluster B with the routes between A1-B1, 
 when cluster B shuts down in the order B2-B1 and starts back up, the static 
 routes are not correctly re-bound on cluster A's side. However if cluster B 
 is shut down in the order B1-B2 and started back up, the route is correctly 
 created and works. However in the non-functioning case (B2-B1, or A2-A1), 
 there is an additional side-effect: on node A2, qpidd crashes with the 
 following error (cluster A is called 'walclust', B is bosclust):
 2011-01-07 18:57:35 error Channel exception: not-attached: Channel 1 is not 
 attached (qpid/amqp_0_10/SessionHandler.cpp:39)
 2011-01-07 18:57:35 critical cluster(102.0.0.0:13650 READY/error) local error 
 2030 did not occur on member 101.0.0.0:9920: not-attached: Channel 1 is not 
 attached (qpid/amqp_0_10/SessionHandler.cpp:39)
 2011-01-07 18:57:35 critical Error delivering frames: local error did not 
 occur on all cluster members : not-attached: Channel 1 is not attached 
 (qpid/amqp_0_10/SessionHandler.cpp:39) (qpid/cluster/ErrorCheck.cpp:89)
 2011-01-07 18:57:35 notice cluster(102.0.0.0:13650 LEFT/error) leaving 
 cluster walclust
 2011-01-07 18:57:35 notice Shut down
 This happens on both sides of the cluster, so it's not limited to one or the 
 other. This crash does *not* occur in the A1-A2/B1-B2 test (i.e. the test 
 where the route is re-bound correctly). I can cause this to reoccur pretty 
 much every time. I've been resetting the cluster completely to a new state 
 between each test. Occasionally in the B2-B1 test, A1 will also crash with 
 the same error (and vice versa for A2-A1 for node B1), though most of the 
 time, it's A2/B2 that crashes.
 I was getting this same behaviour prior to upgrading corosync/openais as 
 well. Previously I was using the stock Squeeze versions of corosync==1.2.1 
 and openais==1.1.2. The results are the same with corosync=1.3.0 and 
 openais==1.1.4.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
Apache Qpid - AMQP Messaging Implementation
Project:  http://qpid.apache.org
Use/Interact: mailto:dev-subscr...@qpid.apache.org