[ 
https://issues.apache.org/jira/browse/QPID-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12995336#comment-12995336
 ] 

michael j. goulish commented on QPID-2993:
------------------------------------------

Well, with the latest code tree  ( r1071018 ) I do indeed find something 
interesting.  I cannot get it to actually crash, but with the attached script  
( 2993_bug.sh ) running on a fast 8-proc system, I was able to see 2 out of 10 
instances of broker A2 failing to start up, and logging the following error:

    2011-02-16 09:34:23 error Channel exception: not-attached: Channel 1 is not 
attached (qpid/amqp_0_10/SessionHandler.cpp:39)
    2011-02-16 09:34:23 critical cluster(20.0.100.36:11855 READY/error) local 
error 1498 did not occur on member 20.0.100.36:11714: not-attached: Channel 1 
is not attached (qpid/amqp_0_10/SessionHandler.cpp:39)
    2011-02-16 09:34:23 critical Error delivering frames: local error did not 
occur on all cluster members : not-attached: Channel 1 is not attached 
(qpid/amqp_0_10/SessionHandler.cpp:39) (qpid/cluster/ErrorCheck.cpp:89)
    2011-02-16 09:34:23 notice cluster(20.0.100.36:11855 LEFT/error) leaving 
cluster A
    2011-02-16 09:34:23 notice Shut down




> Federated source-local links crash remotely federated cluster member on local 
> cluster startup
> ---------------------------------------------------------------------------------------------
>
>                 Key: QPID-2993
>                 URL: https://issues.apache.org/jira/browse/QPID-2993
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker, C++ Clustering
>    Affects Versions: 0.8
>         Environment: Debian Linux Squeeze, 32-bit, kernel 2.6.36.2, Dell 
> Poweredge 1950s. Corosync==1.3.0, Openais==1.1.4
>            Reporter: Mark Moseley
>            Assignee: michael j. goulish
>         Attachments: cluster-fed-src.sh
>
>
> This is related to JIRA 2992 that I opened, but this is for source-local 
> routes. Given the same setup as in JIRA 2992 but using source-local routes 
> (and obviously with the exchanges switched accordingly in the qpid-route 
> statements), i.e. cluster A and cluster B with the routes between A1<->B1, 
> when cluster B shuts down in the order B2->B1 and starts back up, the static 
> routes are not correctly re-bound on cluster A's side. However if cluster B 
> is shut down in the order B1->B2 and started back up, the route is correctly 
> created and works. However in the non-functioning case (B2->B1, or A2->A1), 
> there is an additional side-effect: on node A2, qpidd crashes with the 
> following error (cluster A is called 'walclust', B is bosclust):
> 2011-01-07 18:57:35 error Channel exception: not-attached: Channel 1 is not 
> attached (qpid/amqp_0_10/SessionHandler.cpp:39)
> 2011-01-07 18:57:35 critical cluster(102.0.0.0:13650 READY/error) local error 
> 2030 did not occur on member 101.0.0.0:9920: not-attached: Channel 1 is not 
> attached (qpid/amqp_0_10/SessionHandler.cpp:39)
> 2011-01-07 18:57:35 critical Error delivering frames: local error did not 
> occur on all cluster members : not-attached: Channel 1 is not attached 
> (qpid/amqp_0_10/SessionHandler.cpp:39) (qpid/cluster/ErrorCheck.cpp:89)
> 2011-01-07 18:57:35 notice cluster(102.0.0.0:13650 LEFT/error) leaving 
> cluster walclust
> 2011-01-07 18:57:35 notice Shut down
> This happens on both sides of the cluster, so it's not limited to one or the 
> other. This crash does *not* occur in the A1->A2/B1->B2 test (i.e. the test 
> where the route is re-bound correctly). I can cause this to reoccur pretty 
> much every time. I've been resetting the cluster completely to a new state 
> between each test. Occasionally in the B2->B1 test, A1 will also crash with 
> the same error (and vice versa for A2->A1 for node B1), though most of the 
> time, it's A2/B2 that crashes.
> I was getting this same behaviour prior to upgrading corosync/openais as 
> well. Previously I was using the stock Squeeze versions of corosync==1.2.1 
> and openais==1.1.2. The results are the same with corosync=1.3.0 and 
> openais==1.1.4.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscr...@qpid.apache.org

Reply via email to