[jira] [Commented] (QPID-3286) cluster node went down

Alan Conway (JIRA) Wed, 01 Jun 2011 07:31:17 -0700

    [ 
https://issues.apache.org/jira/browse/QPID-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042195#comment-13042195
 ]


Alan Conway commented on QPID-3286:
-----------------------------------

The problem here is that you are overflowing your journal. The journal isn't 
exactly the same on different nodes in a cluster so if one node overflows and 
the other doesn't the one that overflowed will shut down. This is because it no 
longer has a faithful record of all the messages sent, so it is better to shut 
down and let clients fail over to the good broker.

You should look at the throughput in your producers and consumers. If the 
consumers are not at least as fast (on average) as the producers then queue 
depth will increase without limit. You might also increase the capacity of the 
journal to ensure it is enough to handle the peak message load.

> cluster node went down
> ----------------------
>
>                 Key: QPID-3286
>                 URL: https://issues.apache.org/jira/browse/QPID-3286
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: 0.10
>         Environment: Two node persistent cluster using openais. Both nodes 
> are CentOS 5.5.
>            Reporter: sujith paily
>            Assignee: Alan Conway
>            Priority: Critical
>              Labels: adminis, newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I have configured qpid 0.10 c++ brocker as 2 node persistent cluster. I was 
> worked without any issue for few hours or sometimes one or two day. But one  
> node went down after some time with following error.
> ---------------------------------------
> 2011-05-30 12:55:28 warning Journal "OPC_MESSAGE_QUEUE": Enqueue capacity 
> threshold exceeded on queue "OPC_MESSAGE_QUEUE".
> 2011-05-30 12:55:28 error Unexpected exception: Enqueue capacity threshold 
> exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 error Connection 192.168.1.138:5672-192.168.1.10:58839 
> closed by error: Enqueue capacity threshold exceeded on queue 
> "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)(501)
> 2011-05-30 12:55:28 critical cluster(192.168.1.138:6321 READY/error) local 
> error 11545 did not occur on member 192.168.1.139:25161: Enqueue capacity 
> threshold exceeded on queue "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587)
> 2011-05-30 12:55:28 critical Error delivering frames: local error did not 
> occur on all cluster members : Enqueue capacity threshold exceeded on queue 
> "OPC_MESSAGE_QUEUE". (JournalImpl.cpp:587) (qpid/cluster/ErrorCheck.cpp:89)
> 2011-05-30 12:55:28 notice cluster(192.168.1.138:6321 LEFT/error) leaving 
> cluster QCLUSTER
> 2011-05-30 12:55:28 notice Shut down
> --------------------------------------
> But the remaining node is working without any issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

[jira] [Commented] (QPID-3286) cluster node went down

Reply via email to