Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

David Vossel Tue, 03 Jul 2012 13:31:44 -0700

----- Original Message -----
> From: "Brian J. Murrell" <br...@interlinx.bc.ca>
> To: pacema...@clusterlabs.org
> Sent: Tuesday, July 3, 2012 2:15:09 PM
> Subject: Re: [Pacemaker] Call cib_query failed (-41): Remote node did not     
> respond
> 
> On 12-06-27 11:30 PM, Andrew Beekhof wrote:
> > 
> > The updates from you aren't the problem.  Its the number of
> > resource
> > operations (that need to be stored in the CIB) that result from
> > your
> > changes that might be causing the problem.
> 
> Just to follow this up for anyone currently following or anyone
> finding
> this thread in the future...
> 
> It turns out that the problem is simply the size of the HA cluster
> that
> I want to create.  The details are in the bug I filed at
> http://bugs.clusterlabs.org/show_bug.cgi?id=5076 but the short story
> is
> that I can add the number of resources and constrains I want to add
> (i.e. 32-34 of each, as previously described in this thread),
> concurrently even, so long as there is not more than 4 nodes per
> corosync/pacemaker cluster.
> 
> Even adding 4 passive nodes (I only tried 8 total of 8 nodes, but not
> values between 4 and 8 so the tipping point might be somewhere in
> between 4 and 8) -- nodes that do no CIB operations of their own made
> pacemaker crumble.
>
> 
> So the summary seems to be that pacemaker cannot scale to more than a
> handful of nodes, even when the nodes are big: 12 core Xeon nodes
> with
> gobs of memory.


This is not a definite.  Perhaps you are experiencing this given the pacemaker 
version you are running and the torture test you are running with all those 
parallel commands, but I wouldn't go as far as to say pacemaker cannot scale to 
more than a handful of nodes.  It completely depends on the situation.  16 
nodes with 32 resources might work... 3 nodes with 100 resources might not.  
There is a limit to how far deployments can scale, but it is not easy to 
quantify values that hold any real truth across all deployments.  I'm sure you 
know this, I just wanted to be explicit about this so there is no confusion 
caused by people who may use your example as a concrete metric.

> 
> I can only guess that everybody is using pacemaker in "pair" (or not
> much bigger) type configurations currently.  Is that accurate?
>

>From the deployments I've seen on the mailing list and bug reports, the most 
>common clusters appear to be around the 2-6 node mark.

> Perhaps there is some tuning that can be done to scale somewhat, but
> realistically, I am looking for pacemaker clusters in the tens, if
> not
> into the hundreds of nodes.  However, I really wonder if any amount

The messaging involved with keeping the all the local resource operations in 
the CIB synced across that many nodes is pretty insane.  If you are set on 
using pacemaker, the best approach for scaling for your situation would 
probably be to try and figure out how to break nodes into smaller clusters that 
are easier to manage.  I have not heard of a single deployment as large as you 
are thinking of.

-- Vossel

> of
> tuning could be done to achieve clusters that large given the small
> number of nodes supported with the default tuning values.
>
> 
> Thoughts?
> 
> b.
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

Reply via email to