On 06/21/2017 02:58 AM, Ferenc Wágner wrote: > Ken Gaillot <kgail...@redhat.com> writes: > >> The most significant change in this release is a new cluster option to >> improve scalability. >> >> As users start to create clusters with hundreds of resources and many >> nodes, one bottleneck is a complete reprobe of all resources (for >> example, after a cleanup of all resources). > > Hi, > > Does crm_resource --cleanup without any --resource specified do this? > Does this happen any other (automatic or manual) way?
Correct. A full probe also happens at startup, but that generally is spread out over enough time not to matter. Prior to this release, a full write-out of all node attributes also occurs whenever a node joins the cluster, which has similar characteristic (due to fail counts for each resource on each node). With this release, that is skipped when using the corosync 2 stack, since we have extra guarantees there that make it unnecessary. > >> This can generate enough CIB updates to get the crmd's CIB connection >> dropped for not processing them quickly enough. > > Is this a catastrophic scenario, or does the cluster recover gently? The crmd exits, leading to node fencing. >> This bottleneck has been addressed with a new cluster option, >> cluster-ipc-limit, to raise the threshold for dropping the connection. >> The default is 500. The recommended value is the number of nodes in the >> cluster multiplied by the number of resources. > > I'm running a production cluster with 6 nodes and 159 resources (ATM), > which gives almost twice the above default. What symptoms should I > expect to see under 1.1.16? (1.1.16 has just been released with Debian > stretch. We can't really upgrade it, but changing the built-in default > is possible if it makes sense.) Even twice the threshold is fine in most clusters, because it's highly unlikely that all probe results will come back at exactly the same time. The case that prompted this involved 200 resources whose monitor action was a simple pid check, so they executed near instantaneously (on 9 nodes). The symptom is an "Evicting client" log message from the cib, listing the pid of the crmd, followed by the crmd exiting. Changing the compiled-in default on older versions is a potential workaround (search for 500 in lib/common/ipc.c), but not ideal since it applies to all clusters (even those too small to need it) and all clients (including command-line clients, whereas the new cluster-ipc-limit option only affects connections from other cluster daemons). The only real downside of increasing the threshold is the potential for increased memory usage (which is why there's a limit to begin with, to avoid an unresponsive client from causing a memory surge on a cluster daemon). The usage is dependent on the size of the queued IPC messages -- for probe results, it should be under 1K per result. The memory is only used if the queue actually backs up (it's not pre-allocated). _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org