Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Vallevand, Mark K
No stonith configured. Not explicitly anyway. Does that factor into this somehow? I've tested stonith, but we aren't doing it for customers. Maybe in the future if someone cries or pays us money. Our solution is deployed onto too many different machines. A couple of bare metal. A couple of

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Vallevand, Mark K
We know. We've worked out our application-specific answer to split brain. But, proper fencing is on our to-do list. Currently we only deploy 2-node systems. There is one application and its agent. One resource is configured. We have this in cluster.conf So, we don’t get quorum

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Digimer
On 16/10/15 12:37 PM, Vallevand, Mark K wrote: > Fencing, yes. I have pcmk-redirect for each node in cluster.conf. Do you have stonith configured (and tested!) in Pacemaker as well? > I run with default cman settings for corosync. No totem clause. That gives > the 20s detection. Not sure

[ClusterLabs] Stopped node detection.

2015-10-16 Thread Vallevand, Mark K
Ubuntu 12.04 LTS pacemaker 1.1.10 cman 3.1.7 corosync 1.4.6 If my cluster has no resources, it seems like it takes 20s for a stopped node to be detected. Is the value really 20s and is it a parameter that can be adjusted? Regards. Mark K Vallevand

[ClusterLabs] Alternative to resource monitor polling?

2015-10-16 Thread Vallevand, Mark K
Is there an alternative to resource monitor polling to detect a resource failure? If, for example, a resource failure is detected by our own software, could it signal clustering that a resource has failed? Regards. Mark K Vallevand mark.vallev...@unisys.com

[ClusterLabs] Antw: Stopped node detection.

2015-10-16 Thread Ulrich Windl
>>> "Vallevand, Mark K" schrieb am 15.10.2015 um >>> 22:55 in Nachricht <2f280811793d43418745268be7397...@us-exch13-5.na.uis.unisys.com>: > Ubuntu 12.04 LTS > pacemaker 1.1.10 > cman 3.1.7 > corosync 1.4.6 > > If my cluster has no resources, it seems like it takes 20s

[ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Vallevand, Mark K
It looks like it takes 20s for a cluster to detect that a node has been lost. The detection seems to correlate to dlm reporting its lost connection to the node. Not sure if correlation is causation. Anyway, can someone tell me where that 20s might be coming from and if it is adjustable? Ubuntu

Re: [ClusterLabs] Stopped node detection.

2015-10-16 Thread Ken Gaillot
On 10/15/2015 03:55 PM, Vallevand, Mark K wrote: > Ubuntu 12.04 LTS > pacemaker 1.1.10 > cman 3.1.7 > corosync 1.4.6 > > If my cluster has no resources, it seems like it takes 20s for a stopped node > to be detected. Is the value really 20s and is it a parameter that can be > adjusted? The

[ClusterLabs] A question about resource monitoring.

2015-10-16 Thread Vallevand, Mark K
Is there an alternative to resource monitoring? Maybe a 'supplement' to resource polling is a better way to say it. If my application self-detects an error and wants to report it (rather than wait for the monitor to poll it), can it report that to clustering? Suggestions are welcome. And a

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Vallevand, Mark K
Thanks. I wasn't completely aware of corosync's role in this. I see new things in the docs every time I read them. I looked up the corosync settings at one time and did it again: token loss 3000ms retransmits 10 So 30s. Redid my simple testing and got detection times of 22s,

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Digimer
On 16/10/15 11:40 AM, Vallevand, Mark K wrote: > Thanks. I wasn't completely aware of corosync's role in this. I see new > things in the docs every time I read them. > > I looked up the corosync settings at one time and did it again: > token loss 3000ms > retransmits 10 > So 30s.

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Vallevand, Mark K
Oops. Cman starts corosync. Cman has corosync settings of token loss 1ms and retransmit 10. According to the man page, anyway. Experimenting. Regards. Mark K Vallevand mark.vallev...@unisys.com Never try and teach a pig to sing: it's a waste of time,

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Vallevand, Mark K
Fencing, yes. I have pcmk-redirect for each node in cluster.conf. I run with default cman settings for corosync. No totem clause. That gives the 20s detection. Not sure what the defaults really are. I added to cluster.conf and get about a 5s detection. The corosync man page says:

Re: [ClusterLabs] Cluster node loss detection.

2015-10-16 Thread Digimer
On 16/10/15 01:14 PM, Vallevand, Mark K wrote: > No stonith configured. Not explicitly anyway. > Does that factor into this somehow? Yes, you will eventually have a split-brain. All fencing in cman does with 'fence_pcmk' is say "hey, if you need to fence, ask pacemaker to do it". That's useless