Re: [ClusterLabs] failure-timeout not working in corosync 2.0.1

2021-03-31 Thread Andrei Borzenkov
On 01.04.2021 00:21, Antony Stone wrote: > On Wednesday 31 March 2021 at 23:11:50, Reid Wahl wrote: > >> Maybe Pacemaker-1 was looser in its handling of resource meta attributes vs >> operation meta attributes. Good question. > > Returning to my suspicion that it's more likely me that simply did

Re: [ClusterLabs] failure-timeout not working in corosync 2.0.1

2021-03-31 Thread Antony Stone
On Wednesday 31 March 2021 at 23:09:38, Antony Stone wrote: > On Wednesday 31 March 2021 at 22:53:53, Reid Wahl wrote: > > Hi, Antony. failure-timeout should be a resource meta attribute, not an > > attribute of the monitor operation. At least I'm not aware of it being > > configurable

Re: [ClusterLabs] failure-timeout not working in corosync 2.0.1

2021-03-31 Thread Antony Stone
On Wednesday 31 March 2021 at 23:11:50, Reid Wahl wrote: > Maybe Pacemaker-1 was looser in its handling of resource meta attributes vs > operation meta attributes. Good question. Returning to my suspicion that it's more likely me that simply did something wrong, what command can I use to find

Re: [ClusterLabs] failure-timeout not working in corosync 2.0.1

2021-03-31 Thread Reid Wahl
Maybe Pacemaker-1 was looser in its handling of resource meta attributes vs operation meta attributes. Good question. On Wednesday, March 31, 2021, Antony Stone wrote: > On Wednesday 31 March 2021 at 22:53:53, Reid Wahl wrote: > >> Hi, Antony. failure-timeout should be a resource meta attribute,

Re: [ClusterLabs] failure-timeout not working in corosync 2.0.1

2021-03-31 Thread Antony Stone
On Wednesday 31 March 2021 at 22:53:53, Reid Wahl wrote: > Hi, Antony. failure-timeout should be a resource meta attribute, not an > attribute of the monitor operation. At least I'm not aware of it being > configurable per-operation -- maybe it is. Can't check at the moment :) Okay, I'll try

Re: [ClusterLabs] failure-timeout not working in corosync 2.0.1

2021-03-31 Thread Reid Wahl
Hi, Antony. failure-timeout should be a resource meta attribute, not an attribute of the monitor operation. At least I'm not aware of it being configurable per-operation -- maybe it is. Can't check at the moment :) On Wednesday, March 31, 2021, Antony Stone wrote: > Hi. > > I've pared my

[ClusterLabs] failure-timeout not working in corosync 2.0.1

2021-03-31 Thread Antony Stone
Hi. I've pared my configureation down to almost a bare minimum to demonstrate the problem I'm having. I have two questions: 1. What command can I use to find out what pacemaker thinks my cluster.cib file really means? I know what I put in it, but I want to see what pacemaker has understood

Re: [ClusterLabs] cluster-recheck-interval and failure-timeout

2021-03-31 Thread Ken Gaillot
On Wed, 2021-03-31 at 17:38 +0200, Antony Stone wrote: > On Wednesday 31 March 2021 at 16:58:30, Antony Stone wrote: > > > I'm only interested in the most recent failure. I'm saying that > > once that > > failure is more than "failure-timeout" seconds old, I want the fact > > that > > the

Re: [ClusterLabs] cluster-recheck-interval and failure-timeout

2021-03-31 Thread Antony Stone
On Wednesday 31 March 2021 at 16:58:30, Antony Stone wrote: > I'm only interested in the most recent failure. I'm saying that once that > failure is more than "failure-timeout" seconds old, I want the fact that > the resource failed to be forgotten, so that it can be restarted or moved > between

Re: [ClusterLabs] cluster-recheck-interval and failure-timeout

2021-03-31 Thread Antony Stone
On Wednesday 31 March 2021 at 15:48:15, Ken Gaillot wrote: > On Wed, 2021-03-31 at 14:32 +0200, Antony Stone wrote: > > > > So, what am I misunderstanding about "failure-timeout", and what > > configuration setting do I need to use to tell pacemaker that "provided the > > resource hasn't failed

Re: [ClusterLabs] cluster-recheck-interval and failure-timeout

2021-03-31 Thread Antony Stone
On Wednesday 31 March 2021 at 15:48:15, Ken Gaillot wrote: > On Wed, 2021-03-31 at 14:32 +0200, Antony Stone wrote: > > > So, what am I misunderstanding about "failure-timeout", and what > > configuration setting do I need to use to tell pacemaker that "provided the > > resource hasn't failed

Re: [ClusterLabs] Live migration possible with KSM ?

2021-03-31 Thread Lentes, Bernd
- On Mar 30, 2021, at 7:54 PM, hunter86 bg hunter86...@yahoo.com wrote: > Keep in mind that KSM is highly cpu intensive and is most suitable for same > type > of VMs,so similar memory pages will be merged until a change happen (and that > change is allocated elsewhere). > In oVirt

Re: [ClusterLabs] cluster-recheck-interval and failure-timeout

2021-03-31 Thread Ken Gaillot
On Wed, 2021-03-31 at 14:32 +0200, Antony Stone wrote: > Hi. > > I'm trying to understand what looks to me like incorrect behaviour > between > cluster-recheck-interval and failure-timeout, under pacemaker 2.0.1 > > I have three machines in a corosync (3.0.1 if it matters) cluster, > managing

[ClusterLabs] Corosync 3.1.1 is available at corosync.org!

2021-03-31 Thread Jan Friesse
I am pleased to announce the latest maintenance release of Corosync 3.1.1 available immediately from GitHub release section at https://github.com/corosync/corosync/releases or our website at http://build.clusterlabs.org/corosync/releases/. This release contains important bug fixes and also a

[ClusterLabs] cluster-recheck-interval and failure-timeout

2021-03-31 Thread Antony Stone
Hi. I'm trying to understand what looks to me like incorrect behaviour between cluster-recheck-interval and failure-timeout, under pacemaker 2.0.1 I have three machines in a corosync (3.0.1 if it matters) cluster, managing 12 resources in a single group. I'm following documentation from:

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-03-31 Thread Strahil Nikolov
Actually , I found the error with the help of the RH support. I was colocating with the Master SAPHanaController resource and thus the cluster did what I have told it to do ;) Now, I just colocated (node-attribute=hana_sid_site) with the first backup ipĀ  (which is colocated with the master) and

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-03-31 Thread Strahil Nikolov
Damn... I am too hasty. It seems that the 2 resources I have already configured are also running on the master. The colocation constraint is like: rsc_bkpip3_SAPHana_SID_HDBinst_num with rsc_SAPHana_SID_HDBinst_num-clone (score: INFINITY) (node-attribute:hana_sid_site) (rsc-role:Started)

Re: [ClusterLabs] staggered resource start/stop

2021-03-31 Thread d tbsky
Klaus Wenninger > In this case it might be useful not to wait some defined time > hoping startup of the VM would have gone far enough that > the IO load has already decayed enough. > What about a resource that checks for something running > inside the VM that indicates that startup has completed?

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-03-31 Thread Strahil Nikolov
Disregard the previous one... it needs 'pcs constraint colocation add' to work. Best Regards,Strahil Nikolov On Wed, Mar 31, 2021 at 8:08, Strahil Nikolov wrote: I guess that feature was added in a later version (still on RHEL 8.2). pcs constraint colocation bkp2 with Master

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-03-31 Thread Strahil Nikolov
I guess that feature was added in a later version (still on RHEL 8.2). pcs constraint colocation bkp2 with Master rsc_SAPHana__HDB score=INFINITY node-attribute=hana__site (or using any other attribute) brings 'pcs constraint help' output. It seems that it doesn't like the syntax. Maybe someone

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-03-31 Thread Andrei Borzenkov
On Wed, Mar 31, 2021 at 8:34 AM Strahil Nikolov wrote: > > Damn... I am too hasty. > > It seems that the 2 resources I have already configured are also running on > the master. > > The colocation constraint is like: > > rsc_bkpip3_SAPHana_SID_HDBinst_num with rsc_SAPHana_SID_HDBinst_num-clone >