Re: [ClusterLabs] [EXTERNAL] Users Digest, Vol 55, Issue 19

2019-08-12 Thread Andrei Borzenkov
s master. > 2. Kill services on A, node B will come up as master. > 3. node A is ready to join the cluster, we have to delete the lock file it > creates on any one of the node and execute the cleanup command to get the > node back as standby > > Step 3 is manual so HA is not ac

Re: [ClusterLabs] Master/slave failover does not work as expected

2019-08-12 Thread Harvey Shepherd
ouce agent which supports automatic failback? To avoid generation of lock file and deleting it. 2. If there is no such support, if we need such functionality, do we have to modify existing code? How this can be achieved. Please suggest. Thanks. Thanks. -- next part

Re: [ClusterLabs] [EXTERNAL] Users Digest, Vol 55, Issue 21

2019-08-12 Thread Michael Powell
2. If there is no such support, if we need such functionality, do we have to modify existing code? How this can be achieved. Please suggest. Thanks. Thanks. -- next part -- An HTML attachment was scrubbed... URL: <https://lists.clusterlabs.org/piperma

Re: [ClusterLabs] Querying failed rersource operations from the CIB

2019-08-12 Thread Ken Gaillot
On Mon, 2019-08-12 at 11:15 +0200, Ulrich Windl wrote: > Hi! > > Back in December 2011 I had written a script to retrieve all failed > resource operations by using "cibadmin -Q -o lrm_resources" as data > base. I was querying lrm_rsc_op for op-status != 0. > In a newer release this does not

Re: [ClusterLabs] why is node fenced ?

2019-08-12 Thread Ken Gaillot
On Mon, 2019-08-12 at 18:09 +0200, Lentes, Bernd wrote: > Hi, > > last Friday (9th of August) i had to install patches on my two-node > cluster. > I put one of the nodes (ha-idg-2) into standby (crm node standby ha- > idg-2), patched it, rebooted, > started the cluster (systemctl start

Re: [ClusterLabs] Q: "crmd[7281]: warning: new_event_notification (7281-97955-15): Broken pipe (32)" as response to resource cleanup

2019-08-12 Thread Ken Gaillot
On Mon, 2019-08-12 at 17:46 +0200, Ulrich Windl wrote: > Hi! > > I just noticed that a "crm resource cleanup " caused some > unexpected behavior and the syslog message: > crmd[7281]: warning: new_event_notification (7281-97955-15): Broken > pipe (32) > > It's SLES14 SP4 last updated Sept. 2018

Re: [ClusterLabs] Master/slave failover does not work as expected

2019-08-12 Thread Ken Gaillot
On Mon, 2019-08-12 at 23:09 +0300, Andrei Borzenkov wrote: > > > On Mon, Aug 12, 2019 at 4:12 PM Michael Powell < > michael.pow...@harmonicinc.com> wrote: > > At 07:44:49, the ss agent discovers that the master instance has > > failed on node mgraid…-0 as a result of a failed ssadm request in >

Re: [ClusterLabs] Master/slave failover does not work as expected

2019-08-12 Thread Harvey Shepherd
uch support, if we need such functionality, do we have to modify existing code? How this can be achieved. Please suggest. Thanks. Thanks. -- next part -- An HTML attachment was scrubbed... URL: <https://lists.clusterlabs.org/pipermail/users/attachments/201908

Re: [ClusterLabs] [EXTERNAL] Users Digest, Vol 55, Issue 19

2019-08-12 Thread Michael Powell
s automatic failback? To avoid generation of lock file and deleting it. 2. If there is no such support, if we need such functionality, do we have to modify existing code? How this can be achieved. Please suggest. Thanks. Thanks. -- next part ------ An HTML attachment wa

Re: [ClusterLabs] Restoring network connection breaks cluster services

2019-08-12 Thread Jan Pokorný
On 07/08/19 16:06 +0200, Momcilo Medic wrote: > On Wed, Aug 7, 2019 at 1:00 PM Klaus Wenninger wrote: > >> On 8/7/19 12:26 PM, Momcilo Medic wrote: >> >>> We have three node cluster that is setup to stop resources on lost >>> quorum. Failure (network going down) handling is done properly, >>>

Re: [ClusterLabs] Master/slave failover does not work as expected

2019-08-12 Thread Andrei Borzenkov
On Mon, Aug 12, 2019 at 4:12 PM Michael Powell < michael.pow...@harmonicinc.com> wrote: > At 07:44:49, the ss agent discovers that the master instance has failed on > node *mgraid…-0* as a result of a failed *ssadm* request in response to > an *ss_monitor()* operation. It issues a *crm_master -Q

Re: [ClusterLabs] why is node fenced ?

2019-08-12 Thread Chris Walker
When ha-idg-1 started Pacemaker around 17:43, it did not see ha-idg-2, for example, Aug 09 17:43:05 [6318] ha-idg-1 pacemakerd: info: pcmk_quorum_notification: Quorum retained | membership=1320 members=1 after ~20s (dc-deadtime parameter), ha-idg-2 is marked 'unclean' and STONITHed as

[ClusterLabs] Postgres HA - pacemaker RA do not support auto failback

2019-08-12 Thread Shital A
Hello, Postgres version : 9.6 OS:Rhel 7.6 We are working on HA setup for postgres cluster of two nodes in active-passive mode. Installed: Pacemaker 1.1.19 Corosync 2.4.3 The pacemaker agent with this installation doesn't support automatic failback. What I mean by that is explained below: 1.

[ClusterLabs] why is node fenced ?

2019-08-12 Thread Lentes, Bernd
Hi, last Friday (9th of August) i had to install patches on my two-node cluster. I put one of the nodes (ha-idg-2) into standby (crm node standby ha-idg-2), patched it, rebooted, started the cluster (systemctl start pacemaker) again, put the node again online, everything fine. Then i wanted

[ClusterLabs] Q: "crmd[7281]: warning: new_event_notification (7281-97955-15): Broken pipe (32)" as response to resource cleanup

2019-08-12 Thread Ulrich Windl
Hi! I just noticed that a "crm resource cleanup " caused some unexpected behavior and the syslog message: crmd[7281]: warning: new_event_notification (7281-97955-15): Broken pipe (32) It's SLES14 SP4 last updated Sept. 2018 (up since then, pacemaker-1.1.19+20180928.0d2680780-1.8.x86_64). The

Re: [ClusterLabs] Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Yan Gao
On 8/12/19 3:24 PM, Klaus Wenninger wrote: > On 8/12/19 2:30 PM, Yan Gao wrote: >> Hi Klaus, >> >> On 8/12/19 1:39 PM, Klaus Wenninger wrote: >>> On 8/9/19 9:06 PM, Yan Gao wrote: On 8/9/19 6:40 PM, Andrei Borzenkov wrote: > 09.08.2019 16:34, Yan Gao пишет: >> Hi, >> >> With

Re: [ClusterLabs] Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Klaus Wenninger
On 8/12/19 2:30 PM, Yan Gao wrote: > Hi Klaus, > > On 8/12/19 1:39 PM, Klaus Wenninger wrote: >> On 8/9/19 9:06 PM, Yan Gao wrote: >>> On 8/9/19 6:40 PM, Andrei Borzenkov wrote: 09.08.2019 16:34, Yan Gao пишет: > Hi, > > With disk-less sbd, it's fine to stop cluster service from

Re: [ClusterLabs] Antw: Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Yan Gao
On 8/12/19 8:42 AM, Ulrich Windl wrote: > Hi! > > One motivation to stop all nodes at the same time is to avoid needless moving > of resources, like the following: > You stop node A, then resources are stopped on A and started elsewhere > You stop node B, and resources are stopped and moved to

Re: [ClusterLabs] Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Yan Gao
Hi Klaus, On 8/12/19 1:39 PM, Klaus Wenninger wrote: > On 8/9/19 9:06 PM, Yan Gao wrote: >> On 8/9/19 6:40 PM, Andrei Borzenkov wrote: >>> 09.08.2019 16:34, Yan Gao пишет: Hi, With disk-less sbd, it's fine to stop cluster service from the cluster nodes all at the same time.

Re: [ClusterLabs] Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Klaus Wenninger
On 8/9/19 9:06 PM, Yan Gao wrote: > On 8/9/19 6:40 PM, Andrei Borzenkov wrote: >> 09.08.2019 16:34, Yan Gao пишет: >>> Hi, >>> >>> With disk-less sbd, it's fine to stop cluster service from the cluster >>> nodes all at the same time. >>> >>> But if to stop the nodes one by one, for example with a

[ClusterLabs] Antw: Re: Antw: Re: Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Ulrich Windl
>>> Roger Zhou schrieb am 12.08.2019 um 10:55 in Nachricht <7249e013-1256-675a-3cea-3572f4615...@suse.com>: > On 8/12/19 2:48 PM, Ulrich Windl wrote: > Andrei Borzenkov schrieb am 09.08.2019 um 18:40 in >> Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f9...@gmail.com>: >>> 09.08.2019 16:34,

[ClusterLabs] Querying failed rersource operations from the CIB

2019-08-12 Thread Ulrich Windl
Hi! Back in December 2011 I had written a script to retrieve all failed resource operations by using "cibadmin -Q -o lrm_resources" as data base. I was querying lrm_rsc_op for op-status != 0. In a newer release this does not seems to work anymore. I see resource IDs ending with "_last_0",

Re: [ClusterLabs] Antw: Re: Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Roger Zhou
On 8/12/19 2:48 PM, Ulrich Windl wrote: Andrei Borzenkov schrieb am 09.08.2019 um 18:40 in > Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f9...@gmail.com>: >> 09.08.2019 16:34, Yan Gao пишет: [...] >> >> Lack of cluster wide shutdown mode was mentioned more than once on this >> list. I

Re: [ClusterLabs] Strange lost quorum with qdevice

2019-08-12 Thread Jan Friesse
Andrei Borzenkov napsal(a): Отправлено с iPhone 12 авг. 2019 г., в 8:46, Jan Friesse написал(а): Олег Самойлов napsal(a): 9 авг. 2019 г., в 9:25, Jan Friesse написал(а): Please do not set dpd_interval that high. dpd_interval on qnetd side is not about how often is the ping is sent.

Re: [ClusterLabs] Increasing fence timeout

2019-08-12 Thread Oyvind Albrigtsen
You should be able to increase this timeout by running: pcs stonith update shell_timeout=10 Oyvind On 08/08/19 12:13 -0600, Casey & Gina wrote: Hi, I'm currently running into periodic premature killing of nodes due to the fence monitor timeout being set to 5 seconds. Here is an example

Re: [ClusterLabs] Antw: Re: Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Andrei Borzenkov
Отправлено с iPhone 12 авг. 2019 г., в 9:48, Ulrich Windl написал(а): Andrei Borzenkov schrieb am 09.08.2019 um 18:40 in > Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f9...@gmail.com>: >> 09.08.2019 16:34, Yan Gao пишет: >>> Hi, >>> >>> With disk-less sbd, it's fine to stop cluster

[ClusterLabs] Antw: Re: Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 09.08.2019 um 18:40 in Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f9...@gmail.com>: > 09.08.2019 16:34, Yan Gao пишет: >> Hi, >> >> With disk-less sbd, it's fine to stop cluster service from the cluster >> nodes all at the same time. >> >> But if to stop the

[ClusterLabs] Antw: Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Ulrich Windl
Hi! One motivation to stop all nodes at the same time is to avoid needless moving of resources, like the following: You stop node A, then resources are stopped on A and started elsewhere You stop node B, and resources are stopped and moved to remaining nodes ...until the last node stops, or

Re: [ClusterLabs] Strange lost quorum with qdevice

2019-08-12 Thread Andrei Borzenkov
Отправлено с iPhone > 12 авг. 2019 г., в 8:46, Jan Friesse написал(а): > > Олег Самойлов napsal(a): >>> 9 авг. 2019 г., в 9:25, Jan Friesse написал(а): >>> Please do not set dpd_interval that high. dpd_interval on qnetd side is not >>> about how often is the ping is sent. Could you please

[ClusterLabs] Antw: Re: corosync.service (and sbd.service) are not stopper on pacemaker shutdown when corosync-qdevice is used

2019-08-12 Thread Ulrich Windl
>>> Roger Zhou schrieb am 09.08.2019 um 10:19 in Nachricht <06f700cb-d941-2f53-aee5-2d64c499c...@suse.com>: > > On 8/9/19 3:39 PM, Jan Friesse wrote: >> Roger Zhou napsal(a): >>> >>> On 8/9/19 2:27 PM, Roger Zhou wrote: On 7/29/19 12:24 AM, Andrei Borzenkov wrote: >

Re: [ClusterLabs] Ubuntu 18.04 and corosync-qdevice

2019-08-12 Thread Jan Friesse
Nickle, Richard napsal(a): I've built a two-node DRBD cluster with SBD and STONITH, following advice from ClusterLabs, LinBit, Beekhof's blog on SBD. I still cannot get automated failover when I down one of the nodes. I thought that perhaps I needed to have an odd-numbered quorum so I