Re: [ClusterLabs] Question: Mount Monitoring for Non-shared File-system

2021-12-07 Thread Andrei Borzenkov
On 07.12.2021 21:35, Asseel Sidique wrote: > Hi Everyone, > I'm looking for some insight on what the best way is to configure mount > monitoring for a cloned database resource. > Consider the resource model below: > * Clone Set: database_1-clone [database_1] (promotable): > * Masters: [

Re: [ClusterLabs] resource start after network reconnected

2021-11-20 Thread Andrei Borzenkov
On 21.11.2021 00:39, Strahil Nikolov via Users wrote: > Nope, as long as you use SBD's integration with pacemaker. As the 2 nodes can > communicate between each other sbd won't act. I thinkt it was an entry like > this in the /etc/sysconfig/sbd: 'SBD_PACEMAKER=yes' > That's correct except it

Re: [ClusterLabs] resource start after network reconnected

2021-11-19 Thread Andrei Borzenkov
On 19.11.2021 20:45, Ken Gaillot wrote: > On Fri, 2021-11-19 at 10:40 -0500, john tillman wrote: > > > >>> If pacemaker tries to stop resources due to out of quorum >>> condition, you >>> could set suitable failure-timeout; this will be equivalent to >>> using "pcs >>> resource refresh". Keep

Re: [ClusterLabs] resource start after network reconnected

2021-11-19 Thread Andrei Borzenkov
On 19.11.2021 19:26, john tillman wrote: ... >>> >>> If pacemaker tries to stop resources due to out of quorum condition, you >>> could set suitable failure-timeout; this will be equivalent to using >>> "pcs >>> resource refresh". Keep in mind that pacemaker only checks for >>> failure-timeout

Re: [ClusterLabs] resource start after network reconnected

2021-11-19 Thread Andrei Borzenkov
On 19.11.2021 17:36, john tillman wrote: >> On 18.11.2021 22:33, john tillman wrote: >>> >>> Greetings all, >>> >>> preamble: RHEL8, PCS 0.10.8, COROSYNC 3.1.0, PACEMAKER 2.0.5 >>> >>> I have a mysql resource, cloned, that is behaving the way I wanted. >>> When >>> the node it is on is unplugged

Re: [ClusterLabs] resource start after network reconnected

2021-11-18 Thread Andrei Borzenkov
On 18.11.2021 22:33, john tillman wrote: > > Greetings all, > > preamble: RHEL8, PCS 0.10.8, COROSYNC 3.1.0, PACEMAKER 2.0.5 > > I have a mysql resource, cloned, that is behaving the way I wanted. When > the node it is on is unplugged from the network quorum is lost and the > mysqld service

Re: [ClusterLabs] Fence node when network interface goes down

2021-11-15 Thread Andrei Borzenkov
On Mon, Nov 15, 2021 at 3:32 PM S Rogers wrote: >> >> The only solution here - as long as fencing node on external >> connectivity loss is acceptable - is modifying ethmonitor RA to fail >> monitor operation in this case. > > I was hoping to find a way to achieve the desired outcome without

Re: [ClusterLabs] Fence node when network interface goes down

2021-11-15 Thread Andrei Borzenkov
On Mon, Nov 15, 2021 at 1:18 PM Klaus Wenninger wrote: > > > > On Mon, Nov 15, 2021 at 10:37 AM S Rogers wrote: >> >> I had thought about doing that, but the cluster is then dependent on the >> external system, and if that external system was to go down or become >> unreachable for any reason

Re: [ClusterLabs] drbd nfs slave not working

2021-11-14 Thread Andrei Borzenkov
On 14.11.2021 19:47, Neil McFadyen wrote: > I have a Ubuntu 20.04 drbd nfs pacemaker/corosync setup for 2 nodes, it > was working fine before but now I can't get the 2nd node to show as a slave > under the Clone Set. So if I do a failover both nodes show as stopped. > >

Re: [ClusterLabs] Fence node when network interface goes down

2021-11-12 Thread Andrei Borzenkov
On 12.11.2021 20:31, S Rogers wrote: > Hi, I'm hoping someone will be able to point me in the right direction. > > I am configuring a two-node active/passive cluster that utilises the > PostgreSQL PAF resource agent. Each node has two NICs, therefore the > cluster is configured with two corosync

Re: [ClusterLabs] Antw: [EXT] Inquiry - remote node fencing issue

2021-11-05 Thread Andrei Borzenkov
On 05.11.2021 01:20, Ken Gaillot wrote: >> >> There are two issues discussed in this thread. >> >> 1. Remote node is fenced when connection with this node is lost. For >> all >> I can tell this is intended and expected behavior. That was the >> original >> question. > > It's expected only because

Re: [ClusterLabs] Favoured node in priority-fencing-delay

2021-11-02 Thread Andrei Borzenkov
On 03.11.2021 05:21, Alex Zarifoglu wrote: > Hello all, > I have a question about the "priority-fencing-delay" parameter. > This parameter, although very helpful, it doesn't handle the scenario where > nodes have equal priority. It is intended exactly for the scenario where nodes have equal

Re: [ClusterLabs] Cannot ping a secondary address apart from the server which it is assigned to (on Azure)

2021-10-31 Thread Andrei Borzenkov
On 01.11.2021 01:56, Paul Warwicker wrote: > On 28/10/2021 14:30, Andrei Borzenkov wrote: >> For virtual IP you can (should?) use Azure >> load balancers - basically,  you create a pool of one address, Azure >> probes each node and detects which node has IP active. >>

Re: [ClusterLabs] How to globally enable trace log level in pacemaker?

2021-10-31 Thread Andrei Borzenkov
On 31.10.2021 19:37, Strahil Nikolov wrote: > At least it's worth trying (/etc/sysconfig/pacemaker):PCMK_trace_files=* commit 85040eb19b9405464b01a7e67eb6769d2a03c611 Author: Ken Gaillot Date: Fri Jun 19 17:49:22 2020 -0500 Doc: sysconfig: remove outdated reference to wildcards in

Re: [ClusterLabs] How to globally enable trace log level in pacemaker?

2021-10-31 Thread Andrei Borzenkov
On 31.10.2021 16:48, Strahil Nikolov via Users wrote: > Have you checked the options in /etc/sysconfig/pacemaker as recommended in  > https://documentation.suse.com/sle-ha/15-SP3/html/SLE-HA-all/app-ha-troubleshooting.html#sec-ha-troubleshooting-log > ? > And where exactly it explains how to

Re: [ClusterLabs] How to globally enable trace log level in pacemaker?

2021-10-31 Thread Andrei Borzenkov
t or all of the > pacemaker processes. > > This might be the environment variable you are looking for ? > It sets log level to debug, while I need trace. > Regards, > > Le 31 octobre 2021 09:20:00 GMT+01:00, Andrei Borzenkov > a écrit : >> I think it worked in the past by pa

[ClusterLabs] How to globally enable trace log level in pacemaker?

2021-10-31 Thread Andrei Borzenkov
I think it worked in the past by passing a lot of -VVV when starting pacemaker. It does not seem to work now. I can call /usr/sbin/pacemakerd -..., but it does pass options further to children it starts. So every other daemon is started without any option and with default log

Re: [ClusterLabs] Antw: [EXT] Inquiry - remote node fencing issue

2021-10-30 Thread Andrei Borzenkov
On 29.10.2021 18:37, Ken Gaillot wrote: ... To address the original question, this is the log sequence I find most relevant: > Oct 22 12:21:09.389 jangcluster-srv-2 pacemaker- > schedulerd[776553] > (unpack_rsc_op_failure) warning: Unexpected result

Re: [ClusterLabs] Antw: [EXT] Inquiry - remote node fencing issue

2021-10-29 Thread Andrei Borzenkov
On 29.10.2021 18:16, Andrei Borzenkov wrote: > On 29.10.2021 17:53, Ken Gaillot wrote: >> On Fri, 2021-10-29 at 13:59 +, Gerry R Sommerville wrote: >>> Hey Andrei, >>> >>> Thanks for your response again. The cluster nodes and remote hosts

Re: [ClusterLabs] Antw: [EXT] Inquiry - remote node fencing issue

2021-10-29 Thread Andrei Borzenkov
On 29.10.2021 17:53, Ken Gaillot wrote: > On Fri, 2021-10-29 at 13:59 +, Gerry R Sommerville wrote: >> Hey Andrei, >> >> Thanks for your response again. The cluster nodes and remote hosts >> each share two networks, however there is no routing between them. I >> don't suppose there is a

Re: [ClusterLabs] Antw: [EXT] Inquiry - remote node fencing issue

2021-10-28 Thread Andrei Borzenkov
On 28.10.2021 20:13, Gerry R Sommerville wrote: > > What we also found to be interesting is that if the cluster is only using a > single heartbeat ring, then srv-2 will get fenced instead, and the So as already suspected you did not actually isolate the node at all. > pacemaker-remote

Re: [ClusterLabs] Cannot ping a secondary address apart from the server which it is assigned to (on Azure)

2021-10-28 Thread Andrei Borzenkov
On Thu, Oct 28, 2021 at 3:43 PM Paul Warwicker wrote: > > Hello, > > I originally posted this in the Azure forums first but have had no replies. > Trying here instead in case anyone has encountered it. > > I am trying to setup up a High Availability Cluster in Azure using CentOS 8, > Pacemaker

Re: [ClusterLabs] Antw: [EXT] Inquiry - remote node fencing issue

2021-10-28 Thread Andrei Borzenkov
On Thu, Oct 28, 2021 at 10:30 AM Ulrich Windl wrote: > > Fencing _is_ a part of failover! > As any blanket answer this is mostly incorrect in this context. There are two separate objects here - remote host itself and pacemaker resource used to connect to and monitor state of remote host.

Re: [ClusterLabs] Inquiry - remote node fencing issue

2021-10-27 Thread Andrei Borzenkov
On Tue, Oct 26, 2021 at 11:09 PM Janghyuk Boo wrote: > > > > Dear Community , > > > > Thank you Ken for your reply last time. > > > > I attached the log messages as requested from the last thread. > > > > I have a Pacemaker cluster with two cluster nodes with two network interfaces > each, and

Re: [ClusterLabs] DRBD split-brain investigations, automatic fixes and manual intervention...

2021-10-20 Thread Andrei Borzenkov
On 20.10.2021 17:54, Ian Diddams wrote: > > > On Wednesday, 20 October 2021, 11:15:48 BST, Andrei Borzenkov > wrote: > > >> You cannot resolve split brain without fencing. This is as simple as >> that. Your pacemaker configuration (from another mail

Re: [ClusterLabs] DRBD split-brain investigations, automatic fixes and manual intervention...

2021-10-20 Thread Andrei Borzenkov
On Wed, Oct 20, 2021 at 11:54 AM Ian Diddams via Users wrote: > > So - system logs recently show this > > ESTRELA > Oct 18th > Oct 18 04:04:28 wp-vldyn-estrela kernel: [584651.491139] drbd mysql01/0 > drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from > peer node > Oct 18

Re: [ClusterLabs] Trying to understand dampening (ping)

2021-10-16 Thread Andrei Borzenkov
On 15.10.2021 13:24, Klaus Wenninger wrote: > On Fri, Oct 15, 2021 at 12:01 PM Andrei Borzenkov > wrote: > >> On Fri, Oct 15, 2021 at 9:25 AM Klaus Wenninger >> wrote: >> >>> Main pain-point here is that ping-RA allows us to configure the count of >

Re: [ClusterLabs] Trying to understand dampening (ping)

2021-10-16 Thread Andrei Borzenkov
On 14.10.2021 23:51, martin doc wrote: > > > > From: Andrei Borzenkov , Friday, 15 October 2021 4:59 AM > ... >> Dampening defines delay before attributes are committed to CIB. >> Private attributes are never ever written into CIB, so

Re: [ClusterLabs] Trying to understand dampening (ping)

2021-10-15 Thread Andrei Borzenkov
On 15.10.2021 09:24, Klaus Wenninger wrote: > Main pain-point here is that ping-RA allows us to configure the count of > pings sent, but it > is just using the exit-value from ping that becomes negative already when > one of the > answers is missing. Looking closer, this is not true. This is

Re: [ClusterLabs] Trying to understand dampening (ping)

2021-10-15 Thread Andrei Borzenkov
On Fri, Oct 15, 2021 at 9:25 AM Klaus Wenninger wrote: > Main pain-point here is that ping-RA allows us to configure the count of > pings sent, but it > is just using the exit-value from ping that becomes negative already when one > of the > answers is missing. Use fping instead? Which is

Re: [ClusterLabs] Trying to understand dampening (ping)

2021-10-14 Thread Andrei Borzenkov
On 13.10.2021 18:01, martin doc wrote: > In the ping resource script, there's support for "dampen" in the use of > attrd_updater. > > My expectation is that it will cause "ping", "no-ping", "ping" to result in > the service being continually presented as up rather than to flap about. > > In

Re: [ClusterLabs] iflabel removed??

2021-10-14 Thread Andrei Borzenkov
On 14.10.2021 18:31, Paul Warwicker wrote: > Hello, > > Has the ability to specify an interface alias been removed? I checked > the archives and the source at > https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/IPaddr2 > and it appears to still be valid. Also here >

[ClusterLabs] No link to https://clusterlabs.org/pacemaker/man/ from main page

2021-10-13 Thread Andrei Borzenkov
I found page https://clusterlabs.org/pacemaker/man/ only by accident. There is no link from anywhere else in this site, at least I have not found one. Logically I expect it to be linked from Documentation section. ___ Manage your subscription:

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Coming in Pacemaker 2.1.2: new fencing configuration options

2021-10-12 Thread Andrei Borzenkov
On 12.10.2021 09:27, Ulrich Windl wrote: >>>> Andrei Borzenkov schrieb am 11.10.2021 um 11:43 in > Nachricht > : >> On Mon, Oct 11, 2021 at 9:29 AM Ulrich Windl >> wrote: >> >>>>> Also how long would such a delay be: Long enough until t

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Move a resource only where another has Started

2021-10-11 Thread Andrei Borzenkov
On 11.10.2021 10:15, Ulrich Windl wrote: >>>> Andrei Borzenkov schrieb am 10.10.2021 um 16:52 in > Nachricht : >> On 10.10.2021 14:29, martin doc wrote: > > ... >> For each resource pacemaker computes allocation scores for each node >> (taking into acc

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.1.2: new fencing configuration options

2021-10-11 Thread Andrei Borzenkov
On Mon, Oct 11, 2021 at 9:29 AM Ulrich Windl wrote: > >> Also how long would such a delay be: Long enough until the other node > >> is > >> fenced, or long enough until the other node was fenced, booted > >> (assuming it > >> does) and is running pacemaker? > > > > The delay should be on the

Re: [ClusterLabs] Antw: [EXT] Move a resource only where another has Started

2021-10-10 Thread Andrei Borzenkov
On 10.10.2021 14:29, martin doc wrote: > ok, I think I've solved my problem or at least part of it. > > The issue was I was not including a "score" in any of my constraint > statements. This meant that "INFINITY" was being used. The result is that the > scores would always be the same. >

Re: [ClusterLabs] cofigured trace for Virtual Domains - automatic restart ?

2021-09-17 Thread Andrei Borzenkov
On 17.09.2021 22:13, Ken Gaillot wrote: > On Fri, 2021-09-17 at 15:54 +0200, Lentes, Bernd wrote: >> Hi, >> >> today i configured tracing for some VirtualDomains: >> >> ha-idg-2:~ # crm resource trace vm_documents-oo migrate_from >> INFO: Trace for vm_documents-oo:migrate_from is written to >>

Re: [ClusterLabs] 8 node cluster

2021-09-08 Thread Andrei Borzenkov
On Tue, Sep 7, 2021 at 8:37 PM M N S H SNGHL wrote: > > Hello Team, > > I am looking for some suggestions here. I have created an 8 node HA cluster > on my SuSE hosts. > Have configured certain group resources on it, which mostly run on a single > node. > > Everything works fine, but I am at a

Re: [ClusterLabs] Cluster node unfencing

2021-08-31 Thread Andrei Borzenkov
https://lists.clusterlabs.org/pipermail/users/2021-August/029584.html Your message is a word for word duplicate of the linked email. On Tue, Aug 31, 2021 at 6:47 PM Kiril Pashin wrote: > > Hi , > > From what I see in the documentation for fabric fencing, Pacemaker requires > an administrator

Re: [ClusterLabs] Question about automating cluster unfencing.

2021-08-28 Thread Andrei Borzenkov
On Fri, Aug 27, 2021 at 8:11 PM Gerry R Sommerville wrote: > > Hey all, > > From what I see in the documentation for fabric fencing, Pacemaker requires > an administrator to login to the node to manually start and unfence the node > after some failure. > >

Re: [ClusterLabs] Problem with a ping resource on CentOS 8

2021-08-27 Thread Andrei Borzenkov
On Fri, Aug 27, 2021 at 6:33 PM wrote: > > Hi, > > I'm running a two node cluster on CentOS Linux release 8.4.2105 > 4.18.0-305.12.1.el8_4.x86_64 with the following pacemaker: > pacemaker-2.0.5-9.el8_4.1.x86_64 > pacemaker-cluster-libs-2.0.5-9.el8_4.1.x86_64 >

Re: [ClusterLabs] Pacemaker multi-state resource stop not running although "pcs status" indicates "Stopped"

2021-08-14 Thread Andrei Borzenkov
On 13.08.2021 22:46, ChittaNagaraj, Raghav wrote: > Hello Team, > > Hope you doing well. > > Running into an issue with multi-state resources not running stop function on > a node but failing over to start the resource on another node part of the > cluster when corosync process is killed. > >

Re: [ClusterLabs] Cloned ressource is restarted on all nodes if one node fails

2021-08-09 Thread Andrei Borzenkov
On 09.08.2021 22:57, Reid Wahl wrote: > On Mon, Aug 9, 2021 at 6:19 AM Andrei Borzenkov wrote: > >> On 09.08.2021 16:00, Andreas Janning wrote: >>> Hi, >>> >>> yes, by "service" I meant the apache-clone resource. >>> >>> Maybe I

Re: [ClusterLabs] Cloned ressource is restarted on all nodes if one node fails

2021-08-09 Thread Andrei Borzenkov
On 09.08.2021 16:00, Andreas Janning wrote: > Hi, > > yes, by "service" I meant the apache-clone resource. > > Maybe I can give a more stripped down and detailed example: > > *Given the following configuration:* > [root@pacemaker-test-1 cluster]# pcs cluster cib --config > > > >

Re: [ClusterLabs] Cloned ressource is restarted on all nodes if one node fails

2021-08-09 Thread Andrei Borzenkov
On Mon, Aug 9, 2021 at 3:07 PM Andreas Janning wrote: > > Hi, > > I have just tried your suggestion by adding > name="interleave" value="true"/> > to the clone configuration. > Unfortunately, the behavior stays the same. The service is still restarted on > the passive node when

Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters - working :)

2021-08-06 Thread Andrei Borzenkov
On Fri, Aug 6, 2021 at 3:47 PM Ulrich Windl wrote: > > >>> Antony Stone schrieb am 06.08.2021 um > 14:41 in > Nachricht <202108061441.59936.antony.st...@ha.open.source.it>: > ... > > location pref_A GroupA rule ‑inf: site ne cityA > > location pref_B GroupB rule ‑inf: site ne cityB >

Re: [ClusterLabs] Sub‑clusters / super‑clusters - working :)

2021-08-06 Thread Andrei Borzenkov
On Fri, Aug 6, 2021 at 3:42 PM Antony Stone wrote: > > On Friday 06 August 2021 at 14:14:09, Andrei Borzenkov wrote: > > > On Thu, Aug 5, 2021 at 3:44 PM Antony Stone wrote: > > > > > > For anyone interested in the detail of how to do this (without needing >

Re: [ClusterLabs] Sub‑clusters / super‑clusters - working :)

2021-08-06 Thread Andrei Borzenkov
On Thu, Aug 5, 2021 at 3:44 PM Antony Stone wrote: > > On Thursday 05 August 2021 at 10:51:37, Antony Stone wrote: > > > On Thursday 05 August 2021 at 07:48:37, Ulrich Windl wrote: > > > > > > Have you ever tried to find out why this happens? (Talking about logs) > > > > Not in detail, no, but

Re: [ClusterLabs] Pacemaker/corosync behavior in case of partial split brain

2021-08-06 Thread Andrei Borzenkov
On Thu, Aug 5, 2021 at 9:25 PM Andrei Borzenkov wrote: > > Three nodes A, B, C. Communication between A and B is blocked > (completely - no packet can come in both direction). A and B can > communicate with C. > > I expected that result will be two partitions - (A, C) and (B, C)

[ClusterLabs] Pacemaker/corosync behavior in case of partial split brain

2021-08-05 Thread Andrei Borzenkov
Three nodes A, B, C. Communication between A and B is blocked (completely - no packet can come in both direction). A and B can communicate with C. I expected that result will be two partitions - (A, C) and (B, C). To my surprise, A went offline leaving (B, C) running. It was always the same node

Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Andrei Borzenkov
On 05.08.2021 00:01, Antony Stone wrote: > On Wednesday 04 August 2021 at 22:06:39, Frank D. Engel, Jr. wrote: > >> There is no safe way to do what you are trying to do. >> >> If the resource is on cluster A and contact is lost between clusters A >> and B due to a network failure, how does

Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Andrei Borzenkov
On Wed, Aug 4, 2021 at 5:03 PM Antony Stone wrote: > > On Wednesday 04 August 2021 at 13:31:12, Andrei Borzenkov wrote: > > > On Wed, Aug 4, 2021 at 1:48 PM Antony Stone wrote: > > > On Tuesday 03 August 2021 at 12:12:03, Strahil Nikolov via Users wrote: > > >

Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Andrei Borzenkov
On Wed, Aug 4, 2021 at 1:48 PM Antony Stone wrote: > > On Tuesday 03 August 2021 at 12:12:03, Strahil Nikolov via Users wrote: > > > Won't something like this work ? Each node in LA will have same score of > > 5000, while other cities will be -5000. > > > > pcs constraint location DummyRes1 rule

Re: [ClusterLabs] Antw: [EXT] Moving resource only one way

2021-08-03 Thread Andrei Borzenkov
On 03.08.2021 20:19, Ervin Hegedüs wrote: > Hi there, > > Okay, so I thought I'm done, but today I ran into an issue. There are two > nodes, here is the config: > > node 1: sles15-1 > node 2: sles15-2 > primitive virtualip IPaddr2 \ > params ip=192.168.72.27 nic=eth0 cidr_netmask=24 \ >

Re: [ClusterLabs] Sub-clusters / super-clusters?

2021-08-03 Thread Andrei Borzenkov
On Tue, Aug 3, 2021 at 11:40 AM Antony Stone wrote: > > To implement the above "one resource which can run anywhere, but only a single > instance", I joined together clusters A and B, and placed the corresponding > location constraints on the resources I want only at A and the ones I want > only

Re: [ClusterLabs] Antw: Re: [EXT] Re: Two node cluster without fencing and no split brain?

2021-07-26 Thread Andrei Borzenkov
On Mon, Jul 26, 2021 at 4:53 PM john tillman wrote: > >> > >> Maybe explain how it should work: > >> If the two nodes cannot rech each other, but each can reach the ping > >> node, > >> which node has the quorum then? > >> > > > > Guess both - which is what is played down as 'disadvantage' in the

Re: [ClusterLabs] Two node cluster without fencing and no split brain?

2021-07-22 Thread Andrei Borzenkov
On Thu, Jul 22, 2021 at 1:05 PM Jehan-Guillaume de Rorthais wrote: > To do some rewording in regard with the current topic: if Pacemaker is able to > stop its resources after a quorum lost, it will not reboot, no "death" either. > And how exactly is the remaining quorate partition supposed to

Re: [ClusterLabs] Two node cluster without fencing and no split brain?

2021-07-22 Thread Andrei Borzenkov
On Thu, Jul 22, 2021 at 12:43 PM Jehan-Guillaume de Rorthais wrote: > > On Wed, 21 Jul 2021 12:45:40 -0400 > Digimer wrote: > > > On 2021-07-21 3:26 a.m., Jehan-Guillaume de Rorthais wrote: > > > Hi, > > > > > > On Wed, 21 Jul 2021 04:28:30 + (UTC) > > > Strahil Nikolov via Users wrote: > >

Re: [ClusterLabs] [EXT] Re: Two node cluster without fencing and no split brain?

2021-07-21 Thread Andrei Borzenkov
pad.com/managing_computers/2007/10/split-brain-quo.html > There are eight possible states that I tried to illustrate on the attached > sketch (S="Split Brain", "Q=Quorum, F=Fencing). > > ;-) > > Regards, > Ulrich > > > >>> Andrei Borzenko

Re: [ClusterLabs] Two node cluster without fencing and no split brain?

2021-07-21 Thread Andrei Borzenkov
On Wed, Jul 21, 2021 at 11:50 AM Frank D. Engel, Jr. wrote: > > OpenVMS can do this sort of thing without a requirement for fencing (you > still need a third disk as a quorum device in a 2-node cluster), but > Linux (at least in its current form) cannot. From what I can tell the > fencing

Re: [ClusterLabs] Two node cluster without fencing and no split brain?

2021-07-20 Thread Andrei Borzenkov
On 21.07.2021 07:28, Strahil Nikolov via Users wrote: > Hi, > consider using a 3rd system as a Q disk. What was not clear in "Quorum is a different concept and doesn't remove the need for fencing"? > Also, you can use iscsi from that node as a SBD device, so you will have > proper fencing .If

Re: [ClusterLabs] pcs stonith update problems

2021-07-15 Thread Andrei Borzenkov
On 16.07.2021 01:02, Digimer wrote: > Hi all, > > I've got a predicament... I want to update a stonith resource to > remove an argument. Specifically, when resource move nodes, I want to > change the stonith delay to favour the new host. This involves adding > the 'delay="x"' argument to one

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: @ maillist Admins ‑ DMARC (yahoo)

2021-07-14 Thread Andrei Borzenkov
On Wed, Jul 14, 2021 at 10:21 AM Ulrich Windl wrote: > > What I meant is: > The original signature confirms that the message is from the submitter > (author). > After mangling the message, you can't re-testify that the message is still > from that author, but you can testify that the message is

Re: [ClusterLabs] unexpected fenced node and promotion of the new master PAF - postgres

2021-07-13 Thread Andrei Borzenkov
On 13.07.2021 23:09, damiano giuliani wrote: > Hi Klaus, thanks for helping, im quite lost because cant find out the > causes. > i attached the corosync logs of all three nodes hoping you guys can find > and hint me something i cant see. i really appreciate the effort. > the old master log seems

Re: [ClusterLabs] QDevice vs 3rd host for majority node quorum

2021-07-13 Thread Andrei Borzenkov
On 13.07.2021 19:52, Gerry R Sommerville wrote: > Hello everyone, > I am currently comparing using QDevice vs adding a 3rd host to my > even-number-node cluster and I am wondering about the details concerning > network > communication. > For example, say my cluster is utilizing multiple

Re: [ClusterLabs] @ maillist Admins - DMARC (yahoo)

2021-07-13 Thread Andrei Borzenkov
On Mon, Jul 12, 2021 at 5:50 PM wrote: > > On Sat, 2021-07-10 at 12:34 +0100, lejeczek wrote: > > Hi Admins(of this mailing list) > > > > Could you please fix in DMARC(s) so those of us who are on > > Yahoo would be able to receive own emails/thread. > > > > many thanks, L. > > I suppose we

Re: [ClusterLabs] detecting MySQL database corruption

2021-06-29 Thread Andrei Borzenkov
On 29.06.2021 19:23, john tillman wrote: >> On 29.06.2021 18:14, john tillman wrote: >>> Hello All, >>> >>> I was wondering if there was a way I can move a resource in response to >>> a >>> corrupted MySQL innodb database? So, MySQL service would be running, >>> just >>> the database/table I need

Re: [ClusterLabs] detecting MySQL database corruption

2021-06-29 Thread Andrei Borzenkov
On 29.06.2021 18:14, john tillman wrote: > Hello All, > > I was wondering if there was a way I can move a resource in response to a > corrupted MySQL innodb database? So, MySQL service would be running, just > the database/table I need access to is corrupted. Define "corrupted". MySQL resource

Re: [ClusterLabs] Antw: [EXT] Correctly stop pacemaker on 2-node cluster with SBD and failed devices?

2021-06-16 Thread Andrei Borzenkov
On Wed, Jun 16, 2021 at 9:05 AM Ulrich Windl wrote: > > >>> Andrei Borzenkov schrieb am 15.06.2021 um 17:20 in > Nachricht > : > > We had the following situation > > > > 2‑node cluster with single device (just single external storage > > a

Re: [ClusterLabs] Issue with Pacemaker config related to VIP and an LSB resource

2021-06-15 Thread Andrei Borzenkov
On 16.06.2021 01:49, Michael Romero wrote: > > At which point an administrator or an automated script could intervene If you are going to always use manual intervention outside of pacemaker, just leave failure timeout on default 0 so cluster will never clear failure count automatically on a

Re: [ClusterLabs] Issue with Pacemaker config related to VIP and an LSB resource

2021-06-15 Thread Andrei Borzenkov
On 16.06.2021 01:49, Michael Romero wrote: > Hello, > > I currently have Pacemaker v2.0.3-3ubuntu4.2 running on two Ubuntu 20.04 > LTS systems. My config consists of two service groups, both of which have > an LSB resource and a floating IP resource. The LSB resource is > configured with a

Re: [ClusterLabs] Correctly stop pacemaker on 2-node cluster with SBD and failed devices?

2021-06-15 Thread Andrei Borzenkov
On 15.06.2021 20:48, Strahil Nikolov wrote: > I'm using 'pcs cluster stop' (or it's crm alternative),yet I'm not sure if it > will help in this case. > No it won't. It will still stop pacemaker. > Most probably the safest way is to wait for the storage to be recovered, as > without the

Re: [ClusterLabs] Correctly stop pacemaker on 2-node cluster with SBD and failed devices?

2021-06-15 Thread Andrei Borzenkov
On Tue, Jun 15, 2021 at 6:43 PM Strahil Nikolov wrote: > > How did you stop pacemaker ? systemctl stop pacemaker surprise :) > Usually I use 'pcs cluster stop' or it's crm alternative. > > Best Regards, > Strahil Nikolov > > On Tue, Jun 15, 2021 at 18:21, Andrei Borz

[ClusterLabs] Correctly stop pacemaker on 2-node cluster with SBD and failed devices?

2021-06-15 Thread Andrei Borzenkov
We had the following situation 2-node cluster with single device (just single external storage available). Storage failed. So SBD lost access to the device. Cluster was still up, both nodes were running. We thought that access to storage was restored, but one step was missing so devices appeared

Re: [ClusterLabs] Pacemaker 2.1.0 final release now available

2021-06-11 Thread Andrei Borzenkov
On Wed, Jun 9, 2021 at 8:58 PM wrote: > > I had generated the docs from a host with older versions of some of the > doc tools. I regenerated them from a newer host. Some tables still have > issues, but long lines are now wrapped. > Yes, that is fixed, thank you. > > > > > > There are problems

Re: [ClusterLabs] Pacemaker 2.1.0 final release now available

2021-06-09 Thread Andrei Borzenkov
On Wed, Jun 9, 2021 at 12:24 AM wrote: > > Hi all, > > Pacemaker 2.1.0 has officially been released, with source code > available at: > > https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.0 > > Highlights include OCF Resource Agent API 1.1 compatibility, > noncritical

Re: [ClusterLabs] One Failed Resource = Failover the Cluster?

2021-06-07 Thread Andrei Borzenkov
On 07.06.2021 22:49, Eric Robinson wrote: > > Which is what I don't want to happen. I only want the cluster to failover if > one of the lower dependencies fails (drbd or filesystem). If one of the MySQL > instances fails, I do not want the cluster to move everything for the sake of > that one

Re: [ClusterLabs] What Does the Monitor Action of the IPaddr2 RA Actually Do?

2021-06-02 Thread Andrei Borzenkov
On 02.06.2021 23:38, Eric Robinson wrote: >> -Original Message- >> From: Users On Behalf Of Andrei >> Borzenkov >> Sent: Tuesday, June 1, 2021 1:14 PM >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] What Does the Monitor Act

Re: [ClusterLabs] What Does the Monitor Action of the IPaddr2 RA Actually Do?

2021-06-01 Thread Andrei Borzenkov
On 01.06.2021 19:50, Eric Robinson wrote: > This is related to another question I currently have ongoing. > > I see in the logs that monitoring failed for a VIP resource, and that may > have been responsible for node failover. I read the code for the IPaddr2 RA > but it is not clear to me

Re: [ClusterLabs] Cluster Stopped, No Messages?

2021-06-01 Thread Andrei Borzenkov
On 01.06.2021 19:21, Eric Robinson wrote: > >> -Original Message- >> From: Users On Behalf Of Klaus >> Wenninger >> Sent: Monday, May 31, 2021 12:54 AM >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] Cluster Stopped, No Messages? >> >> On 5/29/21 12:21 AM, Strahil Nikolov

Re: [ClusterLabs] Pacemaker Cluster help

2021-06-01 Thread Andrei Borzenkov
On 01.06.2021 18:20, kgail...@redhat.com wrote: > On Thu, 2021-05-27 at 20:46 +0300, Andrei Borzenkov wrote: >> On 27.05.2021 15:36, Nathan Mazarelo wrote: >>> Is there a way to have pacemaker resource groups failover if all >>> floating IP resources are unavailable?

Re: [ClusterLabs] Pacemaker Cluster help

2021-05-27 Thread Andrei Borzenkov
On 27.05.2021 15:36, Nathan Mazarelo wrote: > Is there a way to have pacemaker resource groups failover if all floating IP > resources are unavailable? > > I want to have multiple floating IPs in a resource group that will only > failover if all IPs cannot work. Each floating IP is on a

Re: [ClusterLabs] Coming in Pacemaker 2.1.0: OCF resource agent path

2021-05-19 Thread Andrei Borzenkov
On 20.05.2021 00:33, kgail...@redhat.com wrote: > Hi all, > > We squeezed one more feature into the Pacemaker 2.1.0 release for rc2: > the ability to search multiple directories for OCF resource agents. > > Previously, the OCF root (typically /usr/lib/ocf) could be specified at > build time via

Re: [ClusterLabs] OCFS2 fragmentation with snapshots

2021-05-18 Thread Andrei Borzenkov
On Tue, May 18, 2021 at 1:52 PM Ulrich Windl wrote: > > Hi! > > I thought using the reflink feature of OCFS2 would be just a nice way to make > crash-consistent VM snapshots while they are running. > As it is a bit tricky to find out how much data is shared between snapshots, > I started to

Re: [ClusterLabs] DRBD + VDO HowTo?

2021-05-18 Thread Andrei Borzenkov
On Tue, May 18, 2021 at 11:43 AM Eric Robinson wrote: > > But really, the most simple would be to use systemd service. Then you do > > not really need to monitor anything. Resource is assumed to be active when > > service is started. That is enough to quickly get it going. > > > > That was the

Re: [ClusterLabs] DRBD + VDO HowTo?

2021-05-18 Thread Andrei Borzenkov
; > > > fi > > > > if [ "$R" == "online" ]; then > > > > echo "running on $MY_HOSTNAME" > > > > exit 0 #--lsb: success > > > > fi > >

Re: [ClusterLabs] DRBD + VDO HowTo?

2021-05-18 Thread Andrei Borzenkov
need to monitor anything. Resource is assumed to be active when service is started. That is enough to quickly get it going. > > > -----Original Message----- > > From: Users On Behalf Of Andrei > > Borzenkov > > Sent: Tuesday, May 18, 2021 12:22 AM > > To: users@clusterl

Re: [ClusterLabs] DRBD + VDO HowTo?

2021-05-17 Thread Andrei Borzenkov
On 17.05.2021 18:18, Eric Robinson wrote: > To Strahil and Klaus – > > I created the vdo devices using default parameters, so ‘auto’ mode was > selected by default. vdostatus shows that the current mode is async. The > underlying drbd devices are running protocol C, so I assume that vdo should

Re: [ClusterLabs] Antw: [EXT] Moving multi-state resources

2021-05-13 Thread Andrei Borzenkov
On Wed, May 12, 2021 at 8:15 PM Alastair Basden wrote: > > > > On 12.05.2021 20:02, Alastair Basden wrote: > Oddly enough, if I: > pcs resource move resourcedrbdClone node2 > it moves it to node 2 okay. > > But then if I > pcs resource clear resourcedrbdClone >

Re: [ClusterLabs] Antw: [EXT] Re: VirtualDomain & "deeper" monitors - what/how?

2021-05-12 Thread Andrei Borzenkov
On 03.05.2021 09:48, Ulrich Windl wrote: Ken Gaillot schrieb am 30.04.2021 um 16:57 in > Nachricht > <3acef4bc31923fb019619c713300444c2dcd354a.ca...@redhat.com>: >> On Fri, 2021‑04‑30 at 11:00 +0100, lejeczek wrote: >>> Hi guys >>> >>> I'd like to ask around for thoughts & suggestions on any

Re: [ClusterLabs] Antw: [EXT] Moving multi-state resources

2021-05-12 Thread Andrei Borzenkov
On 12.05.2021 20:02, Alastair Basden wrote: >>> Oddly enough, if I: >>> pcs resource move resourcedrbdClone node2 >>> it moves it to node 2 okay. >>> >>> But then if I >>> pcs resource clear resourcedrbdClone >>> it moves it back to node1. >>> >>> Which is odd, given its score for role Master is

Re: [ClusterLabs] 2 node mariadb-cluster - constraint-problems ?

2021-05-12 Thread Andrei Borzenkov
d take care > > fatcharly > > > >> Sorry but this is new for me. >> >> Best regards and take care >> >> fatcharly >> >> >> >> >>> Gesendet: Dienstag, 11. Mai 2021 um 17:19 Uhr >>> Von: "Andre

Re: [ClusterLabs] Antw: [EXT] Moving multi-state resources

2021-05-12 Thread Andrei Borzenkov
On 12.05.2021 17:16, Alastair Basden wrote: > Oddly enough, if I: > pcs resource move resourcedrbdClone node2 > it moves it to node 2 okay. > > But then if I > pcs resource clear resourcedrbdClone > it moves it back to node1. > > Which is odd, given its score for role Master is higher on node 2.

Re: [ClusterLabs] Is reverse order for "promote" supposed to be "demote"?

2021-05-11 Thread Andrei Borzenkov
On 11.05.2021 20:30, Andrei Borzenkov wrote: > On 11.05.2021 19:03, Vladislav Bogdanov wrote: >> Hi. >> >> Try >> order o_fs_drbd0_after_ms_drbd0 Mandatory: ms_drbd0:promote fs_drbd0:start >> > > This seems to work, but is not "start" implied when

Re: [ClusterLabs] Is reverse order for "promote" supposed to be "demote"?

2021-05-11 Thread Andrei Borzenkov
o be completely identical? > > > On May 11, 2021 6:35:58 PM Andrei Borzenkov wrote: > >> While testing drbd cluster I found errors (drbd device busy) when >> stopping drbd master with mounted filesystem. I do have >> >> order o_fs_drbd0_after_ms_drbd0 Manda

Re: [ClusterLabs] multi-state constraints

2021-05-11 Thread Andrei Borzenkov
l attribute. > So perhaps: > pcs constraint location resourceClone rule role=master score=50 \#uname > eq node2 > > Cheers, > Alastair. > > On Tue, 11 May 2021, Andrei Borzenkov wrote: > >> [EXTERNAL EMAIL] >> >> On Tue, May 11, 2021 at 10:50 AM Alastai

[ClusterLabs] Is reverse order for "promote" supposed to be "demote"?

2021-05-11 Thread Andrei Borzenkov
While testing drbd cluster I found errors (drbd device busy) when stopping drbd master with mounted filesystem. I do have order o_fs_drbd0_after_ms_drbd0 Mandatory: ms_drbd0:promote fs_drbd0 and I assumed pacemaker automatically does reverse as "first stop then demote". It does not - umount and

Re: [ClusterLabs] 2 node mariadb-cluster - constraint-problems ?

2021-05-11 Thread Andrei Borzenkov
On 11.05.2021 17:43, fatcha...@gmx.de wrote: > Hi, > > I'm using a CentOS 8.3.2011 with a pacemaker-2.0.4-6.el8_3.1.x86_64 + > corosync-3.0.3-4.el8.x86_64 and kmod-drbd90-9.0.25-2.el8_3.elrepo.x86_64. > The cluster consists of two nodes which are providing a ha-mariadb with the > help of two

Re: [ClusterLabs] multi-state constraints

2021-05-11 Thread Andrei Borzenkov
> here, and so I can forget about configuring drbd in pacemaker? Is that > how it is supposed to work? i.e. I can just concentrate on the overlying > file system. > > Sorry that I'm being a bit slow about all this. > > Thanks, > Alastair. > > On Tue, 11 May 2021, And

<    1   2   3   4   5   6   7   >