Re: [ClusterLabs] multi-state constraints

2021-05-10 Thread Andrei Borzenkov
On 10.05.2021 20:36, Alastair Basden wrote: > Hi Andrei, > > Thanks.  So, in summary, I need to: > pcs resource create resourcedrbd0 ocf:linbit:drbd drbd_resource=disk0 op > monitor interval=60s > pcs resource master resourcedrbd0Clone resourcedrbd0 master-max=1 > master-node-max=1 clone-max=2

Re: [ClusterLabs] multi-state constraints

2021-05-10 Thread Andrei Borzenkov
On 10.05.2021 19:17, Andrei Borzenkov wrote: > On 10.05.2021 16:50, Alastair Basden wrote: >> Hi, >> >> We have a 4 node cluster, on which we want a drbd service to run >> master/slave on 2 nodes (with 1 of those preferred as the master).  It >> should not be st

Re: [ClusterLabs] multi-state constraints

2021-05-10 Thread Andrei Borzenkov
On 10.05.2021 16:50, Alastair Basden wrote: > Hi, > > We have a 4 node cluster, on which we want a drbd service to run > master/slave on 2 nodes (with 1 of those preferred as the master).  It > should not be started at all on the other 2 nodes. > > To set this up, we've done: > pcs resource

Re: [ClusterLabs] bit of wizardry bit of trickery needed.

2021-05-10 Thread Andrei Borzenkov
On 10.05.2021 16:48, lejeczek wrote: > Hi guys > > Before I begin my adventure with this I though I would ask experts if > something like below is possible. > > resourceA if started on nodeA, then nodes B & C start resourceB (or > recourceC) > Configure colocation with negative score between

Re: [ClusterLabs] fencing

2021-05-08 Thread Andrei Borzenkov
On 07.05.2021 13:36, Kyle O'Donnell wrote: > Hi Everyone. > > We've setup fencing with our ilo/idrac interfaces and things generally work > well but during some of our failover scenario testing we ran into issues when > we "failed' the switches in which those ilo/idrac interfaces were

Re: [ClusterLabs] Resolving cart before the horse with mounted filesystems.

2021-05-04 Thread Andrei Borzenkov
On 04.05.2021 18:43, Matthew Schumacher wrote: > On 5/3/21 7:19 AM, Andrei Borzenkov wrote: >> This was already asked for the same reason. No, there is not. The goal >> of monitor is to find out whether resource is active or not. If >> prerequisite resources are not t

Re: [ClusterLabs] Resolving cart before the horse with mounted filesystems.

2021-05-03 Thread Andrei Borzenkov
On 03.05.2021 16:12, Matthew Schumacher wrote: ... > > You are right Andrei.  Looking at the logs: > ... > May 03 03:02:41 node2 pacemaker-controld  [1283] (do_lrm_rsc_op) info: > Performing key=7:1:7:b8b0100c-2951-4d07-83da-27cfc1225718 > op=vm-testvm_monitor_0 This is probe operation. > May

Re: [ClusterLabs] Resolving cart before the horse with mounted filesystems.

2021-05-03 Thread Andrei Borzenkov
On 03.05.2021 06:27, Matthew Schumacher wrote: > On 4/30/21 12:08 PM, Matthew Schumacher wrote: >> On 4/30/21 11:51 AM, Ken Gaillot wrote: >>> On Fri, 2021-04-30 at 16:20 +, Strahil Nikolov wrote: Ken ment yo use 'Filesystem' resourse for mounting that NFS server and then clone that

Re: [ClusterLabs] Resolving cart before the horse with mounted filesystems.

2021-04-30 Thread Andrei Borzenkov
On 30.04.2021 17:26, Matthew Schumacher wrote: > I have an issue that I'm not sure how to resolve so feedback is welcome. > > I need to mount a local NFS file system on my node before I start a > VirtualDomain resource which depends on it, however, the NFS server is > itself a resource on the

Re: [ClusterLabs] VirtualDomain & "deeper" monitors - what/how?

2021-04-30 Thread Andrei Borzenkov
On 30.04.2021 17:57, Ken Gaillot wrote: > On Fri, 2021-04-30 at 11:00 +0100, lejeczek wrote: >> Hi guys >> >> I'd like to ask around for thoughts & suggestions on any >> semi/official ways to monitor VirtualDomain. >> Something beyond what included RA does - such as actual >> health testing of

Re: [ClusterLabs] Question about ping nodes

2021-04-22 Thread Andrei Borzenkov
On Mon, Apr 19, 2021 at 8:20 PM Andrei Borzenkov wrote: > > Although I guess the same can be achieved by using positive score. > Instead of banning node without connectivity just prefer node with > connectivity. It sounds more simple. > And is even documented on pace

Re: [ClusterLabs] Preventing multiple resources from moving at the same time.

2021-04-21 Thread Andrei Borzenkov
On 21.04.2021 19:27, Matthew Schumacher wrote: > On 4/21/21 12:48 AM, Klaus Wenninger wrote: >> Just to better understand the issue ... >> Does the first resource implement storage that is being used >> by the resource that is being migrated/moved? >> Or is it just the combination of 2 parallel

Re: [ClusterLabs] Question about ping nodes

2021-04-19 Thread Andrei Borzenkov
>> To: Cluster Labs - All topics related to open-source clustering >> welcomed >> Subject: Re: [ClusterLabs] Question about ping nodes >> >> On Sun, 2021-04-18 at 17:31 +0300, Andrei Borzenkov wrote: >>> On 18.04.2021 08:41, Andrei Borzenkov wrote: >>>

Re: [ClusterLabs] Question about ping nodes

2021-04-18 Thread Andrei Borzenkov
On 18.04.2021 08:41, Andrei Borzenkov wrote: > On 17.04.2021 22:41, Piotr Kandziora wrote: >> Hi, >> >> Hope some guru will advise here ;) >> >> I've got two nodes cluster with some resource placement dependent on ping >> node visibility ( >>

Re: [ClusterLabs] Question about ping nodes

2021-04-17 Thread Andrei Borzenkov
On 17.04.2021 23:01, Antony Stone wrote: > On Saturday 17 April 2021 at 21:41:16, Piotr Kandziora wrote: > >> Hi, >> >> Hope some guru will advise here ;) >> >> I've got two nodes cluster with some resource placement dependent on ping >> node visibility ( >>

Re: [ClusterLabs] Question about ping nodes

2021-04-17 Thread Andrei Borzenkov
On 17.04.2021 22:41, Piotr Kandziora wrote: > Hi, > > Hope some guru will advise here ;) > > I've got two nodes cluster with some resource placement dependent on ping > node visibility ( >

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Node fenced for unknown reason

2021-04-15 Thread Andrei Borzenkov
On 15.04.2021 23:09, Steffen Vinther Sørensen wrote: > On Thu, Apr 15, 2021 at 3:39 PM Klaus Wenninger wrote: >> >> On 4/15/21 3:26 PM, Ulrich Windl wrote: >> Steffen Vinther Sørensen schrieb am 15.04.2021 um >>> 14:56 in >>> Nachricht >>> : On Thu, Apr 15, 2021 at 2:29 PM Ulrich Windl

Re: [ClusterLabs] Node fenced for unknown reason

2021-04-15 Thread Andrei Borzenkov
On 15.04.2021 14:10, Steffen Vinther Sørensen wrote: > Hi there, > > In this 3 node cluster, node03 been offline for a while, and being > brought up to service. Then a migration of a VirtualDomain is being > attempted, and node02 is then fenced. > > Provided is logs from all 2 nodes, and the

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Node fenced for unknown reason

2021-04-15 Thread Andrei Borzenkov
On 15.04.2021 16:39, Klaus Wenninger wrote: > On 4/15/21 3:26 PM, Ulrich Windl wrote: > Steffen Vinther Sørensen schrieb am 15.04.2021 um >> 14:56 in >> Nachricht >> : >>> On Thu, Apr 15, 2021 at 2:29 PM Ulrich Windl >>> wrote: >>> Steffen Vinther Sørensen schrieb am >>> 15.04.2021

Re: [ClusterLabs] Single-node automated startup question

2021-04-14 Thread Andrei Borzenkov
On 14.04.2021 17:50, Digimer wrote: > Hi all, > > As we get close to finish our Anvil! switch to pacemaker, I'm trying > to tie up loose ends. One that I want feedback on is the pacemaker > version of cman's old 'post_join_delay' feature. > > Use case example; > > A common use for the

Re: [ClusterLabs] Custom RA for Multi-Tenant MySQL?

2021-04-11 Thread Andrei Borzenkov
On 11.04.2021 21:47, Eric Robinson wrote: >> -Original Message- >> From: Users On Behalf Of Andrei >> Borzenkov >> Sent: Sunday, April 11, 2021 1:20 PM >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] Custom RA for Multi-Tenant MySQL? >

Re: [ClusterLabs] VirtualDomain - monitor misses to report & plays up

2021-04-11 Thread Andrei Borzenkov
On 11.04.2021 21:38, lejeczek wrote: > Hi guys. > > I've experiencing weir "handling" of VirtualDomain by the cluster. It > seems that cluster sometimes fails to report real state of VM which > results sometime in troubles - like when cluster thinks VM is not > running, which is running then

Re: [ClusterLabs] Custom RA for Multi-Tenant MySQL?

2021-04-11 Thread Andrei Borzenkov
On 11.04.2021 20:07, Eric Robinson wrote: > We're writing a custom RA for a multi-tenant MySQL cluster that runs in > active/standby mode. I've read the RA documentation about what exit codes > should be returned for various outcomes, but something is still unclear to me. > > We run multiple

Re: [ClusterLabs] how to setup single node cluster

2021-04-08 Thread Andrei Borzenkov
On 08.04.2021 09:26, d tbsky wrote: > Reid Wahl >> I don't think we do require fencing for single-node clusters. (Anyone at Red >> Hat, feel free to comment.) I vaguely recall an internal mailing list or IRC >> conversation where we discussed this months ago, but I can't find it now. >> I've

Re: [ClusterLabs] "iscsi.service: Unit cannot be reloaded because it is inactive."

2021-04-03 Thread Andrei Borzenkov
On 03.04.2021 17:35, Jason Long wrote: > Hello, > I configure my clustering labs with three nodes. You have two node cluster. What is running on nodes outside of cluster is out of scope of pacemaker. > One of my nodes is iSCSI Shared Storage. Everything was OK until I restarted > my iSCSI

Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question

2021-04-02 Thread Andrei Borzenkov
ng list. > I have noticed that the cluster is killing those when the cluster is being > stopped (including NFS) . > > > Best Regards, > Strahil Nikolov > > On Fri, Apr 2, 2021 at 14:31, Andrei Borzenkov > wrote: > On Fri, Apr 2, 2021 at 12:30 PM Strahil Nikolov wrote:

Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question

2021-04-02 Thread Andrei Borzenkov
On Fri, Apr 2, 2021 at 12:30 PM Strahil Nikolov wrote: > > To be more specific, the processes left are 'hdbrsutil' This process holds database content in memory after shutdown (I believe, for 1 hour by default) to facilitate fast startup. You can disable it. See SAP note 2159435. > and the

Re: [ClusterLabs] failure-timeout not working in corosync 2.0.1

2021-04-01 Thread Andrei Borzenkov
On 01.04.2021 08:20, Andrei Borzenkov wrote: > On 01.04.2021 00:21, Antony Stone wrote: >> On Wednesday 31 March 2021 at 23:11:50, Reid Wahl wrote: >> >>> Maybe Pacemaker-1 was looser in its handling of resource meta attributes vs >>> operation meta attributes. Goo

Re: [ClusterLabs] failure-timeout not working in corosync 2.0.1

2021-03-31 Thread Andrei Borzenkov
On 01.04.2021 00:21, Antony Stone wrote: > On Wednesday 31 March 2021 at 23:11:50, Reid Wahl wrote: > >> Maybe Pacemaker-1 was looser in its handling of resource meta attributes vs >> operation meta attributes. Good question. > > Returning to my suspicion that it's more likely me that simply did

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-03-31 Thread Andrei Borzenkov
On Wed, Mar 31, 2021 at 8:34 AM Strahil Nikolov wrote: > > Damn... I am too hasty. > > It seems that the 2 resources I have already configured are also running on > the master. > > The colocation constraint is like: > > rsc_bkpip3_SAPHana_SID_HDBinst_num with rsc_SAPHana_SID_HDBinst_num-clone >

Re: [ClusterLabs] Live migration possible with KSM ?

2021-03-30 Thread Andrei Borzenkov
On 30.03.2021 18:16, Lentes, Bernd wrote: > Hi, > > currently i'm reading "Mastering KVM Virtualization", published by Packt > Publishing, a book i can really recommend. > There are some proposals for tuning guests. One is KSM (kernel samepage > merging), which sounds quite interesting. >

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-03-30 Thread Andrei Borzenkov
On 30.03.2021 17:42, Ken Gaillot wrote: >> >> Colocation does not work, this will force everything on the same node >> where master is active and that is not what we want. > > Nope, you can colocate by node attribute instead of node. > > Colocating by node attribute says "put this resource on a

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-03-29 Thread Andrei Borzenkov
On 29.03.2021 20:12, Ken Gaillot wrote: > On Sun, 2021-03-28 at 09:20 +0300, Andrei Borzenkov wrote: >> On 28.03.2021 07:16, Strahil Nikolov wrote: >>> I didn't mean DC as a designated coordinator, but as a physical >>> Datecenter location. >>> Last time I

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: Order set troubles

2021-03-29 Thread Andrei Borzenkov
On 29.03.2021 11:11, Ulrich Windl wrote: >>>> Andrei Borzenkov schrieb am 27.03.2021 um 06:37 in > Nachricht <7c294034-56c3-baab-73c6-7909ab554...@gmail.com>: >> On 26.03.2021 22:18, Reid Wahl wrote: >>> On Fri, Mar 26, 2021 at 6:27 AM Andrei Borzenkov >

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-03-28 Thread Andrei Borzenkov
na_${SID}_vhost attribute for each node and this attribute must be unique and different between two sites. May be worth to look into it. > Best Regards,Strahil Nikolov > > > On Fri, Feb 19, 2021 at 16:51, Andrei Borzenkov wrote: > On Fri, Feb 19, 2021 at 2:44 PM Strahil Nikolov

Re: [ClusterLabs] Which fence agent is needed for an Apache web server cluster?

2021-03-27 Thread Andrei Borzenkov
On 28.03.2021 02:42, Reid Wahl wrote: > On Sat, Mar 27, 2021 at 4:28 PM Strahil Nikolov > wrote: > >> I had to tune the fence_ipmi recently on some older HPE blades. The >> default settings were working, but also returning some output about >> problems negotiating the cypher. >> As that output

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Order set troubles

2021-03-26 Thread Andrei Borzenkov
On 26.03.2021 22:18, Reid Wahl wrote: > On Fri, Mar 26, 2021 at 6:27 AM Andrei Borzenkov > wrote: > >> On Fri, Mar 26, 2021 at 10:17 AM Ulrich Windl >> wrote: >>> >>>>>> Andrei Borzenkov schrieb am 26.03.2021 um >> 06:19 in >>

Re: [ClusterLabs] ocf-tester always claims failure, even with built-in resource agents?

2021-03-26 Thread Andrei Borzenkov
On 26.03.2021 17:28, Antony Stone wrote: > Hi. > > I've just signed up to the list. I've been using corosync and pacemaker for > several years, mostly under Debian 9, which means: > > corosync 2.4.2 > pacemaker 1.1.16 > > I've recently upgraded a test cluster to Debian 10, which

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Order set troubles

2021-03-26 Thread Andrei Borzenkov
On Fri, Mar 26, 2021 at 10:17 AM Ulrich Windl wrote: > > >>> Andrei Borzenkov schrieb am 26.03.2021 um 06:19 in > Nachricht <534274b3-a6de-5fac-0ae4-d02c305f1...@gmail.com>: > > On 25.03.2021 21:45, Reid Wahl wrote: > >> FWIW we have this KB article (

Re: [ClusterLabs] Antw: [EXT] Re: Order set troubles

2021-03-25 Thread Andrei Borzenkov
rt (systemd racing condition with dnsmasq) >> >> Best Regards, >> Strahil Nikolov >> >> On Thu, Mar 25, 2021 at 12:18, Andrei Borzenkov >> wrote: >> On Thu, Mar 25, 2021 at 10:31 AM Strahil Nikolov >> wrote: >>> >>> Use Case: >>>

Re: [ClusterLabs] Antw: [EXT] Re: Order set troubles

2021-03-25 Thread Andrei Borzenkov
On Thu, Mar 25, 2021 at 10:31 AM Strahil Nikolov wrote: > > Use Case: > > nfsA is shared filesystem for HANA running in site A > nfsB is shared filesystem for HANA running in site B > > clusterized resource of type SAPHanaTopology must run on all systems if the > FS for the HANA is running >

Re: [ClusterLabs] Order set troubles

2021-03-24 Thread Andrei Borzenkov
On 24.03.2021 20:56, Ken Gaillot wrote: > On Wed, 2021-03-24 at 09:27 +, Strahil Nikolov wrote: >> Hello All, >> >> I have a trouble creating an order set . >> The end goal is to create a 2 node cluster where nodeA will mount >> nfsA , while nodeB will mount nfsB.On top of that a depended

Re: [ClusterLabs] How to use dnsupdate?

2021-03-09 Thread Andrei Borzenkov
On 10.03.2021 04:47, Ross Sponholtz wrote: > Hi, > I've been working with Linux clustering for several years, mostly in Azure. > However I've got a bit of a challenge right now. I'm trying to set up a > "geo-cluster" and would like to direct client machines to one geo or the > other based on

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: constrain or delay "probes"?

2021-03-08 Thread Andrei Borzenkov
On 08.03.2021 11:57, Ulrich Windl wrote: Reid Wahl schrieb am 08.03.2021 um 08:42 in Nachricht > : >> Did the "active on too many nodes" message happen right after a probe? If >> so, then it does sound like the probe returned code 0. > > Events were like this (I greatly condensed the logs):

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-03-03 Thread Andrei Borzenkov
On 01.03.2021 16:45, Jan Friesse wrote: > Andrei, > >> On 01.03.2021 15:45, Jan Friesse wrote: >>> Andrei, >>> On 01.03.2021 12:26, Jan Friesse wrote: >> > > Thanks for digging into logs. I believe Eric is hitting > https://github.com/corosync/corosync-qdevice/issues/10

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-03-01 Thread Andrei Borzenkov
On 01.03.2021 15:45, Jan Friesse wrote: > Andrei, > >> On 01.03.2021 12:26, Jan Friesse wrote: >>> >>> Thanks for digging into logs. I believe Eric is hitting >>> https://github.com/corosync/corosync-qdevice/issues/10 (already fixed, >>> but may take some time to get into distributions) - it

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-03-01 Thread Andrei Borzenkov
On 01.03.2021 12:26, Jan Friesse wrote: >> > > Thanks for digging into logs. I believe Eric is hitting > https://github.com/corosync/corosync-qdevice/issues/10 (already fixed, > but may take some time to get into distributions) - it also contains > workaround. > I tested corosync-qnetd at

Re: [ClusterLabs] [EXTERNAL] - Antw: [EXT] OCF resource agent is not starting up

2021-02-28 Thread Andrei Borzenkov
On 01.03.2021 08:25, Niveditha U wrote: > Hi Team, > > Can ocft be used in place of ocf-tester? > No, it's different tool. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home:

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-02-28 Thread Andrei Borzenkov
On 27.02.2021 22:12, Andrei Borzenkov wrote: > On 27.02.2021 17:08, Eric Robinson wrote: >> >> I agree, one node is expected to go out of quorum. Still the question is, >> why didn't 001db01b take over the services? I just remembered that 001db01b >> has servic

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-02-27 Thread Andrei Borzenkov
On 27.02.2021 17:08, Eric Robinson wrote: > > I agree, one node is expected to go out of quorum. Still the question is, why > didn't 001db01b take over the services? I just remembered that 001db01b has > services running on it, and those services did not stop, so it seems that > 001db01b did

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-02-26 Thread Andrei Borzenkov
On 27.02.2021 09:05, Eric Robinson wrote: >> -Original Message- >> From: Users On Behalf Of Andrei >> Borzenkov >> Sent: Friday, February 26, 2021 1:25 PM >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] Our 2-Node Cluster with a

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-02-26 Thread Andrei Borzenkov
On 26.02.2021 21:58, Eric Robinson wrote: >> -Original Message- >> From: Users On Behalf Of Andrei >> Borzenkov >> Sent: Friday, February 26, 2021 11:27 AM >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] Our 2-Node Cluster with a

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-02-26 Thread Andrei Borzenkov
26.02.2021 20:23, Eric Robinson пишет: >> -Original Message- >> From: Digimer >> Sent: Friday, February 26, 2021 10:35 AM >> To: Cluster Labs - All topics related to open-source clustering welcomed >> ; Eric Robinson >> Subject: Re: [ClusterLabs] Our 2-Node Cluster with a Separate

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-02-26 Thread Andrei Borzenkov
26.02.2021 19:19, Eric Robinson пишет: > At 5:16 am Pacific time Monday, one of our cluster nodes failed and its mysql > services went down. The cluster did not automatically recover. > > We're trying to figure out: > > > 1. Why did it fail? Pacemaker only registered loss of connection

Re: [ClusterLabs] Latest PDF documents have truncated lines

2021-02-20 Thread Andrei Borzenkov
On Fri, Feb 19, 2021 at 7:48 PM Ken Gaillot wrote: > > On Fri, 2021-02-19 at 17:54 +0300, Andrei Borzenkov wrote: > > In the latest PDF versions I downloaded recently code samples appear > > truncated quite often - they do not fit on page. I compared with > >

[ClusterLabs] Latest PDF documents have truncated lines

2021-02-19 Thread Andrei Borzenkov
In the latest PDF versions I downloaded recently code samples appear truncated quite often - they do not fit on page. I compared with previous versions I have and they have smaller fonts for code samples so it usually fits. Of course it is still an issue for overly long lines, so wrapping such

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-02-19 Thread Andrei Borzenkov
On Fri, Feb 19, 2021 at 2:44 PM Strahil Nikolov wrote: > > > >Do you have a fixed relation between node >pairs and VIPs? I.e. must > >A/D always get VIP1, B/E - VIP2 etc? > > I have to verify it again, but generally speaking - yes , VIP1 is always on > nodeA/D (master), VIP2 on nodeB/E (worker1)

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-02-19 Thread Andrei Borzenkov
On Fri, Feb 19, 2021 at 10:41 AM Strahil Nikolov wrote: > > DC1: > - nodeA > - nodeB > - nodeC > > DC2: > - nodeD > - nodeE > - nodeF > > DC3: > - majority maker > > I will have 3 VIPs: > VIP1 > VIP2 > VIP3 > > I will have to setup the cluster to: > 1. Find where is the master HANA resource > 2.

Re: [ClusterLabs] Question: 2 node pcs cluster required quorum and separate Heartbeat Network

2021-02-10 Thread Andrei Borzenkov
10.02.2021 21:56, Ben .T.George пишет: > HI > > Is it mandatory for 2 node pcs cluster require a quorum and separate > Heartbeat Network? > Two node cluster by definition cannot use quorum - there is no way to split cluster so that any part have majority votes. You can artificially increase

Re: [ClusterLabs] Another odd message: pacemaker-fenced[31326]: warning: Can't create a sane reply

2021-02-09 Thread Andrei Borzenkov
09.02.2021 17:00, Ulrich Windl пишет: > Hi! > > I had made a mistake, leading to node h16 to be fenced. After recovery (h16 > had re-joined the cluster) I had stopped the node, reconfigured the network, > then started the node again. > Then I did the same thing (not the unwanted fencing) with

Re: [ClusterLabs] Antw: [EXT] Re: failed migration handled the wrong way

2021-02-05 Thread Andrei Borzenkov
05.02.2021 12:54, Ulrich Windl пишет: >>>> Ulrich Windl schrieb am 01.02.2021 um 11:59 in Nachricht <6017DF04.888 : > 161 : > 60728>: >>>>> Andrei Borzenkov schrieb am 01.02.2021 um 11:05 in >> Nachricht >> : >>> On Mon, Feb 1, 2021 at 1

Re: [ClusterLabs] Antw: [EXT] Re: failed migration handled the wrong way

2021-02-01 Thread Andrei Borzenkov
On Mon, Feb 1, 2021 at 1:59 PM Ulrich Windl wrote: > > But the VM *wasn't* stopped on h16! > I am not sure what you mean here. It was not stopped during migration? Yes, pacemaker knew it and it tried to stop it explicitly when migration failed. It was not stopped when pacemaker tried to stop it?

Re: [ClusterLabs] failed migration handled the wrong way

2021-02-01 Thread Andrei Borzenkov
On Mon, Feb 1, 2021 at 12:53 PM Ulrich Windl wrote: > > Hi! > > While fighting to get the wrong configuration, I broke libvirt live-migration > by not enabling the TLS socket. > > When testing to live-migrate a VM from h16 to h18, these are the essential > events: > Feb 01 10:30:10 h16

Re: [ClusterLabs] Antw: [EXT] Re: Problem with systemd socket service (start fails when running already)

2021-01-31 Thread Andrei Borzenkov
On Mon, Feb 1, 2021 at 10:07 AM Ulrich Windl wrote: > > You are saying starting libvirtd does not require the ro and tls socket units > to be started? > So far I am not aware of any service that would *require* socket activation. Socket activation is optimization that allows you to avoid

Re: [ClusterLabs] Peer (slave) node deleting master's transient_attributes

2021-01-30 Thread Andrei Borzenkov
29.01.2021 20:37, Stuart Massey пишет: > Can someone help me with this? > Background: > > "node01" is failing, and has been placed in "maintenance" mode. It > occasionally loses connectivity. > > "node02" is able to run our resources > > Consider the following messages from pacemaker.log on

Re: [ClusterLabs] Problem with systemd socket service (start fails when running already)

2021-01-29 Thread Andrei Borzenkov
29.01.2021 14:19, Ulrich Windl пишет: > Hi! > > I'm having an odd failure using a systemd socket unit controlled by the > cluster. Why do you need socket unit to be controller by cluster in the first place? The whole point of socket unit is to auto-start services on access and that defeats

Re: [ClusterLabs] Disable all resources in a group if one or more of them fail and are unable to reactivate

2021-01-28 Thread Andrei Borzenkov
27.01.2021 22:03, Ken Gaillot пишет: > > With a group, later members depend on earlier members. If an earlier > member can't run, then no members after it can run. > > However we can't make the dependency go in both directions. If an > earlier member can't run unless a later member is active,

Re: [ClusterLabs] Disable all resources in a group if one or more of them fail and are unable to reactivate

2021-01-27 Thread Andrei Borzenkov
27.01.2021 19:06, damiano giuliani пишет: > Hi all im pretty new to the clusters, im struggling trying to configure a > bounch of resources and test how they failover.my need is to start and > manage a group of resources as one (in order to archive this a resource > group has been created), and if

Re: [ClusterLabs] Stopping all nodes causes servers to migrate

2021-01-25 Thread Andrei Borzenkov
On Mon, Jan 25, 2021 at 12:07 PM Jehan-Guillaume de Rorthais wrote: > As actions during a cluster shutdown cannot be handled in the same transition > for each nodes, I usually add a step to disable all resources using property > "stop-all-resources" before shutting down the cluster: > > pcs

Re: [ClusterLabs] CCIB migration from Pacemaker 1.x to 2.x

2021-01-23 Thread Andrei Borzenkov
23.01.2021 19:10, Sharma, Jaikumar пишет: > Hi guys, > > I'm newbie to high availability clusters, pls excuse me - learning tools > stack (corosync & pacemaker). > > In fact, our high availability solution is based on Debian 9.x (pacemaker 1.x > and corosync 2.x) - which worked as expected. >

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Cluster breaks after pcs unstandby node

2021-01-18 Thread Andrei Borzenkov
On Mon, Jan 18, 2021 at 12:00 PM Steffen Vinther Sørensen wrote: > > Hi, > > I have persistent journal, but 'journalctl -b -1' was empty in this > case, so it might not be optimally configured. And centralized logging > is on the todo list > > > btw. about the fencing, I have set '

Re: [ClusterLabs] Q: When do I need virtlockd?

2021-01-18 Thread Andrei Borzenkov
On Mon, Jan 18, 2021 at 11:55 AM Ulrich Windl wrote: . > > So can someone explan, or direct me to some helpful docs? > Are you aware of https://libvirt.org/kbase/locking.html which links further to virtlockd description? ___ Manage your subscription:

Re: [ClusterLabs] Running shell command on remote node via corosync messaging infrastructure

2020-12-18 Thread Andrei Borzenkov
18.12.2020 21:54, Ken Gaillot пишет: > On Fri, 2020-12-18 at 17:51 +, Animesh Pande wrote: >> Hello, >> >> Is there a tool that would allow for commands to be run on remote >> nodes in the cluster through the corosync messaging layer? I have a >> cluster configured with multiple corosync

Re: [ClusterLabs] Q: warning: new_event_notification (4527-22416-14): Broken pipe (32)

2020-12-18 Thread Andrei Borzenkov
18.12.2020 12:00, Ulrich Windl пишет: > > Maybe a related question: Do STONITH resources have special rules, meaning > they don't wait for successful fencing? pacemaker resources in CIB do not perform fencing. They only register fencing devices with fenced which does actual job. In particular

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] delaying start of a resource

2020-12-17 Thread Andrei Borzenkov
18.12.2020 10:09, Ulrich Windl пишет: >>>> Andrei Borzenkov schrieb am 18.12.2020 um 08:01 in > Nachricht : >> 17.12.2020 21:30, Ken Gaillot пишет: >>> >>> This reminded me that some IPMI implementations return "success" for >>> co

Re: [ClusterLabs] Antw: Re: Antw: [EXT] delaying start of a resource

2020-12-17 Thread Andrei Borzenkov
17.12.2020 21:30, Ken Gaillot пишет: > > This reminded me that some IPMI implementations return "success" for > commands before they've actually been completed. This is why > fence_ipmilan has a "power_wait" parameter that defaults to 2 seconds. > But on this case we also do not know whether

Re: [ClusterLabs] Antw: Re: Antw: [EXT] delaying start of a resource

2020-12-17 Thread Andrei Borzenkov
17.12.2020 14:02, Ulrich Windl пишет: >>>> Andrei Borzenkov schrieb am 17.12.2020 um 09:50 in > Nachricht > : > > ... >> According to logs from xstha1, it started to activate resources only >> after stonith was confirmed >> >> Dec 16 15

Re: [ClusterLabs] Antw: [EXT] delaying start of a resource

2020-12-17 Thread Andrei Borzenkov
ched off. You really need to test how ipmi behaves with your specific hardware to make sure it is not possible or to adjust stonith agent to handle delays. To reiterate: > > Da: Andrei Borzenkov > > It is possible that your IPMI/BMC/whatever implementation responds > with success bef

Re: [ClusterLabs] Antw: [EXT] delaying start of a resource

2020-12-17 Thread Andrei Borzenkov
On Thu, Dec 17, 2020 at 11:11 AM Gabriele Bulfon wrote: > > Yes, sorry took same bash by mistake...here are the correct logs. > > Yes, xstha1 has delay 10s so that I'm giving him precedence, xstha2 has delay > 1s and will be stonished earlier. > During the short time before xstha2 got powered

Re: [ClusterLabs] Antw: [EXT] delaying start of a resource

2020-12-16 Thread Andrei Borzenkov
16.12.2020 19:05, Gabriele Bulfon пишет: > Looking at the two logs, looks like corosync decided that xst1 was offline, > while xst was still online. > I just issued an "ifconfig ha0 down" on xst1, so I expect both nodes cannot > see other one, while I see these same lines both on xst1 and xst2

Re: [ClusterLabs] Antw: [EXT] delaying start of a resource

2020-12-16 Thread Andrei Borzenkov
16.12.2020 17:56, Gabriele Bulfon пишет: > Thanks, here are the logs, there are infos about how it tried to start > resources on the nodes. Both logs are from the same node. > Keep in mind the node1 was already running the resources, and I simulated a > problem by turning down the ha

Re: [ClusterLabs] Best way to create a floating identity file

2020-12-15 Thread Andrei Borzenkov
15.12.2020 17:10, Tony Stocker пишет: > On Tue, Dec 15, 2020 at 9:02 AM Andrei Borzenkov wrote: >> >> On Tue, Dec 15, 2020 at 4:58 PM Tony Stocker wrote: >>> >> >> You could simply query whether a specific resource (group) is active >> on the nod

Re: [ClusterLabs] Best way to create a floating identity file

2020-12-15 Thread Andrei Borzenkov
On Tue, Dec 15, 2020 at 4:58 PM Tony Stocker wrote: > > I'm trying to figure out the best way to do the following on our > 2-node clusters. > > Whichever node is the primary (all services run on a single node) I > want to create a file that contains an identity descriptor, e.g. >

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Recoveing from node failure

2020-12-15 Thread Andrei Borzenkov
gt; Gabriele > > > Sonicle S.r.l. : http://www.sonicle.com > Music: http://www.gabrielebulfon.com > eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets > > > > > ------ > > Da:

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Recoveing from node failure

2020-12-14 Thread Andrei Borzenkov
On Mon, Dec 14, 2020 at 2:40 PM Gabriele Bulfon wrote: > > I isolated the log when everything happens (when I disable the ha interface), > attached here. > And where are matching logs from the second node? ___ Manage your subscription:

Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure

2020-12-12 Thread Andrei Borzenkov
Sonicle S.r.l. : http://www.sonicle.com > Music: http://www.gabrielebulfon.com > eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets >   > > > > > -- > > Da: Andrei Borz

Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure

2020-12-11 Thread Andrei Borzenkov
11.12.2020 18:37, Gabriele Bulfon пишет: > I found I can do this temporarily: >   > crm config property cib-bootstrap-options: no-quorum-policy=ignore >   All two node clusters I remember run with setting forever :) > then once node 2 is up again: >   > crm config property cib-bootstrap-options:

Re: [ClusterLabs] Can't have 2 nodes as master with galera resource agent

2020-12-11 Thread Andrei Borzenkov
11.12.2020 16:13, Raphael Laguerre пишет: > Hello, > > I'm trying to setup a 2 nodes cluster with 2 galera instances. I use the > ocf:heartbeat:galera resource agent, however, after I create the resource, > only one node appears to be in master role, the other one can't be promoted > and stays

Re: [ClusterLabs] Antw: [EXT] Re: Preferred node for a service (not constrained)

2020-12-03 Thread Andrei Borzenkov
On Thu, Dec 3, 2020 at 11:11 AM Ulrich Windl wrote: > > >>> Strahil Nikolov schrieb am 02.12.2020 um 22:42 in > Nachricht <311137659.2419591.1606945369...@mail.yahoo.com>: > > Constraints' values are varying from: > > infinity which equals to score of 100 > > to: > > - infinity which equals

Re: [ClusterLabs] Q: LVM-activate: "WARNING: You are recommended to activate one LV at a time or use exclusive activation mode."

2020-11-30 Thread Andrei Borzenkov
30.11.2020 15:36, Ulrich Windl пишет: > Hi! > > I configured a shared LVM activation as per instructions (I hope) in SLES15 > SP2. However I get this warning: > LVM-activate(prm_testVG_activate)[57281]: WARNING: You are recommended to > activate one LV at a time or use exclusive activation

Re: [ClusterLabs] Antw: [EXT] Re: resource management of standby node

2020-11-30 Thread Andrei Borzenkov
30.11.2020 17:05, Ulrich Windl пишет: >>>> Andrei Borzenkov schrieb am 30.11.2020 um 14:18 in > Nachricht > : >> On Mon, Nov 30, 2020 at 3:11 PM Ulrich Windl >> wrote: >>> >>> Hi! >>> >>> In SLES15 I'm surprised what a stan

Re: [ClusterLabs] resource management of standby node

2020-11-30 Thread Andrei Borzenkov
On Mon, Nov 30, 2020 at 3:11 PM Ulrich Windl wrote: > > Hi! > > In SLES15 I'm surprised what a standby node does: My guess was that a standby > node would stop all resources and then just "shut up", but it seems it still > tried to place resources and calls monitor operations. > Standby nodes

Re: [ClusterLabs] stop a node

2020-11-15 Thread Andrei Borzenkov
15.11.2020 20:00, Guy Przytula пишет: > a question would be : > > we have maintenance to perform on a node of the cluster > > to avoid that the cluster starts the resource that we stopped - we want > to disable a node temporarily - is this possible without deleting the node > Put node in

Re: [ClusterLabs] fence_scsi problem

2020-10-28 Thread Andrei Borzenkov
On Wed, Oct 28, 2020 at 3:18 PM Patrick Vranckx wrote: > > Hi, > > I try yo setup an HA cluster for ZFS. I think fence_scsi is not working > properly. I can reproduce the problem on two kind of hardware: iSCSI and > SAS storage. > > Here is what I did: > > - set up a storage server with 3 iscsi

Re: [ClusterLabs] VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-23 Thread Andrei Borzenkov
23.10.2020 21:08, Lentes, Bernd пишет: > > Surprisingly if the virsh destroy is successfull the RA waits until the > domain isn't running anymore: > ... > > I need someting like that which waits for some time (maybe 30s) if the domain > nevertheless stops although > "virsh destroy" gaves an

Re: [ClusterLabs] VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-22 Thread Andrei Borzenkov
22.10.2020 23:29, Lentes, Bernd пишет: > Hi guys, > > ocassionally stopping a VirtualDomain resource via "crm resource stop" does > not work, and in the end the node is fenced, which is ugly. > I had a look at the RA to see what it does. After trying to stop the domain > via "virsh shutdown

Re: [ClusterLabs] Adding a node to an active cluster

2020-10-21 Thread Andrei Borzenkov
21.10.2020 20:47, Strahil Nikolov пишет: > Both SUSE and RedHat provide utilities to add the node without messing with > the configs manually. Which are crmsh and pcs respectively :) > > What is your distro ? > > > Best Regards, > Strahil Nikolov > > > > > > > В сряда, 21 октомври 2020

Re: [ClusterLabs] Adding a node to an active cluster

2020-10-21 Thread Andrei Borzenkov
nges may be overwritten by pacemaker? > 2. Do you have idea where(which config file) crm_node command retrieves its > data? CIB > Thanks, > Jiaqi Tian > > - Original message - > From: Andrei Borzenkov > Sent by: "Users" > To: Cluster

Re: [ClusterLabs] Adding a node to an active cluster

2020-10-21 Thread Andrei Borzenkov
On Wed, Oct 21, 2020 at 5:03 PM Jiaqi Tian1 wrote: > > Hi, > I'm trying to add a new node into an active pacemaker cluster with resources > up and running. > After steps: > 1. update corosync.conf files among all hosts in cluster including the new > node > 2. copy corosync auth file to the new

Re: [ClusterLabs] ocf:pacemaker:ping every X seconds

2020-10-09 Thread Andrei Borzenkov
09.10.2020 08:21, Rohit Saini пишет: > Hi Team, > I am using ocf:pacemaker:ping resource to check aliveness of a machine > every X seconds. As I understand, monitor interval 'Y' will cause ping to > happen every 'Y' seconds. So, for my case, Y should be equal to X? > I do not see this behavior

<    1   2   3   4   5   6   7   >