Re: [ClusterLabs] Fast-failover on 2 nodes + qnetd: qdevice connenction disrupted.

2024-05-06 Thread Klaus Wenninger
On Fri, May 3, 2024 at 8:59 PM wrote: > Hi, > > > > Also, I've done wireshark capture and found great mess in TCP, it > > > seems like connection between qdevice and qnetd really stops for some > > > time and packets won't deliver. > > > > Could you check UDP? I guess there is a lot of UDP

Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix

2024-04-23 Thread Klaus Wenninger
On Tue, Apr 23, 2024 at 10:34 AM Klaus Wenninger wrote: > > > On Tue, Apr 23, 2024 at 9:53 AM NOLIBOS Christophe < > christophe.noli...@thalesgroup.com> wrote: > >> Classified as: {OPEN} >> >> >> >> Other strange thing. >> >> On RHE

Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix

2024-04-23 Thread Klaus Wenninger
it would be restarted. Klaus > > > *De :* Klaus Wenninger > *Envoyé :* lundi 22 avril 2024 12:41 > *À :* NOLIBOS Christophe > *Cc :* Cluster Labs - All topics related to open-source clustering > welcomed > *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from

Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix

2024-04-22 Thread Klaus Wenninger
Maybe pacemaker changed behavior here without syncing enough with corosync behavior. We'll look into that to see which approach is better - restart corosync on failure - or have pacemaker be restarted by systemd which should in turn restart corosync as well. Klaus > > > Thanks a lot. >

Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix

2024-04-22 Thread Klaus Wenninger
rocess - so that the exit-code could be set to 0 - should be fine. Klaus > > *De :* Klaus Wenninger > *Envoyé :* jeudi 18 avril 2024 20:17 > *À :* NOLIBOS Christophe > *Cc :* Cluster Labs - All topics related to open-source clustering > welcomed > *Objet :* Re: [ClusterLabs]

Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix

2024-04-18 Thread Klaus Wenninger
*De la part de* NOLIBOS > Christophe via Users > *Envoyé :* jeudi 18 avril 2024 18:34 > *À :* Klaus Wenninger ; Cluster Labs - All topics > related to open-source clustering welcomed > *Cc :* NOLIBOS Christophe > *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly

Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix

2024-04-18 Thread Klaus Wenninger
On Thu, Apr 18, 2024 at 6:09 PM Klaus Wenninger wrote: > > > On Thu, Apr 18, 2024 at 6:06 PM NOLIBOS Christophe < > christophe.noli...@thalesgroup.com> wrote: > >> Classified as: {OPEN} >> >> >> >> Well… why do you say that « Well if c

Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix

2024-04-18 Thread Klaus Wenninger
On Thu, Apr 18, 2024 at 5:07 PM NOLIBOS Christophe via Users < users@clusterlabs.org> wrote: > Classified as: {OPEN} > > I'm using RedHat 8.8 (4.18.0-477.21.1.el8_8.x86_64). > When I kill Corosync, no new corosync process is created and pacemaker is > in failure. > The only solution is to restart

Re: [ClusterLabs] controlling cluster behavior on startup

2024-01-30 Thread Klaus Wenninger
On Tue, Jan 30, 2024 at 2:21 PM Walker, Chris wrote: > >>> However, now it seems to wait that amount of time before it elects a > >>> DC, even when quorum is acquired earlier. In my log snippet below, > >>> with dc-deadtime 300s, > >> > >> The dc-deadtime is not waiting for quorum, but for

Re: [ClusterLabs] trigger something at ?

2024-01-29 Thread Klaus Wenninger
On Mon, Jan 29, 2024 at 5:22 PM Ken Gaillot wrote: > On Fri, 2024-01-26 at 13:55 +0100, lejeczek via Users wrote: > > Hi guys. > > > > Is it possible to trigger some... action - I'm thinking specifically > > at shutdown/start. > > If not within the cluster then - if you do that - perhaps

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-19 Thread Klaus Wenninger
On Tue, Dec 19, 2023 at 10:00 AM Andrei Borzenkov wrote: > On Tue, Dec 19, 2023 at 10:41 AM Artem wrote: > ... > > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] > (update_resource_action_runnable)warning: OST4_stop_0 on lustre4 is > unrunnable (node is offline) > > Dec

[ClusterLabs] Pacemaker 2.1.7-rc2 now available

2023-11-24 Thread Klaus Wenninger
Hi all, Source code for the 2nd release candidate for Pacemaker version 2.1.7 is available at: https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.7-rc2 This is primarily a bug fix release. See the ChangeLog or the link above for details. Everyone is encouraged to download,

Re: [ClusterLabs] PCS ACL for the "pcs cluster stop" command

2023-10-16 Thread Klaus Wenninger
On Fri, Oct 13, 2023 at 9:21 PM Reid Wahl wrote: > On Fri, Oct 13, 2023 at 12:19 PM Reid Wahl wrote: > > > > On Fri, Oct 13, 2023 at 9:56 AM Roberto Rodrigos > wrote: > > > > > > good day! > > > I use the configuration to create an ACL, it is shown below. How can I > restrict access to the

Re: [ClusterLabs] Syncronous primary doesn't switch to async mode on replica power off

2023-10-06 Thread Klaus Wenninger
On Fri, Oct 6, 2023 at 8:46 AM Sergey Cherukhin wrote: > Hello! > > I used Microsoft Outlook to send this message and it was sent in the wrong > format. I'm sorry. I won't do it again. > > I use Postgresql+Pacemaker+Corosync cluster with 2 Postgresql instances in > synchronous replication mode.

Re: [ClusterLabs] Users Digest, Vol 104, Issue 5

2023-09-05 Thread Klaus Wenninger via Users
r body 'help' to >> users-requ...@clusterlabs.org >> >> You can reach the person managing the list at >> users-ow...@clusterlabs.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Users d

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Klaus Wenninger
On Mon, Sep 4, 2023 at 1:50 PM Andrei Borzenkov wrote: > On Mon, Sep 4, 2023 at 2:18 PM Klaus Wenninger > wrote: > > > > > > > > On Mon, Sep 4, 2023 at 12:45 PM David Dolan > wrote: > >> > >> Hi Klaus, > >> > >> With defau

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Klaus Wenninger
On Mon, Sep 4, 2023 at 1:44 PM Andrei Borzenkov wrote: > On Mon, Sep 4, 2023 at 2:25 PM Klaus Wenninger > wrote: > > > > > > Or go for qdevice with LMS where I would expect it to be able to really > go down to > > a single node left - any of the 2 last ones - as

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Klaus Wenninger
On Mon, Sep 4, 2023 at 1:18 PM Klaus Wenninger wrote: > > > On Mon, Sep 4, 2023 at 12:45 PM David Dolan wrote: > >> Hi Klaus, >> >> With default quorum options I've performed the following on my 3 node >> cluster >> >> Bring down cluster ser

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Klaus Wenninger
I bring >> down services on two nodes. >> Thanks >> David >> >> On Thu, 31 Aug 2023 at 11:44, Klaus Wenninger >> wrote: >> >>> >>> >>> On Thu, Aug 31, 2023 at 12:28 PM David Dolan >>> wrote: >>> >>>>

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-08-31 Thread Klaus Wenninger
On Thu, Aug 31, 2023 at 12:28 PM David Dolan wrote: > > > On Wed, 30 Aug 2023 at 17:35, David Dolan wrote: > >> >> >> > Hi All, >>> > >>> > I'm running Pacemaker on Centos7 >>> > Name: pcs >>> > Version : 0.9.169 >>> > Release : 3.el7.centos.3 >>> > Architecture: x86_64 >>> >

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-08-30 Thread Klaus Wenninger
On Wed, Aug 30, 2023 at 2:34 PM David Dolan wrote: > Hi All, > > I'm running Pacemaker on Centos7 > Name: pcs > Version : 0.9.169 > Release : 3.el7.centos.3 > Architecture: x86_64 > > Besides the pcs-version versions of the other cluster-stack-components could be interesting.

Re: [ClusterLabs] Redis Resource error

2023-08-23 Thread Klaus Wenninger
On Tue, Aug 22, 2023 at 11:31 PM Social Boh wrote: > Hello List, > > I know is not really a Pacemaker/Corosync question relate but I don't > know how solve this error: > > redis_start_0 on kam1.kamailio.xyz 'error' (1): call=148, status='Timed > Out', exitreason='Resource agent did not complete

Re: [ClusterLabs] no-quorum-policy=ignore is (Deprecated ) and replaced with other options but not an effective solution

2023-06-28 Thread Klaus Wenninger
On Wed, Jun 28, 2023 at 7:38 AM Klaus Wenninger wrote: > > > On Wed, Jun 28, 2023 at 3:30 AM Priyanka Balotra < > priyanka.14balo...@gmail.com> wrote: > >> I am using SLES 15 SP4. Is the no-quorum-policy still supported? >> >> > Thanks >> Pri

Re: [ClusterLabs] no-quorum-policy=ignore is (Deprecated ) and replaced with other options but not an effective solution

2023-06-27 Thread Klaus Wenninger
rum-policy=ignore is actually what >> you want. >> > Still dangerous without something like wait-for-all - right? With LMS I guess you should have the same effect without having explicitly specified though. Klaus > >> > >> > Thanks >> > Priyanka >> &

Re: [ClusterLabs] no-quorum-policy=ignore is (Deprecated ) and replaced with other options but not an effective solution

2023-06-27 Thread Klaus Wenninger
On Tue, Jun 27, 2023 at 5:24 PM Andrei Borzenkov wrote: > On 27.06.2023 07:21, Priyanka Balotra wrote: > > Hi Andrei, > > After this state the system went through some more fencings and we saw > the > > following state: > > > > :~ # crm status > > Cluster Summary: > >* Stack: corosync > >

Re: [ClusterLabs] Pacemaker logs written on message which is not expected as per configuration

2023-06-26 Thread Klaus Wenninger
On Fri, Jun 23, 2023 at 3:57 PM S Sathish S via Users wrote: > Hi Team, > > > > The pacemaker logs is written in both '/var/log/messages' and > '/var/log/pacemaker/pacemaker.log'. > > Could you please help us for not write pacemaker processes in > /var/log/messages? Even corosync configuration

Re: [ClusterLabs] How to block/stop a resource from running twice?

2023-04-24 Thread Klaus Wenninger
On Fri, Apr 21, 2023 at 12:24 PM fs3000 via Users wrote: > Hello all, > > I'm configuring a two node cluster. Pacemaker 0.9.169 on Centos 7. > > guess this is rather the pcs-version ... > How can i configure a specific service to run just on one node and avoid > having it running on more than

Re: [ClusterLabs] Offtopic - role migration

2023-04-19 Thread Klaus Wenninger
On Tue, Apr 18, 2023 at 9:09 PM Ken Gaillot wrote: > On Tue, 2023-04-18 at 19:50 +0200, Vladislav Bogdanov wrote: > > Btw, an interesting question. How much efforts would it take to > > support a migration of a Master role over the nodes? An use-case is > > drbd, configured for a multi-master

Re: [ClusterLabs] VirtualDomain - node map - ?

2023-04-17 Thread Klaus Wenninger
On Mon, Apr 17, 2023 at 6:17 AM Andrei Borzenkov wrote: > On 16.04.2023 16:29, lejeczek via Users wrote: > > > > > > On 16/04/2023 12:54, Andrei Borzenkov wrote: > >> On 16.04.2023 13:40, lejeczek via Users wrote: > >>> Hi guys > >>> > >>> Some agents do employ that concept of node/host map

Re: [ClusterLabs] resource going to blocked status while we restart service via systemctl twice

2023-04-17 Thread Klaus Wenninger
On Mon, Apr 17, 2023 at 9:25 AM S Sathish S via Users wrote: > Hi Team, > > > > TEST_node1 resource going to blocked status while we restart service via > systemctl twice in less time/before completion of 1st systemctl command. > > In older pacemaker version 2.0.2 we don’t see this issue, only

Re: [ClusterLabs] Location not working [FIXED]

2023-04-12 Thread Klaus Wenninger
On Wed, Apr 12, 2023 at 9:27 AM Andrei Borzenkov wrote: > On Tue, Apr 11, 2023 at 6:27 PM Ken Gaillot wrote: > > > > On Tue, 2023-04-11 at 17:31 +0300, Miro Igov wrote: > > > I fixed the issue by changing location definition from: > > > > > > location intranet-ip_on_any_nginx intranet-ip \ > >

Re: [ClusterLabs] pacemaker-remoted /dev/shm errors

2023-03-06 Thread Klaus Wenninger
On Mon, Mar 6, 2023 at 3:32 PM Christine caulfield wrote: > Hi, > > The error is coming from libqb - which is what manages the local IPC > connections between local clients and the server. > > I'm the libqb maintainer but I've never seen that error before! Is there > anything unusual about the

Re: [ClusterLabs] resource cloned group colocations

2023-03-02 Thread Klaus Wenninger
On Thu, Mar 2, 2023 at 8:41 AM Gerald Vogt wrote: > Hi, > > I am setting up a mail relay cluster which main purpose is to maintain > the service ips via IPaddr2 and move them between cluster nodes when > necessary. > > The service ips should only be active on nodes which are running all >

Re: [ClusterLabs] cluster with redundant links - PCSD offline

2023-02-28 Thread Klaus Wenninger
On Mon, Feb 27, 2023 at 6:25 PM Ken Gaillot wrote: > On Sun, 2023-02-26 at 18:15 +0100, lejeczek via Users wrote: > > Hi guys. > > > > I have a simple 2-node cluster with redundant links and I wonder why > > status reports like this: > > ... > > Node List: > > * Node swir (1): online, feature

[ClusterLabs] sbd v1.5.2

2023-01-09 Thread Klaus Wenninger
Hi sbd - developers & users! Thanks to everybody for contributing to tests and further development. Only functional change is the first topic in the list below. And even that is 'just' refusing startup in a case where the config anyway wouldn't have led to a successful cluster startup. Improved

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Stonith

2022-12-21 Thread Klaus Wenninger
On Wed, Dec 21, 2022 at 4:51 PM Ken Gaillot wrote: > On Wed, 2022-12-21 at 10:45 +0100, Ulrich Windl wrote: > > > > > Ken Gaillot schrieb am 20.12.2022 um > > > > > 16:21 in > > Nachricht > > <3a5960c2331f97496119720f6b5a760b3fe3bbcf.ca...@redhat.com>: > > > On Tue, 2022‑12‑20 at 11:33 +0300,

Re: [ClusterLabs] Antw: [EXT] Re: Bug pacemaker with multiple IP

2022-12-21 Thread Klaus Wenninger
On Wed, Dec 21, 2022 at 11:26 AM Reid Wahl wrote: > On Wed, Dec 21, 2022 at 2:15 AM Ulrich Windl > wrote: > > > > Hi! > > > > I wonder: Could the error message be triggered by adding an exclusive > manatory > > lock in the ip binary? > > If that triggers the bug, I'm rather sure that the error

Re: [ClusterLabs] Samba failover and Windows access

2022-12-12 Thread Klaus Wenninger
On Sat, Dec 10, 2022 at 6:39 PM Dave Withheld wrote: > On Thu, Dec 8, 2022 at 8:03 AM Dave Withheld > wrote: > > In our production factory, we run a 2-node cluster on CentOS 8 with > pacemaker, a virtual IP, and drbd for shared storage with samba (among > other services) running as a resource

Re: [ClusterLabs] Samba failover and Windows access

2022-12-07 Thread Klaus Wenninger
On Thu, Dec 8, 2022 at 8:03 AM Dave Withheld wrote: > In our production factory, we run a 2-node cluster on CentOS 8 with > pacemaker, a virtual IP, and drbd for shared storage with samba (among > other services) running as a resource on the active node. Everything works > great except when we

Re: [ClusterLabs] Unable to build rpm using make rpm command for pacemaker-2.1.4.

2022-11-22 Thread Klaus Wenninger
On Tue, Nov 22, 2022 at 1:16 PM S Sathish S via Users wrote: > Hi Ken/Team, > > We have tried on pacemaker 2.1.1 also faced same issue , later we have > perform below steps as workaround to build pacemaker rpm as you said it run > from a git checkout and build rpm. > > #./autogen.sh >

Re: [ClusterLabs] [External] : Re: Fence Agent tests

2022-11-15 Thread Klaus Wenninger
On Sat, Nov 5, 2022 at 9:45 PM Jehan-Guillaume de Rorthais via Users < users@clusterlabs.org> wrote: > On Sat, 5 Nov 2022 20:53:09 +0100 > Valentin Vidić via Users wrote: > > > On Sat, Nov 05, 2022 at 06:47:59PM +, Robert Hayden wrote: > > > That was my impression as well...so I may have

Re: [ClusterLabs] [External] : Re: Fence Agent tests

2022-11-15 Thread Klaus Wenninger
On Wed, Nov 9, 2022 at 2:58 PM Robert Hayden wrote: > > > -Original Message- > > From: Users On Behalf Of Andrei > > Borzenkov > > Sent: Wednesday, November 9, 2022 2:59 AM > > To: Cluster Labs - All topics related to open-source clustering welcomed > > > > Subject: Re: [ClusterLabs]

Re: [ClusterLabs] crm resource trace

2022-10-24 Thread Klaus Wenninger
ker ML > *Subject:* Re: [ClusterLabs] crm resource trace > > > - On 24 Oct, 2022, at 10:08, Klaus Wenninger kwenn...@redhat.com > wrote: > > > On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users < [ > > mailto:users@clusterlabs.org | > users@clusterlabs.org ]

Re: [ClusterLabs] crm resource trace

2022-10-24 Thread Klaus Wenninger
On Mon, Oct 24, 2022 at 10:46 AM Lentes, Bernd < bernd.len...@helmholtz-muenchen.de> wrote: > > - On 24 Oct, 2022, at 10:08, Klaus Wenninger kwenn...@redhat.com > wrote: > > > On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users < [ > > mailto:users@cluste

Re: [ClusterLabs] crm resource trace

2022-10-24 Thread Klaus Wenninger
On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users wrote: > Hi Bernd, > > I got it, you are on SLE12SP5, and the crmsh version > is crmsh-4.1.1+git.1647830282.d380378a-2.74.2.noarch, right? > > I try to reproduce this inconsistent behavior, add an IPaddr2 agent vip, > run `crm resource trace

Re: [ClusterLabs] crm resource trace

2022-10-18 Thread Klaus Wenninger
On Mon, Oct 17, 2022 at 9:42 PM Ken Gaillot wrote: > This turned out to be interesting. > > In the first case, the resource history contains a start action and a > recurring monitor. The parameters to both change, so the resource > requires a restart. > > In the second case, the resource's

Re: [ClusterLabs] RFE: sdb clone

2022-09-27 Thread Klaus Wenninger
On Tue, Sep 20, 2022 at 3:59 PM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote: > Hi! > > I have a proposal (request) for enhancing sbd: > (I'm not suggesting a complete rewrite with reasonable options, as I had > don that before already ;-)) > When configuring an additional disk device,

Re: [ClusterLabs] (no subject)

2022-09-07 Thread Klaus Wenninger
On Wed, Sep 7, 2022 at 12:28 PM Jehan-Guillaume de Rorthais via Users wrote: > > Hey, > > On Wed, 7 Sep 2022 19:12:53 +0900 > 권오성 wrote: > > > Hello. > > I am a student who wants to implement a redundancy system with raspberry pi. > > Last time, I posted about how to proceed with installation on

Re: [ClusterLabs] Cluster does not start resources

2022-08-25 Thread Klaus Wenninger
On Wed, Aug 24, 2022 at 6:29 PM Lentes, Bernd wrote: > > > - On 24 Aug, 2022, at 16:26, kwenning kwenn...@redhat.com wrote: > > >> > >> if I get Ulrich right - and my fading memory of when I really used crmsh > >> the > >> last time is telling me the same thing ... > >> > > I get the

Re: [ClusterLabs] Cluster does not start resources

2022-08-24 Thread Klaus Wenninger
On Wed, Aug 24, 2022 at 4:24 PM Klaus Wenninger wrote: > > On Wed, Aug 24, 2022 at 2:40 PM Lentes, Bernd > wrote: > > > > > > - On 24 Aug, 2022, at 07:21, Reid Wahl nw...@redhat.com wrote: > > > > > > > As a result, your command might

Re: [ClusterLabs] Cluster does not start resources

2022-08-24 Thread Klaus Wenninger
On Wed, Aug 24, 2022 at 2:40 PM Lentes, Bernd wrote: > > > - On 24 Aug, 2022, at 07:21, Reid Wahl nw...@redhat.com wrote: > > > > As a result, your command might start the virtual machines, but > > Pacemaker will still show that the resources are "Stopped (disabled)". > > To fix that, you'll

Re: [ClusterLabs] Start resource only if another resource is stopped

2022-08-19 Thread Klaus Wenninger
On Thu, Aug 18, 2022 at 8:26 PM Andrei Borzenkov wrote: > > On 17.08.2022 16:58, Miro Igov wrote: > > As you guessed i am using crm res stop nfs_export_1. > > I tried the solution with attribute and it does not work correct. > > > > It does what you asked for originally, but you are shifting the

Re: [ClusterLabs] node1 and node2 communication time question

2022-08-10 Thread Klaus Wenninger
On Wed, Aug 10, 2022 at 3:49 AM 권오성 wrote: > > Thank you for your reply. > Then, can I think of it as being able to adjust the time by changing the > token in /etc/corosync/corosync.conf? That would basically be the time after which a non responsive node in a cluster would be declared dead and

Re: [ClusterLabs] Q: About a false negative of storage_mon

2022-08-05 Thread Klaus Wenninger
On Fri, Aug 5, 2022 at 9:30 AM Kazunori INOUE wrote: > > On Tue, Aug 2, 2022 at 11:09 PM Ken Gaillot wrote: > > > > On Tue, 2022-08-02 at 19:13 +0900, 井上和徳 wrote: > > > Hi, > > > > > > Since O_DIRECT is not specified in open() [1], it reads the buffer > > > cache and > > > may result in a false

Re: [ClusterLabs] Antw: [EXT] Re: Q: About a false negative of storage_mon

2022-08-03 Thread Klaus Wenninger
On Wed, Aug 3, 2022 at 4:02 PM Ulrich Windl wrote: > > >>> Klaus Wenninger schrieb am 03.08.2022 um 15:51 in > Nachricht > : > > On Tue, Aug 2, 2022 at 4:10 PM Ken Gaillot wrote: > >> > >> On Tue, 2022-08-02 at 19:13 +0900, 井上和徳 wrote: > >&g

Re: [ClusterLabs] Q: About a false negative of storage_mon

2022-08-03 Thread Klaus Wenninger
On Tue, Aug 2, 2022 at 4:10 PM Ken Gaillot wrote: > > On Tue, 2022-08-02 at 19:13 +0900, 井上和徳 wrote: > > Hi, > > > > Since O_DIRECT is not specified in open() [1], it reads the buffer > > cache and > > may result in a false negative. I fear that this possibility > > increases > > in environments

Re: [ClusterLabs] pacemaker-fenced[11637]: warning: Can't create a sane reply

2022-06-22 Thread Klaus Wenninger
On Wed, Jun 22, 2022 at 1:46 PM Priyanka Balotra wrote: > > Hi All, > > We are seeing an issue where we performed cluster shutdown followed by > cluster boot operation. All the nodes joined the cluster excet one (the first > node). Here are some pacemaker logs around that timestamp: > >

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: Why not retry a monitor (pacemaker‑execd) that got a segmentation fault?

2022-06-15 Thread Klaus Wenninger
On Wed, Jun 15, 2022 at 2:10 PM Ulrich Windl wrote: > > >>> Klaus Wenninger schrieb am 15.06.2022 um 13:22 in > Nachricht > : > > On Wed, Jun 15, 2022 at 10:33 AM Ulrich Windl > > wrote: > >> > > ... > > >> (As said abov

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Why not retry a monitor (pacemaker‑execd) that got a segmentation fault?

2022-06-15 Thread Klaus Wenninger
On Wed, Jun 15, 2022 at 10:33 AM Ulrich Windl wrote: > > >>> Klaus Wenninger schrieb am 15.06.2022 um 10:00 in > Nachricht > : > > On Wed, Jun 15, 2022 at 8:32 AM Ulrich Windl > > wrote: > >> > >> >>> Ulrich Windl schri

Re: [ClusterLabs] Antw: [EXT] Re: Why not retry a monitor (pacemaker‑execd) that got a segmentation fault?

2022-06-15 Thread Klaus Wenninger
On Wed, Jun 15, 2022 at 8:32 AM Ulrich Windl wrote: > > >>> Ulrich Windl schrieb am 14.06.2022 um 15:53 in Nachricht <62A892F0.174 : > >>> 161 : > 60728>: > > ... > > Yes it's odd, but isn't the cluster just to protect us from odd situations? > > ;-) > > I have more odd stuff: > Jun 14 20:40:09

Re: [ClusterLabs] fencing configuration

2022-06-07 Thread Klaus Wenninger
On Tue, Jun 7, 2022 at 10:27 AM Zoran Bošnjak wrote: > > Hi, I need some help with correct fencing configuration in 5-node cluster. > > The speciffic issue is that there are 3 rooms, where in addition to node > failure scenario, each room can fail too (for example in case of room power >

Re: [ClusterLabs] Antw: [EXT] Re: normal reboot with active sbd does not work

2022-06-07 Thread Klaus Wenninger
On Tue, Jun 7, 2022 at 7:53 AM Ulrich Windl wrote: > > >>> Andrei Borzenkov schrieb am 03.06.2022 um 17:04 in > Nachricht <99f7746a-c962-33bb-6737-f88ba0128...@gmail.com>: > > On 03.06.2022 16:51, Zoran Bošnjak wrote: > >> Thanks for all your answers. Sorry, my mistake. The ipmi_watchdog is

Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Klaus Wenninger
On Fri, Jun 3, 2022 at 3:51 PM Zoran Bošnjak wrote: > > Thanks for all your answers. Sorry, my mistake. The ipmi_watchdog is indeed > OK. I was first experimenting with "softdog", which is blacklisted. So the > reasonable question is how to properly start "softdog" on ubuntu. > > The reason to

Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Klaus Wenninger
On Fri, Jun 3, 2022 at 11:03 AM Klaus Wenninger wrote: > > On Fri, Jun 3, 2022 at 10:19 AM Zoran Bošnjak wrote: > > > > Hi all, > > I would appreciate an advice about sbd fencing (without shared storage). > > > > I am using ubuntu 20.04., with default packages

Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Klaus Wenninger
On Fri, Jun 3, 2022 at 10:19 AM Zoran Bošnjak wrote: > > Hi all, > I would appreciate an advice about sbd fencing (without shared storage). > > I am using ubuntu 20.04., with default packages from the repository > (pacemaker, corosync, fence-agents, ipmitool, pcs...). > > HW watchdog is present

Re: [ClusterLabs] Antw: [EXT] Re: Cluster unable to find back together

2022-05-23 Thread Klaus Wenninger
On Fri, May 20, 2022 at 7:43 AM Ulrich Windl wrote: > > >>> Jan Friesse schrieb am 19.05.2022 um 14:55 in > Nachricht > <1abb8468-6619-329f-cb01-3f51112db...@redhat.com>: > > Hi, > > > > On 19/05/2022 10:16, Leditzky, Fabian via Users wrote: > >> Hello > >> > >> We have been dealing with our

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: How to clean up a failed fencing operation?

2022-05-13 Thread Klaus Wenninger
On Fri, May 13, 2022 at 12:12 PM Ulrich Windl wrote: > > >>> Klaus Wenninger schrieb am 13.05.2022 um 08:22 in > Nachricht > : > > On Tue, May 3, 2022 at 11:53 AM Ulrich Windl > > wrote: > >> > >> >>> Reid Wahl schrieb am 03.05.20

Re: [ClusterLabs] Antw: [EXT] Re: Q: How to clean up a failed fencing operation?

2022-05-13 Thread Klaus Wenninger
On Tue, May 3, 2022 at 11:53 AM Ulrich Windl wrote: > > >>> Reid Wahl schrieb am 03.05.2022 um 10:16 in Nachricht > : > > On Tue, May 3, 2022 at 12:36 AM Ulrich Windl > > wrote: > >> > >> Hi! > >> > >> I'm familiar with cleaning up various failed resource actions via > > "crm_resource ‑C ‑r

Re: [ClusterLabs] Can a two node cluster start resources if only one node is booted?

2022-04-22 Thread Klaus Wenninger
On Thu, Apr 21, 2022 at 8:18 PM john tillman wrote: > > > On 21.04.2022 18:26, john tillman wrote: > >>> Dne 20. 04. 22 v 20:21 john tillman napsal(a): > > On 20.04.2022 19:53, john tillman wrote: > >> I have a two node cluster that won't start any resources if only one > >> node >

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in 2.1.3: node health monitoring improvements

2022-04-14 Thread Klaus Wenninger
On Thu, Apr 14, 2022 at 7:57 AM Ulrich Windl wrote: > > Ken, > > thanks for thje explanations! Maybe it would be best (next time) if you > present the documentation for a new feature first (as a base for discussion), > and _then_ implement it. > I know: People first implement it, and later, if

Re: [ClusterLabs] Resources too_active (active on all nodes of the cluster, instead of only 1 node)

2022-03-29 Thread Klaus Wenninger
On Thu, Mar 24, 2022 at 4:12 PM Ken Gaillot wrote: > > On Wed, 2022-03-23 at 05:30 +, Balotra, Priyanka wrote: > > Hi All, > > > > We have a scenario on SLES 12 SP3 cluster. > > The scenario is explained as follows in the order of events: > > There is a 2-node cluster (FILE-1, FILE-2) > >

Re: [ClusterLabs] Request for ideas: Cluster node summary in 14 characters

2022-03-17 Thread Klaus Wenninger
On Thu, Mar 17, 2022 at 4:16 PM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote: > Hi! > > I had the idea to display the status of a cluster node on the 14-character > LCD display of a Dell PowerEdge server; preferably displaying the hostname > at least partially, too ;-) > > Now, what

Re: [ClusterLabs] Noticed oddity when DC is going to be fenced

2022-03-01 Thread Klaus Wenninger
On Tue, Mar 1, 2022 at 10:05 AM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote: > Hi! > > For current SLES15 SP3 I noticed an oddity when the node running the DC is > going to be fenced: > It seems that another node is performing recovery operations while the old > DC is not confirmed to

Re: [ClusterLabs] Q: fence_kdump and fence_kdump_send

2022-02-28 Thread Klaus Wenninger
On Mon, Feb 28, 2022 at 2:46 PM Klaus Wenninger wrote: > > > On Sat, Feb 26, 2022 at 7:14 AM Strahil Nikolov via Users < > users@clusterlabs.org> wrote: > >> I always used this one for triggering kdump when using sbd: >> https://www.suse.com/support/kb/doc/?

Re: [ClusterLabs] Q: fence_kdump and fence_kdump_send

2022-02-28 Thread Klaus Wenninger
On Sat, Feb 26, 2022 at 7:14 AM Strahil Nikolov via Users < users@clusterlabs.org> wrote: > I always used this one for triggering kdump when using sbd: > https://www.suse.com/support/kb/doc/?id=19873 > > On Fri, Feb 25, 2022 at 21:34, Reid Wahl > wrote: > On Fri, Feb 25, 2022 at 3:47 AM

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

2022-02-17 Thread Klaus Wenninger
On Thu, Feb 17, 2022 at 12:38 PM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote: > >>> Klaus Wenninger schrieb am 17.02.2022 um 10:49 > in > Nachricht > : > ... > >> For completeness: Yes, sbd did recover: > >> Feb 14 13:01:42 h18 sbd[

Re: [ClusterLabs] Antw: [EXT] Re: Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

2022-02-17 Thread Klaus Wenninger
On Thu, Feb 17, 2022 at 10:14 AM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote: > >>> Klaus Wenninger schrieb am 16.02.2022 um 16:59 > in > Nachricht > : > > On Wed, Feb 16, 2022 at 4:26 PM Klaus Wenninger > wrote: > > > >> >

Re: [ClusterLabs] Antw: [EXT] Re: Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

2022-02-17 Thread Klaus Wenninger
On Thu, Feb 17, 2022 at 9:27 AM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote: > >>> Klaus Wenninger schrieb am 16.02.2022 um 16:26 > in > Nachricht > : > > On Wed, Feb 16, 2022 at 3:09 PM Ulrich Windl < > > ulrich.wi...@rz.uni-regensburg.de>

Re: [ClusterLabs] Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

2022-02-16 Thread Klaus Wenninger
On Wed, Feb 16, 2022 at 4:59 PM Klaus Wenninger wrote: > > > On Wed, Feb 16, 2022 at 4:26 PM Klaus Wenninger > wrote: > >> >> >> On Wed, Feb 16, 2022 at 3:09 PM Ulrich Windl < >> ulrich.wi...@rz.uni-regensburg.de> wrote: >> >>>

Re: [ClusterLabs] Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

2022-02-16 Thread Klaus Wenninger
On Wed, Feb 16, 2022 at 4:26 PM Klaus Wenninger wrote: > > > On Wed, Feb 16, 2022 at 3:09 PM Ulrich Windl < > ulrich.wi...@rz.uni-regensburg.de> wrote: > >> Hi! >> >> When changing some FC cables I noticed that sbd complained 2 seconds >> after the

Re: [ClusterLabs] Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

2022-02-16 Thread Klaus Wenninger
On Wed, Feb 16, 2022 at 3:09 PM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote: > Hi! > > When changing some FC cables I noticed that sbd complained 2 seconds after > the connection went down (event though the device is multi-pathed with > other paths being still up). > I don't know any

Re: [ClusterLabs] ethernet link up/down - ?

2022-02-16 Thread Klaus Wenninger
On Tue, Feb 15, 2022 at 5:25 PM lejeczek via Users wrote: > > > On 07/02/2022 19:21, Antony Stone wrote: > > On Monday 07 February 2022 at 20:09:02, lejeczek via Users wrote: > > > >> Hi guys > >> > >> How do you guys go about doing link up/down as a resource? > > I apply or remove addresses on

Re: [ClusterLabs] Antw: [EXT] Cluster Removing VIP and Not Following Order Constraint

2022-02-11 Thread Klaus Wenninger
On Fri, Feb 11, 2022 at 9:13 AM Strahil Nikolov via Users < users@clusterlabs.org> wrote: > Shouldn't you use kind ' Mandatory' and simetrical TRUE ? > > If true, the reverse of the constraint applies for the opposite action > (for example, if B starts after A starts, then B stops before A

Re: [ClusterLabs] Is there a python package for pacemaker ?

2022-02-03 Thread Klaus Wenninger
On Wed, Feb 2, 2022 at 7:06 PM Ken Gaillot wrote: > On Wed, 2022-02-02 at 18:46 +0100, Lentes, Bernd wrote: > > Hi, > > > > i need to write some scripts for our cluster. Until now i wrote bash > > scripts. > > But i like to learn python. Is there a package for pacemaker ? > > What i found is:

Re: [ClusterLabs] Antw: [EXT] Removing a resource without stopping it

2022-01-31 Thread Klaus Wenninger
On Mon, Jan 31, 2022 at 2:43 PM Jehan-Guillaume de Rorthais wrote: > On Mon, 31 Jan 2022 08:49:44 +0100 > Klaus Wenninger wrote: > ... > > Depending on the environment it might make sense to think about > > having the manual migration-step controlled by the cluster(s)

Re: [ClusterLabs] Antw: [EXT] Removing a resource without stopping it

2022-01-30 Thread Klaus Wenninger
On Mon, Jan 31, 2022 at 8:19 AM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote: > >>> Digimer schrieb am 28.01.2022 um 22:38 in Nachricht > : > > Hi all, > > > >I'm trying to figure out how to move a running VM from one pacemaker > > cluster to another. I've got the storage and VM

[ClusterLabs] sbd v1.5.1

2021-11-15 Thread Klaus Wenninger
Hi sbd - developers & users! Thanks to everybody for contributing to tests and further development. Changes since 1.5.1 - improve/fix cmdline handling - tell the actual watchdog device specified with -w - tolerate and strip any leading spaces of commandline option values - Sanitize

Re: [ClusterLabs] Fence node when network interface goes down

2021-11-15 Thread Klaus Wenninger
On Mon, Nov 15, 2021 at 12:19 PM Andrei Borzenkov wrote: > On Mon, Nov 15, 2021 at 1:18 PM Klaus Wenninger > wrote: > > > > > > > > On Mon, Nov 15, 2021 at 10:37 AM S Rogers > wrote: > >> > >> I had thought about doing that, but the cl

Re: [ClusterLabs] Fence node when network interface goes down

2021-11-15 Thread Klaus Wenninger
On Mon, Nov 15, 2021 at 10:37 AM S Rogers wrote: > I had thought about doing that, but the cluster is then dependent on the > external system, and if that external system was to go down or become > unreachable for any reason then it would falsely cause the cluster to > failover or worse it could

Re: [ClusterLabs] Antw: [EXT] Re: VirtualDomain & "deeper" monitors - what/how?

2021-10-26 Thread Klaus Wenninger
On Mon, Oct 25, 2021 at 9:34 PM Kyle O'Donnell wrote: > Finally got around to working on this. > > I spoke with someone on the #cluterslabs IRC channel who mentioned that > the monitor_scripts param does indeed run at some frequency (op monitor > timeout=? interval=?), not just during the

Re: [ClusterLabs] Antw: [EXT] DRBD split‑brain investigations, automatic fixes and manual intervention...

2021-10-20 Thread Klaus Wenninger
On Wed, Oct 20, 2021 at 12:06 PM Ian Diddams via Users < users@clusterlabs.org> wrote: > FWIW here is the basis for my implementation being the "best" and easily > followed drbd/clustering guide/explanantiojn I could find when I searched > > Lisenet.com :: Linux | Security | Networking | Admin

Re: [ClusterLabs] Trying to understand dampening (ping)

2021-10-15 Thread Klaus Wenninger
On Fri, Oct 15, 2021 at 12:01 PM Andrei Borzenkov wrote: > On Fri, Oct 15, 2021 at 9:25 AM Klaus Wenninger > wrote: > > > Main pain-point here is that ping-RA allows us to configure the count of > pings sent, but it > > is just using the exit-value from ping that becomes

Re: [ClusterLabs] Trying to understand dampening (ping)

2021-10-15 Thread Klaus Wenninger
On Thu, Oct 14, 2021 at 10:51 PM martin doc wrote: > > > -- > *From: *Andrei Borzenkov , Friday, 15 October 2021 > 4:59 AM > *...* > > Dampening defines delay before attributes are committed to CIB. > > Private attributes are never ever written into CIB, so dampening

Re: [ClusterLabs] Antw: [EXT] Trying to understand dampening (ping)

2021-10-14 Thread Klaus Wenninger
afair the idea of dampening isn't to configure the behavior of the cluster - like be robust against some kind of glitches. It is rather there to keep resources used to write content to the CIB under control. Klaus On Thu, Oct 14, 2021 at 8:29 AM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de>

Re: [ClusterLabs] Problem with high load (IO)

2021-09-28 Thread Klaus Wenninger
On Tue, Sep 28, 2021 at 11:23 AM Lentes, Bernd < bernd.len...@helmholtz-muenchen.de> wrote: > > > - On Sep 27, 2021, at 2:51 PM, Pacemaker ML users@clusterlabs.org > wrote: > > > I would use something liek this: > > > > ionice -c 2 -n 7 nice cp XXX YYY > > > > Best Regards, > > Strahil

Re: [ClusterLabs] Qemu VM resources - cannot acquire state change lock

2021-08-26 Thread Klaus Wenninger
On Thu, Aug 26, 2021 at 11:13 AM lejeczek via Users wrote: > Hi guys. > > I sometimes - I think I know when in terms of any pattern - > get resources stuck on one node (two-node cluster) with > these in libvirtd's logs: > ... > Cannot start job (query, none, none) for domain > c8kubermaster1;

Re: [ClusterLabs] Pacemaker problems with pingd

2021-08-05 Thread Klaus Wenninger
On Wed, Aug 4, 2021 at 5:30 PM Janusz Jaskiewicz < janusz.jaskiew...@gmail.com> wrote: > Hello. > > Please forgive the length of this email but I wanted to provide as much > details as possible. > > I'm trying to set up a cluster of two nodes for my service. > I have a problem with a scenario

Re: [ClusterLabs] Sub-clusters / super-clusters?

2021-08-03 Thread Klaus Wenninger
On Tue, Aug 3, 2021 at 10:41 AM Antony Stone wrote: > On Tuesday 11 May 2021 at 12:56:01, Strahil Nikolov wrote: > > > Here is the example I had promised: > > > > pcs node attribute server1 city=LA > > pcs node attribute server2 city=NY > > > > # Don't run on any node that is not in LA > > pcs

Re: [ClusterLabs] Antw: Re: [EXT] Re: Two node cluster without fencing and no split brain?

2021-07-23 Thread Klaus Wenninger
On Fri, Jul 23, 2021 at 8:55 AM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote: > >>> "john tillman" schrieb am 22.07.2021 um 16:48 in > Nachricht > <1175ffcec0033015e13d11d7821d5acb.squir...@mail.panix.com>: > > There was a lot of discussion on this topic which might have overshadowed

Re: [ClusterLabs] Antw: [EXT] Re: unexpected fenced node and promotion of the new master PAF ‑ postgres

2021-07-14 Thread Klaus Wenninger
oring unknown cluster health > > Jul 13 20:42:15 ltaoperdbs02 sbd[185357]: notice: inquisitor_child: > > Servant cluster is healthy (age: 0) > > Jul 13 20:42:15 ltaoperdbs02 sbd[185357]: notice: watchdog_init: Using > > watchdog device '/dev/watchdog' > >

  1   2   3   4   5   6   >