from:"Klaus Wenninger"

Re: [ClusterLabs] Fast-failover on 2 nodes + qnetd: qdevice connenction disrupted.

2024-05-06 Thread Klaus Wenninger

On Fri, May 3, 2024 at 8:59 PM  wrote:

> Hi,
>
> > > Also, I've done wireshark capture and found great mess in TCP, it
> > > seems like connection between qdevice and qnetd really stops for some
> > > time and packets won't deliver.
> >
> > Could you check UDP? I guess there is a lot of UDP packets sent by
> corosync
> > which probably makes TCP to not go thru.
> Very improbably.  UPD itself can't prevent TCP from working, and 1GB links
> seems too wide for corosync may overload it.
> Also, overload usually leads to SOME packets drop, but there absolutely
> other case: NO TCP packet passed, I got two captures from two side and I
> see
> that for some time each party sends TCP packets, but other party do not
> receive it at all.
>
> > >
> > > For my guess, it match corosync syncing activities, and I suspect that
> > > corosync prevent any other traffic on the interface it use for rings.
> > >
> > > As I switch qnetd and qdevice to use different interface it seems to
> > > work fine.
> >
> > Actually having dedicated interface just for corosync/knet traffic is
> optimal
> > solution. qdevice+qnetd on the other hand should be as close to
> "customer"
> as
> > possible.
> >
> I am sure qnetd is not intended to proof of network reachability, it only
> an
> arbiter to provide quorum resolution. Therefore, as for me it is better to
> keep it on the intra-cluster network with high priority transport. If we
> need to make a solution based on network reachability, there other ways to
> provide it.
>

This is an example how you could use network reachability to give
preference to a node with better reachability in a 2-node-fencing-race.
There is text in the code that should give you an idea how it is supposed
to work.
https://github.com/ClusterLabs/fence-agents/blob/main/agents/heuristics_ping/fence_heuristics_ping.py

If you think of combining with priority-fencing ...
Of course this idea can be applied for other ways of evaluation of a
running node. I did implement fence_heuristics_ping both for an
explicit use-case and to convey the basic idea back then - having in mind
that
others might come up with different examples.

Guess the main idea of having qdevice+qnetd outside of each of the
2 data-centers (if we're talking about a scenario of this kind) is to be
able to cover the case where one of these data-centers becomes
disconnected for whatever reason. Correct me please if there is more to it!
In this scenario you could use e.g. SBD watchdog-fencing to be able
to safely recover resources from a disconnected data-center (or site of any
kind) .

Klaus

> > So if you could have two interfaces (one just for corosync, second for
> > qnetd+qdevice+publicly accessible services) it might be a solution?
> >
> Yes, this way it works, but I wish to know WHY it won't work on the shared
> interface.
>
> > > So, the question is: does corosync really temporary blocks any other
> > > traffic on the interface it uses? Or it is just a coincidence? If it
> > > blocks, is
> >
> > Nope, no "blocking". But it sends quite some few UDP packets and I guess
> it can
> > really use all available bandwidth so no TCP goes thru.
> Use all available 1GBps? Impossible.
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix

2024-04-23 Thread Klaus Wenninger

On Tue, Apr 23, 2024 at 10:34 AM Klaus Wenninger 
wrote:

>
>
> On Tue, Apr 23, 2024 at 9:53 AM NOLIBOS Christophe <
> christophe.noli...@thalesgroup.com> wrote:
>
>> Classified as: {OPEN}
>>
>>
>>
>> Other strange thing.
>>
>> On RHEL 7, corosync is restarted while the “Restart=on-failure » line is
>> commented.
>>
>> I think also that something changed in the pacemaker behavior, or
>> somewhere else.
>>
>
> That is how it was working before introduction of the reconnection to
> corosync.
> Previously pacemaker would fail and systemd would restart it checking the
> services
> pacemaker depends on. And finding corosync not running it would be
> restarted.
>

>From what I've read there has been a change in how systemd is handling
restart
of dependent services a while back as well. So changed behavior can come
from
that as well. Just for completeness ...

Klaus

>
> Klaus
>
>
>>
>>
>> *De :* Klaus Wenninger 
>> *Envoyé :* lundi 22 avril 2024 12:41
>> *À :* NOLIBOS Christophe 
>> *Cc :* Cluster Labs - All topics related to open-source clustering
>> welcomed 
>> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
>> crash" fix
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Apr 22, 2024 at 12:32 PM NOLIBOS Christophe <
>> christophe.noli...@thalesgroup.com> wrote:
>>
>> Classified as: {OPEN}
>>
>>
>>
>> You are right : the “Restart=on-failure” line is commented and so,
>> disabled per default.
>>
>> Uncommenting it resolves my issue.
>>
>>
>>
>> Maybe pacemaker changed behavior here without syncing enough with
>> corosync behavior.
>>
>> We'll look into that to see which approach is better - restart corosync
>> on failure - or have
>>
>> pacemaker be restarted by systemd which should in turn restart corosync
>> as well.
>>
>>
>>
>> Klaus
>>
>>
>>
>> Thanks a lot.
>>
>> Christophe.
>>
>>
>>
>> *De :* Klaus Wenninger 
>> *Envoyé :* lundi 22 avril 2024 11:06
>> *À :* NOLIBOS Christophe 
>> *Cc :* Cluster Labs - All topics related to open-source clustering
>> welcomed 
>> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
>> crash" fix
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Apr 22, 2024 at 9:51 AM NOLIBOS Christophe <
>> christophe.noli...@thalesgroup.com> wrote:
>>
>> Classified as: {OPEN}
>>
>>
>>
>> ‘kill -9’ command.
>>
>> Is it gracefully exit?
>>
>>
>>
>> Looking as if corosync-unit-file has Restart=on-failure disabled per
>> default.
>>
>> I'm not aware of another mechanism that would restart corosync and I
>>
>> think default behavior is not to restart.
>>
>> Comments suggest just to enable if using watchdog but that might just
>>
>> reference the RestartSec to provoke a watchdog-reboot instead of a
>>
>> restart via systemd.
>>
>> Any signal that isn't handled by the process - so that the exit-code could
>>
>> be set to 0 - should be fine.
>>
>>
>>
>> Klaus
>>
>>
>>
>>
>>
>> *De :* Klaus Wenninger 
>> *Envoyé :* jeudi 18 avril 2024 20:17
>> *À :* NOLIBOS Christophe 
>> *Cc :* Cluster Labs - All topics related to open-source clustering
>> welcomed 
>> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
>> crash" fix
>>
>>
>>
>>
>>
>> NOLIBOS Christophe  schrieb am Do.,
>> 18. Apr. 2024, 19:01:
>>
>> Classified as: {OPEN}
>>
>>
>>
>> Hummm… my RHEL 8.8 OS has been hardened.
>>
>> I am wondering if the problem does not come from that.
>>
>>
>>
>> On another side, I get the same issue (i.e. corosync not restarted by
>> system) with Pacemaker 2.1.5-8 deployed on RHEL 8.4 (not hardened).
>>
>>
>>
>> I’m checking.
>>
>>
>>
>> How did, you kill corosync? If it exits gracefully might not be
>> restarted. Check journal. Sry cant try am on my mobile ATM. Klaus
>>
>>
>>
>>
>>
>> {OPEN}
>>
>>
>>
>> {OPEN}
>>
>>
>>
>> {OPEN}
>>
>>
>>
>> {OPEN}
>>
>> *De :* Users  *De la part de* NOLIBOS
>> Christophe via Users
>> *Envoyé :* jeudi 18 avril 2024 18:34
>>

Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix

2024-04-23 Thread Klaus Wenninger

On Tue, Apr 23, 2024 at 9:53 AM NOLIBOS Christophe <
christophe.noli...@thalesgroup.com> wrote:

> Classified as: {OPEN}
>
>
>
> Other strange thing.
>
> On RHEL 7, corosync is restarted while the “Restart=on-failure » line is
> commented.
>
> I think also that something changed in the pacemaker behavior, or
> somewhere else.
>

That is how it was working before introduction of the reconnection to
corosync.
Previously pacemaker would fail and systemd would restart it checking the
services
pacemaker depends on. And finding corosync not running it would be
restarted.

Klaus


>
>
> *De :* Klaus Wenninger 
> *Envoyé :* lundi 22 avril 2024 12:41
> *À :* NOLIBOS Christophe 
> *Cc :* Cluster Labs - All topics related to open-source clustering
> welcomed 
> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
> crash" fix
>
>
>
>
>
>
>
> On Mon, Apr 22, 2024 at 12:32 PM NOLIBOS Christophe <
> christophe.noli...@thalesgroup.com> wrote:
>
> Classified as: {OPEN}
>
>
>
> You are right : the “Restart=on-failure” line is commented and so,
> disabled per default.
>
> Uncommenting it resolves my issue.
>
>
>
> Maybe pacemaker changed behavior here without syncing enough with corosync
> behavior.
>
> We'll look into that to see which approach is better - restart corosync on
> failure - or have
>
> pacemaker be restarted by systemd which should in turn restart corosync as
> well.
>
>
>
> Klaus
>
>
>
> Thanks a lot.
>
> Christophe.
>
>
>
> *De :* Klaus Wenninger 
> *Envoyé :* lundi 22 avril 2024 11:06
> *À :* NOLIBOS Christophe 
> *Cc :* Cluster Labs - All topics related to open-source clustering
> welcomed 
> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
> crash" fix
>
>
>
>
>
>
>
> On Mon, Apr 22, 2024 at 9:51 AM NOLIBOS Christophe <
> christophe.noli...@thalesgroup.com> wrote:
>
> Classified as: {OPEN}
>
>
>
> ‘kill -9’ command.
>
> Is it gracefully exit?
>
>
>
> Looking as if corosync-unit-file has Restart=on-failure disabled per
> default.
>
> I'm not aware of another mechanism that would restart corosync and I
>
> think default behavior is not to restart.
>
> Comments suggest just to enable if using watchdog but that might just
>
> reference the RestartSec to provoke a watchdog-reboot instead of a
>
> restart via systemd.
>
> Any signal that isn't handled by the process - so that the exit-code could
>
> be set to 0 - should be fine.
>
>
>
> Klaus
>
>
>
>
>
> *De :* Klaus Wenninger 
> *Envoyé :* jeudi 18 avril 2024 20:17
> *À :* NOLIBOS Christophe 
> *Cc :* Cluster Labs - All topics related to open-source clustering
> welcomed 
> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
> crash" fix
>
>
>
>
>
> NOLIBOS Christophe  schrieb am Do.,
> 18. Apr. 2024, 19:01:
>
> Classified as: {OPEN}
>
>
>
> Hummm… my RHEL 8.8 OS has been hardened.
>
> I am wondering if the problem does not come from that.
>
>
>
> On another side, I get the same issue (i.e. corosync not restarted by
> system) with Pacemaker 2.1.5-8 deployed on RHEL 8.4 (not hardened).
>
>
>
> I’m checking.
>
>
>
> How did, you kill corosync? If it exits gracefully might not be restarted.
> Check journal. Sry cant try am on my mobile ATM. Klaus
>
>
>
>
>
> {OPEN}
>
>
>
> {OPEN}
>
>
>
> {OPEN}
>
>
>
> {OPEN}
>
> *De :* Users  *De la part de* NOLIBOS
> Christophe via Users
> *Envoyé :* jeudi 18 avril 2024 18:34
> *À :* Klaus Wenninger ; Cluster Labs - All topics
> related to open-source clustering welcomed 
> *Cc :* NOLIBOS Christophe 
> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
> crash" fix
>
>
>
> Classified as: {OPEN}
>
>
>
> So, the issue is on systemd?
>
>
>
> If I run the same test on RHEL 7 (3.10.0-693.11.1.el7) with pacemaker
> 1.1.13-10, corosync is correctly restarted by systemd.
>
>
>
> [RHEL7 ~]# journalctl -f
>
> -- Logs begin at Wed 2024-01-03 13:15:41 UTC. --
>
> Apr 18 16:26:55 - systemd[1]: corosync.service failed.
>
> Apr 18 16:26:55 - systemd[1]: pacemaker.service holdoff time over,
> scheduling restart.
>
> Apr 18 16:26:55 - systemd[1]: Starting Corosync Cluster Engine...
>
> Apr 18 16:26:55 - corosync[12179]: Starting Corosync Cluster Engine
> (corosync): [  OK  ]
>
> Apr 18 16:26:55 - systemd[1]: Started Corosync Cluster Engine.
>
> Apr 18 16:26:55 - systemd[1]: S

Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix

2024-04-22 Thread Klaus Wenninger

On Mon, Apr 22, 2024 at 12:32 PM NOLIBOS Christophe <
christophe.noli...@thalesgroup.com> wrote:

> Classified as: {OPEN}
>
>
>
> You are right : the “Restart=on-failure” line is commented and so,
> disabled per default.
>
> Uncommenting it resolves my issue.
>

Maybe pacemaker changed behavior here without syncing enough with corosync
behavior.
We'll look into that to see which approach is better - restart corosync on
failure - or have
pacemaker be restarted by systemd which should in turn restart corosync as
well.

Klaus

>
>
> Thanks a lot.
>
> Christophe.
>
>
>
> *De :* Klaus Wenninger 
> *Envoyé :* lundi 22 avril 2024 11:06
> *À :* NOLIBOS Christophe 
> *Cc :* Cluster Labs - All topics related to open-source clustering
> welcomed 
> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
> crash" fix
>
>
>
>
>
>
>
> On Mon, Apr 22, 2024 at 9:51 AM NOLIBOS Christophe <
> christophe.noli...@thalesgroup.com> wrote:
>
> Classified as: {OPEN}
>
>
>
> ‘kill -9’ command.
>
> Is it gracefully exit?
>
>
>
> Looking as if corosync-unit-file has Restart=on-failure disabled per
> default.
>
> I'm not aware of another mechanism that would restart corosync and I
>
> think default behavior is not to restart.
>
> Comments suggest just to enable if using watchdog but that might just
>
> reference the RestartSec to provoke a watchdog-reboot instead of a
>
> restart via systemd.
>
> Any signal that isn't handled by the process - so that the exit-code could
>
> be set to 0 - should be fine.
>
>
>
> Klaus
>
>
>
>
>
> *De :* Klaus Wenninger 
> *Envoyé :* jeudi 18 avril 2024 20:17
> *À :* NOLIBOS Christophe 
> *Cc :* Cluster Labs - All topics related to open-source clustering
> welcomed 
> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
> crash" fix
>
>
>
>
>
> NOLIBOS Christophe  schrieb am Do.,
> 18. Apr. 2024, 19:01:
>
> Classified as: {OPEN}
>
>
>
> Hummm… my RHEL 8.8 OS has been hardened.
>
> I am wondering if the problem does not come from that.
>
>
>
> On another side, I get the same issue (i.e. corosync not restarted by
> system) with Pacemaker 2.1.5-8 deployed on RHEL 8.4 (not hardened).
>
>
>
> I’m checking.
>
>
>
> How did, you kill corosync? If it exits gracefully might not be restarted.
> Check journal. Sry cant try am on my mobile ATM. Klaus
>
>
>
>
>
> {OPEN}
>
>
>
> {OPEN}
>
>
>
> {OPEN}
>
> *De :* Users  *De la part de* NOLIBOS
> Christophe via Users
> *Envoyé :* jeudi 18 avril 2024 18:34
> *À :* Klaus Wenninger ; Cluster Labs - All topics
> related to open-source clustering welcomed 
> *Cc :* NOLIBOS Christophe 
> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
> crash" fix
>
>
>
> Classified as: {OPEN}
>
>
>
> So, the issue is on systemd?
>
>
>
> If I run the same test on RHEL 7 (3.10.0-693.11.1.el7) with pacemaker
> 1.1.13-10, corosync is correctly restarted by systemd.
>
>
>
> [RHEL7 ~]# journalctl -f
>
> -- Logs begin at Wed 2024-01-03 13:15:41 UTC. --
>
> Apr 18 16:26:55 - systemd[1]: corosync.service failed.
>
> Apr 18 16:26:55 - systemd[1]: pacemaker.service holdoff time over,
> scheduling restart.
>
> Apr 18 16:26:55 - systemd[1]: Starting Corosync Cluster Engine...
>
> Apr 18 16:26:55 - corosync[12179]: Starting Corosync Cluster Engine
> (corosync): [  OK  ]
>
> Apr 18 16:26:55 - systemd[1]: Started Corosync Cluster Engine.
>
> Apr 18 16:26:55 - systemd[1]: Started Pacemaker High Availability Cluster
> Manager.
>
> Apr 18 16:26:55 - systemd[1]: Starting Pacemaker High Availability Cluster
> Manager...
>
> Apr 18 16:26:55 - pacemakerd[12192]:   notice: Additional logging
> available in /var/log/pacemaker.log
>
> Apr 18 16:26:55 - pacemakerd[12192]:   notice: Switching to
> /var/log/cluster/corosync.log
>
> Apr 18 16:26:55 - pacemakerd[12192]:   notice: Additional logging
> available in /var/log/cluster/corosync.log
>
>
>
> *De :* Klaus Wenninger 
> *Envoyé :* jeudi 18 avril 2024 18:12
> *À :* NOLIBOS Christophe ; Cluster
> Labs - All topics related to open-source clustering welcomed <
> users@clusterlabs.org>
> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
> crash" fix
>
>
>
>
>
>
>
> On Thu, Apr 18, 2024 at 6:09 PM Klaus Wenninger 
> wrote:
>
>
>
>
>
> On Thu, Apr 18, 2024 at 6:06 PM NOLIBOS Christophe <
> christophe.noli...@thalesgroup.com> wrote:
>

Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix

2024-04-22 Thread Klaus Wenninger

On Mon, Apr 22, 2024 at 9:51 AM NOLIBOS Christophe <
christophe.noli...@thalesgroup.com> wrote:

> Classified as: {OPEN}
>
>
>
> ‘kill -9’ command.
>
> Is it gracefully exit?
>

Looking as if corosync-unit-file has Restart=on-failure disabled per
default.
I'm not aware of another mechanism that would restart corosync and I
think default behavior is not to restart.
Comments suggest just to enable if using watchdog but that might just
reference the RestartSec to provoke a watchdog-reboot instead of a
restart via systemd.
Any signal that isn't handled by the process - so that the exit-code could
be set to 0 - should be fine.

Klaus


>
> *De :* Klaus Wenninger 
> *Envoyé :* jeudi 18 avril 2024 20:17
> *À :* NOLIBOS Christophe 
> *Cc :* Cluster Labs - All topics related to open-source clustering
> welcomed 
> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
> crash" fix
>
>
>
>
>
> NOLIBOS Christophe  schrieb am Do.,
> 18. Apr. 2024, 19:01:
>
> Classified as: {OPEN}
>
>
>
> Hummm… my RHEL 8.8 OS has been hardened.
>
> I am wondering if the problem does not come from that.
>
>
>
> On another side, I get the same issue (i.e. corosync not restarted by
> system) with Pacemaker 2.1.5-8 deployed on RHEL 8.4 (not hardened).
>
>
>
> I’m checking.
>
>
>
> How did, you kill corosync? If it exits gracefully might not be restarted.
> Check journal. Sry cant try am on my mobile ATM. Klaus
>
>
>
>
>
> {OPEN}
>
>
>
> {OPEN}
>
> *De :* Users  *De la part de* NOLIBOS
> Christophe via Users
> *Envoyé :* jeudi 18 avril 2024 18:34
> *À :* Klaus Wenninger ; Cluster Labs - All topics
> related to open-source clustering welcomed 
> *Cc :* NOLIBOS Christophe 
> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
> crash" fix
>
>
>
> Classified as: {OPEN}
>
>
>
> So, the issue is on systemd?
>
>
>
> If I run the same test on RHEL 7 (3.10.0-693.11.1.el7) with pacemaker
> 1.1.13-10, corosync is correctly restarted by systemd.
>
>
>
> [RHEL7 ~]# journalctl -f
>
> -- Logs begin at Wed 2024-01-03 13:15:41 UTC. --
>
> Apr 18 16:26:55 - systemd[1]: corosync.service failed.
>
> Apr 18 16:26:55 - systemd[1]: pacemaker.service holdoff time over,
> scheduling restart.
>
> Apr 18 16:26:55 - systemd[1]: Starting Corosync Cluster Engine...
>
> Apr 18 16:26:55 - corosync[12179]: Starting Corosync Cluster Engine
> (corosync): [  OK  ]
>
> Apr 18 16:26:55 - systemd[1]: Started Corosync Cluster Engine.
>
> Apr 18 16:26:55 - systemd[1]: Started Pacemaker High Availability Cluster
> Manager.
>
> Apr 18 16:26:55 - systemd[1]: Starting Pacemaker High Availability Cluster
> Manager...
>
> Apr 18 16:26:55 - pacemakerd[12192]:   notice: Additional logging
> available in /var/log/pacemaker.log
>
> Apr 18 16:26:55 - pacemakerd[12192]:   notice: Switching to
> /var/log/cluster/corosync.log
>
> Apr 18 16:26:55 - pacemakerd[12192]:   notice: Additional logging
> available in /var/log/cluster/corosync.log
>
>
>
> *De :* Klaus Wenninger 
> *Envoyé :* jeudi 18 avril 2024 18:12
> *À :* NOLIBOS Christophe ; Cluster
> Labs - All topics related to open-source clustering welcomed <
> users@clusterlabs.org>
> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
> crash" fix
>
>
>
>
>
>
>
> On Thu, Apr 18, 2024 at 6:09 PM Klaus Wenninger 
> wrote:
>
>
>
>
>
> On Thu, Apr 18, 2024 at 6:06 PM NOLIBOS Christophe <
> christophe.noli...@thalesgroup.com> wrote:
>
> Classified as: {OPEN}
>
>
>
> Well… why do you say that « Well if corosync isn't  there that this is to
> be expected and pacemaker won't recover corosync.”?
>
> In my mind, Corosync is managed by Pacemaker as any other cluster resource
> and the "pacemakerd: recover properly from > Corosync crash" fix
> implemented in version 2.1.2 seems confirm that.
>
>
>
> Nope. Startup of the stack is done by systemd. And pacemaker is just
> started after corosync is up and
>
> systemd should be responsible for keeping the stack up.
>
> For completeness: if you have sbd in the mix that is as well being started
> by systemd but kind of
>
> parallel with corosync as part of it (systemd terminology).
>
>
>
> The "recover" above is referring to pacemaker recovering from corosync
> going away and coming back.
>
>
>
>
>
> Klaus
>
>
>
>
>
> {OPEN}
>
>
>
> {OPEN}
>
> *De :* NOLIBOS Christophe
> *Envoyé :* jeudi 18 avril 2024 17:56
> *À :* 'Klaus Wenninger' ; Cluste

Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix

2024-04-18 Thread Klaus Wenninger

NOLIBOS Christophe  schrieb am Do., 18.
Apr. 2024, 19:01:

> Classified as: {OPEN}
>
>
>
> Hummm… my RHEL 8.8 OS has been hardened.
>
> I am wondering if the problem does not come from that.
>
>
>
> On another side, I get the same issue (i.e. corosync not restarted by
> system) with Pacemaker 2.1.5-8 deployed on RHEL 8.4 (not hardened).
>
>
>
> I’m checking.
>
>
>
How did, you kill corosync? If it exits gracefully might not be restarted.
Check journal. Sry cant try am on my mobile ATM. Klaus


>
> {OPEN}
>
> *De :* Users  *De la part de* NOLIBOS
> Christophe via Users
> *Envoyé :* jeudi 18 avril 2024 18:34
> *À :* Klaus Wenninger ; Cluster Labs - All topics
> related to open-source clustering welcomed 
> *Cc :* NOLIBOS Christophe 
> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
> crash" fix
>
>
>
> Classified as: {OPEN}
>
>
>
> So, the issue is on systemd?
>
>
>
> If I run the same test on RHEL 7 (3.10.0-693.11.1.el7) with pacemaker
> 1.1.13-10, corosync is correctly restarted by systemd.
>
>
>
> [RHEL7 ~]# journalctl -f
>
> -- Logs begin at Wed 2024-01-03 13:15:41 UTC. --
>
> Apr 18 16:26:55 - systemd[1]: corosync.service failed.
>
> Apr 18 16:26:55 - systemd[1]: pacemaker.service holdoff time over,
> scheduling restart.
>
> Apr 18 16:26:55 - systemd[1]: Starting Corosync Cluster Engine...
>
> Apr 18 16:26:55 - corosync[12179]: Starting Corosync Cluster Engine
> (corosync): [  OK  ]
>
> Apr 18 16:26:55 - systemd[1]: Started Corosync Cluster Engine.
>
> Apr 18 16:26:55 - systemd[1]: Started Pacemaker High Availability Cluster
> Manager.
>
> Apr 18 16:26:55 - systemd[1]: Starting Pacemaker High Availability Cluster
> Manager...
>
> Apr 18 16:26:55 - pacemakerd[12192]:   notice: Additional logging
> available in /var/log/pacemaker.log
>
> Apr 18 16:26:55 - pacemakerd[12192]:   notice: Switching to
> /var/log/cluster/corosync.log
>
> Apr 18 16:26:55 - pacemakerd[12192]:   notice: Additional logging
> available in /var/log/cluster/corosync.log
>
>
>
> *De :* Klaus Wenninger 
> *Envoyé :* jeudi 18 avril 2024 18:12
> *À :* NOLIBOS Christophe ; Cluster
> Labs - All topics related to open-source clustering welcomed <
> users@clusterlabs.org>
> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
> crash" fix
>
>
>
>
>
>
>
> On Thu, Apr 18, 2024 at 6:09 PM Klaus Wenninger 
> wrote:
>
>
>
>
>
> On Thu, Apr 18, 2024 at 6:06 PM NOLIBOS Christophe <
> christophe.noli...@thalesgroup.com> wrote:
>
> Classified as: {OPEN}
>
>
>
> Well… why do you say that « Well if corosync isn't  there that this is to
> be expected and pacemaker won't recover corosync.”?
>
> In my mind, Corosync is managed by Pacemaker as any other cluster resource
> and the "pacemakerd: recover properly from > Corosync crash" fix
> implemented in version 2.1.2 seems confirm that.
>
>
>
> Nope. Startup of the stack is done by systemd. And pacemaker is just
> started after corosync is up and
>
> systemd should be responsible for keeping the stack up.
>
> For completeness: if you have sbd in the mix that is as well being started
> by systemd but kind of
>
> parallel with corosync as part of it (systemd terminology).
>
>
>
> The "recover" above is referring to pacemaker recovering from corosync
> going away and coming back.
>
>
>
>
>
> Klaus
>
>
>
>
>
> {OPEN}
>
>
>
> {OPEN}
>
> *De :* NOLIBOS Christophe
> *Envoyé :* jeudi 18 avril 2024 17:56
> *À :* 'Klaus Wenninger' ; Cluster Labs - All topics
> related to open-source clustering welcomed 
> *Cc :* Ken Gaillot 
> *Objet :* RE: [ClusterLabs] "pacemakerd: recover properly from Corosync
> crash" fix
>
>
>
> Classified as: {OPEN}
>
>
>
>
>
> [~]$ systemctl status corosync
>
> ● corosync.service - Corosync Cluster Engine
>
>Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled;
> vendor preset: disabled)
>
>Active: failed (Result: signal) since Thu 2024-04-18 14:58:42 UTC;
> 53min ago
>
>  Docs: man:corosync
>
>man:corosync.conf
>
>man:corosync_overview
>
>   Process: 2027251 ExecStop=/usr/sbin/corosync-cfgtool -H --force
> (code=exited, status=0/SUCCESS)
>
>   Process: 1324906 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS
> (code=killed, signal=KILL)
>
> Main PID: 1324906 (code=killed, signal=KILL)
>
>
>
> Apr 18 13:16:04 - corosync[1324906]:   [QUORUM] Sync joined[1]: 1
>
> Apr 18 13

Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix

2024-04-18 Thread Klaus Wenninger

On Thu, Apr 18, 2024 at 6:09 PM Klaus Wenninger  wrote:

>
>
> On Thu, Apr 18, 2024 at 6:06 PM NOLIBOS Christophe <
> christophe.noli...@thalesgroup.com> wrote:
>
>> Classified as: {OPEN}
>>
>>
>>
>> Well… why do you say that « Well if corosync isn't  there that this is
>> to be expected and pacemaker won't recover corosync.”?
>>
>> In my mind, Corosync is managed by Pacemaker as any other cluster
>> resource and the "pacemakerd: recover properly from > Corosync crash" fix
>> implemented in version 2.1.2 seems confirm that.
>>
>
> Nope. Startup of the stack is done by systemd. And pacemaker is just
> started after corosync is up and
> systemd should be responsible for keeping the stack up.
> For completeness: if you have sbd in the mix that is as well being started
> by systemd but kind of
> parallel with corosync as part of it (systemd terminology).
>

The "recover" above is referring to pacemaker recovering from corosync
going away and coming back.


>
> Klaus
>
>>
>>
>>
>>
>> {OPEN}
>>
>> *De :* NOLIBOS Christophe
>> *Envoyé :* jeudi 18 avril 2024 17:56
>> *À :* 'Klaus Wenninger' ; Cluster Labs - All topics
>> related to open-source clustering welcomed 
>> *Cc :* Ken Gaillot 
>> *Objet :* RE: [ClusterLabs] "pacemakerd: recover properly from Corosync
>> crash" fix
>>
>>
>>
>> Classified as: {OPEN}
>>
>>
>>
>>
>>
>> [~]$ systemctl status corosync
>>
>> ● corosync.service - Corosync Cluster Engine
>>
>>Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled;
>> vendor preset: disabled)
>>
>>Active: failed (Result: signal) since Thu 2024-04-18 14:58:42 UTC;
>> 53min ago
>>
>>  Docs: man:corosync
>>
>>man:corosync.conf
>>
>>man:corosync_overview
>>
>>   Process: 2027251 ExecStop=/usr/sbin/corosync-cfgtool -H --force
>> (code=exited, status=0/SUCCESS)
>>
>>   Process: 1324906 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS
>> (code=killed, signal=KILL)
>>
>> Main PID: 1324906 (code=killed, signal=KILL)
>>
>>
>>
>> Apr 18 13:16:04 - corosync[1324906]:   [QUORUM] Sync joined[1]: 1
>>
>> Apr 18 13:16:04 - corosync[1324906]:   [TOTEM ] A new membership (1.1c8)
>> was formed. Members joined: 1
>>
>> Apr 18 13:16:04 - corosync[1324906]:   [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>>
>> Apr 18 13:16:04 - corosync[1324906]:   [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>>
>> Apr 18 13:16:04 - corosync[1324906]:   [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>>
>> Apr 18 13:16:04 - corosync[1324906]:   [QUORUM] Members[1]: 1
>>
>> Apr 18 13:16:04 - corosync[1324906]:   [MAIN  ] Completed service
>> synchronization, ready to provide service.
>>
>> Apr 18 13:16:04 - systemd[1]: Started Corosync Cluster Engine.
>>
>> Apr 18 14:58:42 - systemd[1]: corosync.service: Main process exited,
>> code=killed, status=9/KILL
>>
>> Apr 18 14:58:42 - systemd[1]: corosync.service: Failed with result
>> 'signal'.
>>
>> [~]$
>>
>>
>>
>>
>>
>> *De :* Klaus Wenninger 
>> *Envoyé :* jeudi 18 avril 2024 17:43
>> *À :* Cluster Labs - All topics related to open-source clustering
>> welcomed 
>> *Cc :* Ken Gaillot ; NOLIBOS Christophe <
>> christophe.noli...@thalesgroup.com>
>> *Objet :* Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
>> crash" fix
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Apr 18, 2024 at 5:07 PM NOLIBOS Christophe via Users <
>> users@clusterlabs.org> wrote:
>>
>> Classified as: {OPEN}
>>
>> I'm using RedHat 8.8 (4.18.0-477.21.1.el8_8.x86_64).
>> When I kill Corosync, no new corosync process is created and pacemaker is
>> in failure.
>> The only solution is to restart the pacemaker service.
>>
>> [~]$ pcs status
>> Error: unable to get cib
>> [~]$
>>
>> [~]$systemctl status pacemaker
>> ● pacemaker.service - Pacemaker High Availability Cluster Manager
>>Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled;
>> vendor preset: disabled)
>>Active: active (running) since Thu 2024-04-18 13:16:04 UTC; 1h 43min
>> ago
>>  Docs: man:pacemakerd
>>https://clusterlabs.org/pac

Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix

2024-04-18 Thread Klaus Wenninger

On Thu, Apr 18, 2024 at 5:07 PM NOLIBOS Christophe via Users <
users@clusterlabs.org> wrote:

> Classified as: {OPEN}
>
> I'm using RedHat 8.8 (4.18.0-477.21.1.el8_8.x86_64).
> When I kill Corosync, no new corosync process is created and pacemaker is
> in failure.
> The only solution is to restart the pacemaker service.
>
> [~]$ pcs status
> Error: unable to get cib
> [~]$
>
> [~]$systemctl status pacemaker
> ● pacemaker.service - Pacemaker High Availability Cluster Manager
>Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled;
> vendor preset: disabled)
>Active: active (running) since Thu 2024-04-18 13:16:04 UTC; 1h 43min ago
>  Docs: man:pacemakerd
>https://clusterlabs.org/pacemaker/doc/
>  Main PID: 1324923 (pacemakerd)
> Tasks: 91
>Memory: 132.1M
>CGroup: /system.slice/pacemaker.service
> ...
> Apr 18 14:59:02 - pacemakerd[1324923]:  crit: Could not connect to
> Corosync CFG: CS_ERR_LIBRARY
> Apr 18 14:59:03 - pacemakerd[1324923]:  crit: Could not connect to
> Corosync CFG: CS_ERR_LIBRARY
> Apr 18 14:59:04 - pacemakerd[1324923]:  crit: Could not connect to
> Corosync CFG: CS_ERR_LIBRARY
> Apr 18 14:59:05 - pacemakerd[1324923]:  crit: Could not connect to
> Corosync CFG: CS_ERR_LIBRARY
> Apr 18 14:59:06 - pacemakerd[1324923]:  crit: Could not connect to
> Corosync CFG: CS_ERR_LIBRARY
> Apr 18 14:59:07 - pacemakerd[1324923]:  crit: Could not connect to
> Corosync CFG: CS_ERR_LIBRARY
> Apr 18 14:59:08 - pacemakerd[1324923]:  crit: Could not connect to
> Corosync CFG: CS_ERR_LIBRARY
> Apr 18 14:59:09 - pacemakerd[1324923]:  crit: Could not connect to
> Corosync CFG: CS_ERR_LIBRARY
> Apr 18 14:59:10 - pacemakerd[1324923]:  crit: Could not connect to
> Corosync CFG: CS_ERR_LIBRARY
> Apr 18 14:59:11 - pacemakerd[1324923]:  crit: Could not connect to
> Corosync CFG: CS_ERR_LIBRARY
> [~]$
>
>
> Well if corosync isn't  there that this is to be expected and pacemaker
won't recover corosync.
Can you check what systemd thinks about corosync (status/journal).

Klaus

>
> {OPEN}
>
> -Message d'origine-
> De : Ken Gaillot 
> Envoyé : jeudi 18 avril 2024 16:40
> À : Cluster Labs - All topics related to open-source clustering welcomed <
> users@clusterlabs.org>
> Cc : NOLIBOS Christophe 
> Objet : Re: [ClusterLabs] "pacemakerd: recover properly from Corosync
> crash" fix
>
> What OS are you using? Does it use systemd?
>
> What does happen when you kill Corosync?
>
> On Thu, 2024-04-18 at 13:13 +, NOLIBOS Christophe via Users wrote:
> > Classified as: {OPEN}
> >
> > Dear All,
> >
> > I have a question about the "pacemakerd: recover properly from
> > Corosync crash" fix implemented in version 2.1.2.
> > I have observed the issue when testing pacemaker version 2.0.5, just
> > by killing the ‘corosync’ process: Corosync was not recovered.
> >
> > I am using now pacemaker version 2.1.5-8.
> > Doing the same test, I have the same result: Corosync is still not
> > recovered.
> >
> > Please confirm the "pacemakerd: recover properly from Corosync crash"
> > fix implemented in version 2.1.2 covers this scenario.
> > If it is, did I miss something in the configuration of my cluster?
> >
> > Best Regard.
> >
> > Christophe.
> >
> >
> >
> > {OPEN}
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> --
> Ken Gaillot 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] controlling cluster behavior on startup

2024-01-30 Thread Klaus Wenninger

On Tue, Jan 30, 2024 at 2:21 PM Walker, Chris 
wrote:

> >>> However, now it seems to wait that amount of time before it elects a
> >>> DC, even when quorum is acquired earlier.  In my log snippet below,
> >>> with dc-deadtime 300s,
> >>
> >> The dc-deadtime is not waiting for quorum, but for another DC to show
> >> up. If all nodes show up, it can proceed, but otherwise it has to wait.
>
> > I believe all the nodes showed up by 14:17:04, but it still waited until
> 14:19:26 to elect a DC:
>
> > Jan 29 14:14:25 gopher12 pacemaker-controld  [123697]
> (peer_update_callback)info: Cluster node gopher12 is now membe  (was in
> unknown state)
> > Jan 29 14:17:04 gopher12 pacemaker-controld  [123697]
> (peer_update_callback)info: Cluster node gopher11 is now membe  (was in
> unknown state)
> > Jan 29 14:17:04 gopher12 pacemaker-controld  [123697]
> (quorum_notification_cb)  notice: Quorum acquired | membership=54 members=2
> > Jan 29 14:19:26 gopher12 pacemaker-controld  [123697] (do_log)  info:
> Input I_ELECTION_DC received in state S_ELECTION from election_win_cb
>
> > This is a cluster with 2 nodes, gopher11 and gopher12.
>
> This is our experience with dc-deadtime too: even if both nodes in the
> cluster show up, dc-deadtime must elapse before the cluster starts.  This
> was discussed on this list a while back (
> https://www.mail-archive.com/users@clusterlabs.org/msg03897.html) and an
> RFE came out of it (https://bugs.clusterlabs.org/show_bug.cgi?id=5310).
>
>
>
> I’ve worked around this by having an ExecStartPre directive for Corosync
> that does essentially:
>
>
>
> while ! systemctl -H ${peer} is-active corosync; do sleep 5; done
>
>
>
> With this in place, the nodes wait for each other before starting Corosync
> and Pacemaker.  We can then use the default 20s dc-deadtime so that the DC
> election happens quickly once both nodes are up.
>

Actually wait-for-all coming per default with 2-node should lead to quorum
being delayed till both nodes showed up.
And if we make the cluster not ignore quorum it shouldn't start fencing
before it sees the peer - right?
Running a 2-node-cluster ignoring quorum or without wait-for-all is a
delicate thing anyway I would say
and shouldn't work in a generic case. Not saying it is an issue here -
guess there just isn't enough
info about the cluster to say.
So you shouldn't need this raised dc-deadtime and thus wouldn't experience
large startup-delays.

Regards,
Klaus


> Thanks,
>
> Chris
>
>
>
> *From: *Users  on behalf of Faaland, Olaf
> P. via Users 
> *Date: *Monday, January 29, 2024 at 7:46 PM
> *To: *Ken Gaillot , Cluster Labs - All topics
> related to open-source clustering welcomed 
> *Cc: *Faaland, Olaf P. 
> *Subject: *Re: [ClusterLabs] controlling cluster behavior on startup
>
> >> However, now it seems to wait that amount of time before it elects a
> >> DC, even when quorum is acquired earlier.  In my log snippet below,
> >> with dc-deadtime 300s,
> >
> > The dc-deadtime is not waiting for quorum, but for another DC to show
> > up. If all nodes show up, it can proceed, but otherwise it has to wait.
>
> I believe all the nodes showed up by 14:17:04, but it still waited until
> 14:19:26 to elect a DC:
>
> Jan 29 14:14:25 gopher12 pacemaker-controld  [123697]
> (peer_update_callback)info: Cluster node gopher12 is now membe  (was in
> unknown state)
> Jan 29 14:17:04 gopher12 pacemaker-controld  [123697]
> (peer_update_callback)info: Cluster node gopher11 is now membe  (was in
> unknown state)
> Jan 29 14:17:04 gopher12 pacemaker-controld  [123697]
> (quorum_notification_cb)  notice: Quorum acquired | membership=54 members=2
> Jan 29 14:19:26 gopher12 pacemaker-controld  [123697] (do_log)  info:
> Input I_ELECTION_DC received in state S_ELECTION from election_win_cb
>
> This is a cluster with 2 nodes, gopher11 and gopher12.
>
> Am I misreading that?
>
> thanks,
> Olaf
>
> 
> From: Ken Gaillot 
> Sent: Monday, January 29, 2024 3:49 PM
> To: Faaland, Olaf P.; Cluster Labs - All topics related to open-source
> clustering welcomed
> Subject: Re: [ClusterLabs] controlling cluster behavior on startup
>
> On Mon, 2024-01-29 at 22:48 +, Faaland, Olaf P. wrote:
> > Thank you, Ken.
> >
> > I changed my configuration management system to put an initial
> > cib.xml into /var/lib/pacemaker/cib/, which sets all the property
> > values I was setting via pcs commands, including dc-deadtime.  I
> > removed those "pcs property set" commands from the ones that are run
> > at startup time.
> >
> > That worked in the sense that after Pacemaker start, the node waits
> > my newly specified dc-deadtime of 300s before giving up on the
> > partner node and fencing it, if the partner never appears as a
> > member.
> >
> > However, now it seems to wait that amount of time before it elects a
> > DC, even when quorum is acquired earlier.  In my log snippet below,
> > with dc-deadtime 300s,
>
> The dc-deadtime is not waiting for

Re: [ClusterLabs] trigger something at ?

2024-01-29 Thread Klaus Wenninger

On Mon, Jan 29, 2024 at 5:22 PM Ken Gaillot  wrote:

> On Fri, 2024-01-26 at 13:55 +0100, lejeczek via Users wrote:
> > Hi guys.
> >
> > Is it possible to trigger some... action - I'm thinking specifically
> > at shutdown/start.
> > If not within the cluster then - if you do that - perhaps outside.
> > I would like to create/remove constraints, when cluster starts &
> > stops, respectively.
> >
> > many thanks, L.
> >
>
> You could use node status alerts for that, but it's risky for alert
> agents to change the configuration (since that may result in more
> alerts and potentially some sort of infinite loop).
>
> Pacemaker has no concept of a full cluster start/stop, only node
> start/stop. You could approximate that by checking whether the node
> receiving the alert is the only active node.
>
> Another possibility would be to write a resource agent that does what
> you want and order everything else after it. However it's even more
> risky for a resource agent to modify the configuration.
>
> Finally you could write a systemd unit to do what you want and order it
> after pacemaker.
>
> What's wrong with leaving the constraints permanently configured?
>

My guts feeling tells me there is something wrong with the constraints
that probably will hit you as well when recovering from a problem.
But maybe it would be easier with some kind of example.

Klaus

> --
> Ken Gaillot 
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-19 Thread Klaus Wenninger

On Tue, Dec 19, 2023 at 10:00 AM Andrei Borzenkov 
wrote:

> On Tue, Dec 19, 2023 at 10:41 AM Artem  wrote:
> ...
> > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107]
> (update_resource_action_runnable)warning: OST4_stop_0 on lustre4 is
> unrunnable (node is offline)
> > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107]
> (recurring_op_for_active)info: Start 20s-interval monitor for OST4 on
> lustre3
> > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107]
> (log_list_item)  notice: Actions: Stop   OST4( lustre4
> )  blocked
>
> This is the default for the failed stop operation. The only way
> pacemaker can resolve failure to stop a resource is to fence the node
> where this resource was active. If it is not possible (and IIRC you
> refuse to use stonith), pacemaker has no other choice as to block it.
> If you insist, you can of course sert on-fail=ignore, but this means
> unreachable node will continue to run resources. Whether it can lead
> to some corruption in your case I cannot guess.
>

Don't know if I'm reading that correctly but I understand what you had
written
above that you try to trigger the failover by stopping the VM (lustre4)
without
ordered shutdown.
With fencing disabled what we are seeing is exactly what we would expect:
The state of the resource is unknown - pacemaker tries to stop it - doesn't
work
as the node is offline - no fencing configured - so everything it can do is
wait
till there is info if the resource is up or not.
I guess the strange output below is because of fencing disabled - quite an
unusual - also not recommended - configuration and so this might not have
shown up too often in that way.

Klaus

>
> > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107]
> (pcmk__create_graph) crit: Cannot fence lustre4 because of OST4:
> blocked (OST4_stop_0)
>
> That is a rather strange phrase. The resource is blocked because the
> pacemaker could not fence the node, not the other way round.
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Pacemaker 2.1.7-rc2 now available

2023-11-24 Thread Klaus Wenninger

Hi all,

Source code for the 2nd release candidate for Pacemaker version 2.1.7
is available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.7-rc2

This is primarily a bug fix release. See the ChangeLog or the link
above for details.

Everyone is encouraged to download, build, and test the new release. We
do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

Many thanks to all contributors of source code to this release,
including Chris Lumens
Gao Yan, Grace Chin, Hideo Yamauchi, Jan Pokorný, Ken Gaillot,
liupei, Oyvind Albrigtsen, Reid Wahl, xin liang, xuezhixin.

Klaus
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] PCS ACL for the "pcs cluster stop" command

2023-10-16 Thread Klaus Wenninger

On Fri, Oct 13, 2023 at 9:21 PM Reid Wahl  wrote:

> On Fri, Oct 13, 2023 at 12:19 PM Reid Wahl  wrote:
> >
> > On Fri, Oct 13, 2023 at 9:56 AM Roberto Rodrigos 
> wrote:
> > >
> > > good day!
> > > I use the configuration to create an ACL, it is shown below. How can I
> restrict access to the "pcs cluster stop" command for a user?
> >
> > I don't think you can. ACLs are implemented in Pacemaker; pcs simply
> > provides an interface to manage them.
> >
> > `pcs cluster stop` basically runs `systemctl stop pacemaker; systemctl
> > stop corosync`. So it doesn't interact with the Pacemaker ACLs. It
> > just stops the service.
>
> In my experience only the root user can run `pcs cluster stop`
> successfully anyway
>

Haven't actually tried it but in a setup running pcsd stop commands would
run in the context of pcsd and so it might still be possible to trigger
commands
by a non root user which wouldn't work being called directly.

Klaus

>
> >
> > > useradd rouser -m -G haclient
> > > useradd rwuser -m -G haclient
> > > passwd rwuser
> > > passwd rouser
> > > pcs acl enable
> > > pcs acl role create read-only description="Read access to cluster"
> read xpath /cib
> > > pcs acl role create write-access description="Full access" write xpath
> /cib
> > > pcs acl permission add write_config write xpath /cib/configuration
> > > pcs acl permission add write_config write xpath
> //crm_config//nvpair[@name='maintenance-mode']
> > > pcs acl permission add write_config write xpath
> //nvpair[@name='maintenance']
> > > pcs acl permission add write_config write xpath //resources
> > > pcs acl permission add write_config write xpath //constraints
> > > pcs acl user create rouser read-only
> > > pcs acl user create rwuser write-access
> > > pcs acl role assign read-only to rouser
> > > pcs acl role assign write_config to rwuser
> > >
> > > User: rouser
> > >   Roles: read-only
> > > User: rwuser
> > >   Roles: write-access write_config
> > > Role: read-only
> > >   Description: Read access to cluster
> > >   Permission: read xpath /cib (read-only-read)
> > > Role: write-access
> > >   Description: Full access
> > >   Permission: write xpath /cib (write-access-write)
> > > Role: write_config
> > >   Permission: write xpath /cib/configuration (write_config-write)
> > >   Permission: write xpath //crm_config//nvpair[@name=maintenance-mode]
> (write_config-write-1)
> > >   Permission: write xpath //nvpair[@name=maintenance]
> (write_config-write-2)
> > >   Permission: write xpath //resources (write_config-write-3)
> > >   Permission: write xpath //constraints (write_config-write-4)
> > >
> > > su rouser
> > > Username: rouser
> > > Password:
> > > localhost: Authorized
> > > pcs cluster stop
> > > Stopping Cluster (pacemaker)...
> > > Stopping Cluster (corosync)...
> > >
> > > ___
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > >
> > > ClusterLabs home: https://www.clusterlabs.org/
> >
> >
> >
> > --
> > Regards,
> >
> > Reid Wahl (He/Him)
> > Senior Software Engineer, Red Hat
> > RHEL High Availability - Pacemaker
>
>
>
> --
> Regards,
>
> Reid Wahl (He/Him)
> Senior Software Engineer, Red Hat
> RHEL High Availability - Pacemaker
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Syncronous primary doesn't switch to async mode on replica power off

2023-10-06 Thread Klaus Wenninger

On Fri, Oct 6, 2023 at 8:46 AM Sergey Cherukhin 
wrote:

> Hello!
>
> I used Microsoft Outlook to send this message and it was sent in the wrong
> format. I'm sorry. I won't do it again.
>
> I use Postgresql+Pacemaker+Corosync cluster with 2 Postgresql instances in
> synchronous replication mode. Parameter “rep_mode” is set to "sync", and
> when I shut down the replica normal way, the primary node  switches to the
> async mode. But when I  shut down the replica by powering it off to emulate
> power unit failure, primary remains in sync mode and clients hang on INSERT
> operations  until "pcs resource cleanup" is performed.  I created an alert
> agent to run "pcs resource cleanup" when any node is lost, but this
> approach doesn’t work.
>
> What should I do to be sure the primary node will switch to async mode if
> the replica becomes lost for any cause?
>

One idea might be running (a) small daemon(s) colocated with the Postgresql
instance(s) that uses pacemaker-tooling to check
for the state of the partner-node and if it isn't there switches to async
mode. You can solve this as a small custom Resource-Agent.
Actually it wouldn't even be necessary to have a persistently running
process - could be done in the monitoring as well.
Of course you could enhance monitoring of Postgresql Resource-Agent as that
it supports this switching.
As this would be quite a generic change it would probably be interesting
for the community as well.

On the other hand I would have considered this issue so generic that it is
hard to believe that there is no ready made / tested
solution around already.

To get it more reactive (without setting the monitoring-interval to
incredibly low values) using an alert-agent (as you already tried)
but maybe directly switching to async-mode might be worthwhile trying.
Did you investigate what did actually go wrong when you made experiments
with the alert-agent? Interesting that the
resource cleanup that obviously works from the cmdline doesn't do the trick
when run as alert-agent - maybe an selinux issue ...

Regards,
Klaus

>
>
> Best regards,
> Sergey Cherukhin
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Users Digest, Vol 104, Issue 5

2023-09-05 Thread Klaus Wenninger via Users

Down below you replied to 2 threads. I think the latter is the one you
intended to ... very confusing ...
Sry for adding more spam - was hesitant - but I think there is a chance it
removes some confusion ...

Klaus

On Mon, Sep 4, 2023 at 10:29 PM Adil Bouazzaoui  wrote:

> Hi Jan,
>
> to add more information, we deployed Centreon 2 Node HA Cluster (Master in
> DC 1 & Slave in DC 2), quorum device which is responsible for split-brain
> is on DC 1 too, and the poller which is responsible for monitoring is i DC
> 1 too. The problem is that a VIP address is required (attached to Master
> node, in case of failover it will be moved to Slave) and we don't know what
> VIP we should use? also we don't know what is the perfect setup for our
> current scenario so if DC 1 goes down then the Slave on DC 2 will be the
> Master, that's why we don't know where to place the Quorum device and the
> poller?
>
> i hope to get some ideas so we can setup this cluster correctly.
> thanks in advance.
>
> Adil Bouazzaoui
> IT Infrastructure engineer
> adil.bouazza...@tmandis.ma
> adilb...@gmail.com
>
> Le lun. 4 sept. 2023 à 15:24,  a écrit :
>
>> Send Users mailing list submissions to
>> users@clusterlabs.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> or, via email, send a message with subject or body 'help' to
>> users-requ...@clusterlabs.org
>>
>> You can reach the person managing the list at
>> users-ow...@clusterlabs.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Users digest..."
>>
>>
>> Today's Topics:
>>
>>1. Re: issue during Pacemaker failover testing (Klaus Wenninger)
>>2. Re: issue during Pacemaker failover testing (Klaus Wenninger)
>>3. Re: issue during Pacemaker failover testing (David Dolan)
>>    4. Re: Centreon HA Cluster - VIP issue (Jan Friesse)
>>
>>
>> --
>>
>> Message: 1
>> Date: Mon, 4 Sep 2023 14:15:52 +0200
>> From: Klaus Wenninger 
>> To: Cluster Labs - All topics related to open-source clustering
>> welcomed 
>> Cc: David Dolan 
>> Subject: Re: [ClusterLabs] issue during Pacemaker failover testing
>> Message-ID:
>> > wody...@mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> On Mon, Sep 4, 2023 at 1:44?PM Andrei Borzenkov 
>> wrote:
>>
>> > On Mon, Sep 4, 2023 at 2:25?PM Klaus Wenninger 
>> > wrote:
>> > >
>> > >
>> > > Or go for qdevice with LMS where I would expect it to be able to
>> really
>> > go down to
>> > > a single node left - any of the 2 last ones - as there is still
>> qdevice.#
>> > > Sry for the confusion btw.
>> > >
>> >
>> > According to documentation, "LMS is also incompatible with quorum
>> > devices, if last_man_standing is specified in corosync.conf then the
>> > quorum device will be disabled".
>> >
>>
>> That is why I said qdevice with LMS - but it was probably not explicit
>> enough without telling that I meant the qdevice algorithm and not
>> the corosync flag.
>>
>> Klaus
>>
>> > ___
>> > Manage your subscription:
>> > https://lists.clusterlabs.org/mailman/listinfo/users
>> >
>> > ClusterLabs home: https://www.clusterlabs.org/
>> >
>> -- next part --
>> An HTML attachment was scrubbed...
>> URL: <
>> https://lists.clusterlabs.org/pipermail/users/attachments/20230904/23e22260/attachment-0001.htm
>> >
>>
>> --
>>
>> Message: 2
>> Date: Mon, 4 Sep 2023 14:32:39 +0200
>> From: Klaus Wenninger 
>> To: Cluster Labs - All topics related to open-source clustering
>> welcomed 
>> Cc: David Dolan 
>> Subject: Re: [ClusterLabs] issue during Pacemaker failover testing
>> Message-ID:
>> <
>> calrdao0v8bxp4ajwcobkeae6pimvgg2xme6ia+ohxshesx9...@mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> On Mon, Sep 4, 2023 at 1:50?PM Andrei Borzenkov 
>> wrote:
>>
>> > On Mon, Sep 4, 2023 at 2:18?PM Klaus Wenninger 
>> > wrote:
>> > >
>> > >
>> > >
>> > > On Mon, Sep 4, 2023 at 12:45?PM David Do

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Klaus Wenninger

On Mon, Sep 4, 2023 at 1:50 PM Andrei Borzenkov  wrote:

> On Mon, Sep 4, 2023 at 2:18 PM Klaus Wenninger 
> wrote:
> >
> >
> >
> > On Mon, Sep 4, 2023 at 12:45 PM David Dolan 
> wrote:
> >>
> >> Hi Klaus,
> >>
> >> With default quorum options I've performed the following on my 3 node
> cluster
> >>
> >> Bring down cluster services on one node - the running services migrate
> to another node
> >> Wait 3 minutes
> >> Bring down cluster services on one of the two remaining nodes - the
> surviving node in the cluster is then fenced
> >>
> >> Instead of the surviving node being fenced, I hoped that the services
> would migrate and run on that remaining node.
> >>
> >> Just looking for confirmation that my understanding is ok and if I'm
> missing something?
> >
> >
> > As said I've never used it ...
> > Well when down to 2 nodes LMS per definition is getting into trouble as
> after another
> > outage any of them is gonna be alone. In case of an ordered shutdown
> this could
> > possibly be circumvented though. So I guess your fist attempt to enable
> auto-tie-breaker
> > was the right idea. Like this you will have further service at least on
> one of the nodes.
> > So I guess what you were seeing is the right - and unfortunately only
> possible - behavior.
>
> I still do not see where fencing comes from. Pacemaker requests
> fencing of the missing nodes. It also may request self-fencing, but
> not in the default settings. It is rather hard to tell what happens
> without logs from the last remaining node.
>
> That said, the default action is to stop all resources, so the end
> result is not very different :)
>

But you are of course right. The expected behaviour would be that
the leftover node stops the resources.
But maybe we're missing something here. Hard to tell without
the exact configuration including fencing.
Again, as already said, I don't know anything about the LMS
implementation with corosync. In theory there were both arguments
to either suicide (but that would have to be done by pacemaker) or
to automatically switch to some 2-node-mode once the remaining
partition is reduced to just 2 followed by a fence-race (when done
without the precautions otherwise used for 2-node-clusters).
But I guess in this case it is none of those 2.

Klaus

> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Klaus Wenninger

On Mon, Sep 4, 2023 at 1:44 PM Andrei Borzenkov  wrote:

> On Mon, Sep 4, 2023 at 2:25 PM Klaus Wenninger 
> wrote:
> >
> >
> > Or go for qdevice with LMS where I would expect it to be able to really
> go down to
> > a single node left - any of the 2 last ones - as there is still qdevice.#
> > Sry for the confusion btw.
> >
>
> According to documentation, "LMS is also incompatible with quorum
> devices, if last_man_standing is specified in corosync.conf then the
> quorum device will be disabled".
>

That is why I said qdevice with LMS - but it was probably not explicit
enough without telling that I meant the qdevice algorithm and not
the corosync flag.

Klaus

> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Klaus Wenninger

On Mon, Sep 4, 2023 at 1:18 PM Klaus Wenninger  wrote:

>
>
> On Mon, Sep 4, 2023 at 12:45 PM David Dolan  wrote:
>
>> Hi Klaus,
>>
>> With default quorum options I've performed the following on my 3 node
>> cluster
>>
>> Bring down cluster services on one node - the running services migrate to
>> another node
>> Wait 3 minutes
>> Bring down cluster services on one of the two remaining nodes - the
>> surviving node in the cluster is then fenced
>>
>> Instead of the surviving node being fenced, I hoped that the services
>> would migrate and run on that remaining node.
>>
>> Just looking for confirmation that my understanding is ok and if I'm
>> missing something?
>>
>
> As said I've never used it ...
> Well when down to 2 nodes LMS per definition is getting into trouble as
> after another
> outage any of them is gonna be alone. In case of an ordered shutdown this
> could
> possibly be circumvented though. So I guess your fist attempt to enable
> auto-tie-breaker
> was the right idea. Like this you will have further service at least on
> one of the nodes.
> So I guess what you were seeing is the right - and unfortunately only
> possible - behavior.
> Where LMS shines is probably scenarios with substantially more nodes.
>

Or go for qdevice with LMS where I would expect it to be able to really go
down to
a single node left - any of the 2 last ones - as there is still qdevice.#
Sry for the confusion btw.

Klaus

>
> Klaus
>
>>
>> Thanks
>> David
>>
>>
>>
>> On Thu, 31 Aug 2023 at 11:59, David Dolan  wrote:
>>
>>> I just tried removing all the quorum options setting back to defaults so
>>> no last_man_standing or wait_for_all.
>>> I still see the same behaviour where the third node is fenced if I bring
>>> down services on two nodes.
>>> Thanks
>>> David
>>>
>>> On Thu, 31 Aug 2023 at 11:44, Klaus Wenninger 
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Aug 31, 2023 at 12:28 PM David Dolan 
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, 30 Aug 2023 at 17:35, David Dolan 
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> > Hi All,
>>>>>>> >
>>>>>>> > I'm running Pacemaker on Centos7
>>>>>>> > Name: pcs
>>>>>>> > Version : 0.9.169
>>>>>>> > Release : 3.el7.centos.3
>>>>>>> > Architecture: x86_64
>>>>>>> >
>>>>>>> >
>>>>>>> Besides the pcs-version versions of the other
>>>>>>> cluster-stack-components
>>>>>>> could be interesting. (pacemaker, corosync)
>>>>>>>
>>>>>>  rpm -qa | egrep "pacemaker|pcs|corosync|fence-agents"
>>>>>> fence-agents-vmware-rest-4.2.1-41.el7_9.6.x86_64
>>>>>> corosynclib-2.4.5-7.el7_9.2.x86_64
>>>>>> pacemaker-cluster-libs-1.1.23-1.el7_9.1.x86_64
>>>>>> fence-agents-common-4.2.1-41.el7_9.6.x86_64
>>>>>> corosync-2.4.5-7.el7_9.2.x86_64
>>>>>> pacemaker-cli-1.1.23-1.el7_9.1.x86_64
>>>>>> pacemaker-1.1.23-1.el7_9.1.x86_64
>>>>>> pcs-0.9.169-3.el7.centos.3.x86_64
>>>>>> pacemaker-libs-1.1.23-1.el7_9.1.x86_64
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> > I'm performing some cluster failover tests in a 3 node cluster. We
>>>>>>> have 3
>>>>>>> > resources in the cluster.
>>>>>>> > I was trying to see if I could get it working if 2 nodes fail at
>>>>>>> different
>>>>>>> > times. I'd like the 3 resources to then run on one node.
>>>>>>> >
>>>>>>> > The quorum options I've configured are as follows
>>>>>>> > [root@node1 ~]# pcs quorum config
>>>>>>> > Options:
>>>>>>> >   auto_tie_breaker: 1
>>>>>>> >   last_man_standing: 1
>>>>>>> >   last_man_standing_window: 1
>>>>>>> >   wait_for_all: 1
>>>>>>> >
>>>>>>> >
>>>>>>> Not sure if the combination of auto_tie_breaker and
>>>>>>> last_man_standing m

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Klaus Wenninger

On Mon, Sep 4, 2023 at 12:45 PM David Dolan  wrote:

> Hi Klaus,
>
> With default quorum options I've performed the following on my 3 node
> cluster
>
> Bring down cluster services on one node - the running services migrate to
> another node
> Wait 3 minutes
> Bring down cluster services on one of the two remaining nodes - the
> surviving node in the cluster is then fenced
>
> Instead of the surviving node being fenced, I hoped that the services
> would migrate and run on that remaining node.
>
> Just looking for confirmation that my understanding is ok and if I'm
> missing something?
>

As said I've never used it ...
Well when down to 2 nodes LMS per definition is getting into trouble as
after another
outage any of them is gonna be alone. In case of an ordered shutdown this
could
possibly be circumvented though. So I guess your fist attempt to enable
auto-tie-breaker
was the right idea. Like this you will have further service at least on one
of the nodes.
So I guess what you were seeing is the right - and unfortunately only
possible - behavior.
Where LMS shines is probably scenarios with substantially more nodes.

Klaus

>
> Thanks
> David
>
>
>
> On Thu, 31 Aug 2023 at 11:59, David Dolan  wrote:
>
>> I just tried removing all the quorum options setting back to defaults so
>> no last_man_standing or wait_for_all.
>> I still see the same behaviour where the third node is fenced if I bring
>> down services on two nodes.
>> Thanks
>> David
>>
>> On Thu, 31 Aug 2023 at 11:44, Klaus Wenninger 
>> wrote:
>>
>>>
>>>
>>> On Thu, Aug 31, 2023 at 12:28 PM David Dolan 
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, 30 Aug 2023 at 17:35, David Dolan 
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> > Hi All,
>>>>>> >
>>>>>> > I'm running Pacemaker on Centos7
>>>>>> > Name: pcs
>>>>>> > Version : 0.9.169
>>>>>> > Release : 3.el7.centos.3
>>>>>> > Architecture: x86_64
>>>>>> >
>>>>>> >
>>>>>> Besides the pcs-version versions of the other cluster-stack-components
>>>>>> could be interesting. (pacemaker, corosync)
>>>>>>
>>>>>  rpm -qa | egrep "pacemaker|pcs|corosync|fence-agents"
>>>>> fence-agents-vmware-rest-4.2.1-41.el7_9.6.x86_64
>>>>> corosynclib-2.4.5-7.el7_9.2.x86_64
>>>>> pacemaker-cluster-libs-1.1.23-1.el7_9.1.x86_64
>>>>> fence-agents-common-4.2.1-41.el7_9.6.x86_64
>>>>> corosync-2.4.5-7.el7_9.2.x86_64
>>>>> pacemaker-cli-1.1.23-1.el7_9.1.x86_64
>>>>> pacemaker-1.1.23-1.el7_9.1.x86_64
>>>>> pcs-0.9.169-3.el7.centos.3.x86_64
>>>>> pacemaker-libs-1.1.23-1.el7_9.1.x86_64
>>>>>
>>>>>>
>>>>>>
>>>>>> > I'm performing some cluster failover tests in a 3 node cluster. We
>>>>>> have 3
>>>>>> > resources in the cluster.
>>>>>> > I was trying to see if I could get it working if 2 nodes fail at
>>>>>> different
>>>>>> > times. I'd like the 3 resources to then run on one node.
>>>>>> >
>>>>>> > The quorum options I've configured are as follows
>>>>>> > [root@node1 ~]# pcs quorum config
>>>>>> > Options:
>>>>>> >   auto_tie_breaker: 1
>>>>>> >   last_man_standing: 1
>>>>>> >   last_man_standing_window: 1
>>>>>> >   wait_for_all: 1
>>>>>> >
>>>>>> >
>>>>>> Not sure if the combination of auto_tie_breaker and last_man_standing
>>>>>> makes
>>>>>> sense.
>>>>>> And as you have a cluster with an odd number of nodes auto_tie_breaker
>>>>>> should be
>>>>>> disabled anyway I guess.
>>>>>>
>>>>> Ah ok I'll try removing auto_tie_breaker and leave last_man_standing
>>>>>
>>>>>>
>>>>>>
>>>>>> > [root@node1 ~]# pcs quorum status
>>>>>> > Quorum information
>>>>>> > --
>>>>>> > Date: Wed Aug 30 11:20:04 2023
>>>>>> > Quorum provider:  corosync_votequorum
>>>>>> > Nod

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-08-31 Thread Klaus Wenninger

On Thu, Aug 31, 2023 at 12:28 PM David Dolan  wrote:

>
>
> On Wed, 30 Aug 2023 at 17:35, David Dolan  wrote:
>
>>
>>
>> > Hi All,
>>> >
>>> > I'm running Pacemaker on Centos7
>>> > Name: pcs
>>> > Version : 0.9.169
>>> > Release : 3.el7.centos.3
>>> > Architecture: x86_64
>>> >
>>> >
>>> Besides the pcs-version versions of the other cluster-stack-components
>>> could be interesting. (pacemaker, corosync)
>>>
>>  rpm -qa | egrep "pacemaker|pcs|corosync|fence-agents"
>> fence-agents-vmware-rest-4.2.1-41.el7_9.6.x86_64
>> corosynclib-2.4.5-7.el7_9.2.x86_64
>> pacemaker-cluster-libs-1.1.23-1.el7_9.1.x86_64
>> fence-agents-common-4.2.1-41.el7_9.6.x86_64
>> corosync-2.4.5-7.el7_9.2.x86_64
>> pacemaker-cli-1.1.23-1.el7_9.1.x86_64
>> pacemaker-1.1.23-1.el7_9.1.x86_64
>> pcs-0.9.169-3.el7.centos.3.x86_64
>> pacemaker-libs-1.1.23-1.el7_9.1.x86_64
>>
>>>
>>>
>>> > I'm performing some cluster failover tests in a 3 node cluster. We
>>> have 3
>>> > resources in the cluster.
>>> > I was trying to see if I could get it working if 2 nodes fail at
>>> different
>>> > times. I'd like the 3 resources to then run on one node.
>>> >
>>> > The quorum options I've configured are as follows
>>> > [root@node1 ~]# pcs quorum config
>>> > Options:
>>> >   auto_tie_breaker: 1
>>> >   last_man_standing: 1
>>> >   last_man_standing_window: 1
>>> >   wait_for_all: 1
>>> >
>>> >
>>> Not sure if the combination of auto_tie_breaker and last_man_standing
>>> makes
>>> sense.
>>> And as you have a cluster with an odd number of nodes auto_tie_breaker
>>> should be
>>> disabled anyway I guess.
>>>
>> Ah ok I'll try removing auto_tie_breaker and leave last_man_standing
>>
>>>
>>>
>>> > [root@node1 ~]# pcs quorum status
>>> > Quorum information
>>> > --
>>> > Date: Wed Aug 30 11:20:04 2023
>>> > Quorum provider:  corosync_votequorum
>>> > Nodes:3
>>> > Node ID:  1
>>> > Ring ID:  1/1538
>>> > Quorate:  Yes
>>> >
>>> > Votequorum information
>>> > --
>>> > Expected votes:   3
>>> > Highest expected: 3
>>> > Total votes:  3
>>> > Quorum:   2
>>> > Flags:Quorate WaitForAll LastManStanding AutoTieBreaker
>>> >
>>> > Membership information
>>> > --
>>> > Nodeid  VotesQdevice Name
>>> >  1  1 NR node1 (local)
>>> >  2  1 NR node2
>>> >  3  1 NR node3
>>> >
>>> > If I stop the cluster services on node 2 and 3, the groups all
>>> failover to
>>> > node 1 since it is the node with the lowest ID
>>> > But if I stop them on node1 and node 2 or node1 and node3, the cluster
>>> > fails.
>>> >
>>> > I tried adding this line to corosync.conf and I could then bring down
>>> the
>>> > services on node 1 and 2 or node 2 and 3 but if I left node 2 until
>>> last,
>>> > the cluster failed
>>> > auto_tie_breaker_node: 1  3
>>> >
>>> > This line had the same outcome as using 1 3
>>> > auto_tie_breaker_node: 1  2 3
>>> >
>>> >
>>> Giving multiple auto_tie_breaker-nodes doesn't make sense to me but
>>> rather
>>> sounds dangerous if that configuration is possible at all.
>>>
>>> Maybe the misbehavior of last_man_standing is due to this (maybe not
>>> recognized) misconfiguration.
>>> Did you wait long enough between letting the 2 nodes fail?
>>>
>> I've done it so many times so I believe so. But I'll try remove the
>> auto_tie_breaker config, leaving the last_man_standing. I'll also make sure
>> I leave a couple of minutes between bringing down the nodes and post back.
>>
> Just confirming I removed the auto_tie_breaker config and tested. Quorum
> configuration is as follows:
>  Options:
>   last_man_standing: 1
>   last_man_standing_window: 1
>   wait_for_all: 1
>
> I waited 2-3 minutes between stopping cluster services on two nodes via
> pcs cluster stop
> The remaining cluster node is then fenced. I was hoping the remaining node
> would stay online running the resources.
>

Yep - that would've been my understanding as well.
But honestly I've never used last_man_standing in this context - wasn't
even aware that it was
offered without qdevice nor have I checked how it is implemented.

Klaus

>
>
>>> Klaus
>>>
>>>
>>> > So I'd like it to failover when any combination of two nodes fail but
>>> I've
>>> > only had success when the middle node isn't last.
>>> >
>>> > Thanks
>>> > David
>>>
>>>
>>>
>>>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-08-30 Thread Klaus Wenninger

On Wed, Aug 30, 2023 at 2:34 PM David Dolan  wrote:

> Hi All,
>
> I'm running Pacemaker on Centos7
> Name: pcs
> Version : 0.9.169
> Release : 3.el7.centos.3
> Architecture: x86_64
>
>
Besides the pcs-version versions of the other cluster-stack-components
could be interesting. (pacemaker, corosync)


> I'm performing some cluster failover tests in a 3 node cluster. We have 3
> resources in the cluster.
> I was trying to see if I could get it working if 2 nodes fail at different
> times. I'd like the 3 resources to then run on one node.
>
> The quorum options I've configured are as follows
> [root@node1 ~]# pcs quorum config
> Options:
>   auto_tie_breaker: 1
>   last_man_standing: 1
>   last_man_standing_window: 1
>   wait_for_all: 1
>
>
Not sure if the combination of auto_tie_breaker and last_man_standing makes
sense.
And as you have a cluster with an odd number of nodes auto_tie_breaker
should be
disabled anyway I guess.


> [root@node1 ~]# pcs quorum status
> Quorum information
> --
> Date: Wed Aug 30 11:20:04 2023
> Quorum provider:  corosync_votequorum
> Nodes:3
> Node ID:  1
> Ring ID:  1/1538
> Quorate:  Yes
>
> Votequorum information
> --
> Expected votes:   3
> Highest expected: 3
> Total votes:  3
> Quorum:   2
> Flags:Quorate WaitForAll LastManStanding AutoTieBreaker
>
> Membership information
> --
> Nodeid  VotesQdevice Name
>  1  1 NR node1 (local)
>  2  1 NR node2
>  3  1 NR node3
>
> If I stop the cluster services on node 2 and 3, the groups all failover to
> node 1 since it is the node with the lowest ID
> But if I stop them on node1 and node 2 or node1 and node3, the cluster
> fails.
>
> I tried adding this line to corosync.conf and I could then bring down the
> services on node 1 and 2 or node 2 and 3 but if I left node 2 until last,
> the cluster failed
> auto_tie_breaker_node: 1  3
>
> This line had the same outcome as using 1 3
> auto_tie_breaker_node: 1  2 3
>
>
Giving multiple auto_tie_breaker-nodes doesn't make sense to me but rather
sounds dangerous if that configuration is possible at all.

Maybe the misbehavior of last_man_standing is due to this (maybe not
recognized) misconfiguration.
Did you wait long enough between letting the 2 nodes fail?

Klaus


> So I'd like it to failover when any combination of two nodes fail but I've
> only had success when the middle node isn't last.
>
> Thanks
> David
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Redis Resource error

2023-08-23 Thread Klaus Wenninger

On Tue, Aug 22, 2023 at 11:31 PM Social Boh  wrote:

> Hello List,
>
> I know is not really a Pacemaker/Corosync question relate but I don't
> know how solve this error:
>
> redis_start_0 on kam1.kamailio.xyz 'error' (1): call=148, status='Timed
> Out', exitreason='Resource agent did not complete within 2m',
> last-rc-change='Tue Aug 22 16:13:46 2023', queued=0ms, exec=120002ms
>
> If you need more info, please ping me.
>

Well that is really not much info.
I don't know much about redis but I've heard that in-memory-databases
might take some time to start as they are getting their content from some
kind of storage to memory. Depending on speed and size it might really
not manage to startup within 2min - and if you are Ok with that you might
increase startup timeout.
Other possibility is that it is actually hanging and would never complete.
So you might check logs for hints, simply try to give it more time and
check later if it is Ok to take so long, ...
I don't know if there are any peculiarities regarding the 'official' redis
Resource Agent regarding how it finds out that the service is running
and if something could fail with that mechanism - maybe it is possible
to manually check the service while pacemaker thinks it isn't started.

It is usually helpful to provide a bit of info about the
cluster-configuration,
platform the cluster is running on and which versions of the cluster
components are in use.

Regards,
Klaus

>
> Regards
>
> --
> ---
> I'm SoCIaL, MayBe
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] no-quorum-policy=ignore is (Deprecated ) and replaced with other options but not an effective solution

2023-06-28 Thread Klaus Wenninger

On Wed, Jun 28, 2023 at 7:38 AM Klaus Wenninger  wrote:

>
>
> On Wed, Jun 28, 2023 at 3:30 AM Priyanka Balotra <
> priyanka.14balo...@gmail.com> wrote:
>
>> I am using SLES 15 SP4. Is the no-quorum-policy still supported?
>>
>>
> Thanks
>> Priyanka
>>
>> On Wed, 28 Jun 2023 at 12:46 AM, Ken Gaillot  wrote:
>>
>>> On Tue, 2023-06-27 at 22:38 +0530, Priyanka Balotra wrote:
>>> > In this case stonith has been configured as a resource,
>>> > primitive stonith-sbd stonith:external/sbd
>>>
>>
> Then the error scenario you described looks like everybody lost connection
> to the shared-storage. The nodes rebooting then probably rather suicided
> instead of reading the poison-pill. And the quorate partition is staying
> alive because
> it is quorate but not seeing the shared-storage it can't verify that it
> had been
> able to write the poison-pill which makes the other nodes stay unclean.
> But again just guessing ...
>

That said and without knowing details about your scenario and the
failure-scenarios you want to cover you might consider watchdog-fencing.
afaik Suse does support that as well for a while now.
It gives you service-recovery from nodes that are cut off via network
including their physical fencing-devices. I know that poison-pill-fencing
should do that as well as long as the quorate part of the cluster is able
to access the shared-disk but in your scenario this doesn't seem to be
the case.
Just out of curiosity: Are you using poison-pill with multiple shared disks?
Asking as in that case the poison-pill may still be passed via a single disk
and the target would reboot but the other side that initiated fencing might
not recover resources as it might not have been able to write the
poison-pill
to a quorate number of disks.

Klaus

>
>
>> >
>>> > For it to be functional properly , the resource needs to be up, which
>>> > is only possible if the system is quorate.
>>>
>>> Pacemaker can use a fence device even if its resource is not active.
>>> The resource being active just allows Pacemaker to monitor the device
>>> regularly.
>>>
>>> >
>>> > Hence our requirement is to make the system quorate even if one Node
>>> > of the cluster is up.
>>> > Stonith will then take care of any split-brain scenarios.
>>>
>>> In that case it sounds like no-quorum-policy=ignore is actually what
>>> you want.
>>>
>>
> Still dangerous without something like wait-for-all - right?
> With LMS I guess you should have the same effect without having explicitly
> specified though.
>
> Klaus
>
>
>>
>>> >
>>> > Thanks
>>> > Priyanka
>>> >
>>> > On Tue, Jun 27, 2023 at 9:06 PM Klaus Wenninger 
>>> > wrote:
>>> > >
>>> > > On Tue, Jun 27, 2023 at 5:24 PM Andrei Borzenkov <
>>> > > arvidj...@gmail.com> wrote:
>>> > > > On 27.06.2023 07:21, Priyanka Balotra wrote:
>>> > > > > Hi Andrei,
>>> > > > > After this state the system went through some more fencings and
>>> > > > we saw the
>>> > > > > following state:
>>> > > > >
>>> > > > > :~ # crm status
>>> > > > > Cluster Summary:
>>> > > > >* Stack: corosync
>>> > > > >* Current DC: FILE-2 (version
>>> > > > > 2.1.2+20211124.ada5c3b36-150400.2.43-2.1.2+20211124.ada5c3b36)
>>> > > > - partition
>>> > > > > with quorum
>>> > > >
>>> > > > It says "partition with quorum" so what exactly is the problem?
>>> > >
>>> > > I guess the problem is that resources aren't being recovered on
>>> > > the nodes in the quorate partition.
>>> > > Reason for that is probably that - as Ken was already suggesting -
>>> > > fencing isn't
>>> > > working properly or fencing-devices used are simply inappropriate
>>> > > for the
>>> > > purpose (e.g. onboard IPMI).
>>> > > The fact that a node is rebooting isn't enough. The node that
>>> > > initiated fencing
>>> > > has to know that it did actually work. But we're just guessing
>>> > > here. Logs should
>>> > > show what is actually going on.
>>> > >
>>> > > Klaus
>>> > > > >* Last updated: Mon Jun 26 12:44:15 2023
>>> > > > &

Re: [ClusterLabs] no-quorum-policy=ignore is (Deprecated ) and replaced with other options but not an effective solution

2023-06-27 Thread Klaus Wenninger

On Wed, Jun 28, 2023 at 3:30 AM Priyanka Balotra <
priyanka.14balo...@gmail.com> wrote:

> I am using SLES 15 SP4. Is the no-quorum-policy still supported?
>
> Thanks
> Priyanka
>
> On Wed, 28 Jun 2023 at 12:46 AM, Ken Gaillot  wrote:
>
>> On Tue, 2023-06-27 at 22:38 +0530, Priyanka Balotra wrote:
>> > In this case stonith has been configured as a resource,
>> > primitive stonith-sbd stonith:external/sbd
>>
>
Then the error scenario you described looks like everybody lost connection
to the shared-storage. The nodes rebooting then probably rather suicided
instead of reading the poison-pill. And the quorate partition is staying
alive because
it is quorate but not seeing the shared-storage it can't verify that it had
been
able to write the poison-pill which makes the other nodes stay unclean.
But again just guessing ...


> >
>> > For it to be functional properly , the resource needs to be up, which
>> > is only possible if the system is quorate.
>>
>> Pacemaker can use a fence device even if its resource is not active.
>> The resource being active just allows Pacemaker to monitor the device
>> regularly.
>>
>> >
>> > Hence our requirement is to make the system quorate even if one Node
>> > of the cluster is up.
>> > Stonith will then take care of any split-brain scenarios.
>>
>> In that case it sounds like no-quorum-policy=ignore is actually what
>> you want.
>>
>
Still dangerous without something like wait-for-all - right?
With LMS I guess you should have the same effect without having explicitly
specified though.

Klaus


>
>> >
>> > Thanks
>> > Priyanka
>> >
>> > On Tue, Jun 27, 2023 at 9:06 PM Klaus Wenninger 
>> > wrote:
>> > >
>> > > On Tue, Jun 27, 2023 at 5:24 PM Andrei Borzenkov <
>> > > arvidj...@gmail.com> wrote:
>> > > > On 27.06.2023 07:21, Priyanka Balotra wrote:
>> > > > > Hi Andrei,
>> > > > > After this state the system went through some more fencings and
>> > > > we saw the
>> > > > > following state:
>> > > > >
>> > > > > :~ # crm status
>> > > > > Cluster Summary:
>> > > > >* Stack: corosync
>> > > > >* Current DC: FILE-2 (version
>> > > > > 2.1.2+20211124.ada5c3b36-150400.2.43-2.1.2+20211124.ada5c3b36)
>> > > > - partition
>> > > > > with quorum
>> > > >
>> > > > It says "partition with quorum" so what exactly is the problem?
>> > >
>> > > I guess the problem is that resources aren't being recovered on
>> > > the nodes in the quorate partition.
>> > > Reason for that is probably that - as Ken was already suggesting -
>> > > fencing isn't
>> > > working properly or fencing-devices used are simply inappropriate
>> > > for the
>> > > purpose (e.g. onboard IPMI).
>> > > The fact that a node is rebooting isn't enough. The node that
>> > > initiated fencing
>> > > has to know that it did actually work. But we're just guessing
>> > > here. Logs should
>> > > show what is actually going on.
>> > >
>> > > Klaus
>> > > > >* Last updated: Mon Jun 26 12:44:15 2023
>> > > > >* Last change:  Mon Jun 26 12:41:12 2023 by root via
>> > > > cibadmin on FILE-2
>> > > > >* 4 nodes configured
>> > > > >* 11 resource instances configured
>> > > > >
>> > > > > Node List:
>> > > > >* Node FILE-1: UNCLEAN (offline)
>> > > > >* Node FILE-4: UNCLEAN (offline)
>> > > > >* Online: [ FILE-2 ]
>> > > > >* Online: [ FILE-3 ]
>> > > > >
>> > > > > At this stage FILE-1 and FILE-4 were continuously getting
>> > > > fenced (we have
>> > > > > device based stonith configured but the resource was not up ) .
>> > > > > Two nodes were online and two were offline. So quorum wasn't
>> > > > attained
>> > > > > again.
>> > > > > 1)  For such a scenario we need help to be able to have one
>> > > > cluster live .
>> > > > > 2)  And in cases where only one node of the cluster is up and
>> > > > others are
>> > > > > down we need the resources and cluster to be up .
>> > > > >

Re: [ClusterLabs] no-quorum-policy=ignore is (Deprecated ) and replaced with other options but not an effective solution

2023-06-27 Thread Klaus Wenninger

On Tue, Jun 27, 2023 at 5:24 PM Andrei Borzenkov 
wrote:

> On 27.06.2023 07:21, Priyanka Balotra wrote:
> > Hi Andrei,
> > After this state the system went through some more fencings and we saw
> the
> > following state:
> >
> > :~ # crm status
> > Cluster Summary:
> >* Stack: corosync
> >* Current DC: FILE-2 (version
> > 2.1.2+20211124.ada5c3b36-150400.2.43-2.1.2+20211124.ada5c3b36) -
> partition
> > with quorum
>
> It says "partition with quorum" so what exactly is the problem?
>

I guess the problem is that resources aren't being recovered on
the nodes in the quorate partition.
Reason for that is probably that - as Ken was already suggesting - fencing
isn't
working properly or fencing-devices used are simply inappropriate for the
purpose (e.g. onboard IPMI).
The fact that a node is rebooting isn't enough. The node that initiated
fencing
has to know that it did actually work. But we're just guessing here. Logs
should
show what is actually going on.

Klaus

>
> >* Last updated: Mon Jun 26 12:44:15 2023
> >* Last change:  Mon Jun 26 12:41:12 2023 by root via cibadmin on
> FILE-2
> >* 4 nodes configured
> >* 11 resource instances configured
> >
> > Node List:
> >* Node FILE-1: UNCLEAN (offline)
> >* Node FILE-4: UNCLEAN (offline)
> >* Online: [ FILE-2 ]
> >* Online: [ FILE-3 ]
> >
> > At this stage FILE-1 and FILE-4 were continuously getting fenced (we have
> > device based stonith configured but the resource was not up ) .
> > Two nodes were online and two were offline. So quorum wasn't attained
> > again.
> > 1)  For such a scenario we need help to be able to have one cluster live
> .
> > 2)  And in cases where only one node of the cluster is up and others are
> > down we need the resources and cluster to be up .
> >
> > Thanks
> > Priyanka
> >
> > On Tue, Jun 27, 2023 at 12:25 AM Andrei Borzenkov 
> > wrote:
> >
> >> On 26.06.2023 21:14, Priyanka Balotra wrote:
> >>> Hi All,
> >>> We are seeing an issue where we replaced no-quorum-policy=ignore with
> >> other
> >>> options in corosync.conf order to simulate the same behaviour :
> >>>
> >>>
> >>> * wait_for_all: 0*
> >>>
> >>> *last_man_standing: 1last_man_standing_window: 2*
> >>>
> >>> There was another property (auto-tie-breaker) tried but couldn't
> >> configure
> >>> it as crm did not recognise this property.
> >>>
> >>> But even after using these options, we are seeing that system is not
> >>> quorate if at least half of the nodes are not up.
> >>>
> >>> Some properties from crm config are as follows:
> >>>
> >>>
> >>>
> >>> *primitive stonith-sbd stonith:external/sbd \params
> >>> pcmk_delay_base=5s.*
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> *.property cib-bootstrap-options: \have-watchdog=true \
> >>>
> >>
> dc-version="2.1.2+20211124.ada5c3b36-150400.2.43-2.1.2+20211124.ada5c3b36"
> >>> \cluster-infrastructure=corosync \cluster-name=FILE \
> >>> stonith-enabled=true \stonith-timeout=172 \
> >>> stonith-action=reboot \stop-all-resources=false \
> >>> no-quorum-policy=ignorersc_defaults build-resource-defaults: \
> >>> resource-stickiness=1rsc_defaults rsc-options: \
> >>> resource-stickiness=100 \migration-threshold=3 \
> >>> failure-timeout=1m \cluster-recheck-interval=10minop_defaults
> >>> op-options: \timeout=600 \record-pending=true*
> >>>
> >>> On a 4-node setup when the whole cluster is brought up together we see
> >>> error logs like:
> >>>
> >>> *2023-06-26T11:35:17.231104+00:00 FILE-1 pacemaker-schedulerd[26359]:
> >>> warning: Fencing and resource management disabled due to lack of
> quorum*
> >>>
> >>> *2023-06-26T11:35:17.231338+00:00 FILE-1 pacemaker-schedulerd[26359]:
> >>> warning: Ignoring malformed node_state entry without uname*
> >>>
> >>> *2023-06-26T11:35:17.233771+00:00 FILE-1 pacemaker-schedulerd[26359]:
> >>> warning: Node FILE-2 is unclean!*
> >>>
> >>> *2023-06-26T11:35:17.233857+00:00 FILE-1 pacemaker-schedulerd[26359]:
> >>> warning: Node FILE-3 is unclean!*
> >>>
> >>> *2023-06-26T11:35:17.233957+00:00 FILE-1 pacemaker-schedulerd[26359]:
> >>> warning: Node FILE-4 is unclean!*
> >>>
> >>
> >> According to this output FILE-1 lost connection to three other nodes, in
> >> which case it cannot be quorate.
> >>
> >>>
> >>> Kindly help correct the configuration to make the system function
> >> normally
> >>> with all resources up, even if there is just one node up.
> >>>
> >>> Please let me know if any more info is needed.
> >>>
> >>> Thanks
> >>> Priyanka
> >>>
> >>>
> >>> ___
> >>> Manage your subscription:
> >>> https://lists.clusterlabs.org/mailman/listinfo/users
> >>>
> >>> ClusterLabs home: https://www.clusterlabs.org/
> >>
> >> ___
> >> Manage your subscription:
> >>

Re: [ClusterLabs] Pacemaker logs written on message which is not expected as per configuration

2023-06-26 Thread Klaus Wenninger

On Fri, Jun 23, 2023 at 3:57 PM S Sathish S via Users 
wrote:

> Hi Team,
>
>
>
> The pacemaker logs is written in both '/var/log/messages' and
> '/var/log/pacemaker/pacemaker.log'.
>
> Could you please help us for not write pacemaker processes in
> /var/log/messages? Even corosync configuration we have set to_syslog: no.
>
> Attached the corosync.conf file.
>
>
>
> Pacemaker 2.1.6
>
>
>
> [root@node1 username]# tail -f /var/log/messages
>
> Jun 23 13:45:38 node1 ESAFMA_RA(ESAFMA_node1)[3593054]: INFO: 
> component is running with 10502  number
>
> Jun 23 13:45:38 node1 HealthMonitor_RA(HEALTHMONITOR_node1)[3593055]:
> INFO: Health Monitor component is running with 3046  number
>
> Jun 23 13:45:38 node1 ESAPMA_RA(ESAPMA_OCC)[3593056]: INFO: 
> component is running with 10902  number
>
> Jun 23 13:45:38 node1 HP_AMSD_RA(HP_AMSD_node1)[3593057]: INFO: 
> component is running with 2540  number
>
> Jun 23 13:45:38 node1 HP_SMAD_RA(HP_SMAD_node1)[3593050]: INFO: 
> component is running with 2536  number
>
> Jun 23 13:45:38 node1 SSMAGENT_RA(SSMAGENT_node1)[3593068]: INFO: 
> component is running with 2771  number
>
> Jun 23 13:45:38 node1 HazelCast_RA(HAZELCAST_node1)[3593059]: INFO:
>  component is running with 13355 number
>
> Jun 23 13:45:38 node1 HP_SMADREV_RA(HP_SMADREV_node1)[3593062]: INFO:
>  component is running with 2735  number
>
> Jun 23 13:45:38 node1 ESAMA_RA(ESAMA_node1)[3593065]: INFO: 
> component is running with 9572  number
>
> Jun 23 13:45:38 node1 MANAGER_RA(MANAGER_OCC)[3593071]: INFO: 
> component is running with 10069 number
>

What did you configure in /etc/sysconfig/pacemaker?
  PCMK_logfacility=none
should disable all syslogging.

Klaus

>
>
>
>
> cat /etc/corosync/corosync.conf
>
> totem {
>
> version: 2
>
> cluster_name: OCC
>
> transport: knet
>
> crypto_cipher: aes256
>
> crypto_hash: sha256
>
> cluster_uuid: 20572748740a4ac2a7bcc3a3bb6889e9
>
> }
>
>
>
> nodelist {
>
> node {
>
> ring0_addr: node1
>
> name: node1
>
> nodeid: 1
>
> }
>
> }
>
>
>
> quorum {
>
> provider: corosync_votequorum
>
> }
>
>
>
> logging {
>
> to_logfile: yes
>
> logfile: /var/log/cluster/corosync.log
>
> to_syslog: no
>
> timestamp: on
>
> }
>
>
>
> Thanks and Regards,
> S Sathish S
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] How to block/stop a resource from running twice?

2023-04-24 Thread Klaus Wenninger

On Fri, Apr 21, 2023 at 12:24 PM fs3000 via Users 
wrote:

> Hello all,
>
> I'm configuring a two node cluster. Pacemaker 0.9.169 on Centos 7.
>
> guess this is rather the pcs-version ...

> How can i configure a specific service to run just on one node and avoid
> having it running on more than one node simultaneously. If i start the
> service on the other node, it keeps running, pacemaker does not kill it. I
> have googled and searched the docs, but can't find a solution. This is for
> systemd resources. Any ideas please?
>
> Example:
>
> node1# pcs resource create httpd_service systemd:httpd  op monitor
> interval=10s
> node2# systemctl start httpd
>
> httpd keeps running on both nodes. Ideally, pacemaker should kill a second
> instance of that service.
>

When you start pacemaker on a node it will check which resources are
running there (called 'probe')
and thus if you had multiple instances of a primitive running (without a
clone) pacemaker would
take care of that.
But you are starting the systemd-unit after pacemaker.
The checking for a running resource that isn't expected to be running isn't
done periodically (at
least not per default and I don't know a way to achieve that from the top
of my mind).
Resources that have been started by pacemaker (or found running legitimate
on startup) are of course
monitored using that interval you have given.

Klaus

>
> Thanks in advance for any tips you might have.
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Offtopic - role migration

2023-04-19 Thread Klaus Wenninger

On Tue, Apr 18, 2023 at 9:09 PM Ken Gaillot  wrote:

> On Tue, 2023-04-18 at 19:50 +0200, Vladislav Bogdanov wrote:
> > Btw, an interesting question. How much efforts would it take to
> > support a migration of a Master role over the nodes? An use-case is
> > drbd, configured for a multi-master mode internally, but with master-
> > max=1 in the resource definition. Assuming that resource-agent
> > supports that flow -
> > 1. Do nothing.
> > 2. Promote on a dest node.
> > 3. Demote on a source node.
> >
> > Actually just wonder, because may be it could be some-how achievable
> > to migrate VM which are on top of drbd which is not a multi-master in
> > pacemaker. Fully theoretical case. Didn't verify the flow in-the-
> > mind.
> >
> > I believe that currently only the top-most resource is allowed to
> > migrate, but may be there is some room for impovement?
> >
> > Sorry for the off-topic.
> >
> > Best
> > Vlad
>
> It would be worthwhile, but conceptually it's difficult to imagine a
> solution.
>
> If a resource must be colocated with the promoted role of another
> resource, and only one instance can be promoted at a time, how would it
> be possible to live-migrate? You couldn't promote the new instance
> before demoting the old one, and you couldn't demote the old one
> without stopping the dependent resource.
>
> You would probably need some really complex new constraint types and/or
> resource agent actions. Something like "colocate rsc1 with the promoted
> role of rsc2-clone, unless it needs to migrate, in which case call this
> special agent action to prepare it for running with a demoted instance,
> then demote the instance, then migrate the resource, then promote the
> new instance, then call this other agent action to return it to normal
> operation".
>

Don't know if I got that correctly but wouldn't it rather have to be the
other way round?

- Migration source running on the promoted device
- Start target in receiving mode but without automatically starting once
  state is complete (don't know it qemu supports that) and without access
  to block-devices
- Start migration on the source side
- Once state difference is small enough disable VM on source side +
  transfer the leftover state
- Switch underlying filesystem/block-device to the destination
- Make qemu pick up at the state is has memorized and transferred

If we make the VM a promotable resource as well we could try to pull
the promoted state of fliesystem & VM to the other side once migration
is kicked off on the source-side.
Source side VM would refuse demotion as long as the state is passed
to the other side. We would need that measure of state difference is
small enough qemu is using when it kicks of the activation of the target.
At the end of demotion it would stop qemu and transfer the leftovers.
Then pacemaker could demote the source filesystem and promote on
the destination-side. That would trigger promoting the destination VM
which makes it run on the transfered state.
The above concept would probably work with cold migration to a
state-image on the destination side. But that is probably not interesting.
Don't know if qemu allows interception at the interesting points.
Don't know if we need the resource-interface to be extended for this.

Klaus


-- 
> Ken Gaillot 
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] VirtualDomain - node map - ?

2023-04-17 Thread Klaus Wenninger

On Mon, Apr 17, 2023 at 6:17 AM Andrei Borzenkov 
wrote:

> On 16.04.2023 16:29, lejeczek via Users wrote:
> >
> >
> > On 16/04/2023 12:54, Andrei Borzenkov wrote:
> >> On 16.04.2023 13:40, lejeczek via Users wrote:
> >>> Hi guys
> >>>
> >>> Some agents do employ that concept of node/host map which I
> >>> do not see in any manual/docs that this agent does - would
> >>> you suggest some technique or tips on how to achieve
> >>> similar?
> >>> I'm thinking specifically of 'migrate' here, as I understand
> >>> 'migration' just uses OS' own resolver to call migrate_to
> >>> node.
> >>>
> >>
> >> No, pacemaker decides where to migrate the resource and
> >> calls agents on the current source and then on the
> >> intended target passing this information.
> >>
> >
> > Yes pacemaker does that but - as I mentioned - some agents
> > do employ that "internal" nodes map "technique".
> > I see no mention of that/similar in the manual for
> > VirtualDomain so I asked if perhaps somebody had an idea of
> > how to archive such result by some other means.
> >
>
> What about showing example of these "some agents" or better describing
> what you want to achieve?
>

yep more context would be useful.
Or are you referring to the host-mapping used for fencing?

Klaus

> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] resource going to blocked status while we restart service via systemctl twice

2023-04-17 Thread Klaus Wenninger

On Mon, Apr 17, 2023 at 9:25 AM S Sathish S via Users 
wrote:

> Hi Team,
>
>
>
> TEST_node1 resource going to blocked status while we restart service via
> systemctl twice in less time/before completion of 1st systemctl command.
>
> In older pacemaker version 2.0.2 we don’t see this issue, only observing
> this issue on latest pacemaker version 2.1.15.
>

I'm not sure which change in particular with 2.1.5 would have created the
behavioral
change in your configuration. (rember discussion about reacting to
systemd-events
in pacemaker but didn't find anything already implemented on a quick check
of the
sources)
But basically afaik you are not expected to interfere with resources that
are under
pacemaker-control via anything else than pacemaker administration tooling
(high
or low level like e.g. pcs, crmsh, crm_resource, ...).
Otherwise you will see unexpected behavior. If you manage to do a restart
within
a monitoring-interval from the pacemaker-side you may get away without any
impact on the pacemaker-side though.

Klaus


>
> [root@node1 ~]# pcs resource status TEST_node1
>
>   * TEST_node1  (ocf::provider:TEST_RA):  Started node1
>
> [root@node1 ~]# systemctl restart TESTec
>
> [root@node1 ~]# cat /var/pid/TEST.pid
>
> 271466
>
> [root@node1 ~]# systemctl restart TESTec
>
> [root@node1 ~]# cat /var/pid/TEST.pid
>
> 271466
>
> [root@node1 ~]# pcs resource status TEST_node1
>
>   * TEST_node1  (ocf::provider:TEST_RA):  FAILED node1 (blocked)
>
> [root@node1 ~]#
>
>
>
>
>
> [root@node1 ~]# pcs resource config TEST_node1
>
> Resource: TEST_node1 (class=ocf provider=provider type=TEST_RA)
>
>   Meta Attributes: TEST_node1-meta_attributes
>
> failure-timeout=120s
>
> migration-threshold=5
>
> priority=60
>
>   Operations:
>
> migrate_from: TEST_node1-migrate_from-interval-0s
>
>   interval=0s
>
>   timeout=20
>
> migrate_to: TEST_node1-migrate_to-interval-0s
>
>   interval=0s
>
>   timeout=20
>
> monitor: TEST_node1-monitor-interval-10s
>
>   interval=10s
>
>   timeout=120s
>
>   on-fail=restart
>
> reload: TEST_node1-reload-interval-0s
>
>   interval=0s
>
>   timeout=20
>
> start: TEST_node1-start-interval-0s
>
>   interval=0s
>
>   timeout=120s
>
>   on-fail=restart
>
> stop: TEST_node1-stop-interval-0s
>
>   interval=0s
>
>   timeout=120s
>
>   on-fail=block
>
> [root@node1 ~]#
>
>
>
> Thanks and Regards,
>
> S Sathish S
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Location not working [FIXED]

2023-04-12 Thread Klaus Wenninger

On Wed, Apr 12, 2023 at 9:27 AM Andrei Borzenkov 
wrote:

> On Tue, Apr 11, 2023 at 6:27 PM Ken Gaillot  wrote:
> >
> > On Tue, 2023-04-11 at 17:31 +0300, Miro Igov wrote:
> > > I fixed the issue by changing location definition from:
> > >
> > > location intranet-ip_on_any_nginx intranet-ip \
> > > rule -inf: opa-nginx_1_active eq 0 \
> > > rule -inf: opa-nginx_2_active eq 0
> > >
> > > To:
> > >
> > > location intranet-ip_on_any_nginx intranet-ip \
> > > rule opa-nginx_1_active eq 1 \
> > >rule opa-nginx_2_active eq 1
> > >
> > > Now it works fine and shows the constraint with: crm res constraint
> > > intranet-ip
> >
> > Ah, I suspect the issue was that the original constraint compared only
> > against 0, when initially (before the resources ever start) the
> > attribute is undefined.
> >
>
> This does not really explain the original question. Apparently the
> attribute *was* defined but somehow ignored.
>
> Apr 10 12:11:02 intranet-test2 pacemaker-attrd[1511]:  notice: Setting
> opa-nginx_1_active[intranet-test1]: 1 -> 0
> ...
>   * intranet-ip (ocf::heartbeat:IPaddr2):Started intranet-test1
>

But that log is from a different node that sets the node-attribute for
another node and some time has gone by till the IP was detected
to be running unwantedly (a failure of the nfs-service recorded 14min
later - so at least that much). A lot can have happened in between like
rejoins - not saying what happened (like merging the CIBs) did
happen as it should have - but the log-line isn't necessarily a proof
that the attribute was still at 0. Searching the logs for changes
in the cluster-topology in the time in between may give some insights.

Klaus

> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] pacemaker-remoted /dev/shm errors

2023-03-06 Thread Klaus Wenninger

On Mon, Mar 6, 2023 at 3:32 PM Christine caulfield 
wrote:

> Hi,
>
> The error is coming from libqb - which is what manages the local IPC
> connections between local clients and the server.
>
> I'm the libqb maintainer but I've never seen that error before! Is there
> anything unusual about the setup on this node? Like filesystems on NFS
> or some other networked filesystem?
>
> Other basic things to check are that /dev/shm is not full. Yes, normally
> you'd get ENOSPC in that case but it's always worth checking because odd
> things can happen when filesystems get full.
>
> It might be helpful strace the client and server processes when the
> error occurs (if that's possible). I'm not 100% sure which operation is
> failing with EREMOTEIO - though I can't find many useful references to
> that error in the kernel which is also slightly weird.
>

EREMOTEIO is being used for the obvious purpose in pacemaker.

Klaus


>
> Chrissie
>
> On 06/03/2023 13:03, Alexander Epaneshnikov via Users wrote:
> > Hello. we are using pacemaker 2.1.4-5.el8  and seeing strange errors in
> the
> > logs when a request is made to the cluster.
> >
> > Feb 17 08:18:15 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984]
> (handle_new_connection)  error: Error in connection setup
> (/dev/shm/qb-2984-1077673-18-7xR8Y0/qb): Remote I/O error (121)
> > Feb 17 08:19:15 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984]
> (handle_new_connection)  error: Error in connection setup
> (/dev/shm/qb-2984-1077927-18-dX5NSt/qb): Remote I/O error (121)
> > Feb 17 08:20:16 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984]
> (handle_new_connection)  error: Error in connection setup
> (/dev/shm/qb-2984-1078160-18-RjzD4K/qb): Remote I/O error (121)
> > Feb 17 08:21:16 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984]
> (handle_new_connection)  error: Error in connection setup
> (/dev/shm/qb-2984-1078400-18-YyJmJJ/qb): Remote I/O error (121)
> >
> > other than that pacemaker/corosync works fine.
> >
> > any suggestions on the cause of the error, or at least where to start
> debugging, are welcome.
> >
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] resource cloned group colocations

2023-03-02 Thread Klaus Wenninger

On Thu, Mar 2, 2023 at 8:41 AM Gerald Vogt  wrote:

> Hi,
>
> I am setting up a mail relay cluster which main purpose is to maintain
> the service ips via IPaddr2 and move them between cluster nodes when
> necessary.
>
> The service ips should only be active on nodes which are running all
> necessary mail (systemd) services.
>
> So I have set up a resource for each of those services, put them into a
> group in order they should start, cloned the group as they are normally
> supposed to run on the nodes at all times.
>
> Then I added an order constraint
>start mail-services-clone then start mail1-ip
>start mail-services-clone then start mail2-ip
>
> and colocations to prefer running the ips on different nodes but only
> with the clone running:
>
>colocation add mail2-ip with mail1-ip -1000
>colocation ip1 with mail-services-clone
>colocation ip2 with mail-services-clone
>
> as well as a location constraint to prefer running the first ip on the
> first node and the second on the second
>
>location ip1 prefers ha1=2000
>location ip2 prefers ha2=2000
>
> Now if I stop pacemaker on one of those nodes, e.g. on node ha2, it's
> fine. ip2 will be moved immediately to ha3. Good.
>
> However, if pacemaker on ha2 starts up again, it will immediately remove
> ip2 from ha3 and keep it offline, while the services in the group are
> starting on ha2. As the services unfortunately take some time to come
> up, ip2 is offline for more than a minute.
>
> It seems the colocations with the clone are already good once the clone
> group begins to start services and thus allows the ip to be removed from
> the current node.
>

To achieve this you have to add orders on top of collocations.

Klaus


>
> I was wondering how can I define the colocation to be accepted only if
> all services in the clone have been started? And not once the first
> service in the clone is starting?
>
> Thanks,
>
> Gerald
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] cluster with redundant links - PCSD offline

2023-02-28 Thread Klaus Wenninger

On Mon, Feb 27, 2023 at 6:25 PM Ken Gaillot  wrote:

> On Sun, 2023-02-26 at 18:15 +0100, lejeczek via Users wrote:
> > Hi guys.
> >
> > I have a simple 2-node cluster with redundant links and I wonder why
> > status reports like this:
> > ...
> > Node List:
> >   * Node swir (1): online, feature set 3.16.2
> >   * Node whale (2): online, feature set 3.16.2
> > ...
> > PCSD Status:
> >   swir: Online
> >   whale: Offline
> > ...
> >
> > Cluster's config:
> > ...
> > Nodes:
> >   swir:
> > Link 0 address: 10.1.1.100
> > Link 1 address: 10.3.1.100
> > nodeid: 1
> >   whale:
> > Link 0 address: 10.1.1.101
> > Link 1 address: 10.3.1.101
> > nodeid: 2
> > ...
> >
> > Is that all normal for a cluster when 'Link 0' is down?
> > I thought redundant link(s) would be for that exact reason - so
> > cluster would remain fully operational.
> >
> > many thanks, L.
>
> If you're referring to the "Offline" under "PCSD Status", yes, that's
> normal. That only affects the pcsd daemon used to coordinate pcs
> commands across all nodes, not the cluster itself. As far as I know,
> pcsd has no way to use multiple links.
>
Well ... chicken and eggs 

Idea of pcsd is to be a basis for setting up all the layers of the
cluster stack - so relying on the knet-layer would be an issue.

That said it would of course be interesting - once knet-layer or
whatever is setup - to be able to use redundancy for pcs.
Excuse my ignorance if there is something available already.
One thing that might be relatively easy to setup might be a
virtual ethernet-connection via knet. Haven't played with that
myself so others might be a better help to configure that - if
pcs doesn't already have support for such a setup.

Klaus

>
> The "online" under "Nodes" is what's relevant to the operation of the
> cluster itself.
> --
> Ken Gaillot 
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] sbd v1.5.2

2023-01-09 Thread Klaus Wenninger

Hi sbd - developers & users!

Thanks to everybody for contributing to tests and
further development.

Only functional change is the first topic in the list below.
And even that is 'just' refusing startup in a case where
the config anyway wouldn't have led to a successful cluster
startup.

Improved logs/build/test should make things more convenient
and less error prone.


Changes since 1.5.1

- fail startup if pacemaker integration is disabled while
  SBD_SYNC_RESOURCE_STARTUP is conflicting (+ hint to overcome)
- improve logs
  - when logging state of SBD_PACEMAKER tell it is just that as
this might still be overridden via cmdline options
  - log a warning if SBD_PACEMAKER is overridden by -P or -PP option
  - do not warn about startup syncing with pacemaker integration disabled
  - when watchdog-device is busy give a hint on who is hogging it
- improve build environment
  - have --with-runstatedir overrule --runstatedir
  - use new package name for pacemaker devel on opensuse
  - make config location configurable for man-page-creation
  - reverse alloc/de-alloc order to make gcc-12 static analysis happy
- improve test environment
  - have image-files in /dev/shm to assure they are in memory and
sbd opening the files with O_SYNC doesn't trigger unnecessary
syncs on a heavily loaded test-machine
fallback to /tmp if /dev/shm doesn't exist
  - wrapping away libaio and usage of device-mapper for block-device
simulation can now be passed into make via
SBD_USE_DM & SBD_TRANSLATE_AIO
  - have variables that configure test-environment be printed
out prior to running tests
  - finally assure we clean environment when interrupted by a
signal (bash should have done it with just setting EXIT handler -
but avoiding bashism might come handy one day)

Regards,
Klaus
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Stonith

2022-12-21 Thread Klaus Wenninger

On Wed, Dec 21, 2022 at 4:51 PM Ken Gaillot  wrote:

> On Wed, 2022-12-21 at 10:45 +0100, Ulrich Windl wrote:
> > > > > Ken Gaillot  schrieb am 20.12.2022 um
> > > > > 16:21 in
> > Nachricht
> > <3a5960c2331f97496119720f6b5a760b3fe3bbcf.ca...@redhat.com>:
> > > On Tue, 2022‑12‑20 at 11:33 +0300, Andrei Borzenkov wrote:
> > > > On Tue, Dec 20, 2022 at 10:07 AM Ulrich Windl
> > > >  wrote:
> > > > > > But keep in mind that if the whole site is down (or
> > > > > > unaccessible)
> > > > > > you
> > > > > > will not have access to IPMI/PDU/whatever on this site so
> > > > > > your
> > > > > > stonith
> > > > > > agents will fail ...
> > > > >
> > > > > But, considering the design, such site won't have a quorum and
> > > > > should commit suicide, right?
> > > > >
> > > >
> > > > Not by default.
> > >
> > > And even if it does, the rest of the cluster can't assume that it
> > > did,
> > > so resources can't be recovered. It could work with sbd, but the
> > > poster
> > > said that the physical hosts aren't accessible.
> >
> > Why? Assuming fencing is configured, the nodes part of the quorum
> > should wait
> > for fencing delay, assuming fencing (or suicide) was done.
> > Then they can manage resources. OK, a non-working fencing or suicide
> > mechanism
> > is a different story...
> >
> > Regards,
> > Ulrich
>
> Right, that would be using watchdog-based SBD for self-fencing, but the
> poster can't use SBD in this case.
>

Read it in a way that this would just be a PoC setup.
Like ssh-fencing as a replacement for a real fencing-device one can
use softdog (or whatever the virtual-environment offers that is supported
by the kernel as watchdog-device) with watchdog-fencing at least for
PoC purposes.
I guess it depends on how the final setup is gonna differ from the PoC
setup. Knowing that things like live-migration, pausing a machine,
running on heavily overcommitted hosts, snapshots, ... would
be critical for the scenario one could simply try to avoid these things
during PoC tests if they are not relevant for a final production setup.

Klaus


> --
> Ken Gaillot 
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Re: Bug pacemaker with multiple IP

2022-12-21 Thread Klaus Wenninger

On Wed, Dec 21, 2022 at 11:26 AM Reid Wahl  wrote:

> On Wed, Dec 21, 2022 at 2:15 AM Ulrich Windl
>  wrote:
> >
> > Hi!
> >
> > I wonder: Could the error message be triggered by adding an exclusive
> manatory
> > lock in the ip binary?
> > If that triggers the bug, I'm rather sure that the error message is bad.
> > Shouldn't that be EWOULDBLOCK then?
>
> I did some cursory reading earlier today, and it seems that ETXTBSY is
> becoming less common: https://lwn.net/Articles/866493/
>
> Either way, that would be a question for kernel maintainers.
>

Maybe network-stack-guys there or sbdy with deeper insight of how the
ip-tool
is currently interfering with the kernel.
Without knowing any details certain things might be handled calling
bpf-binaries
and ip being the userspace application this might still be shown if it was
actually rather about a bpf-binary to be executed. Thinking of
race-conditions
at that front ...


>
> > (I have no idea how Sophos AV works, though. If they open the files to
> check
> > in write-mode, it's really stupid then IMHO)
> >
> > Regards,
> > Ulrich
> >
> >
> > >>> Reid Wahl  schrieb am 21.12.2022 um 10:19 in
> Nachricht
> > :
> > > On Wed, Dec 21, 2022 at 12:24 AM Thomas CAS  wrote:
> > >>
> > >> Ken,
> > >>
> > >> Antivirus (sophos-av) is running but not in "real time access
> scanning",
> > the
> > > scheduled scan is however at 9pm every day.
> > >> 7 minutes later, we got these alerts.
> > >> The anti virus may indeed be the cause.
> > >
> > > I see. That does seem fairly likely. At least, there's no other
> > > obvious candidate for the cause.
> > >
> > > I used to work on a customer-facing support team for the ClusterLabs
> > > suite, and we received a fair number of cases where bizarre issues
> > > (such as hangs and access errors) were apparently caused by an
> > > antivirus. In those cases, all other usual lines of investigation were
> > > exhausted, and when we asked the customer to disable their AV, the
> > > issue disappeared. This happened with several different AV products.
> > >
> > > I can't say with any certainty that the AV is causing your issue, and
> > > I know it's frustrating that you won't know whether any given
> > > intervention worked, since this only happens once every few months.
> > >
> > > You may want to either exclude certain files from the scan, or write a
> > > short script to place the cluster in maintenance mode before the scan
> > > and take it out of maintenance after the scan is complete.
> > >
> > >>
> > >> I had the case on December 13 (with systemctl here):
> > >>
> > >> pacemaker.log-20221217.gz:Dec 13 21:07:53 wd-websqlng01
> pacemaker-controld
> >
> > > [5082] (process_lrm_event)  notice:
> wd-websqlng01-NGINX_monitor_15000:454 [
> >
> > > /etc/init.d/nginx: 33: /lib/lsb/init-functions.d/40-systemd:
> systemctl: Text
> >
> > > file busy\n/etc/init.d/nginx: 82: /lib/lsb/init-functions.d/40-systemd:
> > > /bin/systemctl: Text file busy\n ]
> > >> pacemaker.log-20221217.gz:Dec 13 21:07:53 wd-websqlng01
> pacemaker-controld
> >
> > > [5082] (process_lrm_event)  notice:
> wd-websqlng01-NGINX_monitor_15000:454 [
> >
> > > /etc/init.d/nginx: 33: /lib/lsb/init-functions.d/40-systemd:
> systemctl: Text
> >
> > > file busy\n/etc/init.d/nginx: 82: /lib/lsb/init-functions.d/40-systemd:
> > > /bin/systemctl: Text file busy\n ]
> > >>
> > >> After, this happens rarely, we had the case in August:
> > >>
> > >> pacemaker.log-20220826.gz:Aug 25 21:06:31 wd-websqlng01
> pacemaker-controld
> >
> > > [3718] (process_lrm_event)  notice:
> > > wd-websqlng01-NGINX-VIP-232_monitor_1:2877 [
> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: uname: Text file
> > > busy\nocf-exit-reason:IPaddr2 only supported Linux.\n ]
> > >> pacemaker.log-20220826.gz:Aug 25 21:06:31 wd-websqlng01
> pacemaker-controld
> >
> > > [3718] (process_lrm_event)  notice:
> > > wd-websqlng01-NGINX-VIP-231_monitor_1:2880 [
> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: uname: Text file
> > > busy\nocf-exit-reason:IPaddr2 only supported Linux.\n ]
> > >>
> > >> It's always around 9:00-9:07 pm,
> > >> I'll move the virus scan to 10pm and see.
> > >
> > > That also sounds like a good plan to confirm the cause :) It might
> > > take a while to find out though.
> > >
> > >>
> > >> Thanks,
> > >> Best regards,
> > >>
> > >> Thomas Cas  |  Technicien du support infogérance
> > >> PHONE : +33 3 51 25 23 26   WEB : www.ikoula.com/en
> > >> IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE
> > >> Before printing this letter, think about the impact on the
> environment!
> > >>
> > >> -Message d'origine-
> > >> De : Reid Wahl 
> > >> Envoyé : mardi 20 décembre 2022 20:34
> > >> À : Cluster Labs - All topics related to open-source clustering
> welcomed
> > > 
> > >> Cc : Ken Gaillot ; Service Infogérance
> > > 
> > >> Objet : Re: [ClusterLabs] Bug pacemaker with

Re: [ClusterLabs] Samba failover and Windows access

2022-12-12 Thread Klaus Wenninger

On Sat, Dec 10, 2022 at 6:39 PM Dave Withheld 
wrote:

> On Thu, Dec 8, 2022 at 8:03 AM Dave Withheld 
> wrote:
>
> In our production factory, we run a 2-node cluster on CentOS 8 with
> pacemaker, a virtual IP, and drbd for shared storage with samba (among
> other services) running as a resource on the active node.  Everything works
> great except when we fail over.  All resources are moved to the other node
> and start just fine, but Windows hosts that have connections to the samba
> shares all have to be rebooted before they can reconnect.  Clients that
> were not connected can connect.  We have samba configured for only SMB1
> protocol and all Windows clients are configured to allow it.
>
> >>Did you test if it is samba/smb-client related or windows IP-stack
> related - like ping the samba-host from the windows machines?
> >>Is the virtual IP using the physical MAC address of the interface - like
> windows missing the gratuitous ARP?
>
> Not just ping, but several other services (custom daemons, http, Mariadb,
> etc) all connect seamlessly.  It's only the samba connections that don't
> (obviously ping works, too).
>
> As for the MAC address, it is the same:  ip a shows two IPs for the
> interface but only one link/ether.
>
> This server (2-node cluster) is replacing an old system I built in 2008,
> which used heartbeat (no pacemaker or corosync) and had a much older
> version of samba.  It had no problem failing over:  mapped drives on the
> Windows clients worked just as well after a failover as they did before and
> UNCs worked seamlessly, as well.  In fact, the few times it failed over, no
> one even knew it until we saw a message in our emails sent by the servers
> when the resources moved.
>
> On the old system, ifconfig showed an eth0 interface, as well as an eth0:0
> interface on the active node which the virtual IP.  The docs called the
> virtual IP an "alias".  On the new server, ifconfig does not show the
> virtual IP at all and I have to use "ip a" to see the two addresses on one
> interface.  I tried using the command "ifconfig eno1:0 XXX.XXX.XXX.XXX up"
> to manually add an IP in a similar manner to the old server and the address
> I added did show up in ifconfig.  The point is, the virtual address is
> being added differently and I suspect the Windows clients treat it
> differently.
>
> I will be looking closely at the resource agents and see how they
> compare.  If any of this rings a bell, I would love to hear more from
> anyong with experience.  Thanks!
>

Sry didn't say I had a clue for you. Just thought this info was missing ;-)
My personal experience with samba is both minimal and dated.
What you could do - if nobody has a clue - would be capturing the traffic -
both cases if the old setup is still available.

Klaus

>
> >>Klaus
>
> Maybe this is a question for the samba folks, but thought I'd try here
> first since it's only a problem when the other node takes over the samba
> resource.  Anyone seen this problem and solved it?
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Samba failover and Windows access

2022-12-07 Thread Klaus Wenninger

On Thu, Dec 8, 2022 at 8:03 AM Dave Withheld 
wrote:

> In our production factory, we run a 2-node cluster on CentOS 8 with
> pacemaker, a virtual IP, and drbd for shared storage with samba (among
> other services) running as a resource on the active node.  Everything works
> great except when we fail over.  All resources are moved to the other node
> and start just fine, but Windows hosts that have connections to the samba
> shares all have to be rebooted before they can reconnect.  Clients that
> were not connected can connect.  We have samba configured for only SMB1
> protocol and all Windows clients are configured to allow it.
>
> Did you test if it is samba/smb-client related or windows IP-stack related
- like ping the samba-host from the windows machines?
Is the virtual IP using the physical MAC address of the interface - like
windows missing the gratuitous ARP?

Klaus

> Maybe this is a question for the samba folks, but thought I'd try here
> first since it's only a problem when the other node takes over the samba
> resource.  Anyone seen this problem and solved it?
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Unable to build rpm using make rpm command for pacemaker-2.1.4.

2022-11-22 Thread Klaus Wenninger

On Tue, Nov 22, 2022 at 1:16 PM S Sathish S via Users 
wrote:

> Hi Ken/Team,
>
> We have tried on pacemaker 2.1.1 also faced same issue , later we have
> perform below steps as workaround to build pacemaker rpm as you said it run
> from a git checkout and build rpm.
>
> #./autogen.sh
> #./configure/
> #make
> #make dist
> #rpmbuild -v -bb spec.in
>
> RPM output in below format:
> pacemaker-cluster-libs-2.1.4-1.2.1.4.git.el8.x86_64
> pacemaker-schemas-2.1.4-1.2.1.4.git.el8.noarch
> pacemaker-2.1.4-1.2.1.4.git.el8.x86_64
> pacemaker-libs-2.1.4-1.2.1.4.git.el8.x86_64
> pacemaker-cli-2.1.4-1.2.1.4.git.el8.x86_64
>
> Please let us know once it is fixed on 2.1.5-rc3 ,we need to build rpm
> without git checkout method.
>

2.1.5-rc3 isn't out yet but patch-set from PR
https://github.com/ClusterLabs/pacemaker/pull/2949
merged into 2.1-branch should solve the issue.

Klaus

>
> Thanks and Regards,
> S Sathish S
> -Original Message-
> From: Ken Gaillot 
> Sent: 21 November 2022 21:52
> To: Cluster Labs - All topics related to open-source clustering welcomed <
> users@clusterlabs.org>
> Cc: S Sathish S 
> Subject: Re: [ClusterLabs] Unable to build rpm using make rpm command for
> pacemaker-2.1.4.
>
> Hi,
>
> Currently the RPM targets can only be run from a git checkout, not a
> distribution tarball. It looks like that's a regression introduced in
> 2.1.2 so I'll try to fix it for 2.1.5-rc3 (expected this week).
>
> On Mon, 2022-11-21 at 13:04 +, S Sathish S via Users wrote:
> > Hi Team,
> >
> > I am getting the below error when executing make rpm command to build
> > pacemaker-2.1.4 package on linux 8 server.
> >
> > [root@node1 pacemaker-Pacemaker-2.1.4]# make rpm make  -C rpm  "rpm"
> > make[1]: Entering directory '/root/smf_source/pacemaker-Pacemaker-
> > 2.1.4/rpm'
> > cd /root/smf_source/pacemaker-Pacemaker-
> > 2.1.4/rpm/..;   \
> > if [ -n "" ]; then  \
> > git commit -m "DO-NOT-PUSH"
> > -a; \
> > git archive --prefix=pacemaker-DIST/ -o
> > "/root/smf_source/pacemaker-Pacemaker-2.1.4/rpm/../pacemaker-
> > DIST.tar.gz" HEAD^{tree};  \
> > git reset --mixed
> > HEAD^;\
> > echo "`date`: Rebuilt /root/smf_source/pacemaker-Pacemaker-
> > 2.1.4/rpm/../pacemaker-
> > DIST.tar.gz"; \
> > elif [ -f "/root/smf_source/pacemaker-Pacemaker-
> > 2.1.4/rpm/../pacemaker-DIST.tar.gz" ];
> > then \
> > echo "`date`: Using existing tarball: /root/smf_source/pacemaker-
> > Pacemaker-2.1.4/rpm/../pacemaker-DIST.tar.gz"; \
> > else
> >\
> > git archive --prefix=pacemaker-DIST/ -o
> > "/root/smf_source/pacemaker-Pacemaker-2.1.4/rpm/../pacemaker-
> > DIST.tar.gz" DIST^{tree};  \
> > echo "`date`: Rebuilt /root/smf_source/pacemaker-Pacemaker-
> > 2.1.4/rpm/../pacemaker-
> > DIST.tar.gz"; \
> > fi
> > fatal: not a git repository (or any of the parent directories): .git
> > Mon Nov 21 07:42:25 EST 2022: Rebuilt /root/smf_source/pacemaker-
> > Pacemaker-2.1.4/rpm/../pacemaker-DIST.tar.gz
> > rm -f "/root/smf_source/pacemaker-Pacemaker-
> > 2.1.4/rpm/SRPMS"/*.src.rpm
> > rm -f "/root/smf_source/pacemaker-Pacemaker-
> > 2.1.4/rpm/SPECS/pacemaker.spec"
> > fatal: not a git repository (or any of the parent directories): .git
> > fatal: not a git repository (or any of the parent directories): .git
> > fatal: not a git repository (or any of the parent directories): .git
> > fatal: not a git repository (or any of the parent directories): .git
> > fatal: not a git repository (or any of the parent directories): .git
> > fatal: not a git repository (or any of the parent directories): .git
> > fatal: not a git repository (or any of the parent directories): .git
> > fatal: not a git repository (or any of the parent directories): .git
> > /usr/bin/mkdir -p "/root/smf_source/pacemaker-Pacemaker-
> > 2.1.4/rpm/SPECS"
> > if [ x"`git ls-files -m
> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-45444731-da62d98365929a68=1=8617afbf-d9b3-4f10-880c-745c5d63df19=http%3A%2F%2Fpacemaker.spec.in%2F
> 2>/dev/null`" != x ];
> > then\
> > cat "/root/smf_source/pacemaker-Pacemaker-
> > 2.1.4/rpm/pacemaker.spec.in"; \
> > elif git cat-file -e DIST:rpm/pacemaker.spec.in 2>/dev/null;
> > then   \
> > git show
> > DIST:rpm/pacemaker.spec.in;\
> > elif git cat-file -e DIST:pacemaker.spec.in 2>/dev/null;
> > then   \
> > git show
> > DIST:pacemaker.spec.in;\
> > else
> >\
> > cat "/root/smf_source/pacemaker-Pacemaker-
> > 2.1.4/rpm/pacemaker.spec.in"; \

Re: [ClusterLabs] [External] : Re: Fence Agent tests

2022-11-15 Thread Klaus Wenninger

On Sat, Nov 5, 2022 at 9:45 PM Jehan-Guillaume de Rorthais via Users <
users@clusterlabs.org> wrote:

> On Sat, 5 Nov 2022 20:53:09 +0100
> Valentin Vidić via Users  wrote:
>
> > On Sat, Nov 05, 2022 at 06:47:59PM +, Robert Hayden wrote:
> > > That was my impression as well...so I may have something wrong.  My
> > > expectation was that SBD daemon should be writing to the /dev/watchdog
> > > within 20 seconds and the kernel watchdog would self fence.
> >
> > I don't see anything unusual in the config except that pacemaker mode is
> > also enabled. This means that the cluster is providing signal for sbd
> even
> > when the storage device is down, for example:
> >
> > 883 ?SL 0:00 sbd: inquisitor
> > 892 ?SL 0:00  \_ sbd: watcher: /dev/vdb1 - slot: 0 - uuid:
> ...
> > 893 ?SL 0:00  \_ sbd: watcher: Pacemaker
> > 894 ?SL 0:00  \_ sbd: watcher: Cluster
> >
> > You can strace different sbd processes to see what they are doing at any
> > point.
>
> I suspect both watchers should detect the loss of network/communication
> with
> the other node.
>
> BUT, when sbd is in Pacemaker mode, it doesn't reset the node if the
> local **Pacemaker** is still quorate (via corosync). See the full chapter:
> «If Pacemaker integration is activated, SBD will not self-fence if
> **device**
> majority is lost [...]»
>
> https://documentation.suse.com/sle-ha/15-SP4/html/SLE-HA-all/cha-ha-storage-protect.html
>
> Would it be possible that no node is shutting down because the cluster is
> in
> two-node mode? Because of this mode, both would keep the quorum expecting
> the
> fencing to kill the other one... Except there's no active fencing here,
> only
> "self-fencing".
>

Seems not to be the case here but for completeness:
This fact should be recognized automatically by sbd (upstream since some
time
in 2017 iirc) and instead of checking quorum sbd would then check for
presence of 2 nodes with the cpg-group. I hope corosync prevents 2-node &
qdevice
set at the same time. But even in that case I would rather expect unexpected
self-fencing instead of the opposite.

Klaus


>
> To verify this guess, check the corosync conf for the "two_node" parameter
> and
> if both nodes still report as quorate during network outage using:
>
>   corosync-quorumtool -s
>
> If this turn to be a good guess, without **active** fencing, I suppose a
> cluster
> can not rely on the two-node mode. I'm not sure what would be the best
> setup
> though.
>
> Regards,
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] [External] : Re: Fence Agent tests

2022-11-15 Thread Klaus Wenninger

On Wed, Nov 9, 2022 at 2:58 PM Robert Hayden 
wrote:

>
> > -Original Message-
> > From: Users  On Behalf Of Andrei
> > Borzenkov
> > Sent: Wednesday, November 9, 2022 2:59 AM
> > To: Cluster Labs - All topics related to open-source clustering welcomed
> > 
> > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests
> >
> > On Mon, Nov 7, 2022 at 5:07 PM Robert Hayden
> >  wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Users  On Behalf Of Valentin
> > Vidic
> > > > via Users
> > > > Sent: Sunday, November 6, 2022 5:20 PM
> > > > To: users@clusterlabs.org
> > > > Cc: Valentin Vidić 
> > > > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests
> > > >
> > > > On Sun, Nov 06, 2022 at 09:08:19PM +, Robert Hayden wrote:
> > > > > When SBD_PACEMAKER was set to "yes", the lack of network
> > connectivity
> > > > to the node
> > > > > would be seen and acted upon by the remote nodes (evicts and takes
> > > > > over ownership of the resources).  But the impacted node would just
> > > > > sit logging IO errors.  Pacemaker would keep updating the
> > /dev/watchdog
> > > > > device so SBD would not self evict.   Once I re-enabled the
> network,
> > then
> > > > the
> > > >
> > > > Interesting, not sure if this is the expected behaviour based on:
> > > >
> > > >
> >
> https://urldefense.com/v3/__https://lists.clusterlabs.org/pipermail/users/2


Which versions of pacemaker/corosync/sbd are you using?
iirc a result of the discussion linked was sbd checking watchdog-timeout
against sync-timeout in case of qdevice being used. default sync-timeout
is 30s and your watchdog-timeout is 20s. So I would expect kind of current
sbd should refuse startup.
But iirc in the discussion linked the pacemaker-node finally became
non-quorate.
There was just a possible split-brain-gap when sync-timeout >
watchdog-timeout.
So if your pacemaker-instance stays quorate it has to be something else
rather.


>
> > > > 017-
> > > >
> > August/022699.html__;!!ACWV5N9M2RV99hQ!IvnnhGI1HtTBGTKr4VFabWA
> > > > LeMfBWNhcS0FHsPFHwwQ3Riu5R3pOYLaQPNia-
> > > > GaB38wRJ7Eq4Q3GyT5C3s8y7w$
> > > >
> > > > Does SBD log "Majority of devices lost - surviving on pacemaker" or
> > > > some other messages related to Pacemaker?
> > >
> > > Yes.
> > >
> > > >
> > > > Also what is the status of Pacemaker when the network is down? Does
> it
> > > > report no quorum or something else?
> > > >
> > >
> > > Pacemaker on the failing node shows quorum even though it has lost
> > > communication to the Quorum Device and to the other node in the
> cluster.
> > > The non-failing node of the cluster can see the Quorum Device system
> and
> > > thus correctly determines to fence the failing node and take over its
> > > resources.
>

Hmm ... maybe some problem with qdevice-setup and/or quorum stategy (LMS
for instance).
If quorum doesn't work properly your cluster won't work properly regardless
of sbd killing the node properly or not.


> > >
> > > Only after I run firewall-cmd --panic-off, will the failing node start
> to log
> > > messages about loss of TOTEM and getting a new consensus with the
> > > now visible members.
> > >
> >
> > Where exactly do you use firewalld panic mode? You have hosts, you
> > have VM, you have qnode ...
> >
> > Have you verified that the network is blocked bidirectionally? I had
> > rather mixed experience with asymmetrical firewalls which resembles
> > your description.
>
> In my testing harness, I will send a script to the remote node which
> contains the firewall-cmd --panic-on, a sleep command, and then
> turn off the panic mode.  That way I can adjust the length of time
> network is unavailable on a single node.  I used to log into a network
> switch to turn ports off, but that is not possible in a Cloud environment.
> I have also played with manually creating iptables rules, but the panic
> mode
> is simply easier and accomplishes the task.
>
> I have verified that when panic mode is on, no inbound or outbound
> network traffic is allowed.   This includes iSCSI packets as well.  You
> better
> have access to the console or the ability to reset the system.
>
>
> >
> > Also it may depend on the corosync driver in use.
> >
> > > I think all of that explains the lack of self-fencing when the sbd
> setting of
> > > SBD_PACEMAKER=yes is used.
>

Are you aware that when setting SBD_PACEMAKER=no with just a single
disk this disk will become a SPOF?

Klaus


> > >
> >
> > Correct. This means that at least under some conditions
> > pacemaker/corosync fail to detect isolation.
> > ___
> > Manage your subscription:
> >
> https://urldefense.com/v3/__https://lists.clusterlabs.org/mailman/listinfo/u
> > sers__;!!ACWV5N9M2RV99hQ!IMFB2Teli90q80SZ0fS4861iqEF-
> > yFGiPUvE81iTEJM4MHWMqoPOAxaJL5Fwmyr8py4S4QRvU4INEiY6YXvIH5c$
> >
> > ClusterLabs home:
> > https://urldefense.com/v3/__https://www.clusterlabs.org/__;!!ACWV5N9
> > M2RV99hQ!IMFB2Teli90q80SZ0fS4861iqEF-
> >

Re: [ClusterLabs] crm resource trace

2022-10-24 Thread Klaus Wenninger

On Mon, Oct 24, 2022 at 11:10 AM Xin Liang via Users 
wrote:

> Hi Bernd,
>
> The behaviors between the SLE15SP4 and SLE12SP5 are different.
>
> On 12sp5:
>
>- run `crm_resource --cleanup --resource `, then the resource
>is not restarted when trace/untrace
>
> On 15sp4:
>
>- run `crm_resource --cleanup --resource `, then the resource
>still restarted when trace/untrace
>
>
Hmm ... thanks for the update!
I do remember having reviewed some PRs dealing with digest but
obviously not detailed enough to tell if some upstream change
might have 'fixed' the issue.
Maybe Ken can still tell from the top of his mind.

Klaus

>
>-
>
> --
> *From:* Users  on behalf of Lentes, Bernd <
> bernd.len...@helmholtz-muenchen.de>
> *Sent:* Monday, October 24, 2022 4:46 PM
> *To:* Pacemaker ML 
> *Subject:* Re: [ClusterLabs] crm resource trace
>
>
> - On 24 Oct, 2022, at 10:08, Klaus Wenninger kwenn...@redhat.com
> wrote:
>
> > On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users < [
> > mailto:users@clusterlabs.org  |
> users@clusterlabs.org ] > wrote:
>
>
>
> > Did you try a cleanup in between?
>
> When i do a cleanup before trace/untrace the resource is not restarted.
> When i don't do a cleanup it is restarted.
>
> Bernd
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] crm resource trace

2022-10-24 Thread Klaus Wenninger

On Mon, Oct 24, 2022 at 10:46 AM Lentes, Bernd <
bernd.len...@helmholtz-muenchen.de> wrote:

>
> - On 24 Oct, 2022, at 10:08, Klaus Wenninger kwenn...@redhat.com
> wrote:
>
> > On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users < [
> > mailto:users@clusterlabs.org | users@clusterlabs.org ] > wrote:
>
>
>
> > Did you try a cleanup in between?
>
> When i do a cleanup before trace/untrace the resource is not restarted.
> When i don't do a cleanup it is restarted.
>

Sry Bernd for not being explicit - did get it that far ;-)
Wanted to see if Xin Liang has tried the cleanup as well.

Klaus

>
> Bernd___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] crm resource trace

2022-10-24 Thread Klaus Wenninger

On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users 
wrote:

> Hi Bernd,
>
> I got it, you are on SLE12SP5, and the crmsh version
> is crmsh-4.1.1+git.1647830282.d380378a-2.74.2.noarch, right?
>
> I try to reproduce this inconsistent behavior, add an IPaddr2 agent vip,
> run `crm resource trace vip` and `crm resource untrace vip`
>
> On each time, the resource vip will be restarted("due to resource
> definition change")
>

Did you try a cleanup in between?

Klaus

>
> I can't see the resource don't restart when trace/untrace resource
>
>
> Regards,
> Xin
>
>
> --
> *From:* Users  on behalf of Xin Liang via
> Users 
> *Sent:* Monday, October 24, 2022 10:29 AM
> *To:* Cluster Labs - All topics related to open-source clustering
> welcomed 
> *Cc:* Xin Liang 
> *Subject:* Re: [ClusterLabs] crm resource trace
>
> Hi Bernd,
>
> On which version you're running for crmsh and SLE?
>
>
> Regards,
> Xin
> --
> *From:* Users  on behalf of Lentes, Bernd <
> bernd.len...@helmholtz-muenchen.de>
> *Sent:* Monday, October 17, 2022 6:43 PM
> *To:* Pacemaker ML 
> *Subject:* Re: [ClusterLabs] crm resource trace
>
> Hi,
>
> i try to find out why there is sometimes a restart of the resource and
> sometimes not.
> Unpredictable behaviour is someting i expect from Windows, not from Linux.
> Here you see two "crm resource trace "resource"".
> In the first case the resource is restarted , in the second not.
> The command i used is identical in both cases.
>
> ha-idg-2:~/trace-untrace # date; crm resource trace vm-genetrap
> Fri Oct 14 19:05:51 CEST 2022
> INFO: Trace for vm-genetrap is written to /var/lib/heartbeat/trace_ra/
> INFO: Trace set, restart vm-genetrap to trace non-monitor operations
>
>
> ==
>
> 1st try:
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> Diff: --- 7.28974.3 2
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> Diff: +++ 7.28975.0 299af44e1c8a3867f9e7a4b25f2c3d6a
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  +
> /cib:  @epoch=28975, @num_updates=0
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++
> /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-monitor-30']:
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++
> /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-stop-0']:
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++
> /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-start-0']:
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++
> /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_from-0']:
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++
> /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_to-0']:
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [26001] ha-idg-1   crmd: info:
> abort_transition_graph:  Transition 791 aborted by
> instance_attributes.vm-genetrap-monitor-30-instance_attributes 'create':
> Configuration change | cib=7.28975.0 source=te_update_diff_v2:483
> path=/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-monitor-30']
> complete=true
> Oct 14 19:05:52 [26001] ha-idg-1   crmd:   notice:
> do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE |
> input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_process_request: Completed cib_apply_diff operation for section
> 'all': OK (rc=0, origin=ha-idg-2/cibadmin/2, version=7.28975.0)
> Oct 14 19:05:52 [25997] ha-idg-1 stonith-ng: info:
> update_cib_stonith_devices_v2:   Updating device list from the cib: create
> op[@id='vm-genetrap-monitor-30']
> Oct 14 19:05:52 [25997] ha-idg-1 stonith-ng: info:
> cib_devices_update:  Updating devices to version 7.28975.0
>

Re: [ClusterLabs] crm resource trace

2022-10-18 Thread Klaus Wenninger

On Mon, Oct 17, 2022 at 9:42 PM Ken Gaillot  wrote:

> This turned out to be interesting.
>
> In the first case, the resource history contains a start action and a
> recurring monitor. The parameters to both change, so the resource
> requires a restart.
>
> In the second case, the resource's history was apparently cleaned at
> some point, so the cluster re-probed it and found it running. That
> means its history contained only the probe and the recurring monitor.
> Neither probe nor recurring monitor changes require a restart, so
> nothing is done.
>
> It would probably make sense to distinguish between probes that found
> the resource running and probes that found it not running. Parameter
> changes in the former should probably be treated like start.
>

Which leaves the non trivial task to the RA to determine during a probe
if a resource is not just running or stopped but has been started with
exactly those parameters - right? May be easy for some RAs and
a real issue for others. Not talking of which RA has it implemented
like that already. Error code would be generic-error to trigger a stop-start
- right?
If I'm getting it right even without that in depth checking on probe it
would have worked in this case as the probe happens before parameter
change.

Klaus

>
> On Mon, 2022-10-17 at 12:43 +0200, Lentes, Bernd wrote:
> > Hi,
> >
> > i try to find out why there is sometimes a restart of the resource
> > and sometimes not.
> > Unpredictable behaviour is someting i expect from Windows, not from
> > Linux.
> > Here you see two "crm resource trace "resource"".
> > In the first case the resource is restarted , in the second not.
> > The command i used is identical in both cases.
> >
> > ha-idg-2:~/trace-untrace # date; crm resource trace vm-genetrap
> > Fri Oct 14 19:05:51 CEST 2022
> > INFO: Trace for vm-genetrap is written to
> > /var/lib/heartbeat/trace_ra/
> > INFO: Trace set, restart vm-genetrap to trace non-monitor operations
> >
> > =
> > =
> >
> > 1st try:
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  Diff: --- 7.28974.3 2
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  Diff: +++ 7.28975.0 299af44e1c8a3867f9e7a4b25f2c3d6a
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  +  /cib:  @epoch=28975, @num_updates=0
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> > genetrap']/operations/op[@id='vm-genetrap-monitor-
> > 30']:  
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> >   > name="trace_ra" value="1" id="vm-genetrap-monito
> > r-30-instance_attributes-trace_ra"/>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> > > ributes>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> > genetrap']/operations/op[@id='vm-genetrap-stop-
> > 0']:  
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> >   > name="trace_ra" value="1" id="vm-genetrap-stop-0-ins
> > tance_attributes-trace_ra"/>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> > > tes>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> > genetrap']/operations/op[@id='vm-genetrap-start-
> > 0']:  
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> >> name="trace_ra" value="1" id="vm-genetrap-start-0-i
> > nstance_attributes-trace_ra"/>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> >  > utes>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> > genetrap']/operations/op[@id='vm-genetrap-migrate_from-
> > 0']:  
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> >   > name="trace_ra" value="1" id="vm-genetrap-mi
> > grate_from-0-instance_attributes-trace_ra"/>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> > > _attributes>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> > genetrap']/operations/op[@id='vm-genetrap-migrate_to-
> > 0']:  
> > Oct 14 19:05:52

Re: [ClusterLabs] RFE: sdb clone

2022-09-27 Thread Klaus Wenninger

On Tue, Sep 20, 2022 at 3:59 PM Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:

> Hi!
>
> I have a proposal (request) for enhancing sbd:
> (I'm not suggesting a complete rewrite with reasonable options, as I had
> don that before already ;-))
> When configuring an additional disk device, it would be quite handy to be
> able to "clone" the configuration from an existing device.
>
what you're suggesting is instead of entering the parameters for creation
of a device you would like
to have the possibility to instead point to an existing configured device?
sounds like a good idea.

> As I understand it, sbd uses the physical block size for disk layout, so a
> simple dd won't do the job when old and new device use different physical
> block sizes, right?
>
yep - physical block size is queried and used for message-box entries - for
good reason

Klaus

> I think manually entering the parameters for the sbd header (the way it
> has to be done now) is quite error-prone, and it's easy to configure
> devices using different timing parameters.
>
> Regards,
> Ulrich
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] (no subject)

2022-09-07 Thread Klaus Wenninger

On Wed, Sep 7, 2022 at 12:28 PM Jehan-Guillaume de Rorthais via Users
 wrote:
>
> Hey,
>
> On Wed, 7 Sep 2022 19:12:53 +0900
> 권오성  wrote:
>
> > Hello.
> > I am a student who wants to implement a redundancy system with raspberry pi.
> > Last time, I posted about how to proceed with installation on raspberry pi
> > and received a lot of comments.
> > Among them, I searched a lot after looking at the comments saying that
> > fencing stonith should not be false.
> > (ex -> sudo pcs property set stonith-enabled=false)
> > However, I saw a lot of posts saying that there is no choice but to do
> > false because there is no ipmi in raspberry pi, and I wonder how we can
> > solve it in this situation.
>
> Fencing is not juste about IPMI:
> * you can use external smart devices to shut down your nodes (eg. PDU, UPS, a
>   self-made fencing device
>   (https://www.alteeve.com/w/Building_a_Node_Assassin_v1.1.4))
> * you can fence a node by disabling its access to the network from a
>   manageable switch, without shutting down the node

For a small device that can run from POE (external dongle or HAT needed I guess)
one might even think of using a manageable switch to power off/on the
whole device.
Don't know of a fence-agents implementing that but shouldn't be that hard to
do and would be an interesting project.

> * you can use 1/2/3 shared storage + the hardware RPi watchdog using the SBD
>   service
> * you can use the internal hardware RPi watchdog using the SBD service, 
> without
>   shared disk

There is a standard to control ports on USB-hubs.
Not all hubs do implement it of course but the one built into my Lenovo W541
(be careful if your laptop connects touch, keyboard, mouse via usb you might
lock yourself out ;-) ) does and I bought a hub from plugable that does as well.
Have a look at https://github.com/mvp/uhubctl for the cmdline-tool
(and a list of
actual devices supporting the standard) you could easily
call on an extra pi (connected to the host port of the switch) via ssh
(or with a little
more effort convert it to some daemon triggered via network) from a simple
fence-agent.

Klaus

>
> Regards,
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Cluster does not start resources

2022-08-25 Thread Klaus Wenninger

On Wed, Aug 24, 2022 at 6:29 PM Lentes, Bernd
 wrote:
>
>
> - On 24 Aug, 2022, at 16:26, kwenning kwenn...@redhat.com wrote:
>
> >>
> >> if I get Ulrich right - and my fading memory of when I really used crmsh 
> >> the
> >> last time is telling me the same thing ...
> >>
>
> I get the impression many people prefer pcs to crm. Is there any reason for 
> that ?
> And can i use pcs on Suse ? If yes, how ?

I guess both is possible - pcs on Suse and crmsh on Red Hat - unsupported
by the distribution of course in both cases.
But you'll probably be most happy using what comes with the distribution.

Apologies for a potential misunderstanding I might have triggered.
Me telling that I haven't been using crmsh for some time doesn't
have to do anything with quality & usability of any of crmsh or pcs.
It solely has to do with how my CV has developed recently ;-)

Klaus
>
> Bernd___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Cluster does not start resources

2022-08-24 Thread Klaus Wenninger

On Wed, Aug 24, 2022 at 4:24 PM Klaus Wenninger  wrote:
>
> On Wed, Aug 24, 2022 at 2:40 PM Lentes, Bernd
>  wrote:
> >
> >
> > - On 24 Aug, 2022, at 07:21, Reid Wahl nw...@redhat.com wrote:
> >
> >
> > > As a result, your command might start the virtual machines, but
> > > Pacemaker will still show that the resources are "Stopped (disabled)".
> > > To fix that, you'll need to enable the resources.
> >
> > How do i achieve that ?
>
> crm resource start ...
>
> if I get Ulrich right - and my fading memory of when I really used crmsh the
> last time is telling me the same thing ...
>

Guess the resources running now are those you tried to enable before
while they were globally stopped 

> Klaus
> >
> > Bernd___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Cluster does not start resources

2022-08-24 Thread Klaus Wenninger

On Wed, Aug 24, 2022 at 2:40 PM Lentes, Bernd
 wrote:
>
>
> - On 24 Aug, 2022, at 07:21, Reid Wahl nw...@redhat.com wrote:
>
>
> > As a result, your command might start the virtual machines, but
> > Pacemaker will still show that the resources are "Stopped (disabled)".
> > To fix that, you'll need to enable the resources.
>
> How do i achieve that ?

crm resource start ...

if I get Ulrich right - and my fading memory of when I really used crmsh the
last time is telling me the same thing ...

Klaus
>
> Bernd___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Start resource only if another resource is stopped

2022-08-19 Thread Klaus Wenninger

On Thu, Aug 18, 2022 at 8:26 PM Andrei Borzenkov  wrote:
>
> On 17.08.2022 16:58, Miro Igov wrote:
> > As you guessed i am using crm res stop nfs_export_1.
> > I tried the solution with attribute and it does not work correct.
> >
>
> It does what you asked for originally, but you are shifting the
> goalposts ...
>
> > When i stop nfs_export_1 it stops data_1 data_1_active, then it starts
> > data_2_failover - so far so good.
> >
> > When i start nfs_export_1 it starts data_1, starts data_1_active and then
> > stops data_2_failover as result of order data_1_active_after_data_1 and
> > location data_2_failover_if_data_1_inactive.
> >
> > But stopping data_2_failover unmounts the mount and end result is having no
> > NFS export mounted:
> >
>
> Nowhere before did you mention that you have two resources managing the
> same mount point.
>
> ...
> > Aug 17 15:24:52 intranet-test1 Filesystem(data_1)[16382]: INFO: Running
> > start for nas-sync-test1:/home/pharmya/NAS on
> > /data/synology/pharmya_office/NAS_Sync/NAS
> > Aug 17 15:24:52 intranet-test1 Filesystem(data_1)[16382]: INFO: Filesystem
> > /data/synology/pharmya_office/NAS_Sync/NAS is already mounted.
> ...
> > Aug 17 15:24:52 intranet-test1 Filesystem(data_2_failover)[16456]: INFO:
> > Trying to unmount /data/synology/pharmya_office/NAS_Sync/NAS
> > Aug 17 15:24:52 intranet-test1 systemd[1]:
> > data-synology-pharmya_office-NAS_Sync-NAS.mount: Succeeded.
>
> This configuration is wrong - period. Filesystem agent monitor action
> checks for mounted mountpoint, so pacemaker cannot determine which
> resource is started. You may get away with it because by default
> pacemaker does not run recurrent monitor for inactive resource, but any
> probe will give wrong results.
>
> It is almost always wrong to have multiple independent pacemaker
> resources managing the same underlying physical resource.
>
> It looks like you attempt to reimplement high available NFS server on
> client side. If you insist on this, I see as the only solution separate
> resource agent that monitors state of export/data resources and sets
> attribute accordingly. But effectively you will be duplicating pacemaker
> logic.

As Ulrich already pointed out before in this thread, that sounds
a bit as if the concept of promotable resources might be helpful
here - as to have at least part of the logic done by pacemaker.
But as Andrei is saying - you'll need a custom resource-agent here.
Maybe it could be done in a generic way so that the community
might adopt it in the end though. I'm at least not aware that
such a thing would be out there already but ...

Klaus

> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] node1 and node2 communication time question

2022-08-10 Thread Klaus Wenninger

On Wed, Aug 10, 2022 at 3:49 AM 권오성  wrote:
>
> Thank you for your reply.
> Then, can I think of it as being able to adjust the time by changing the 
> token in /etc/corosync/corosync.conf?

That would basically be the time after which a non responsive
node in a cluster  would be declared dead and drop out of
the cluster.
But be careful when you set that time too low as your node
might drop out of the cluster because of a hickup in the
network or because load might prevent corosync from
being scheduled.
Then think of what it really means that a node isn't reachable
via network. It doesn't necessarily mean it is totally dead and
doesn't interfere with anything anymore or might come back
a second later.
With this uncertainty recovering a service on another node
is risky.
And this is where fencing kicks in to assure that this potentially
dead node is dead for sure before you proceed recovering
services from it.

> And the site I searched and found was explaining to disable fencing.
> If so, could you introduce me to a site or blog that explains by activating 
> fencing?
> I am a college student studying about ha.
> I first learned about the concept of ha, and I don't know how to set it up or 
> what options to change.
> And I am using a translator because I am not good at English, but I do not 
> understand how to apply it by looking at the document in the cluster lab.

https://www.clusterlabs.org/pacemaker/doc/2.1/Clusters_from_Scratch/html/
should give you an introduction to all the important concepts and
run you through an example.
I don't know how well a translator does with that though.

Klaus

> Please check it out.
> Thank you.
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Q: About a false negative of storage_mon

2022-08-05 Thread Klaus Wenninger

On Fri, Aug 5, 2022 at 9:30 AM Kazunori INOUE
 wrote:
>
> On Tue, Aug 2, 2022 at 11:09 PM Ken Gaillot  wrote:
> >
> > On Tue, 2022-08-02 at 19:13 +0900, 井上和徳 wrote:
> > > Hi,
> > >
> > > Since O_DIRECT is not specified in open() [1], it reads the buffer
> > > cache and
> > > may result in a false negative. I fear that this possibility
> > > increases
> > > in environments with large buffer cache and running disk-reading
> > > applications
> > > such as database.
> > >
> > > So, I think it's better to specify O_RDONLY|O_DIRECT, but what about
> > > it?
> > > (in this case, lseek() processing is unnecessary.)
> > >
> > > # I am ready to create a patch that works with O_DIRECT. Also, I
> > > wouldn't mind
> > > # a "change to add a new mode of inspection with O_DIRECT
> > > # (add a option to storage_mon) while keeping the current inspection
> > > process".
> > >
> > > [1]
> > > https://github.com/ClusterLabs/resource-agents/blob/main/tools/storage_mon.c#L47-L90
> > >
> > > Best Regards,
> > > Kazunori INOUE
> >
> > I agree, it makes sense to use O_DIRECT when available. I don't think
> > an option is necessary.
> >
> > However, O_DIRECT is not available on all OSes, so the configure script
> > should detect support. Also, it is not supported by all filesystems, so
> > if the open fails, we should retry without O_DIRECT.
> > --
> > Ken Gaillot 
> >
>
> Thank you, everyone.
> I will create a patch using O_DIRECT.
> (I'm also interested in AIO, but I don't have a track record of using it,

Not saying you have to use it nor that it is a perfect reference implementation
but it is proven to work to a certain extent - and well - I know it ;-)

https://github.com/ClusterLabs/sbd/blob/main/src/sbd-md.c

Not sure as well if it is of much benefit for a usage pattern without
a persistent daemon.
And it is Linux proprietary. Unfortunately posix aio implementation seems
not to be using the kernel-api (yet).

What you could adopt from the sbd-code might be getting the block-size
from the device instead of assuming 512 bytes (what iirc the current code does).

Klaus

> so I'm going to see it off this time.)
>
> Kazunori INOUE
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Re: Q: About a false negative of storage_mon

2022-08-03 Thread Klaus Wenninger

On Wed, Aug 3, 2022 at 4:02 PM Ulrich Windl
 wrote:
>
> >>> Klaus Wenninger  schrieb am 03.08.2022 um 15:51 in
> Nachricht
> :
> > On Tue, Aug 2, 2022 at 4:10 PM Ken Gaillot  wrote:
> >>
> >> On Tue, 2022-08-02 at 19:13 +0900, 井上和徳 wrote:
> >> > Hi,
> >> >
> >> > Since O_DIRECT is not specified in open() [1], it reads the buffer
> >> > cache and
> >> > may result in a false negative. I fear that this possibility
> >> > increases
> >> > in environments with large buffer cache and running disk-reading
> >> > applications
> >> > such as database.
> >> >
> >> > So, I think it's better to specify O_RDONLY|O_DIRECT, but what about
> >> > it?
> >> > (in this case, lseek() processing is unnecessary.)
> >> >
> >> > # I am ready to create a patch that works with O_DIRECT. Also, I
> >> > wouldn't mind
> >> > # a "change to add a new mode of inspection with O_DIRECT
> >> > # (add a option to storage_mon) while keeping the current inspection
> >> > process".
> >> >
> >> > [1]
> >> >
> >
> https://github.com/ClusterLabs/resource-agents/blob/main/tools/storage_mon.c#
>
> > L47-L90
> >> >
> >> > Best Regards,
> >> > Kazunori INOUE
> >>
> >> I agree, it makes sense to use O_DIRECT when available. I don't think
> >> an option is necessary.
> >
> > Might as well be interesting to adjust block-size/alignment to the
> > device.
> > Another consideration could be to on top directly access the block-layer
> > using aio.
>
> Again AIO is POSIX; it depends on the implementation what it really does.

Wasn't speaking of the Linux POSIX AIO implementation in userspace
(guess that is still the case) but what is available as syscalls
(io_submit, io_setup, io_cancel, io_destroy, io_getevents) that is afaik
Linux proprietary and can't be wrapped into the Posix interface.

>
> > Both is being done in sbd (storage-based-death) and yes it as well
> > adds Linux specific stuff that might have to be conditional for other OSs.
> >
> > Klaus
> >
> >>
> >> However, O_DIRECT is not available on all OSes, so the configure script
> >> should detect support. Also, it is not supported by all filesystems, so
> >> if the open fails, we should retry without O_DIRECT.
> >> --
> >> Ken Gaillot 
> >>
> >> ___
> >> Manage your subscription:
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>
> >> ClusterLabs home: https://www.clusterlabs.org/
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Q: About a false negative of storage_mon

2022-08-03 Thread Klaus Wenninger

On Tue, Aug 2, 2022 at 4:10 PM Ken Gaillot  wrote:
>
> On Tue, 2022-08-02 at 19:13 +0900, 井上和徳 wrote:
> > Hi,
> >
> > Since O_DIRECT is not specified in open() [1], it reads the buffer
> > cache and
> > may result in a false negative. I fear that this possibility
> > increases
> > in environments with large buffer cache and running disk-reading
> > applications
> > such as database.
> >
> > So, I think it's better to specify O_RDONLY|O_DIRECT, but what about
> > it?
> > (in this case, lseek() processing is unnecessary.)
> >
> > # I am ready to create a patch that works with O_DIRECT. Also, I
> > wouldn't mind
> > # a "change to add a new mode of inspection with O_DIRECT
> > # (add a option to storage_mon) while keeping the current inspection
> > process".
> >
> > [1]
> > https://github.com/ClusterLabs/resource-agents/blob/main/tools/storage_mon.c#L47-L90
> >
> > Best Regards,
> > Kazunori INOUE
>
> I agree, it makes sense to use O_DIRECT when available. I don't think
> an option is necessary.

Might as well be interesting to adjust block-size/alignment to the
device.
Another consideration could be to on top directly access the block-layer
using aio.
Both is being done in sbd (storage-based-death) and yes it as well
adds Linux specific stuff that might have to be conditional for other OSs.

Klaus

>
> However, O_DIRECT is not available on all OSes, so the configure script
> should detect support. Also, it is not supported by all filesystems, so
> if the open fails, we should retry without O_DIRECT.
> --
> Ken Gaillot 
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] pacemaker-fenced[11637]: warning: Can't create a sane reply

2022-06-22 Thread Klaus Wenninger

On Wed, Jun 22, 2022 at 1:46 PM Priyanka Balotra
 wrote:
>
> Hi All,
>
> We are seeing an issue where we performed cluster shutdown followed by 
> cluster boot operation. All the nodes joined the cluster excet one (the first 
> node). Here are some pacemaker logs around that timestamp:
>
> 2022-06-19T07:02:08.690213+00:00 FILE-1 pacemaker-fenced[11637]:  notice: 
> Operation 'off' targeting FILE-1 on FILE-2 for 
> pacemaker-controld.11523@FILE-2.0b09e949: OK
>
> 2022-06-19T07:02:08.690604+00:00 FILE-1 pacemaker-fenced[11637]:  error: 
> stonith_construct_reply: Triggered assert at fenced_commands.c:2363 : request 
> != NULL
>
> 2022-06-19T07:02:08.690781+00:00 FILE-1 pacemaker-fenced[11637]:  warning: 
> Can't create a sane reply
>
> 2022-06-19T07:02:08.691872+00:00 FILE-1 pacemaker-controld[11643]:  crit: We 
> were allegedly just fenced by FILE-2 for FILE-2!
>
> 2022-06-19T07:02:08.693994+00:00 FILE-1 pacemakerd[11622]:  warning: Shutting 
> cluster down because pacemaker-controld[11643] had fatal failure
>
> 2022-06-19T07:02:08.694209+00:00 FILE-1 pacemakerd[11622]:  notice: Shutting 
> down Pacemaker
>
> 2022-06-19T07:02:08.694381+00:00 FILE-1 pacemakerd[11622]:  notice: Stopping 
> pacemaker-schedulerd
>
>
>
> Let us know if you need any more logs to find an rca to this.

A little bit more info about your configuration and the pacemaker-version (cib?)
used would definitely be helpful.

Klaus
>
> Thanks
> Priyanka
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: Why not retry a monitor (pacemaker‑execd) that got a segmentation fault?

2022-06-15 Thread Klaus Wenninger

On Wed, Jun 15, 2022 at 2:10 PM Ulrich Windl
 wrote:
>
> >>> Klaus Wenninger  schrieb am 15.06.2022 um 13:22 in
> Nachricht
> :
> > On Wed, Jun 15, 2022 at 10:33 AM Ulrich Windl
> >  wrote:
> >>
>
> ...
>
> >> (As said above it may be some RAM corruption where SMI (system management
> >> interrupts, or so) play a role, but Dell says the hardware is OK, and using
> >> SLES we don't have software support with Dell, so they won't even consider
> > that
> >> fact.)
> >
> > That happens inside of VMs right? I mean nodes being VMs.
>
> No, it happens on the hypervisor nodes that are part of the cluster.
>

What I described below as well froze the whole machine - till
it was taken down by the hardware-watchdog.

> > A couple of years back I had an issue running protected mode inside
> > of kvm-virtual machines on Lenovo laptops.
> > That was really an SMI issue (obviously issues when an SMI interrupt
> > was invoked during the CPU being in protected mode) that went away
> > disabling SMI interrupts.
> > I have no idea if that is still possible with current chipsets. And I'm not
> > telling you to do that in production but it might be interesting to narrow
> > the issue down still. One might run into thermal issues and such
> > SMI is taking care of on that hardware.
>
> Well, as I have no better idea, I'd probably even give "kick it hard with the 
> foot" a chance ;-)

Don't know if it is of much use but this is what I was using iirc
https://github.com/zultron/smictrl.
Jan back then wrote it for his laptop and mine showed the same behavior and
being close enough chipset-wise it did the trick on mine as well.

Obviously reading uefi-variables from the os as well triggers some SMI action.
So booting with a legacy bios - if possible - might be an interesting test-case.

>
> Regards,
> Ulrich
>
> >
> > Klaus
> >>
> >> But actually I start believing such a system is a good playground for any 
> >> HA
> >> solution ;-)
> >> Unfortunately here it's much more production than playground...
> >>
> >> Regards,
> >> Ulrich
> >>
> >>
> >> ___
> >> Manage your subscription:
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>
> >> ClusterLabs home: https://www.clusterlabs.org/
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Why not retry a monitor (pacemaker‑execd) that got a segmentation fault?

2022-06-15 Thread Klaus Wenninger

On Wed, Jun 15, 2022 at 10:33 AM Ulrich Windl
 wrote:
>
> >>> Klaus Wenninger  schrieb am 15.06.2022 um 10:00 in
> Nachricht
> :
> > On Wed, Jun 15, 2022 at 8:32 AM Ulrich Windl
> >  wrote:
> >>
> >> >>> Ulrich Windl schrieb am 14.06.2022 um 15:53 in Nachricht <62A892F0.174
> : 161
> > :
> >> 60728>:
> >>
> >> ...
> >> > Yes it's odd, but isn't the cluster just to protect us from odd
> situations?
> >> > ;‑)
> >>
> >> I have more odd stuff:
> >> Jun 14 20:40:09 rksaph18 pacemaker‑execd[7020]:  warning:
> > prm_lockspace_ocfs2_monitor_12 process (PID 30234) timed out
> >> ...
> >> Jun 14 20:40:14 h18 pacemaker‑execd[7020]:  crit:
> > prm_lockspace_ocfs2_monitor_12 process (PID 30234) will not die!
> >> ...
> >> Jun 14 20:40:53 h18 pacemaker‑controld[7026]:  warning: lrmd IPC request
> 525
> > failed: Connection timed out after 5000ms
> >> Jun 14 20:40:53 h18 pacemaker‑controld[7026]:  error: Couldn't perform
> > lrmd_rsc_cancel operation (timeout=0): ‑110: Connection timed out (110)
> >> ...
> >> Jun 14 20:42:23 h18 pacemaker‑controld[7026]:  error: Couldn't perform
> > lrmd_rsc_exec operation (timeout=9): ‑114: Connection timed out (110)
> >> Jun 14 20:42:23 h18 pacemaker‑controld[7026]:  error: Operation stop on
> > prm_lockspace_ocfs2 failed: ‑70
> >> ...
> >> Jun 14 20:42:23 h18 pacemaker‑controld[7026]:  warning: Input I_FAIL
> received
> > in state S_NOT_DC from do_lrm_rsc_op
> >> Jun 14 20:42:23 h18 pacemaker‑controld[7026]:  notice: State transition
> > S_NOT_DC ‑> S_RECOVERY
> >> Jun 14 20:42:23 h18 pacemaker‑controld[7026]:  warning: Fast‑tracking
> shutdown
> > in response to errors
> >> Jun 14 20:42:23 h18 pacemaker‑controld[7026]:  error: Input I_TERMINATE
> > received in state S_RECOVERY from do_recover
> >> Jun 14 20:42:28 h18 pacemaker‑controld[7026]:  warning: Sending IPC to lrmd
>
> > disabled until pending reply received
> >> Jun 14 20:42:28 h18 pacemaker‑controld[7026]:  error: Couldn't perform
> > lrmd_rsc_cancel operation (timeout=0): ‑114: Connection timed out (110)
> >> Jun 14 20:42:33 h18 pacemaker‑controld[7026]:  warning: Sending IPC to lrmd
>
> > disabled until pending reply received
> >> Jun 14 20:42:33 h18 pacemaker‑controld[7026]:  error: Couldn't perform
> > lrmd_rsc_cancel operation (timeout=0): ‑114: Connection timed out (110)
> >> Jun 14 20:42:33 h18 pacemaker‑controld[7026]:  notice: Stopped 2 recurring
>
> > operations at shutdown (0 remaining)
> >> Jun 14 20:42:33 h18 pacemaker‑controld[7026]:  error: 3 resources were
> active
> > at shutdown
> >> Jun 14 20:42:33 h18 pacemaker‑controld[7026]:  notice: Disconnected from
> the
> > executor
> >> Jun 14 20:42:33 h18 pacemaker‑controld[7026]:  notice: Disconnected from
> > Corosync
> >> Jun 14 20:42:33 h18 pacemaker‑controld[7026]:  notice: Disconnected from
> the
> > CIB manager
> >> Jun 14 20:42:33 h18 pacemaker‑controld[7026]:  error: Could not recover
> from
> > internal error
> >> Jun 14 20:42:33 h18 pacemakerd[7003]:  error: pacemaker‑controld[7026]
> exited
> > with status 1 (Error occurred)
> >> Jun 14 20:42:33 h18 pacemakerd[7003]:  notice: Stopping
> pacemaker‑schedulerd
> >> Jun 14 20:42:33 h18 pacemaker‑schedulerd[7024]:  notice: Caught
> 'Terminated'
> > signal
> >> Jun 14 20:42:33 h18 pacemakerd[7003]:  notice: Stopping pacemaker‑attrd
> >> Jun 14 20:42:33 h18 pacemaker‑attrd[7022]:  notice: Caught 'Terminated'
> > signal
> >> Jun 14 20:42:33 h18 pacemakerd[7003]:  notice: Stopping pacemaker‑execd
> >> Jun 14 20:42:34 h18 sbd[6856]:  warning: inquisitor_child: pcmk health
> > check: UNHEALTHY
> >> Jun 14 20:42:34 h18 sbd[6856]:  warning: inquisitor_child: Servant pcmk is
>
> > outdated (age: 41877)
> >> (SBD Fencing)
> >>
> >
> > Rolling it up from the back I guess the reaction to self‑fence in case
> > pacemaker
> > is telling it doesn't know ‑ and isn't able to find out ‑ about the
> > state of the resources
> > is basically correct.
> >
> > Seeing the issue with the fake‑age being printed ‑ possibly causing
> > confusion ‑ it reminds
> > me that this should be addressed. Thought we had already but obviously
> > a false memory.
>
> Hi Klaus and others!
>
> Well that is the current update state of SLES15 SP3; maybe upstream updates
> did not make it into SLES yet; I don't know.
&g

Re: [ClusterLabs] Antw: [EXT] Re: Why not retry a monitor (pacemaker‑execd) that got a segmentation fault?

2022-06-15 Thread Klaus Wenninger

On Wed, Jun 15, 2022 at 8:32 AM Ulrich Windl
 wrote:
>
> >>> Ulrich Windl schrieb am 14.06.2022 um 15:53 in Nachricht <62A892F0.174 : 
> >>> 161 :
> 60728>:
>
> ...
> > Yes it's odd, but isn't the cluster just to protect us from odd situations?
> > ;-)
>
> I have more odd stuff:
> Jun 14 20:40:09 rksaph18 pacemaker-execd[7020]:  warning: 
> prm_lockspace_ocfs2_monitor_12 process (PID 30234) timed out
> ...
> Jun 14 20:40:14 h18 pacemaker-execd[7020]:  crit: 
> prm_lockspace_ocfs2_monitor_12 process (PID 30234) will not die!
> ...
> Jun 14 20:40:53 h18 pacemaker-controld[7026]:  warning: lrmd IPC request 525 
> failed: Connection timed out after 5000ms
> Jun 14 20:40:53 h18 pacemaker-controld[7026]:  error: Couldn't perform 
> lrmd_rsc_cancel operation (timeout=0): -110: Connection timed out (110)
> ...
> Jun 14 20:42:23 h18 pacemaker-controld[7026]:  error: Couldn't perform 
> lrmd_rsc_exec operation (timeout=9): -114: Connection timed out (110)
> Jun 14 20:42:23 h18 pacemaker-controld[7026]:  error: Operation stop on 
> prm_lockspace_ocfs2 failed: -70
> ...
> Jun 14 20:42:23 h18 pacemaker-controld[7026]:  warning: Input I_FAIL received 
> in state S_NOT_DC from do_lrm_rsc_op
> Jun 14 20:42:23 h18 pacemaker-controld[7026]:  notice: State transition 
> S_NOT_DC -> S_RECOVERY
> Jun 14 20:42:23 h18 pacemaker-controld[7026]:  warning: Fast-tracking 
> shutdown in response to errors
> Jun 14 20:42:23 h18 pacemaker-controld[7026]:  error: Input I_TERMINATE 
> received in state S_RECOVERY from do_recover
> Jun 14 20:42:28 h18 pacemaker-controld[7026]:  warning: Sending IPC to lrmd 
> disabled until pending reply received
> Jun 14 20:42:28 h18 pacemaker-controld[7026]:  error: Couldn't perform 
> lrmd_rsc_cancel operation (timeout=0): -114: Connection timed out (110)
> Jun 14 20:42:33 h18 pacemaker-controld[7026]:  warning: Sending IPC to lrmd 
> disabled until pending reply received
> Jun 14 20:42:33 h18 pacemaker-controld[7026]:  error: Couldn't perform 
> lrmd_rsc_cancel operation (timeout=0): -114: Connection timed out (110)
> Jun 14 20:42:33 h18 pacemaker-controld[7026]:  notice: Stopped 2 recurring 
> operations at shutdown (0 remaining)
> Jun 14 20:42:33 h18 pacemaker-controld[7026]:  error: 3 resources were active 
> at shutdown
> Jun 14 20:42:33 h18 pacemaker-controld[7026]:  notice: Disconnected from the 
> executor
> Jun 14 20:42:33 h18 pacemaker-controld[7026]:  notice: Disconnected from 
> Corosync
> Jun 14 20:42:33 h18 pacemaker-controld[7026]:  notice: Disconnected from the 
> CIB manager
> Jun 14 20:42:33 h18 pacemaker-controld[7026]:  error: Could not recover from 
> internal error
> Jun 14 20:42:33 h18 pacemakerd[7003]:  error: pacemaker-controld[7026] exited 
> with status 1 (Error occurred)
> Jun 14 20:42:33 h18 pacemakerd[7003]:  notice: Stopping pacemaker-schedulerd
> Jun 14 20:42:33 h18 pacemaker-schedulerd[7024]:  notice: Caught 'Terminated' 
> signal
> Jun 14 20:42:33 h18 pacemakerd[7003]:  notice: Stopping pacemaker-attrd
> Jun 14 20:42:33 h18 pacemaker-attrd[7022]:  notice: Caught 'Terminated' signal
> Jun 14 20:42:33 h18 pacemakerd[7003]:  notice: Stopping pacemaker-execd
> Jun 14 20:42:34 h18 sbd[6856]:  warning: inquisitor_child: pcmk health check: 
> UNHEALTHY
> Jun 14 20:42:34 h18 sbd[6856]:  warning: inquisitor_child: Servant pcmk is 
> outdated (age: 41877)
> (SBD Fencing)
>

Rolling it up from the back I guess the reaction to self-fence in case pacemaker
is telling it doesn't know - and isn't able to find out - about the
state of the resources
is basically correct.

Seeing the issue with the fake-age being printed - possibly causing
confusion - it reminds
me that this should be addressed. Thought we had already but obviously
a false memory.

Would be interesting if pacemaker would recover the sub-processes
without sbd around
and other ways of fencing - that should kick in in a similar way -
would need a significant
time.
As pacemakerd recently started to ping the sub-daemons via ipc -
instead of just listening
for signals - it would be interesting if logs we are seeing are
already from that code.

That what is happening with the monitor-process kicked off by execd seems to hog
the ipc for a significant time might be an issue to look after.
Although the new implementation in pacemakerd might kick in and recover execd -
for what that is worth in the end.

This all seems to be kicked off by an RA that might not be robust enough or
the node is in a state that just doesn't allow a better answer.
Guess timeouts and retries required to give a timely answer about the state
of a resource should be taken care of inside the RA.
Guess the last 2 are at least something totally different than fork segfaulting
although that might as well be a sign that there is something really wrong
with the node.

Klaus

> Regards,
> Ulrich
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs

Re: [ClusterLabs] fencing configuration

2022-06-07 Thread Klaus Wenninger

On Tue, Jun 7, 2022 at 10:27 AM Zoran Bošnjak  wrote:
>
> Hi, I need some help with correct fencing configuration in 5-node cluster.
>
> The speciffic issue is that there are 3 rooms, where in addition to node 
> failure scenario, each room can fail too (for example in case of room power 
> failure or room network failure).
>
> room0: [ node0 ]
> roomA: [ node1, node2 ]
> roomB: [ node3, node4 ]
>
> - ipmi board is present on each node
> - watchdog timer is available
> - shared storage is not available
>
> Please advice, what would be a proper fencing configuration in this case.
>
> The intention is to configure ipmi fencing (using "fence_idrac" agent) plus 
> watchdog timer as a fallback. In other words, I would like to tell the 
> pacemaker: "If fencing is required, try to fence via ipmi. In case of ipmi 
> fence failure, after some timeout assume watchdog has rebooted the node, so 
> it is safe to proceed, as if the (self)fencing had succeeded)."
>
> From the documentation is not clear to me whether this would be:
> a) multiple fencing where ipmi would be first level and sbd would be a second 
> level fencing (where sbd always succeeds)
> b) or this is considered a single level fencing with a timeout

With b) falling back to watchdog-fencing wouldn't work properly
although I remember
some recent change that might make it fall back without issues.
I would try to go for a) as with a reasonably current
pacemaker-version (iirc 2.1.0 and above)
you should be able to make the watchdog-fencing-device visible as with
other fencing-devices
(just use fence_watchdog as the fence-agent - still implemented inside
pacemaker
fence-watchdog-binary actually just provides the meta-data).
Like this you can limit watchdog-fencing to certain-nodes that do
actually provide a proper
hardware-watchdog and you can add it to a topology.

Depending on your infra-structure an alternative solution to using
watchdog-fencing
for your case (where you can't access ipmis in a room with
power-outage) might be
fabric-fencing.

Klaus
>
> I have tried to followed option b) and create stonith resource for each node 
> and setup the stonith-watchdog-timeout, like this:
>
> ---
> # for each node... [0..4]
> export name=...
> export ip=...
> export password=...
> sudo pcs stonith create "fence_ipmi_$name" fence_idrac \
> lanplus=1 ip="$ip" \
> username="admin"  password="$password" \
> pcmk_host_list="$name" op monitor interval=10m timeout=10s
>
> sudo pcs property set stonith-watchdog-timeout=20
>
> # start dummy resource
> sudo pcs resource create dummy ocf:heartbeat:Dummy op monitor interval=30s
> ---
>
> I am not sure if additional location constraints have to be specified for 
> stonith resources. For example: I have noticed that pacemaker will start a 
> stonith resource on the same node as the fencing target. Is this OK?
>
> Should there be any location constraints regarding fencing and rooms?
>
> 'sbd' is running, properties are as follows:
>
> ---
> $ sudo pcs property show
> Cluster Properties:
>  cluster-infrastructure: corosync
>  cluster-name: debian
>  dc-version: 2.0.3-4b1f869f0f
>  have-watchdog: true
>  last-lrm-refresh: 1654583431
>  stonith-enabled: true
>  stonith-watchdog-timeout: 20
> ---
>
> Ipmi fencing (when the ipmi connection is alive) works correctly for each 
> node. The watchdog timer also seems to be working correctly. The problem is 
> that dummy resource is not restarted as expected.
>
> In the test scenario, the dummy resource is currently running on node1. I 
> have simulated node failure by unplugging the ipmi AND host network 
> interfaces from node1. The result was that node1 gets rebooted (by watchdog), 
> but the rest of the pacemaker cluster was unable to fence node1 (this is 
> expected, since node1's ipmi is not accessible). The problem is that dummy 
> resource remains stopped and node1 unclean. I was expecting that 
> stonith-watchdog-timeout kicks in, so that dummy resource gets restarted on 
> some other node which has quorum.
>
> Obviously there is something wrong with my configuration, since this seems to 
> be a reasonably simple scenario for the pacemaker. Appreciate your help.
>
> regards,
> Zoran
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Re: normal reboot with active sbd does not work

2022-06-07 Thread Klaus Wenninger

On Tue, Jun 7, 2022 at 7:53 AM Ulrich Windl
 wrote:
>
> >>> Andrei Borzenkov  schrieb am 03.06.2022 um 17:04 in
> Nachricht <99f7746a-c962-33bb-6737-f88ba0128...@gmail.com>:
> > On 03.06.2022 16:51, Zoran Bošnjak wrote:
> >> Thanks for all your answers. Sorry, my mistake. The ipmi_watchdog is indeed
>
> > OK. I was first experimenting with "softdog", which is blacklisted. So the
> > reasonable question is how to properly start "softdog" on ubuntu.
> >>
> >
> > blacklist prevents autoloading of modules by alias during hardware
> > detection. Neither softdog or ipmi_watchdog have any alias so they
> > cannot be autoloaded and blacklist is irrelevant here.
> >
> >> The reason to unload watchdog module (ipmi or softdog) is that there seems
>
> > to be a difference between normal reboot and watchdog reboot.
> >> In case of ipmi watchdog timer reboot:
> >> - the system hangs at the end of reboot cycle for some time
> >> - restart seems to be harder (like power off/on cycle), BIOS runs more
> > diagnostics at startup
>
> maybe kdump is enabled in that case?
>
> >> - it turns on HW diagnostic indication on the server front panel (dell
> > server) which stays on forever
> >> - it logs the event to IDRAC, which is unnecessary, because it was not a
> > hardware event, but just a normal reboot
>
> If the hardware watchdog times out and fires, it is consoidered to be an
> exceptional event that will be logged and reported.
>
> >>
> >> In case of "sudo reboot" command, I would like to skip this... so the idea
>
> > is to fully stop the watchdog just before reboot. I am not sure how to do
> > this properly.
> >>
> >> The "softdog" is better in this respect. It does not trigger nothing from
> > the list above, but I still get the message during reboot
> >> [ ... ] watchdog: watchdog0: watchdog did not stop!
> >> ... with some small timeout.
> >>
> >
> > The first obvious question - is there only one watchdog? Some watchdog
> > drivers *are* autoloaded.
> >
> > Is there only one user of watchdog? systemd may use it too as example.
>
> Don't mix timers with a watchdog: It makes little sense to habe multipe
> watchdogs enabled IMHO.

Yep that is an issue atm.

When you have multiple user of a hardware-watchdog like:
watchdog-daemon, sbd, corosync, systemd, ...

I'm not aware of an implementation that would provide multiple watchdog-timers
with the usual char-device-interface out of one physical.
Of course this should be relatively easy to implement - even in user-space.
On our embedded devices we usually had something like a service that
would offer multiple timers to other instances.
The implementation of that service itself was guarded by a hardware-watchdog
so that the derived timers would be as reliable as a hardware-watchdog.
Last implementation was built into watchdog-daemon and offered a dbus-interface.
What systemd has implemented is similarly interesting.
Current systemd-implementation has a suspicious loop around it that prevents
it from being fit for sbd-purposes as it doesn't guarantee a reboot within
a reasonably short time like this.
This is why I haven't yet implemented using the systemd-filedescriptor-approach
in sbd yet (as a configurable alternative to going for the device directly).
Approaching the systemd-guys and asking why it is implemented as it is has
been on my todo-list for a while now.

If you are running multiple-services on a host that don't offer something
like a common supervision main-loop it may make sense to offer a common
instance that offers something like a watchdog-service.
For a node that has all service under pacemaker-control this shouldn't be
needed as we have sbd observing pacemakerd. Pacemakerd in turn
observes the other pacemaker subdaemons (released with RHEL-8.6 and
iirc 2.1.3 upstream) guaranteeing that the monitors on the resources don't
get stuck.

Klaus
>
> >
> >> So after some additional testing, the situation is the following:
> >>
> >> - without any watchdog and without sbd package, the server reboots
> normally
> >> - with "softdog" module loaded, I only get "watchdog did not stop message"
>
> > at reboot
> >> - with "softdog" loaded, but unloaded with "ExecStop=...rmmod", reboot is
> > normal again
> >> - same as above, but with "sbd" package loaded, I am getting "watchdog did
>
> > not stop message" again
> >> - switching from "softdog" to "ipmi_watchdog" gets me to the original list
>
> > of problems
> >>
> >> It looks like the "sbd" is preventing the watchdog to close, so that
> > watchdog triggers always, even in the case of normal reboot. What am I
> > missing here?
>
> The watchdog may have a "no way out" parameter that prevents disabling it
> after enabled once.
>
> >
> > While the only way I can reproduce it on my QEMU VM is "reboot -f"
> > (without stopping all services), there is certainly a race condition in
> > sbd.service.
> >
> > ExecStop=@bindir@/kill -TERM $MAINPID
> >
> >
> > systemd will continue as soon as "kill" completes without waiting for
> > sbd

Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Klaus Wenninger

On Fri, Jun 3, 2022 at 3:51 PM Zoran Bošnjak  wrote:
>
> Thanks for all your answers. Sorry, my mistake. The ipmi_watchdog is indeed 
> OK. I was first experimenting with "softdog", which is blacklisted. So the 
> reasonable question is how to properly start "softdog" on ubuntu.
>
> The reason to unload watchdog module (ipmi or softdog) is that there seems to 
> be a difference between normal reboot and watchdog reboot.
> In case of ipmi watchdog timer reboot:
> - the system hangs at the end of reboot cycle for some time
> - restart seems to be harder (like power off/on cycle), BIOS runs more 
> diagnostics at startup
> - it turns on HW diagnostic indication on the server front panel (dell 
> server) which stays on forever
> - it logs the event to IDRAC, which is unnecessary, because it was not a 
> hardware event, but just a normal reboot
>
> In case of "sudo reboot" command, I would like to skip this... so the idea is 
> to fully stop the watchdog just before reboot. I am not sure how to do this 
> properly.
>
> The "softdog" is better in this respect. It does not trigger nothing from the 
> list above, but I still get the message during reboot
> [ ... ] watchdog: watchdog0: watchdog did not stop!
> ... with some small timeout.
>
> So after some additional testing, the situation is the following:
>
> - without any watchdog and without sbd package, the server reboots normally
> - with "softdog" module loaded, I only get "watchdog did not stop message" at 
> reboot
> - with "softdog" loaded, but unloaded with "ExecStop=...rmmod", reboot is 
> normal again
> - same as above, but with "sbd" package loaded, I am getting "watchdog did 
> not stop message" again
> - switching from "softdog" to "ipmi_watchdog" gets me to the original list of 
> problems
>
> It looks like the "sbd" is preventing the watchdog to close, so that watchdog 
> triggers always, even in the case of normal reboot. What am I missing here?

sbd has the watchdog-device open and thus is preventing unloading the module.
Without giving any instructions in your unit-file systemd will try to
stop the unit immediately and thus fail.
Have you tried

[Unit]
Before=sbd.service

[Install]
RequiredBy=sbd.service

I would have expected that rebooting with the device disabled again
after sbd shuts down
should behave similarly as with the module being unloaded.
You could check for something like 'nowayout' with the kernel module that would
prevent disabling the watchdog once opened.

Klaus
>
> Zoran
>
> - Original Message -
> From: "Andrei Borzenkov" 
> To: "users" 
> Sent: Friday, June 3, 2022 11:24:03 AM
> Subject: Re: [ClusterLabs] normal reboot with active sbd does not work
>
> On 03.06.2022 11:18, Zoran Bošnjak wrote:
> > Hi all,
> > I would appreciate an advice about sbd fencing (without shared storage).
> >
> > I am using ubuntu 20.04., with default packages from the repository 
> > (pacemaker, corosync, fence-agents, ipmitool, pcs...).
> >
> > HW watchdog is present on servers. The first problem was to load/unload the 
> > watchdog module. For some reason the module is blacklisted on ubuntu,
>
> What makes you think so?
>
> bor@bor-Latitude-E5450:~$ lsb_release  -d
>
> Description:Ubuntu 20.04.4 LTS
>
> bor@bor-Latitude-E5450:~$ modprobe -c | grep ipmi_watchdog
>
> bor@bor-Latitude-E5450:~$
>
>
>
>
>
> > so I've created a service for this purpose.
> >
>
> man modules-load.d
>
>
> > --- file: /etc/systemd/system/watchdog.service
> > [Unit]
> > Description=Load watchdog timer module
> > After=syslog.target
> >
>
> Without any explicit dependencies stop will be attempted as soon as
> possible.
>
> > [Service]
> > Type=oneshot
> > RemainAfterExit=yes
> > ExecStart=/sbin/modprobe ipmi_watchdog
> > ExecStop=/sbin/rmmod ipmi_watchdog
> >
>
> Why on earth do you need to unload kernel driver when system reboots?
>
> > [Install]
> > WantedBy=multi-user.target
> > ---
> >
> > Is this a proper way to load watchdog module under ubuntu?
> >
>
> There is standard way to load non-autoloaded drivers on *any* systemd
> based distribution. Which is modules-load.d.
>
> > Anyway, once the module is loaded, the /dev/watchdog (which is required by 
> > 'sbd') is present.
> > Next, the 'sbd' is installed by
> >
> > sudo apt install sbd
> > (followed by one reboot to get the sbd active)
> >
> > The configuration of the 'sbd' is default. The sbd reacts to network 
> > failure as expected (reboots the server). However, when the 'sbd' is 
> > active, the server won't reboot normally any more. For example from the 
> > command line "sudo reboot", it gets stuck at the end of the reboot 
> > sequence. There is a message on the console:
> >
> > ... reboot progress
> > [ OK ] Finished Reboot.
> > [ OK ] Reached target Reboot.
> > [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> > [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> > ... it gets stuck at this point
> >
> > After some long timeout, it looks like the watchdog timer expires

Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Klaus Wenninger

On Fri, Jun 3, 2022 at 11:03 AM Klaus Wenninger  wrote:
>
> On Fri, Jun 3, 2022 at 10:19 AM Zoran Bošnjak  wrote:
> >
> > Hi all,
> > I would appreciate an advice about sbd fencing (without shared storage).
> >
> > I am using ubuntu 20.04., with default packages from the repository 
> > (pacemaker, corosync, fence-agents, ipmitool, pcs...).
> >
> > HW watchdog is present on servers. The first problem was to load/unload the 
> > watchdog module. For some reason the module is blacklisted on ubuntu, so 
> > I've created a service for this purpose.
> >
> > --- file: /etc/systemd/system/watchdog.service
> > [Unit]
> > Description=Load watchdog timer module
> > After=syslog.target
> >
> > [Service]
> > Type=oneshot
> > RemainAfterExit=yes
> > ExecStart=/sbin/modprobe ipmi_watchdog
> > ExecStop=/sbin/rmmod ipmi_watchdog
> >
> > [Install]
> > WantedBy=multi-user.target
> > ---
> >
> > Is this a proper way to load watchdog module under ubuntu?
> >
> > Anyway, once the module is loaded, the /dev/watchdog (which is required by 
> > 'sbd') is present.
> > Next, the 'sbd' is installed by
> >
> > sudo apt install sbd
> > (followed by one reboot to get the sbd active)
> >
> > The configuration of the 'sbd' is default. The sbd reacts to network 
> > failure as expected (reboots the server). However, when the 'sbd' is 
> > active, the server won't reboot normally any more. For example from the 
> > command line "sudo reboot", it gets stuck at the end of the reboot 
> > sequence. There is a message on the console:
> >
> > ... reboot progress
> > [ OK ] Finished Reboot.
> > [ OK ] Reached target Reboot.
> > [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> > [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> > ... it gets stuck at this point
> >
> > After some long timeout, it looks like the watchdog timer expires and 
> > server boots, but the failure indication remains on the front panel of the 
> > server. If I uninstall the 'sbd' package, the "sudo reboot" works normally 
> > again.
> >
> > My question is: How do I configure the system, to have the 'sbd' function 
> > present, but still be able to reboot the system normally.
>
> Loading modules - depending on distribution an version - should probably 
> rather
> be done editing /etc/modules or putting some files under /etc/modprobe-d/.
Of course that would require removing the driver from blacklist.
Any reason why you didn't consider that?
> Guess in your case stopping the unit won't work as the watchdog-device is
> still opened by sbd. In general I don't see why the watchdog-module should
> be unloaded upon shutdown. So as a first try you just might remove that part.
>
> Klaus
>
> >
> > regards,
> > Zoran
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Klaus Wenninger

On Fri, Jun 3, 2022 at 10:19 AM Zoran Bošnjak  wrote:
>
> Hi all,
> I would appreciate an advice about sbd fencing (without shared storage).
>
> I am using ubuntu 20.04., with default packages from the repository 
> (pacemaker, corosync, fence-agents, ipmitool, pcs...).
>
> HW watchdog is present on servers. The first problem was to load/unload the 
> watchdog module. For some reason the module is blacklisted on ubuntu, so I've 
> created a service for this purpose.
>
> --- file: /etc/systemd/system/watchdog.service
> [Unit]
> Description=Load watchdog timer module
> After=syslog.target
>
> [Service]
> Type=oneshot
> RemainAfterExit=yes
> ExecStart=/sbin/modprobe ipmi_watchdog
> ExecStop=/sbin/rmmod ipmi_watchdog
>
> [Install]
> WantedBy=multi-user.target
> ---
>
> Is this a proper way to load watchdog module under ubuntu?
>
> Anyway, once the module is loaded, the /dev/watchdog (which is required by 
> 'sbd') is present.
> Next, the 'sbd' is installed by
>
> sudo apt install sbd
> (followed by one reboot to get the sbd active)
>
> The configuration of the 'sbd' is default. The sbd reacts to network failure 
> as expected (reboots the server). However, when the 'sbd' is active, the 
> server won't reboot normally any more. For example from the command line 
> "sudo reboot", it gets stuck at the end of the reboot sequence. There is a 
> message on the console:
>
> ... reboot progress
> [ OK ] Finished Reboot.
> [ OK ] Reached target Reboot.
> [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> ... it gets stuck at this point
>
> After some long timeout, it looks like the watchdog timer expires and server 
> boots, but the failure indication remains on the front panel of the server. 
> If I uninstall the 'sbd' package, the "sudo reboot" works normally again.
>
> My question is: How do I configure the system, to have the 'sbd' function 
> present, but still be able to reboot the system normally.

Loading modules - depending on distribution an version - should probably rather
be done editing /etc/modules or putting some files under /etc/modprobe-d/.
Guess in your case stopping the unit won't work as the watchdog-device is
still opened by sbd. In general I don't see why the watchdog-module should
be unloaded upon shutdown. So as a first try you just might remove that part.

Klaus

>
> regards,
> Zoran
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Re: Cluster unable to find back together

2022-05-23 Thread Klaus Wenninger

On Fri, May 20, 2022 at 7:43 AM Ulrich Windl
 wrote:
>
> >>> Jan Friesse  schrieb am 19.05.2022 um 14:55 in
> Nachricht
> <1abb8468-6619-329f-cb01-3f51112db...@redhat.com>:
> > Hi,
> >
> > On 19/05/2022 10:16, Leditzky, Fabian via Users wrote:
> >> Hello
> >>
> >> We have been dealing with our pacemaker/corosync clusters becoming
> unstable.
> >> The OS is Debian 10 and we use Debian packages for pacemaker and corosync,
> >> version 3.0.1‑5+deb10u1 and 3.0.1‑2+deb10u1 respectively.
> >
> > Seems like pcmk version is not so important for behavior you've
> > described. Corosync 3.0.1 is super old, are you able to reproduce the
>
> I'm running corosync-2.4.5-12.7.1.x86_64 (SLES15 SP3) here ;-)
>
> Are you mixing "super old" with "super buggy"?

Actually 3.0.1 is older than 2.4.5 and on top 2.4.5 is the head of a mature
branch while 3.0.1 is the beginning of a new branch that brought
substantial changes.

Klaus
>
> Regards,
> Ulrich
>
> > behavior with 3.1.6? What is the version of knet? There were quite a few
> > fixes so last one (1.23) is really recommended.
> >
> > You can try to compile yourself, or use proxmox repo
> > (http://download.proxmox.com/debian/pve/) which contains newer version
> > of packages.
> >
> >> We use knet over UDP transport.
> >>
> >> We run multiple 2‑node and 4‑8 node clusters, primarily managing VIP
> > resources.
> >> The issue we experience presents itself as a spontaneous disagreement of
> >> the status of cluster members. In two node clusters, each node
> spontaneously
> >> sees the other node as offline, despite network connectivity being OK.
> >> In larger clusters, the status can be inconsistent across the nodes.
> >> E.g.: node1 sees 2,4 as offline, node 2 sees 1,4 as offline while node 3
> and
> > 4 see every node as online.
> >
> > This really shouldn't happen.
> >
> >> The cluster becomes generally unresponsive to resource actions in this
> > state.
> >
> > Expected
> >
> >> Thus far we have been unable to restore cluster health without restarting
> > corosync.
> >>
> >> We are running packet captures 24/7 on the clusters and have custom
> tooling
> >> to detect lost UDP packets on knet ports. So far we could not see
> > significant
> >> packet loss trigger an event, at most we have seen a single UDP packet
> > dropped
> >> some seconds before the cluster fails.
> >>
> >> However, even if the root cause is indeed a flaky network, we do not
> > understand
> >> why the cluster cannot recover on its own in any way. The issues definitely
>
> > persist
> >> beyond the presence of any intermittent network problem.
> >
> > Try newer version. If problem persist, it's good idea to monitor if
> > packets are really passed thru. Corosync always (at least) creates
> > single node membership.
> >
> > Regards,
> >Honza
> >
> >>
> >> We were able to artificially break clusters by inducing packet loss with an
>
> > iptables rule.
> >> Dropping packets on a single node of an 8‑node cluster can cause
> malfunctions
> > on
> >> multiple other cluster nodes. The expected behavior would be detecting that
>
> > the
> >> artificially broken node failed but keeping the rest of the cluster
> stable.
> >> We were able to reproduce this also on Debian 11 with more recent
> > corosync/pacemaker
> >> versions.
> >>
> >> Our configuration basic, we do not significantly deviate from the
> defaults.
> >>
> >> We will be very grateful for any insights into this problem.
> >>
> >> Thanks,
> >> Fabian
> >>
> >> // corosync.conf
> >> totem {
> >>  version: 2
> >>  cluster_name: cluster01
> >>  crypto_cipher: aes256
> >>  crypto_hash: sha512
> >>  transport: knet
> >> }
> >> logging {
> >>  fileline: off
> >>  to_stderr: no
> >>  to_logfile: no
> >>  to_syslog: yes
> >>  debug: off
> >>  timestamp: on
> >>  logger_subsys {
> >>  subsys: QUORUM
> >>  debug: off
> >>  }
> >> }
> >> quorum {
> >>  provider: corosync_votequorum
> >>  two_node: 1
> >>  expected_votes: 2
> >> }
> >> nodelist {
> >>  node {
> >>  name: node01
> >>  nodeid: 01
> >>  ring0_addr: 10.0.0.10
> >>  }
> >>  node {
> >>  name: node02
> >>  nodeid: 02
> >>  ring0_addr: 10.0.0.11
> >>  }
> >> }
> >>
> >> // crm config show
> >> node 1: node01 \
> >>  attributes standby=off
> >> node 2: node02 \
> >>  attributes standby=off maintenance=off
> >> primitive IP‑clusterC1 IPaddr2 \
> >>  params ip=10.0.0.20 nic=eth0 cidr_netmask=24 \
> >>  meta migration‑threshold=2 target‑role=Started is‑managed=true \
> >>  op monitor interval=20 timeout=60 on‑fail=restart
> >> primitive IP‑clusterC2 IPaddr2 \
> >>  params ip=10.0.0.21 nic=eth0 cidr_netmask=24 \
> >>  meta migration‑threshold=2 target‑role=Started is‑managed=true \
> >>  op monitor interval=20 timeout=60 on‑fail=restart
> >> location STICKY‑IP‑clusterC1 IP‑clusterC1 100: node01
> >> location STICKY‑IP‑clusterC2

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: How to clean up a failed fencing operation?

2022-05-13 Thread Klaus Wenninger

On Fri, May 13, 2022 at 12:12 PM Ulrich Windl
 wrote:
>
> >>> Klaus Wenninger  schrieb am 13.05.2022 um 08:22 in
> Nachricht
> :
> > On Tue, May 3, 2022 at 11:53 AM Ulrich Windl
> >  wrote:
> >>
> >> >>> Reid Wahl  schrieb am 03.05.2022 um 10:16 in
> Nachricht
> >> :
> >> > On Tue, May 3, 2022 at 12:36 AM Ulrich Windl
> >> >  wrote:
> >> >>
> >> >> Hi!
> >> >>
> >> >> I'm familiar with cleaning up various failed resource actions via
> >> > "crm_resource ‑C ‑r resource_name ‑N node_name ‑n operation".
> >> >> However I wonder wha tthe correct paraneters for a failed fencing
> operation
> >>
> >> > (that lingers around) are.
> >> >
> >> > stonith_admin ‑‑history '*' ‑‑cleanup
> >>
> >> Ah, a completely different command! Interestingly this does not produce
> any
> >> logs in syslog (no DC action).
> >
> > Fencing history is totally independent from failure history for
> > resources that is
> > recorded in the cib. That is part of the strategy to have the fencing
> > framework
> > operate kind of independently from DC, scheduler and stuff.
> > It lives purely within the framework built by the fenced instances and
> > broadcasted
> > between those instances to keep it current - or purged if requested.
> > DC role thus isn't relevant for working with the fencing history.
> > Of course operations on the fencing history can create logs but they may be
> > below the usually enabled level. Nothing there should influence the
> behavior
> > of the cluster (you can't purge pending actions).
>
> It's probably not formally defined, but I always felt that any state changes
> should be logged with severity "info" at least; actually I prefer "notice"
> (using "info" for "interesting" events).

Fencing history is just a memory-based distributed log.

>
> Regards,
> Ulrich
>
>
> >
> > Klaus
> >>
> >> Regards,
> >> Ulrich
> >>
> >>
> >> >
> >> >>
> >> >> crm_mon found:
> >> >> Failed Fencing Actions:
> >> >>   * reboot of h18 failed: delegate=h16,
> >> client=stonith_admin.controld.22336,
> >> > origin=h18, last‑failed='2022‑04‑27 02:22:52 +02:00' (a later attempt
> >> succeeded)
> >> >>
> >> >> Regards,
> >> >> Ulrich
> >> >>
> >> >>
> >> >>
> >> >> ___
> >> >> Manage your subscription:
> >> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >> >>
> >> >> ClusterLabs home: https://www.clusterlabs.org/
> >> >>
> >> >
> >> >
> >> > ‑‑
> >> > Regards,
> >> >
> >> > Reid Wahl (He/Him), RHCA
> >> > Senior Software Maintenance Engineer, Red Hat
> >> > CEE ‑ Platform Support Delivery ‑ ClusterHA
> >> >
> >> > ___
> >> > Manage your subscription:
> >> > https://lists.clusterlabs.org/mailman/listinfo/users
> >> >
> >> > ClusterLabs home: https://www.clusterlabs.org/
> >>
> >>
> >>
> >> ___
> >> Manage your subscription:
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>
> >> ClusterLabs home: https://www.clusterlabs.org/
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Re: Q: How to clean up a failed fencing operation?

2022-05-13 Thread Klaus Wenninger

On Tue, May 3, 2022 at 11:53 AM Ulrich Windl
 wrote:
>
> >>> Reid Wahl  schrieb am 03.05.2022 um 10:16 in Nachricht
> :
> > On Tue, May 3, 2022 at 12:36 AM Ulrich Windl
> >  wrote:
> >>
> >> Hi!
> >>
> >> I'm familiar with cleaning up various failed resource actions via
> > "crm_resource ‑C ‑r resource_name ‑N node_name ‑n operation".
> >> However I wonder wha tthe correct paraneters for a failed fencing operation
>
> > (that lingers around) are.
> >
> > stonith_admin ‑‑history '*' ‑‑cleanup
>
> Ah, a completely different command! Interestingly this does not produce any
> logs in syslog (no DC action).

Fencing history is totally independent from failure history for
resources that is
recorded in the cib. That is part of the strategy to have the fencing framework
operate kind of independently from DC, scheduler and stuff.
It lives purely within the framework built by the fenced instances and
broadcasted
between those instances to keep it current - or purged if requested.
DC role thus isn't relevant for working with the fencing history.
Of course operations on the fencing history can create logs but they may be
below the usually enabled level. Nothing there should influence the behavior
of the cluster (you can't purge pending actions).

Klaus
>
> Regards,
> Ulrich
>
>
> >
> >>
> >> crm_mon found:
> >> Failed Fencing Actions:
> >>   * reboot of h18 failed: delegate=h16,
> client=stonith_admin.controld.22336,
> > origin=h18, last‑failed='2022‑04‑27 02:22:52 +02:00' (a later attempt
> succeeded)
> >>
> >> Regards,
> >> Ulrich
> >>
> >>
> >>
> >> ___
> >> Manage your subscription:
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>
> >> ClusterLabs home: https://www.clusterlabs.org/
> >>
> >
> >
> > ‑‑
> > Regards,
> >
> > Reid Wahl (He/Him), RHCA
> > Senior Software Maintenance Engineer, Red Hat
> > CEE ‑ Platform Support Delivery ‑ ClusterHA
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Can a two node cluster start resources if only one node is booted?

2022-04-22 Thread Klaus Wenninger

On Thu, Apr 21, 2022 at 8:18 PM john tillman  wrote:
>
> > On 21.04.2022 18:26, john tillman wrote:
> >>> Dne 20. 04. 22 v 20:21 john tillman napsal(a):
> > On 20.04.2022 19:53, john tillman wrote:
> >> I have a two node cluster that won't start any resources if only one
> >> node
> >> is booted; the pacemaker service does not start.
> >>
> >> Once the second node boots up, the first node will start pacemaker
> >> and
> >> the
> >> resources are started.  All is well.  But I would like the resources
> >> to
> >> start when the first node boots by itself.
> >>
> >> I thought the problem was with the wait_for_all option but I have it
> >> set
> >> to "0".
> >>
> >> On the node that is booted by itself, when I run
> >> "corosync-quorumtool"
> >> I
> >> see:
> >>
> >> [root@test00 ~]# corosync-quorumtool
> >> Quorum information
> >> --
> >> Date: Wed Apr 20 16:05:07 2022
> >> Quorum provider:  corosync_votequorum
> >> Nodes:1
> >> Node ID:  1
> >> Ring ID:  1.2f
> >> Quorate:  Yes
> >>
> >> Votequorum information
> >> --
> >> Expected votes:   2
> >> Highest expected: 2
> >> Total votes:  1
> >> Quorum:   1
> >> Flags:2Node Quorate
> >>
> >> Membership information
> >> --
> >> Nodeid  Votes Name
> >>  1  1 test00 (local)
> >>
> >>
> >> My config file look like this:
> >> totem {
> >> version: 2
> >> cluster_name: testha
> >> transport: knet
> >> crypto_cipher: aes256
> >> crypto_hash: sha256
> >> }
> >>
> >> nodelist {
> >> node {
> >> ring0_addr: test00
> >> name: test00
> >> nodeid: 1
> >> }
> >>
> >> node {
> >> ring0_addr: test01
> >> name: test01
> >> nodeid: 2
> >> }
> >> }
> >>
> >> quorum {
> >> provider: corosync_votequorum
> >> two_node: 1
> >> wait_for_all: 0
> >> }
> >>
> >> logging {
> >> to_logfile: yes
> >> logfile: /var/log/cluster/corosync.log
> >> to_syslog: yes
> >> timestamp: on
> >> debug: on
> >> syslog_priority: debug
> >> logfile_priority: debug
> >> }
> >>
> >> Fencing is disabled.
> >>
> >
> > That won't work.
> >
> >> I've also looked in "corosync.log" but I don't know what to look for
> >> to
> >> diagnose this issue.  I mean there are many lines similar to:
> >> [QUORUM] This node is within the primary component and will provide
> >> service.
> >> and
> >> [VOTEQ ] Sending quorum callback, quorate = 1
> >> and
> >> [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: Yes
> >> Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins:
> >> No
> >>
> >> Is there something specific I should look for in the log?
> >>
> >> So can a two node cluster work after booting only one node?  Maybe
> >> it
> >> never will and I am wasting a lot of time, yours and mine.
> >>
> >> If it can, what else can I investigate further?
> >>
> >
> > Before node can start handling resources it needs to know status of
> > other node. Without successful fencing there is no way to accomplish
> > it.
> >
> > Yes, you can tell pacemaker to ignore unknown status. Depending on
> > your
> > resources this could simply prevent normal work or lead to data
> > corruption.
> 
> 
>  Makes sense.  Thank you.
> 
>  Perhaps some future enhancement could allow for this situation?  I
>  mean,
>  It might be desirable for some cases to allow for a single node to
>  boot,
>  determine quorum by two_node=1 and wait_for_all=0, and start resources
>  without ever seeing the other node.  Sure, there are dangers of split
>  brain but I can see special cases where I want the node to work alone
>  for
>  a period of time despite the danger.
> 
> >>>
> >>> Hi John,
> >>>
> >>> How about 'pcs quorum unblock'?
> >>>
> >>> Regards,
> >>> Tomas
> >>>
> >>
> >>
> >> Tomas,
> >>
> >> Thank you for the suggestion.  However it didn't work.  It returned:
> >> Error: unable to check quorum status
> >>   crm_mon: Error: cluster is not available on this node
> >> I checked pacemaker, just in case, and it still isn't running.
> >>
> >
> > Either pacemaker or some service it depends upon attempted to start and
> > failed or systemd

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in 2.1.3: node health monitoring improvements

2022-04-14 Thread Klaus Wenninger

On Thu, Apr 14, 2022 at 7:57 AM Ulrich Windl
 wrote:
>
> Ken,
>
> thanks for thje explanations! Maybe it would be best (next time) if you
> present the documentation for a new feature first (as a base for discussion),
> and _then_ implement it.
> I know: People first implement it, and later, if they have time or feel like
> it, they'll document.
> However, as I found out for myself, sometimes documentation is really useful
> when you review your code some time later and wonder: "What should have been
> the purpose of all that?" ;-)
Guess what has proven to work quite well - as well efficiency wise - is an
iterative approach ;-)
Like have a rough idea - implement something incl. first documentation -
play with it - discuss it - improve documentation/implementation
through feedback ...

Klaus
>
> Regards,
> Ulrich
>
> >>> Ken Gaillot  schrieb am 13.04.2022 um 15:59 in
> Nachricht
> <3ad20a26a4623d2e7ff11eb0bdf822faae1a5114.ca...@redhat.com>:
> > On Wed, 2022-04-13 at 08:22 +0200, Ulrich Windl wrote:
> >> > > > Ken Gaillot  schrieb am 12.04.2022 um
> >> > > > 17:22 in
> >> Nachricht
> >> <33f4147d0f6a3e46581aaa46a4eca81dfa59ce15.ca...@redhat.com>:
> >> > Hi all,
> >> >
> >> > I'm hoping to have the first release candidate for 2.1.3 ready next
> >> > week.
> >> >
> >> > Pacemaker has long had a feature to monitor node health (CPU usage,
> >> > SMART drive errors, etc.) and move resources off degraded nodes:
> >> >
> >> >
> > https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/ind
>
> >> > ex.html#tracking‑node‑health
> >>
> >> Great, I wanted to ask a question on it anyway:
> >> Is the node health attribute stored in the CIB, or is it transient
> >> (i.e.:
> >> reset when the node is restarted)?
> >
> > They can be either, although transient makes more sense. As long as the
> > name starts with "#health" it will be treated as a health attribute.
> >
> >>
> >> Some comments on the docs:
> >>
> >> "yellow" state: could also mean node is becoming healthy (coming from
> >> red),
> >> right?
> >
> > True, I'll make a note to update that
> >
> >>
> >> The "Node Health Strategy" could benefit from  better explanation.
> >> E.g.: "Assign the value of ..." Assign to whom/what?
> >
> > The wording could definitely be improved.
> >
> > In this case, the idea is that "red", "yellow", and "green" are just
> > convenient names for particular integer scores. The actual values used
> > depend on the strategy, hence "assign ... to red" and so forth.
> >
> >> It's very hard to find out what "progressive" really does.
> >>
> >> I think an configuration example with a sample scenario (node health
> >> changes)
> >> would be very helpful.
> >
> > Yes progressive and custom are confusing without examples. I'll add it
> > to the to-do list ...
> >
> > The idea behind progressive is that you might want to give a negative
> > but not infinite preference to yellow and/or red. With the other
> > strategies, any red attribute will cause all resources to move off.
> > With progressive, you could set red to some number (say -100) and that
> > score would be used just as if you had configured a location constraint
> > with that score. If you had stickiness higher than that, that would
> > keep any existing resources running there, but prevent any new
> > resources from being moved to the node.
> >
> >>
> >> > The 2.1.3 release will add a couple of features to make this more
> >> > useful.
> >> >
> >> > First, you can now exempt particular resources from health‑related
> >> > bans, using the new "allow‑unhealthy‑nodes" resource
> >> > meta‑attribute.
> >>
> >> If that's  a resource attribute, then the name is poorly chosen
> >> (IMHO).
> >> In times like these I'd almost suggest to name it
> >> "immune-against-node-health=red" or so (OK, just a joke).
> >
> > I always agonize over the names :)
> >
> > What I really wanted was to use the existing "requires" meta-attribute.
> > It currently can be set to nothing, quorum, fencing, or unfencing, to
> > determine what conditions have to be in place for the resource to run
> > (the default of fencing means that the cluster partition must have
> > quorum and any unclean nodes must have been successfully fenced).
> >
> > It would have been nice to have requires="fencing,health" mean that the
> > resource can only run on a healthy node (as defined by the configured
> > strategy). Unfortunately that would not have been backward compatible
> > with existing explicit configurations.
> >
> >>
> >>
> >> > This is particularly helpful for the health monitoring agents
> >> > themselves. Without the new option, health agents get moved off
> >>
> >> Specifically if the health state can improve again.
> >>
> >> > degraded nodes, which means the cluster can't detect if the
> >> > degraded
> >> > condition goes away. Users had to manually clear the health
> >> > attributes
> >> > to allow resources to move back to the node. Now, you can set
> >> > allow‑
> >> > unhealthy‑nodes=true on your

Re: [ClusterLabs] Resources too_active (active on all nodes of the cluster, instead of only 1 node)

2022-03-29 Thread Klaus Wenninger

On Thu, Mar 24, 2022 at 4:12 PM Ken Gaillot  wrote:
>
> On Wed, 2022-03-23 at 05:30 +, Balotra, Priyanka wrote:
> > Hi All,
> >
> > We have a scenario on SLES 12 SP3 cluster.
> > The scenario is explained as follows in the order of events:
> >  There is a 2-node cluster (FILE-1, FILE-2)
> >  The cluster and the resources were up and running fine initially .
> >  Then fencing request from pacemaker got issued on both nodes
> > simultaneously
> >
> > Logs from 1st node:
> > 2022-02-22T03:26:36.737075+00:00 FILE-1 corosync[12304]: [TOTEM ]
> > Failed to receive the leave message. failed: 2
> > .
> > .
> > 2022-02-22T03:26:36.977888+00:00 FILE-1 pacemaker-fenced[12331]:
> > notice: Requesting that FILE-1 perform 'off' action targeting FILE-2
> >
> > Logs from 2nd node:
> > 2022-02-22T03:26:36.738080+00:00 FILE-2 corosync[4989]: [TOTEM ]
> > Failed to receive the leave message. failed: 1
> > .
> > .
> > Feb 22 03:26:38 FILE-2 pacemaker-fenced [5015] (call_remote_stonith)
> > notice: Requesting that FILE-2 perform 'off' action targeting FILE-1
> >
> >  When the nodes came up after unfencing, the DC got set after
> > election
> >  After that the resources which were expected to run on only one node
> > became active on both (all) nodes of the cluster.
> >
> >  27290 2022-02-22T04:16:31.699186+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: error: Resource stonith-sbd is active on 2 nodes
> > (attempting recovery)
> > 27291 2022-02-22T04:16:31.699397+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: notice: See
> > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> > more information
> > 27292 2022-02-22T04:16:31.699590+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: error: Resource FILE_Filesystem is active on 2
> > nodes (attem pting recovery)
> > 27293 2022-02-22T04:16:31.699731+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: notice: See
> > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> > more information
> > 27294 2022-02-22T04:16:31.699878+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: error: Resource IP_Floating is active on 2 nodes
> > (attemptin g recovery)
> > 27295 2022-02-22T04:16:31.700027+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: notice: See
> > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> > more information
> > 27296 2022-02-22T04:16:31.700203+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: error: Resource Service_Postgresql is active on 2
> > nodes (at tempting recovery)
> > 27297 2022-02-22T04:16:31.700354+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: notice: See
> > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> > more information
> > 27298 2022-02-22T04:16:31.700501+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: error: Resource Service_Postgrest is active on 2
> > nodes (att empting recovery)
> > 27299 2022-02-22T04:16:31.700648+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: notice: See
> > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> > more information
> > 27300 2022-02-22T04:16:31.700792+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: error: Resource Service_esm_primary is active on 2
> > nodes (a ttempting recovery)
> > 27301 2022-02-22T04:16:31.700939+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: notice: See
> > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> > more information
> > 27302 2022-02-22T04:16:31.701086+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: error: Resource Shared_Cluster_Backup is active on
> > 2 nodes (attempting recovery)
> >
> > Can you guys please help us understand if this is indeed a split-
> > brain scenario ? Under what circumstances can such a scenario be
> > observed?
>
> This does look like a split-brain, and the most likely cause is that
> the fence agent reported that fencing was successful, but it actually
> wasn't.
>
> What are you using as a fencing device?
>
> If you're using watchdog-based SBD, that won't work with only two
> nodes, because both nodes will assume they still have quorum, and not
> self-fence. You need either true quorum or a shared external drive to
> use SBD.

We see a fencing-resource stonith_sbd so I would guess
poison-pill-fencing is configured.
So we should verify there isn't stonith-watchdog-timeout configured
to anything but 0 as well - just to be sure it would never fall back
to watchdog-fencing.
Maybe you can try inserting the poison pill manually and see if
the targeted node is rebooting. You can either do that using high-level
tooling as crmsh or pcs or using the sbd-binary as cmdline-tool
directly.
You can try that both from the node to rebooted as well as from the
other node. To e.g. check if both sides see the same disk(s) ...
Check that the disk(s) configured with the sbd-service are the
same as those configured for the sbd-fencing-resource (and of
course when using sbd as cmdline tool to insert a poison pill
the same disks have to be used as well).
Is sbd-service running without complaints?
Please check as

Re: [ClusterLabs] Request for ideas: Cluster node summary in 14 characters

2022-03-17 Thread Klaus Wenninger

On Thu, Mar 17, 2022 at 4:16 PM Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:

> Hi!
>
> I had the idea to display the status of a cluster node on the 14-character
> LCD display of a Dell PowerEdge server; preferably displaying the hostname
> at least partially, too ;-)
>
> Now, what would you display, and how would you display it?
>
> (Actually I already have something (e.g. "h18:T[3Q_]P8V"), but I'd like to
> get your ideas)
>

hmm ... that isn't even enough for a bit.ly-link ;-)

>
> BTW, I also wanted to write a pseudo fencing agent that displays
> "Fencing..." on the LCD when the node is being fenced (hopefully it will
> stay there during reboot), but I realized that the documentation is rather
> incomplete. Most of all I don't really have a fencing-test environment...
>

Something like
https://github.com/ClusterLabs/fence-agents/blob/main/agents/heuristics_ping/fence_heuristics_ping.py
to be used with a fencing-topology?
With a simple client on the node to be fenced and always returning success
...
Unfortunately it probably won't work in most of the interesting cases as
either the to
be fenced node isn't gonna be alive enough or connectivity is gone.
Or is it possible to make the remote-management (aka fencing-device) talk
to the
display somehow to display a configurable text.
Going with an alert agent or registering for fence-history (history would
be reported
on all nodes - thus no need for additional communication) probably isn't
gonna be
helpful either as I think you can't see partial success on a topology -
just when it
is reported to be finally killed - well to late then to do something on the
node to be killed :-(

Klaus

>
> Regards,
> Ulrich
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Noticed oddity when DC is going to be fenced

2022-03-01 Thread Klaus Wenninger

On Tue, Mar 1, 2022 at 10:05 AM Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:

> Hi!
>
> For current SLES15 SP3 I noticed an oddity when the node running the DC is
> going to be fenced:
> It seems that another node is performing recovery operations while the old
> DC is not confirmed to be fenced.
>
> Like this (116 is the DC):
> Mar 01 01:33:53 h18 corosync[6754]:   [TOTEM ] A new membership (
> 172.20.16.18:45612) was formed. Members left: 116
>
> Mar 01 01:33:53 h18 corosync[6754]:   [MAIN  ] Completed service
> synchronization, ready to provide service.
> Mar 01 01:33:53 h18 pacemaker-controld[6980]:  notice: Our peer on the DC
> (h16) is dead
> Mar 01 01:33:53 h18 pacemaker-controld[6980]:  notice: State transition
> S_NOT_DC -> S_ELECTION
>
> Mar 01 01:33:53 h18 dlm_controld[8544]: 394518 fence request 116 pid 16307
> nodedown time 1646094833 fence_all dlm_stonith
> Mar 01 01:33:53 h18 pacemaker-controld[6980]:  notice: State transition
> S_ELECTION -> S_INTEGRATION
> Mar 01 01:33:53 h18 dlm_stonith[16307]: stonith_api_time: Found 1 entries
> for 116/(null): 0 in progress, 0 completed
> Mar 01 01:33:53 h18 pacemaker-fenced[6973]:  notice: Client
> stonith-api.16307.4961743f wants to fence (reboot) '116' with device '(any)'
> Mar 01 01:33:53 h18 pacemaker-fenced[6973]:  notice: Requesting peer
> fencing (reboot) targeting h16
>
> Mar 01 01:33:53 h18 pacemaker-schedulerd[6978]:  warning: Cluster node h16
> will be fenced: peer is no longer part of the cluster
> Mar 01 01:33:53 h18 pacemaker-schedulerd[6978]:  warning: Node h16 is
> unclean
>
> (so far, so good)
> Mar 01 01:33:53 h18 pacemaker-schedulerd[6978]:  warning: Scheduling Node
> h16 for STONITH
> Mar 01 01:33:53 h18 pacemaker-schedulerd[6978]:  notice:  * Fence (reboot)
> h16 'peer is no longer part of the cluster'
>
> Mar 01 01:33:53 h18 pacemaker-controld[6980]:  notice: Initiating monitor
> operation prm_stonith_sbd_monitor_60 locally on h18
> Mar 01 01:33:53 h18 pacemaker-controld[6980]:  notice: Requesting local
> execution of monitor operation for prm_stonith_sbd on h18
> Mar 01 01:33:53 h18 pacemaker-controld[6980]:  notice: Initiating stop
> operation prm_cron_snap_v17_stop_0 on h19
> (isn't h18 playing DC already while h16 isn't fenced yet?)
>

periodic monitors should happen autonomously.
as long as you don't see pacemaker-schedulerd on h18 calculate a new
transition recovering the resources
everything should be fine.
and yes to a certain extent h18 is playing DC (it is elected to be new DC)
- somebody has to schedule fencing.

Klaus

>
> Mar 01 01:35:23 h18 pacemaker-controld[6980]:  error: Node h18 did not
> send monitor result (via controller) within 9ms (action timeout plus
> cluster-delay)
> Mar 01 01:35:23 h18 pacemaker-controld[6980]:  error: [Action   26]:
> In-flight resource op prm_stonith_sbd_monitor_60 on h18 (priority:
> 9900, waiting: (null))
> Mar 01 01:35:23 h18 pacemaker-controld[6980]:  notice: Transition 0
> aborted: Action lost
> Mar 01 01:35:23 h18 pacemaker-controld[6980]:  warning: rsc_op 26:
> prm_stonith_sbd_monitor_60 on h18 timed out
> (whatever that means)
>
> (now the fencing confirmation follows)
> Mar 01 01:35:55 h18 pacemaker-fenced[6973]:  notice: Operation 'reboot'
> [16309] (call 2 from stonith-api.16307) for host 'h16' with device
> 'prm_stonith_sbd' returned: 0 (OK)
> Mar 01 01:35:55 h18 pacemaker-fenced[6973]:  notice: Operation 'reboot'
> targeting h16 on h18 for stonith-api.16307@h18.36b9a9bb: OK
> Mar 01 01:35:55 h18 stonith-api[16307]: stonith_api_kick: Node 116/(null)
> kicked: reboot
> Mar 01 01:35:55 h18 pacemaker-fenced[6973]:  notice: Operation 'reboot'
> targeting h16 on rksaph18 for pacemaker-controld.6980@h18.8ce2f33f
> (merged): OK
> Mar 01 01:35:55 h18 pacemaker-controld[6980]:  notice: Peer h16 was
> terminated (reboot) by h18 on behalf of stonith-api.16307: OK
> Mar 01 01:35:55 h18 pacemaker-controld[6980]:  notice: Stonith operation
> 2/1:0:0:a434124e-3e35-410d-8e17-ef9ae4e4e6eb: OK (0)
> Mar 01 01:35:55 h18 pacemaker-controld[6980]:  notice: Peer h16 was
> terminated (reboot) by h18 on behalf of pacemaker-controld.6980: OK
>
> (actual recovery happens)
> Mar 01 01:35:55 h18 kernel: ocfs2: Begin replay journal (node 116, slot 0)
> on device (9,10)
>
> Mar 01 01:35:55 h18 kernel: md: md10: resync done.
>
> (more actions follow)
> Mar 01 01:35:56 h18 pacemaker-schedulerd[6978]:  notice: Calculated
> transition 1, saving inputs in /var/lib/pacemaker/pengine/pe-input-87.bz2
>
> (actions completed)
> Mar 01 01:37:18 h18 pacemaker-controld[6980]:  notice: State transition
> S_TRANSITION_ENGINE -> S_IDLE
>
> (pacemaker-2.0.5+20201202.ba59be712-150300.4.16.1.x86_64)
>
> Did I misunderstand something, or does it look like a bug?
>
> Regards,
> Ulrich
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>

Re: [ClusterLabs] Q: fence_kdump and fence_kdump_send

2022-02-28 Thread Klaus Wenninger

On Mon, Feb 28, 2022 at 2:46 PM Klaus Wenninger  wrote:

>
>
> On Sat, Feb 26, 2022 at 7:14 AM Strahil Nikolov via Users <
> users@clusterlabs.org> wrote:
>
>> I always used this one for triggering kdump when using sbd:
>> https://www.suse.com/support/kb/doc/?id=19873
>>
>> On Fri, Feb 25, 2022 at 21:34, Reid Wahl
>>  wrote:
>> On Fri, Feb 25, 2022 at 3:47 AM Andrei Borzenkov 
>> wrote:
>> >
>> > On Fri, Feb 25, 2022 at 2:23 PM Reid Wahl  wrote:
>> > >
>> > > On Fri, Feb 25, 2022 at 3:22 AM Reid Wahl  wrote:
>> > > >
>> > ...
>> > > > >
>> > > > > So what happens most likely is that the watchdog terminates the
>> kdump.
>> > > > > In that case all the mess with fence_kdump won't help, right?
>> > > >
>> > > > You can configure extra_modules in your /etc/kdump.conf file to
>> > > > include the watchdog module, and then restart kdump.service. For
>> > > > example:
>> > > >
>> > > > # grep ^extra_modules /etc/kdump.conf
>> > > > extra_modules i6300esb
>> > > >
>> > > > If you're not sure of the name of your watchdog module, wdctl can
>> help
>> > > > you find it. sbd needs to be stopped first, because it keeps the
>> > > > watchdog device timer busy.
>> > > >
>> > > > # pcs cluster stop --all
>> > > > # wdctl | grep Identity
>> > > > Identity:  i6300ESB timer [version 0]
>> > > > # lsmod | grep -i i6300ESB
>> > > > i6300esb  13566  0
>> > > >
>> > > >
>> > > > If you're also using fence_sbd (poison-pill fencing via block
>> device),
>> > > > then you should be able to protect yourself from that during a dump
>> by
>> > > > configuring fencing levels so that fence_kdump is level 1 and
>> > > > fence_sbd is level 2.
>> > >
>> > > RHKB, for anyone interested:
>> > >  - sbd watchdog timeout causes node to reboot during crash kernel
>> > > execution (https://access.redhat.com/solutions/3552201)
>> >
>> > What is not clear from this KB (and quotes from it above) - what
>> > instance updates watchdog? Quoting (emphasis mine)
>> >
>> > --><--
>> > With the module loaded, the timer *CAN* be updated so that it does not
>> > expire and force a reboot in the middle of vmcore generation.
>> > --><--
>> >
>> > Sure it can, but what program exactly updates the watchdog during
>> > kdump execution? I am pretty sure that sbd does not run at this point.
>>
>> That's a valid question. I found this approach to work back in 2018
>> after a fair amount of frustration, and didn't question it too deeply
>> at the time.
>>
>> The answer seems to be that the kernel does it.
>>   - https://stackoverflow.com/a/2020717
>>   - https://stackoverflow.com/a/42589110
>>
>> I think in most cases nobody would be triggering the running watchdog
> except maybe in case of the 2 drivers mentioned.
> Behavior is that if there is no watchdog-timeout defined for the
> crashdump-case
> sbd will (at least try to) disable the watchdog.
> If disabling isn't prohibited or not possible with a certain watchdog this
> should
> lead to the hardware-watchdog being really disabled without anything
> needing
> to trigger it anymore.
> If crashdump-watchdog-timeout is configured to the same value as
> watchdog-timeout engaged before sbd isn't gonna touch the watchdog
> (closing the device without stopping).
> That being said I'd suppose that the only somewhat production-safe
> configuration should be setting both watchdog-timeouts to the same
> value.
>
Unfortunately this setting isn't the default and thus contradicts
the usual paradigm that defaults should be safe settings.
Changing now - or even back when I fixed setting crashdump-timeout -
would unfortunately break existing setups.
So my suggestion is to stay with what we have and be aware of
the non-safe-behavior.

> I doubt that we can assume that all io from the host  - that was initiated
> prior to triggering the transition to crashdump-kernel - being stopped
> immediately. All other nodes will assume that io will be stopped within
> watchdog-timeout though. When we disable the watchdog we can't
> be sure that subsequent transition to crashdump-kernel will even happen.
> So leaving watchdog-timeout at the previous value seems to be
> the only

Re: [ClusterLabs] Q: fence_kdump and fence_kdump_send

2022-02-28 Thread Klaus Wenninger

On Sat, Feb 26, 2022 at 7:14 AM Strahil Nikolov via Users <
users@clusterlabs.org> wrote:

> I always used this one for triggering kdump when using sbd:
> https://www.suse.com/support/kb/doc/?id=19873
>
> On Fri, Feb 25, 2022 at 21:34, Reid Wahl
>  wrote:
> On Fri, Feb 25, 2022 at 3:47 AM Andrei Borzenkov 
> wrote:
> >
> > On Fri, Feb 25, 2022 at 2:23 PM Reid Wahl  wrote:
> > >
> > > On Fri, Feb 25, 2022 at 3:22 AM Reid Wahl  wrote:
> > > >
> > ...
> > > > >
> > > > > So what happens most likely is that the watchdog terminates the
> kdump.
> > > > > In that case all the mess with fence_kdump won't help, right?
> > > >
> > > > You can configure extra_modules in your /etc/kdump.conf file to
> > > > include the watchdog module, and then restart kdump.service. For
> > > > example:
> > > >
> > > > # grep ^extra_modules /etc/kdump.conf
> > > > extra_modules i6300esb
> > > >
> > > > If you're not sure of the name of your watchdog module, wdctl can
> help
> > > > you find it. sbd needs to be stopped first, because it keeps the
> > > > watchdog device timer busy.
> > > >
> > > > # pcs cluster stop --all
> > > > # wdctl | grep Identity
> > > > Identity:  i6300ESB timer [version 0]
> > > > # lsmod | grep -i i6300ESB
> > > > i6300esb  13566  0
> > > >
> > > >
> > > > If you're also using fence_sbd (poison-pill fencing via block
> device),
> > > > then you should be able to protect yourself from that during a dump
> by
> > > > configuring fencing levels so that fence_kdump is level 1 and
> > > > fence_sbd is level 2.
> > >
> > > RHKB, for anyone interested:
> > >  - sbd watchdog timeout causes node to reboot during crash kernel
> > > execution (https://access.redhat.com/solutions/3552201)
> >
> > What is not clear from this KB (and quotes from it above) - what
> > instance updates watchdog? Quoting (emphasis mine)
> >
> > --><--
> > With the module loaded, the timer *CAN* be updated so that it does not
> > expire and force a reboot in the middle of vmcore generation.
> > --><--
> >
> > Sure it can, but what program exactly updates the watchdog during
> > kdump execution? I am pretty sure that sbd does not run at this point.
>
> That's a valid question. I found this approach to work back in 2018
> after a fair amount of frustration, and didn't question it too deeply
> at the time.
>
> The answer seems to be that the kernel does it.
>   - https://stackoverflow.com/a/2020717
>   - https://stackoverflow.com/a/42589110
>
> I think in most cases nobody would be triggering the running watchdog
except maybe in case of the 2 drivers mentioned.
Behavior is that if there is no watchdog-timeout defined for the
crashdump-case
sbd will (at least try to) disable the watchdog.
If disabling isn't prohibited or not possible with a certain watchdog this
should
lead to the hardware-watchdog being really disabled without anything needing
to trigger it anymore.
If crashdump-watchdog-timeout is configured to the same value as
watchdog-timeout engaged before sbd isn't gonna touch the watchdog
(closing the device without stopping).
That being said I'd suppose that the only somewhat production-safe
configuration should be setting both watchdog-timeouts to the same
value.
I doubt that we can assume that all io from the host  - that was initiated
prior to triggering the transition to crashdump-kernel - being stopped
immediately. All other nodes will assume that io will be stopped within
watchdog-timeout though. When we disable the watchdog we can't
be sure that subsequent transition to crashdump-kernel will even happen.
So leaving watchdog-timeout at the previous value seems to be
the only way to really assure that the node is being silenced by a
hardware-reset within the timeout assumed by the rest of the nodes.
In case the watchdog-driver has this running-detection - mentioned
in the links above - the safe way would probably be having the
module removed from crash-kernel.

Klaus

>
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
>
>
> --
> Regards,
>
> Reid Wahl (He/Him), RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

2022-02-17 Thread Klaus Wenninger

On Thu, Feb 17, 2022 at 12:38 PM Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:

> >>> Klaus Wenninger  schrieb am 17.02.2022 um 10:49
> in
> Nachricht
> :
> ...
> >> For completeness: Yes, sbd did recover:
> >> Feb 14 13:01:42 h18 sbd[6615]:  warning: cleanup_servant_by_pid: Servant
> >> for /dev/disk/by-id/dm-name-SBD_1-3P1 (pid: 6619) has terminated
> >> Feb 14 13:01:42 h18 sbd[6615]:  warning: cleanup_servant_by_pid: Servant
> >> for /dev/disk/by-id/dm-name-SBD_1-3P2 (pid: 6621) has terminated
> >> Feb 14 13:01:42 h18 sbd[31668]: /dev/disk/by-id/dm-name-SBD_1-3P1:
> >>  notice: servant_md: Monitoring slot 4 on disk
> >> /dev/disk/by-id/dm-name-SBD_1-3P1
> >> Feb 14 13:01:42 h18 sbd[31669]: /dev/disk/by-id/dm-name-SBD_1-3P2:
> >>  notice: servant_md: Monitoring slot 4 on disk
> >> /dev/disk/by-id/dm-name-SBD_1-3P2
> >> Feb 14 13:01:49 h18 sbd[6615]:   notice: inquisitor_child: Servant
> >> /dev/disk/by-id/dm-name-SBD_1-3P1 is healthy (age: 0)
> >> Feb 14 13:01:49 h18 sbd[6615]:   notice: inquisitor_child: Servant
> >> /dev/disk/by-id/dm-name-SBD_1-3P2 is healthy (age: 0)
> >>
> >
> > Good to see that!
> > Did you try several times?
>
> Well, we only have two fabrics, and the server is productive, so both
> fabrics were interrupted once each (to change the cabling).
> sbd survived.
>
Yup - sometimes the entities that would have to be failed are just too large
to have them as part of the playground/sandbox :-(

>
> Second fabric:
> Feb 14 13:03:51 h18 kernel: qla2xxx [:01:00.0]-500b:2: LOOP DOWN
> detected (2 7 0 0).
> Feb 14 13:03:57 h18 multipathd[5180]: SBD_1-3P2: remaining active paths: 3
> Feb 14 13:03:57 h18 multipathd[5180]: SBD_1-3P2: remaining active paths: 2
>
> Feb 14 13:05:18 h18 kernel: qla2xxx [:01:00.0]-500a:2: LOOP UP
> detected (8 Gbps).
> Feb 14 13:05:22 h18 multipathd[5180]: SBD_1-3P2: sdr - tur checker reports
> path is up
> Feb 14 13:05:22 h18 multipathd[5180]: SBD_1-3P2: remaining active paths: 3
> Feb 14 13:05:23 h18 multipathd[5180]: SBD_1-3P2: sdae - tur checker
> reports path is up
> Feb 14 13:05:23 h18 multipathd[5180]: SBD_1-3P2: remaining active paths: 4
> Feb 14 13:05:25 h18 multipathd[5180]: SBD_1-3P1: sdl - tur checker reports
> path is up
> Feb 14 13:05:25 h18 multipathd[5180]: SBD_1-3P1: remaining active paths: 3
> Feb 14 13:05:26 h18 multipathd[5180]: SBD_1-3P1: sdo - tur checker reports
> path is up
> Feb 14 13:05:26 h18 multipathd[5180]: SBD_1-3P1: remaining active paths: 4
>
> So this time multipath reacted before SBD noticed anything (the way it
> should have been anyway)
>
Depends on how you like it to behave.
You are free to configure the io-timeout in a way that sbd wouldn't see it
or
if you'd rather have some notice in the sbd-logs, or the added reliability
of
kicking off another try instead of waiting for a first - maybe doomed - one
to
finish you give it enough time to retry within your msgwait-timeout.
Unfortunately it isn't possible to have one-fits-all defaults here.
But feedback is welcome so that we can do a little tweaking that makes them
fit
for a larger audience.
Remember a case where devices stalled for 50s during a firmware-update
shouldn't trigger fencing - definitely a case that can't be covered by
defaults.


> > I have some memory that when testing with the kernel mentioned before
> > behavior
> > changed after a couple of timeouts and it wasn't able to create the
> > read-request
> > anymore (without the fix mentioned) - assume some kind of resource
> depletion
> > due to previously hanging attempts not destroyed properly.
>
> That can be a nasty rece condition, too, however. (I had my share of
> signal handlers, threads and race conditions).
> Of course more crude programming errors are possible, too.
>
One single threaded process and it was gone once the api was handled
properly.
I mean the different behavior after a couple of retries was gone. The basic
issue
was persistent with that kernel.

> Debugging can be very hard, but dmsetup can create bad disks for testing
> for you ;-)
> DEV=bad_disk
> dmsetup create "$DEV" < 0 8 zero
> 8 1 error
> 9 7 zero
> 16 1 error
> 17 255 zero
> EOF
>
We need to impose the problem dynamically.
Otherwise sbd wouldn't come up in the first place - which is of course a
useful test
in itself as well.
Atm regressions.sh is using wipe_table to impose an error dynamically
but simultaneously on all blocks. The periodic reading is anyway done on
just
a single block (more accurately the header as well). So we should be fine
with that.
I saw that device-mapper offers a possibility to delay here as well.

Re: [ClusterLabs] Antw: [EXT] Re: Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

2022-02-17 Thread Klaus Wenninger

On Thu, Feb 17, 2022 at 10:14 AM Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:

> >>> Klaus Wenninger  schrieb am 16.02.2022 um 16:59
> in
> Nachricht
> :
> > On Wed, Feb 16, 2022 at 4:26 PM Klaus Wenninger 
> wrote:
> >
> >>
> >>
> >> On Wed, Feb 16, 2022 at 3:09 PM Ulrich Windl <
> >> ulrich.wi...@rz.uni-regensburg.de> wrote:
> >>
> >>> Hi!
> >>>
> >>> When changing some FC cables I noticed that sbd complained 2 seconds
> >>> after the connection went down (event though the device is multi-pathed
> >>> with other paths being still up).
> >>> I don't know any sbd parameter being set so low that after 2 seconds
> sbd
> >>> would panic. Which parameter (if any) is responsible for that?
> >>>
> >>> In fact multipath takes up to 5 seconds to adjust paths.
> >>>
> >>> Here are some sample events (sbd-1.5.0+20210720.f4ca41f-3.6.1.x86_64
> from
> >>> SLES15 SP3):
> >>> Feb 14 13:01:36 h18 kernel: qla2xxx [:41:00.0]-500b:3: LOOP DOWN
> >>> detected (2 7 0 0).
> >>> Feb 14 13:01:38 h18 sbd[6621]: /dev/disk/by-id/dm-name-SBD_1-3P2:
> >>> error: servant_md: slot read failed in servant.
> >>> Feb 14 13:01:38 h18 sbd[6619]: /dev/disk/by-id/dm-name-SBD_1-3P1:
> >>> error: servant_md: mbox read failed in servant.
> >>> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Servant
> >>> /dev/disk/by-id/dm-name-SBD_1-3P1 is outdated (age: 11)
> >>> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Servant
> >>> /dev/disk/by-id/dm-name-SBD_1-3P2 is outdated (age: 11)
> >>> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Majority of
> >>> devices lost - surviving on pacemaker
> >>> Feb 14 13:01:42 h18 kernel: sd 3:0:3:2: rejecting I/O to offline device
> >>> Feb 14 13:01:42 h18 kernel: blk_update_request: I/O error, dev sdbt,
> >>> sector 2048 op 0x0:(READ) flags 0x4200 phys_seg 1 prio class 1
> >>> Feb 14 13:01:42 h18 kernel: device-mapper: multipath: 254:17: Failing
> >>> path 68:112.
> >>> Feb 14 13:01:42 h18 kernel: sd 3:0:1:2: rejecting I/O to offline device
> >>>
> >> Sry forgotten to address the following.
> >
> > Guess your sbd-package predates
> >
> https://github.com/ClusterLabs/sbd/commit/9e6cbbad9e259de374cbf41b713419c342
> > 528db1
> > and thus doesn't properly destroy the io-context using the aio-api.
> > This flaw has been in kind of since ever and I actually found it due to a
> > kernel-issue that made
> > all block-io done the way sbd is doing it (aio + O_SYNC + O_DIRECT
> Actually
> > never successfully
> > tracked it down to the real kernel issue playing with kprobes. But it was
> > gone on the next kernel
> > update
> > ) timeout.
> > Without survival on pacemaker it would have suicided after
> > msgwait-timeout (10s in your case probably).
> > Would be interesting what happens if you raise msgwait-timeout to a value
> > that would allow
> > another read attempt.
> > Does your setup actually recover? Could be possible that it doesn't
> missing
> > the fix referenced above.
>
> For completeness: Yes, sbd did recover:
> Feb 14 13:01:42 h18 sbd[6615]:  warning: cleanup_servant_by_pid: Servant
> for /dev/disk/by-id/dm-name-SBD_1-3P1 (pid: 6619) has terminated
> Feb 14 13:01:42 h18 sbd[6615]:  warning: cleanup_servant_by_pid: Servant
> for /dev/disk/by-id/dm-name-SBD_1-3P2 (pid: 6621) has terminated
> Feb 14 13:01:42 h18 sbd[31668]: /dev/disk/by-id/dm-name-SBD_1-3P1:
>  notice: servant_md: Monitoring slot 4 on disk
> /dev/disk/by-id/dm-name-SBD_1-3P1
> Feb 14 13:01:42 h18 sbd[31669]: /dev/disk/by-id/dm-name-SBD_1-3P2:
>  notice: servant_md: Monitoring slot 4 on disk
> /dev/disk/by-id/dm-name-SBD_1-3P2
> Feb 14 13:01:49 h18 sbd[6615]:   notice: inquisitor_child: Servant
> /dev/disk/by-id/dm-name-SBD_1-3P1 is healthy (age: 0)
> Feb 14 13:01:49 h18 sbd[6615]:   notice: inquisitor_child: Servant
> /dev/disk/by-id/dm-name-SBD_1-3P2 is healthy (age: 0)
>

Good to see that!
Did you try several times?
I have some memory that when testing with the kernel mentioned before
behavior
changed after a couple of timeouts and it wasn't able to create the
read-request
anymore (without the fix mentioned) - assume some kind of resource depletion
due to previously hanging attempts not destroyed properly.
But that behavior might heavily depend on the kernel-version and as your
attempts
do terminate with failure in the kernel some time later

Re: [ClusterLabs] Antw: [EXT] Re: Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

2022-02-17 Thread Klaus Wenninger

On Thu, Feb 17, 2022 at 9:27 AM Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:

> >>> Klaus Wenninger  schrieb am 16.02.2022 um 16:26
> in
> Nachricht
> :
> > On Wed, Feb 16, 2022 at 3:09 PM Ulrich Windl <
> > ulrich.wi...@rz.uni-regensburg.de> wrote:
> >
> >> Hi!
> >>
> >> When changing some FC cables I noticed that sbd complained 2 seconds
> after
> >> the connection went down (event though the device is multi-pathed with
> >> other paths being still up).
> >> I don't know any sbd parameter being set so low that after 2 seconds sbd
> >> would panic. Which parameter (if any) is responsible for that?
> >>
> >> In fact multipath takes up to 5 seconds to adjust paths.
> >>
> >> Here are some sample events (sbd-1.5.0+20210720.f4ca41f-3.6.1.x86_64
> from
> >> SLES15 SP3):
> >> Feb 14 13:01:36 h18 kernel: qla2xxx [:41:00.0]-500b:3: LOOP DOWN
> >> detected (2 7 0 0).
> >> Feb 14 13:01:38 h18 sbd[6621]: /dev/disk/by-id/dm-name-SBD_1-3P2:
> >> error: servant_md: slot read failed in servant.
> >> Feb 14 13:01:38 h18 sbd[6619]: /dev/disk/by-id/dm-name-SBD_1-3P1:
> >> error: servant_md: mbox read failed in servant.
> >> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Servant
> >> /dev/disk/by-id/dm-name-SBD_1-3P1 is outdated (age: 11)
> >> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Servant
> >> /dev/disk/by-id/dm-name-SBD_1-3P2 is outdated (age: 11)
> >> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Majority of
> >> devices lost - surviving on pacemaker
> >> Feb 14 13:01:42 h18 kernel: sd 3:0:3:2: rejecting I/O to offline device
> >> Feb 14 13:01:42 h18 kernel: blk_update_request: I/O error, dev sdbt,
> >> sector 2048 op 0x0:(READ) flags 0x4200 phys_seg 1 prio class 1
> >> Feb 14 13:01:42 h18 kernel: device-mapper: multipath: 254:17: Failing
> path
> >> 68:112.
> >> Feb 14 13:01:42 h18 kernel: sd 3:0:1:2: rejecting I/O to offline device
> >>
> >> Most puzzling is the fact that sbd reports a problem 4 seconds before
> the
> >> kernel reports an I/O error. I guess sbd "times out" the pending read.
> >>
> > Yep - that is timeout_io defaulting to 3s.
> > You can set it with -I daemon start parameter.
> > Together with the rest of the default-timeout-scheme the 3s do make
> sense.
> > Not sure but if you increase that significantly you might have to adapt
> > other timeouts.
>
> We extended the timeouts so that sbd would survive an online firmware
> update of the storage system which may cause it not to respond for up to 30
> seconds when the controllers restart.
>
> > There are a certain number of checks regarding relationship of timeouts
> but
> > they might not be exhaustive.
> >
> >>
> >> The thing is: Both SBD disks are on different storage systems, each
> being
> >> connected by two separate FC fabrics, but still when disconnecting one
> >> cable from the host sbd panics.
> >> My guess is if "surviving on pacemaker" would not have happened, the
> node
> >> would be fenced; is that right?
> >>
> >> The other thing I wonder is the "outdated age":
> >> How can the age be 11 (seconds) when the disk was disconnected 4 seconds
> >> ago?
> >> It seems here the age is "current time - time_of_last read" instead of
> >> "current_time - time_when read_attempt_started".
> >>
> > Exactly! And that is the correct way to do it as we need to record the
> time
> > passed since last successful read.
> > There is no value in starting the clock when we start the read attempt as
> > these attempts are not synced throughout
> > the cluster.
>
> I don't understand: There is no heartbeat written to SBD that has to be
> read; instead the device is polled for messages.
> So the important point is how much the polling is delayed by some problem
> (not by some deliberate sleep).
> And I don't see why the measurement has to be synced throughout the
> cluster: Local is enough.
>
> I'm afraid I'm not really getting your point below.

Anyway the important part is that the last time having read the mailbox
successfully as empty mustn't
be older than basically msgwait. (partly a result of the polls not synced
throughout the cluster)
As we on top need to take into account signalling between the processes +
poll-cycle + x the actual read
io-timeout should be substantially shorter.
Purpose of the code is not to print any critical l

Re: [ClusterLabs] Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

2022-02-16 Thread Klaus Wenninger

On Wed, Feb 16, 2022 at 4:59 PM Klaus Wenninger  wrote:

>
>
> On Wed, Feb 16, 2022 at 4:26 PM Klaus Wenninger 
> wrote:
>
>>
>>
>> On Wed, Feb 16, 2022 at 3:09 PM Ulrich Windl <
>> ulrich.wi...@rz.uni-regensburg.de> wrote:
>>
>>> Hi!
>>>
>>> When changing some FC cables I noticed that sbd complained 2 seconds
>>> after the connection went down (event though the device is multi-pathed
>>> with other paths being still up).
>>> I don't know any sbd parameter being set so low that after 2 seconds sbd
>>> would panic. Which parameter (if any) is responsible for that?
>>>
>>> In fact multipath takes up to 5 seconds to adjust paths.
>>>
>>> Here are some sample events (sbd-1.5.0+20210720.f4ca41f-3.6.1.x86_64
>>> from SLES15 SP3):
>>> Feb 14 13:01:36 h18 kernel: qla2xxx [:41:00.0]-500b:3: LOOP DOWN
>>> detected (2 7 0 0).
>>> Feb 14 13:01:38 h18 sbd[6621]: /dev/disk/by-id/dm-name-SBD_1-3P2:
>>> error: servant_md: slot read failed in servant.
>>> Feb 14 13:01:38 h18 sbd[6619]: /dev/disk/by-id/dm-name-SBD_1-3P1:
>>> error: servant_md: mbox read failed in servant.
>>> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Servant
>>> /dev/disk/by-id/dm-name-SBD_1-3P1 is outdated (age: 11)
>>> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Servant
>>> /dev/disk/by-id/dm-name-SBD_1-3P2 is outdated (age: 11)
>>> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Majority of
>>> devices lost - surviving on pacemaker
>>> Feb 14 13:01:42 h18 kernel: sd 3:0:3:2: rejecting I/O to offline device
>>> Feb 14 13:01:42 h18 kernel: blk_update_request: I/O error, dev sdbt,
>>> sector 2048 op 0x0:(READ) flags 0x4200 phys_seg 1 prio class 1
>>> Feb 14 13:01:42 h18 kernel: device-mapper: multipath: 254:17: Failing
>>> path 68:112.
>>> Feb 14 13:01:42 h18 kernel: sd 3:0:1:2: rejecting I/O to offline device
>>>
>> Sry forgotten to address the following.
>
> Guess your sbd-package predates
>
> https://github.com/ClusterLabs/sbd/commit/9e6cbbad9e259de374cbf41b713419c342528db1
> and thus doesn't properly destroy the io-context using the aio-api.
> This flaw has been in kind of since ever and I actually found it due to a
> kernel-issue that made
> all block-io done the way sbd is doing it (aio + O_SYNC + O_DIRECT
> Actually never successfully
> tracked it down to the real kernel issue playing with kprobes. But it was
> gone on the next kernel
> update
> ) timeout.
> Without survival on pacemaker it would have suicided after
> msgwait-timeout (10s in your case probably).
> Would be interesting what happens if you raise msgwait-timeout to a value
> that would allow
> another read attempt.
> Does your setup actually recover? Could be possible that it doesn't
> missing the fix referenced above.
>
One more thing:
Even if it looks as if it recovers there might be a leak of kernel
resources (maybe per process)
so that issues surface just after the timeout has happened several times.


>
> Regards,
> Klaus
>
>>
>>> Most puzzling is the fact that sbd reports a problem 4 seconds before
>>> the kernel reports an I/O error. I guess sbd "times out" the pending read.
>>>
>> Yep - that is timeout_io defaulting to 3s.
>> You can set it with -I daemon start parameter.
>> Together with the rest of the default-timeout-scheme the 3s do make sense.
>> Not sure but if you increase that significantly you might have to adapt
>> other timeouts.
>> There are a certain number of checks regarding relationship of timeouts
>> but they might not be exhaustive.
>>
>>>
>>> The thing is: Both SBD disks are on different storage systems, each
>>> being connected by two separate FC fabrics, but still when disconnecting
>>> one cable from the host sbd panics.
>>> My guess is if "surviving on pacemaker" would not have happened, the
>>> node would be fenced; is that right?
>>>
>>> The other thing I wonder is the "outdated age":
>>> How can the age be 11 (seconds) when the disk was disconnected 4 seconds
>>> ago?
>>> It seems here the age is "current time - time_of_last read" instead of
>>> "current_time - time_when read_attempt_started".
>>>
>> Exactly! And that is the correct way to do it as we need to record the
>> time passed since last successful read.
>> There is no value in starting the clock when we start the read attempt as
>> these attempts are not synced throughout
>> the cluster.
>>
>> Regards,
>> Klaus
>>
>>>
>>> Regards,
>>> Ulrich
>>>
>>>
>>>
>>>
>>> ___
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>>>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

2022-02-16 Thread Klaus Wenninger

On Wed, Feb 16, 2022 at 4:26 PM Klaus Wenninger  wrote:

>
>
> On Wed, Feb 16, 2022 at 3:09 PM Ulrich Windl <
> ulrich.wi...@rz.uni-regensburg.de> wrote:
>
>> Hi!
>>
>> When changing some FC cables I noticed that sbd complained 2 seconds
>> after the connection went down (event though the device is multi-pathed
>> with other paths being still up).
>> I don't know any sbd parameter being set so low that after 2 seconds sbd
>> would panic. Which parameter (if any) is responsible for that?
>>
>> In fact multipath takes up to 5 seconds to adjust paths.
>>
>> Here are some sample events (sbd-1.5.0+20210720.f4ca41f-3.6.1.x86_64 from
>> SLES15 SP3):
>> Feb 14 13:01:36 h18 kernel: qla2xxx [:41:00.0]-500b:3: LOOP DOWN
>> detected (2 7 0 0).
>> Feb 14 13:01:38 h18 sbd[6621]: /dev/disk/by-id/dm-name-SBD_1-3P2:
>> error: servant_md: slot read failed in servant.
>> Feb 14 13:01:38 h18 sbd[6619]: /dev/disk/by-id/dm-name-SBD_1-3P1:
>> error: servant_md: mbox read failed in servant.
>> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Servant
>> /dev/disk/by-id/dm-name-SBD_1-3P1 is outdated (age: 11)
>> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Servant
>> /dev/disk/by-id/dm-name-SBD_1-3P2 is outdated (age: 11)
>> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Majority of
>> devices lost - surviving on pacemaker
>> Feb 14 13:01:42 h18 kernel: sd 3:0:3:2: rejecting I/O to offline device
>> Feb 14 13:01:42 h18 kernel: blk_update_request: I/O error, dev sdbt,
>> sector 2048 op 0x0:(READ) flags 0x4200 phys_seg 1 prio class 1
>> Feb 14 13:01:42 h18 kernel: device-mapper: multipath: 254:17: Failing
>> path 68:112.
>> Feb 14 13:01:42 h18 kernel: sd 3:0:1:2: rejecting I/O to offline device
>>
> Sry forgotten to address the following.

Guess your sbd-package predates
https://github.com/ClusterLabs/sbd/commit/9e6cbbad9e259de374cbf41b713419c342528db1
and thus doesn't properly destroy the io-context using the aio-api.
This flaw has been in kind of since ever and I actually found it due to a
kernel-issue that made
all block-io done the way sbd is doing it (aio + O_SYNC + O_DIRECT Actually
never successfully
tracked it down to the real kernel issue playing with kprobes. But it was
gone on the next kernel
update
) timeout.
Without survival on pacemaker it would have suicided after
msgwait-timeout (10s in your case probably).
Would be interesting what happens if you raise msgwait-timeout to a value
that would allow
another read attempt.
Does your setup actually recover? Could be possible that it doesn't missing
the fix referenced above.

Regards,
Klaus

>
>> Most puzzling is the fact that sbd reports a problem 4 seconds before the
>> kernel reports an I/O error. I guess sbd "times out" the pending read.
>>
> Yep - that is timeout_io defaulting to 3s.
> You can set it with -I daemon start parameter.
> Together with the rest of the default-timeout-scheme the 3s do make sense.
> Not sure but if you increase that significantly you might have to adapt
> other timeouts.
> There are a certain number of checks regarding relationship of timeouts
> but they might not be exhaustive.
>
>>
>> The thing is: Both SBD disks are on different storage systems, each being
>> connected by two separate FC fabrics, but still when disconnecting one
>> cable from the host sbd panics.
>> My guess is if "surviving on pacemaker" would not have happened, the node
>> would be fenced; is that right?
>>
>> The other thing I wonder is the "outdated age":
>> How can the age be 11 (seconds) when the disk was disconnected 4 seconds
>> ago?
>> It seems here the age is "current time - time_of_last read" instead of
>> "current_time - time_when read_attempt_started".
>>
> Exactly! And that is the correct way to do it as we need to record the
> time passed since last successful read.
> There is no value in starting the clock when we start the read attempt as
> these attempts are not synced throughout
> the cluster.
>
> Regards,
> Klaus
>
>>
>> Regards,
>> Ulrich
>>
>>
>>
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

2022-02-16 Thread Klaus Wenninger

On Wed, Feb 16, 2022 at 3:09 PM Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:

> Hi!
>
> When changing some FC cables I noticed that sbd complained 2 seconds after
> the connection went down (event though the device is multi-pathed with
> other paths being still up).
> I don't know any sbd parameter being set so low that after 2 seconds sbd
> would panic. Which parameter (if any) is responsible for that?
>
> In fact multipath takes up to 5 seconds to adjust paths.
>
> Here are some sample events (sbd-1.5.0+20210720.f4ca41f-3.6.1.x86_64 from
> SLES15 SP3):
> Feb 14 13:01:36 h18 kernel: qla2xxx [:41:00.0]-500b:3: LOOP DOWN
> detected (2 7 0 0).
> Feb 14 13:01:38 h18 sbd[6621]: /dev/disk/by-id/dm-name-SBD_1-3P2:
> error: servant_md: slot read failed in servant.
> Feb 14 13:01:38 h18 sbd[6619]: /dev/disk/by-id/dm-name-SBD_1-3P1:
> error: servant_md: mbox read failed in servant.
> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Servant
> /dev/disk/by-id/dm-name-SBD_1-3P1 is outdated (age: 11)
> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Servant
> /dev/disk/by-id/dm-name-SBD_1-3P2 is outdated (age: 11)
> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Majority of
> devices lost - surviving on pacemaker
> Feb 14 13:01:42 h18 kernel: sd 3:0:3:2: rejecting I/O to offline device
> Feb 14 13:01:42 h18 kernel: blk_update_request: I/O error, dev sdbt,
> sector 2048 op 0x0:(READ) flags 0x4200 phys_seg 1 prio class 1
> Feb 14 13:01:42 h18 kernel: device-mapper: multipath: 254:17: Failing path
> 68:112.
> Feb 14 13:01:42 h18 kernel: sd 3:0:1:2: rejecting I/O to offline device
>
> Most puzzling is the fact that sbd reports a problem 4 seconds before the
> kernel reports an I/O error. I guess sbd "times out" the pending read.
>
Yep - that is timeout_io defaulting to 3s.
You can set it with -I daemon start parameter.
Together with the rest of the default-timeout-scheme the 3s do make sense.
Not sure but if you increase that significantly you might have to adapt
other timeouts.
There are a certain number of checks regarding relationship of timeouts but
they might not be exhaustive.

>
> The thing is: Both SBD disks are on different storage systems, each being
> connected by two separate FC fabrics, but still when disconnecting one
> cable from the host sbd panics.
> My guess is if "surviving on pacemaker" would not have happened, the node
> would be fenced; is that right?
>
> The other thing I wonder is the "outdated age":
> How can the age be 11 (seconds) when the disk was disconnected 4 seconds
> ago?
> It seems here the age is "current time - time_of_last read" instead of
> "current_time - time_when read_attempt_started".
>
Exactly! And that is the correct way to do it as we need to record the time
passed since last successful read.
There is no value in starting the clock when we start the read attempt as
these attempts are not synced throughout
the cluster.

Regards,
Klaus

>
> Regards,
> Ulrich
>
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] ethernet link up/down - ?

2022-02-16 Thread Klaus Wenninger

On Tue, Feb 15, 2022 at 5:25 PM lejeczek via Users 
wrote:

>
>
> On 07/02/2022 19:21, Antony Stone wrote:
> > On Monday 07 February 2022 at 20:09:02, lejeczek via Users wrote:
> >
> >> Hi guys
> >>
> >> How do you guys go about doing link up/down as a resource?
> > I apply or remove addresses on the interface, using "IPaddr2" and
> "IPv6addr",
> > which I know is not the same thing.
> >
> > Why do you separately want to control link up/down?  I can't think what I
> > would use this for.
>
Just out of curiosity and as I haven't seen an answer in the thread yet -
maybe
I overlooked something ...
Is this to control some link-triggered redundancy setup with switches?

Klaus

> >
> >
> > Antony.
> >
> Kind of similar - tcp/ip and those layers configs are
> delivered by DHCP.
> I'd think it would have to be a clone resource with one
> master without any constraints where cluster freely decides
> where to put master(link up) on - which is when link gets
> dhcp-served.
> But I wonder if that would mean writing up a new resource -
> I don't think there is anything like that included in
> ready-made pcs/ocf packages.
>
> many thanks, L
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Cluster Removing VIP and Not Following Order Constraint

2022-02-11 Thread Klaus Wenninger

On Fri, Feb 11, 2022 at 9:13 AM Strahil Nikolov via Users <
users@clusterlabs.org> wrote:

> Shouldn't you use kind ' Mandatory' and simetrical TRUE ?
>
> If true, the reverse of the constraint applies for the opposite action
> (for example, if B starts after A starts, then B stops before A stops).
>
If the script should be run before any change then it sounds as if an
asymmetric order would be desirable.
So you might create at least two order constraints explicitly listing the
actions.
But I doubt that this explains the unexpected behavior described.
As Ulrich said a little bit more info about the config would be helpful.

Regards,
Klaus

>
> Best Regards,
> Strahil Nikolov
>
> On Fri, Feb 11, 2022 at 9:11, Ulrich Windl
>  wrote:
> >>> Jonno  schrieb am 10.02.2022 um 20:43 in Nachricht
> :
> > Hello,
> >
> > I am having some trouble getting my 2 node active/passive cluster to do
> > what I want. More specifically, my cluster is removing the VIP from the
> > cluster whenever I attempt a failover with a command such as "crm
> resource
> > move rsc_cluster_vip node2".
> >
> > When running the command above, I am asking the cluster to migrate the
> VIP
> > to the standby node, but I am expecting the cluster to honour the order
> > constraint, by first running the script resource named "rsc_lsb_quiesce".
> > The order constraint looks like:
> >
> > "order order_ABC rsc_lsb_quiesce rsc_cluster_vip msl_ABC:promote
> > rsc_lsb_resume"
> >
> > But it doesn't seem to do what I expect. It always removes the VIP
> entirely
> > from the cluster first, then it starts to follow the order constraint.
> This
> > means my cluster is in a state where the VIP is completely gone for a
> > couple of minutes. I've also tried doing a "crm resource move
> > rsc_lsb_quiesce
> > node2" hoping to trigger the script resource first, but the cluster
> always
> > removes the VIP before doing anything.
> >
> > My question is: How can I make the cluster follow this order constraint?
> I
>
> I'm very sure you just made a configuration mistake.
> But nobody can help you unless you show your configuration and example
> execution of events, plus the expected order of execution.
>
> Regards,
> Ulrich
>
>
> > need the cluster to run the "rsc_lsb_quiesce" script against a remote
> > application server before any other action is taken. I especially need
> the
> > VIP to stay where it is. Should I be doing this another way?
> >
> > Regards,
> > Jonathan
>
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Is there a python package for pacemaker ?

2022-02-03 Thread Klaus Wenninger

On Wed, Feb 2, 2022 at 7:06 PM Ken Gaillot  wrote:

> On Wed, 2022-02-02 at 18:46 +0100, Lentes, Bernd wrote:
> > Hi,
> >
> > i need to write some scripts for our cluster. Until now i wrote bash
> > scripts.
> > But i like to learn python. Is there a package for pacemaker ?
> > What i found is: https://pypi.org/project/pacemaker/ and i'm not sure
> > what that is.
> >
> > Thanks.
> >
> > Bernd
>
> Not currently (that's an unrelated project I wasn't aware of). It is a
> goal to make one, but time hasn't been available.
>
> We're taking a big step towards it by creating a high-level C API for
> Pacemaker that's essentially equivalent to how the command-line tools
> work. It will be much easier to wrap this API in Python. There are
> already high-level API equivalents of crmadmin and crm_simulate, and
> the crm_mon equivalent is expected in the next release.
>
> In the meantime, the easiest approach is probably just to use the
> subprocess module to execute the Pacemaker command-line tools to do
> what you want.
>
You might as well use pcs, crmsh or cts as a source of
inspiration as they are all 3 written in python.
AFAIK none of those offers a (stable) python API (yet) though.

Klaus

> --
> Ken Gaillot 
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Removing a resource without stopping it

2022-01-31 Thread Klaus Wenninger

On Mon, Jan 31, 2022 at 2:43 PM Jehan-Guillaume de Rorthais 
wrote:

> On Mon, 31 Jan 2022 08:49:44 +0100
> Klaus Wenninger  wrote:
> ...
> > Depending on the environment it might make sense to think about
> > having the manual migration-step controlled by the cluster(s) using
> > booth. Just thinking - not a specialist on that topic ...
>
> Could you elaborate a bit on this?
>
> Boothd allows to start/stop a ressource in the cluster currently owning the
> associated ticket. In this regard, this could help to stop the resource on
> one
> side and start it on the other one.
>
> However, as far as I know, there's no action like migrate-to/migrate-from
> that
> could be executed across multiple clusters to deal with the migration steps
> between both clusters... or does it?
>
Guess I hadn't thought that far. 'controlled' above is probably not the
right
wording. 'safeguard' may be the better one. Was mainly thinking of a
mechanism
that would prevent the initial cluster from starting the vm again if
something
goes wrong with the sequence of actions described.
But of course the idea of using booth to actively trigger a migration sounds
appealing and generally useful.
Maybe something to put on the wishlist ;-)


> ++
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Removing a resource without stopping it

2022-01-30 Thread Klaus Wenninger

On Mon, Jan 31, 2022 at 8:19 AM Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:

> >>> Digimer  schrieb am 28.01.2022 um 22:38 in Nachricht
> :
> > Hi all,
> >
> >I'm trying to figure out how to move a running VM from one pacemaker
> > cluster to another. I've got the storage and VM live migration sorted,
> > but having trouble with pacemaker.
> >
> >I tried unmanaging the resource (the VM), then deleted the resource,
> > and the node got fenced. So I am assuming it thought it couldn't stop
> > the service so it self-fenced. In any case, can someone let me know what
> > the proper procedure is?
>
> I'd try:
>
> unmanage on source cluster, migrate the VM to dest cluster.
> The set the source cluster's resource role to stopped.
> Then ´manage the resource again. The cluster should find that the resource
> is
> stopped and thus be happy.
> Then delete the stopped resource.
> On the dest custer define the resource. Thecluster should find it running
> and
> be happy.
>
Isn't that exactly what digimer mentioned in her most recent comment?

Depending on the environment it might make sense to think about
having the manual migration-step controlled by the cluster(s) using
booth. Just thinking - not a specialist on that topic ...

Klaus

>
> Regards,
> Ulrich
>
> >
> >Said more directly;
> >
> >How to I delete a resource from pacemaker (via pcs on EL8) without
> > stopping the resource?
>
> Bad idea IMHO.
>
> >
> > --
> > Digimer
> > Papers and Projects: https://alteeve.com/w/
> > "I am, somehow, less interested in the weight and convolutions of
> Einstein’s
>
> > brain than in the near certainty that people of equal talent have lived
> and
>
> > died in cotton fields and sweatshops." - Stephen Jay Gould
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] sbd v1.5.1

2021-11-15 Thread Klaus Wenninger

Hi sbd - developers & users!

Thanks to everybody for contributing to tests and
further development.


Changes since 1.5.1

- improve/fix cmdline handling
  - tell the actual watchdog device specified with -w
  - tolerate and strip any leading spaces of commandline option values
  - Sanitize numeric arguments
- if start-delay enabled, not explicitly given and msgwait can't be
  read from disk (diskless) use 2 * watchdog-timeout
- avoid using deprecated valloc for disk-io-buffers
- avoid frequent alloc/free of aligned buffers to prevent fragmentation
- fix memory-leak in one-time-allocations of sector-buffers
- fix AIO-API usage: properly destroy io-context
- improve/fix build environment
  - validate configure options for paths
  - remove unneeded complexity of configure.ac hierarchy
  - correctly derive package version from git (regression since 1.5.0)
  - make runstatedir configurable and derive from distribution


Regards,
Klaus
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Fence node when network interface goes down

2021-11-15 Thread Klaus Wenninger

On Mon, Nov 15, 2021 at 12:19 PM Andrei Borzenkov 
wrote:

> On Mon, Nov 15, 2021 at 1:18 PM Klaus Wenninger 
> wrote:
> >
> >
> >
> > On Mon, Nov 15, 2021 at 10:37 AM S Rogers 
> wrote:
> >>
> >> I had thought about doing that, but the cluster is then dependent on the
> >> external system, and if that external system was to go down or become
> >> unreachable for any reason then it would falsely cause the cluster to
> >> failover or worse it could even take the cluster down completely, if the
> >> external system goes down and both nodes cannot ping it.
> >
> > You wouldn't necessarily have to ban resources from nodes that can't
> > reach the external network. It would be enough to make them prefer
> > the location that has connection. So if both lose connection  one side
> > would still stay up.
> > Not to depend on something really external you might use the
> > router to your external network as ping target.
> > In case of fencing - triggered by whatever - and a potential fence-race
>
> The problem here is that nothing really triggers fencing. What happens, is
>

Got that! Which is why I gave the hint how to prevent shutting down
services with ping first.
Taking care of what happens when nodes are fenced still makes sense.
Imagine a fence-race where the node running services loses just
to afterwards get the services moved back when it comes up again.

Klaus


>
> - two postgres lose connection over external network, but cluster
> nodes retain connectivity over another network
> - postgres RA compares "latest timestamp" when selecting the best node
> to fail over to
> - primary postgres has better timestamp, so RA simply does not
> consider secondary as suitable for (atomatic) failover
>
> The only solution here - as long as fencing node on external
> connectivity loss is acceptable - is modifying ethmonitor RA to fail
> monitor operation in this case.
>
> > you might use the rather new feature priority-fencing-delay (give the
> node
> > that is running valuable resources a benefit in the race) or go for
> > fence_heuristics_ping (pseudo fence-resource that together with a
> > fencing-topology prevents the node without access to a certain IP
> > from fencing the other node).
> >
> https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-cluster-options.html
> >
> https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py
> >
> > Klaus
> > ___
> >>
> >> Manage your subscription:
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>
> >> ClusterLabs home: https://www.clusterlabs.org/
> >>
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Fence node when network interface goes down

2021-11-15 Thread Klaus Wenninger

On Mon, Nov 15, 2021 at 10:37 AM S Rogers  wrote:

> I had thought about doing that, but the cluster is then dependent on the
> external system, and if that external system was to go down or become
> unreachable for any reason then it would falsely cause the cluster to
> failover or worse it could even take the cluster down completely, if the
> external system goes down and both nodes cannot ping it.
>
You wouldn't necessarily have to ban resources from nodes that can't
reach the external network. It would be enough to make them prefer
the location that has connection. So if both lose connection  one side
would still stay up.
Not to depend on something really external you might use the
router to your external network as ping target.
In case of fencing - triggered by whatever - and a potential fence-race
you might use the rather new feature priority-fencing-delay (give the node
that is running valuable resources a benefit in the race) or go for
fence_heuristics_ping (pseudo fence-resource that together with a
fencing-topology prevents the node without access to a certain IP
from fencing the other node).
https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-cluster-options.html
https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py

Klaus
___

> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Re: VirtualDomain & "deeper" monitors - what/how?

2021-10-26 Thread Klaus Wenninger

On Mon, Oct 25, 2021 at 9:34 PM Kyle O'Donnell  wrote:

> Finally got around to working on this.
>
> I spoke with someone on the #cluterslabs IRC channel who mentioned that
> the monitor_scripts param does indeed run at some frequency (op monitor
> timeout=? interval=?), not just during the "start" and "migrate_from"
> actions.
>
> The monitor_scripts param does not support scripts with command line args,
> just a space delimited list for running multiple scripts. This means that
> each VirtualDomain resource needs its own script to be able to define the
> ${DOMAIN_NAME}.  I found that a bit annoying so I created a symlink to a
> wrapper script using the ${DOMAIN_NAME} as the first part of the filename
> and a separator for awk:
>
> The scripts being called by the monitor operation should inherit the
environment from the monitor so that you should be able to use these
variables.

Klaus

> ln -s /path/to/wrapper_script.sh
> /path/to/wrapper/myvmhostname_wrapper_script.sh
>
> and in my wrapper_script.sh:
> #!/bin/bash
> DOMAIN_NAME=$(basename "$0" |awk -F'' '{print $1}')
> /path/to/myscript.sh -H ${DOMAIN_NAME} -C guest-get-time -l 25 -w 1
>
> (a bit hack-y but better than creating 1 script per vm resource and
> modifying it with the ${DOMAIN_NAME})
>
> Then creating the cluster resource:
> pcs resource create myvmhostname VirtualDomain
> config="/path/to/myvmhostname/myvmhostname.xml" hypervisor="qemu:///system"
> migration_transport="ssh" force_stop="false"
> monitor_scripts="/path/to/wrapper/myvmhostname_wrapper_script.sh" meta
> allow-migrate="true" target-role="Stopped" op migrate_from timeout=90s
> interval=0s op migrate_to timeout=120s interval=0s op monitor timeout=40s
> interval=10s op start timeout=90s interval=0s op stop timeout=90s
> interval=0s
>
> ‐‐‐ Original Message ‐‐‐
>
> On Sunday, June 6th, 2021 at 16:56, Kyle O'Donnell  wrote:
>
> > Let me know if there is a better approach to the following problem. When
> the virtual machine does not respond to a state query I want the cluster to
> kick it
> >
> > I could not find any useful docs for using the nagios plugins. After
> reading the documentation about running a custom script via the "monitor"
> function in the RA I determined that would not meet my requirements as it's
> only run on start and migrate(unless I read it incorrectly?).
> >
> > Here is what I did (im on ubuntu 20.04):
> >
> > cp /usr/lib/ocf/resource.d/heartbeat/VirtualDomain
> /usr/lib/ocf/resource.d/heartbeat/MyVirtDomain
> >
> > cp /usr/share/resource-agents/ocft/configs/VirtualDomain cp
> /usr/share/resource-agents/ocft/configs/MyVirtDomain
> >
> > sed -i 's/VirtualDomain/MyVirtDomain/g'
> /usr/lib/ocf/resource.d/heartbeat/MyVirtDomain
> >
> > sed -i 's/VirtualDomain/MyVirtDomain/g'
> /usr/share/resource-agents/ocft/configs/MyVirtDomain
> >
> > edited function MyVirtDomain_status in
> /usr/lib/ocf/resource.d/heartbeat/MyVirtDomain, adding the following to the
> status case running|paused|idle|blocked|"in shutdown")
> >
> > FROM
> >
> > running|paused|idle|blocked|"in shutdown")
> >
> > # running: domain is currently actively consuming cycles
> >
> > # paused: domain is paused (suspended)
> >
> > # idle: domain is running but idle
> >
> > # blocked: synonym for idle used by legacy Xen versions
> >
> > # in shutdown: the domain is in process of shutting down, but has not
> completely shutdown or crashed.
> >
> > ocf_log debug "Virtual domain $DOMAIN_NAME is currently $status."
> >
> > rc=$OCF_SUCCESS
> >
> > TO
> >
> > running|paused|idle|blocked|"in shutdown")
> >
> > # running: domain is currently actively consuming cycles
> >
> > # paused: domain is paused (suspended)
> >
> > # idle: domain is running but idle
> >
> > # blocked: synonym for idle used by legacy Xen versions
> >
> > # in shutdown: the domain is in process of shutting down, but has not
> completely shutdown or crashed.
> >
> > custom_chk=$(/path/to/myscript.sh -H $DOMAIN_NAME -C guest-get-time -l
> 25 -w 1)
> >
> > custom_rc=$?
> >
> > if [ ${custom_rc} -eq 0 ]; then
> >
> > ocf_log debug "Virtual domain $DOMAIN_NAME is currently $status."
> >
> > rc=$OCF_SUCCESS
> >
> > else
> >
> > ocf_log debug "Virtual domain $DOMAIN_NAME is currently ${custom_chk}."
> >
> > rc=$OCF_ERR_GENERIC
> >
> > fi
> >
> > The custom script uses the qemu-guest-agent in my guest, passing the
> parameter to grab the guest's time (seems to be most universal [windows,
> centos6, ubuntu, centos 7]). Runs 25 loops, sleeps 1 second between
> iterations, exit 0 as soon as the agent responds with the time and exit 1
> after the 25th loop, which are OCF_SUCCESS and OCF_ERR_GENERIC based on
> docs.
> >
> > /path/to/myscript.sh -H myvm -C guest-get-time -l 25 -w 1
> > =
> >
> > [GOOD] - myvm virsh qemu-agent-command guest-get-time output:
> {"return":1623011582178375000}
> >
> > or when its not responding:
> >
> > /path/to/myscript.sh -H myvm -C guest-get-time -l 25

Re: [ClusterLabs] Antw: [EXT] DRBD split‑brain investigations, automatic fixes and manual intervention...

2021-10-20 Thread Klaus Wenninger

On Wed, Oct 20, 2021 at 12:06 PM Ian Diddams via Users <
users@clusterlabs.org> wrote:

> FWIW here is the basis for my implementation being the "best" and easily
> followed drbd/clustering guide/explanantiojn I could find when I searched
>
> Lisenet.com :: Linux | Security | Networking | Admin Blog
> 
>
>
> After all it is still a pacemaker cluster so you can get the basics from
https://clusterlabs.org/pacemaker/doc/.
Pick "Clusters from Scratch" - version matching your pacemaker-version -
for an intuitive walkthrough.
It even has a section going through a DRBD setup.

Klaus

> cheers
>
> ian
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Trying to understand dampening (ping)

2021-10-15 Thread Klaus Wenninger

On Fri, Oct 15, 2021 at 12:01 PM Andrei Borzenkov 
wrote:

> On Fri, Oct 15, 2021 at 9:25 AM Klaus Wenninger 
> wrote:
>
> > Main pain-point here is that ping-RA allows us to configure the count of
> pings sent, but it
> > is just using the exit-value from ping that becomes negative already
> when one of the
> > answers is missing.
>
> Use fping instead? Which is supported by ping RA and should behave
> exactly as needed - report host alive if at least one reply was
> received.
>
I like fping but it having some reputation as DOS tool not everybody might
be fine installing it.
And we will still have something that would be fine with at least a 50%
packet
loss, which as well might not be acceptable to qualify a host as reachable.
But of course we still can tweak it even with the current implementation to
let's say a loss <20% by giving the same host 5 times and having
the limit set to 4.

>
> Maybe when using ping RA could also parse ping output instead of
> relying on exit status.
>
as the fence-agent referenced is doing ;-)

> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Trying to understand dampening (ping)

2021-10-15 Thread Klaus Wenninger

On Thu, Oct 14, 2021 at 10:51 PM martin doc  wrote:

>
>
> --
> *From: *Andrei Borzenkov ,  Friday, 15 October 2021
> 4:59 AM
> *...*
> > Dampening defines delay before attributes are committed to CIB.
> > Private attributes are never ever written into CIB, so dampening
> > makes no sense here. Private attributes are managed by attrd
> > itself and you see the latest value.
>
> > If you change transient attribute (without -p option) value you
> > will see different values reported by
>
> > attrd_updater -n my_ping -Q
>
> > and
>
> > cibadmin -Q -A "//nvpair[@name='my_ping']"
>
> > until dampening timeout expires.
>
> > This applies even to deleting attribute.
>
> Ok, now I understand what the dampen function does.
>
> If I understand this correctly then this probably makes every documented
> example of using ocf:pacemaker:ping with a colocation statement wrong
> because the only way to see the effect of dampen is to use a rule that
> references the value of pingd directly. That or the script for ping has a
> major flaw with respect to dampen.
>

As we've already tried to explain, purpose of dampening is not
implementation of any
kind of resilience against loss of a certain percentage of packets or
anything similar.

Basic idea is to have more than one ping host so that - given failure_score
is low enough -
there is gonna be a certain resilience against packet loss.
If your number of ping-hosts isn't large enough you might play with adding
them in multiple
times to get some kind of resilience.
But I agree that this one out of two behavior is probably too resilient for
most cases and
thus there might be room for improvement.
Main pain-point here is that ping-RA allows us to configure the count of
pings sent, but it
is just using the exit-value from ping that becomes negative already when
one of the
answers is missing.
This is why with
https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py
I chose to both give the number of packets sent + number received necessary
to be
assumed as alive. If we assume the latter, when not given at all, as equal
to the number
of packets sent we would preserve unchanged behavior for existent
configurations.

Klaus

>
> That is when I do this:
>
> pcs resource create myPing ocf:pacemaker:ping host_list=192.168.1.1
> failure_score=1
> pcs resource create database ocf:heartbeat:pgsql
> pcs group add pgrp myPing database
>
> PCS will move everything to a new node if there is even 1 ping failure
> because monitor in ping doesn't look at the dampened value, only the value
> of the immediate returned value.
>
> The same is true with colocation statements - if a constraint is made with
> a ping resource without using a rule that references pingd then  the dampen
> behaviour is ignored completely.
>
> Is the ping'er missing something that does this:
>
> score=`cibadmin -Q -A "//nvpair[@name='ping']" | sed -e
> 's/.*value="\([^"]*\)".*/\1/'`
>
> before it checks if $score is less than $OCF_RESKEY_failure_score?
>
> Thanks
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Trying to understand dampening (ping)

2021-10-14 Thread Klaus Wenninger

afair the idea of dampening isn't to configure the behavior of the cluster
- like be robust against some kind of glitches.
It is rather there to keep resources used to write content to the CIB under
control.

Klaus

On Thu, Oct 14, 2021 at 8:29 AM Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:

> Hi!
>
> IMHO "dampening" was a very bad term, and it had confused me right from the
> start.
> Maybe "change_ignore_time" would have been better.
> But actually a true moving average (over a fixed window) would be much
> preferrable.
> Maybe exponential averaging, too.
>
> And the description in pingd is very poor, also:
> dampen (integer, [1s]): Dampening interval
> The time to wait (dampening) further changes occur
>
> Regards,
> Ulrich
>
> >>> martin doc  schrieb am 13.10.2021 um 17:01 in
> Nachricht
> <
> ps2p216mb0546168e8f60ee9d89131efcc2...@ps2p216mb0546.korp216.prod.outlook.com
> >:
>
> > In the ping resource script, there's support for "dampen" in the use of
> > attrd_updater.
> >
> > My expectation is that it will cause "ping", "no‑ping", "ping" to result
> in
>
> > the service being continually presented as up rather than to flap about.
> >
> > In testing I can't demonstrate this, even using attrd_updater directly.
> >
> > To test out how attrd_updater works, I wrote a small script to do this:
> >
> > attrd_updater ‑n my_ping ‑D
> > attrd_updater ‑n my_ping ‑p ‑B 1000 ‑d 3s
> > sleep 1
> > for i in 0 1 2 3 4 5 6 7 8 9; do
> > attrd_updater ‑n my_ping ‑Q
> > sleep 1
> > attrd_updater ‑n my_ping ‑p ‑U 0 ‑d 3s
> > done
> >
> > The output always has the first line as 1000 and every other line with a
> > valud of "0" ‑ as if there was no dampening actually happening.
> >
> > Even if I modify the above to do ‑U 1000, ‑U 0, ‑U 1000, doing ‑Q at any
> point
> > always shows the last value supplied, with no evidence of any smoothng
> as a
>
> > result of dampening.
> >
> > Is the problem here that the ‑Q doesn't retrieve the value for my_ping
> using
>
> > the same method as is used for resource scripts?
> >
> > Am I totally misunderstanding how dampening works?
> >
> > Thanks.
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Problem with high load (IO)

2021-09-28 Thread Klaus Wenninger

On Tue, Sep 28, 2021 at 11:23 AM Lentes, Bernd <
bernd.len...@helmholtz-muenchen.de> wrote:

>
>
> - On Sep 27, 2021, at 2:51 PM, Pacemaker ML users@clusterlabs.org
> wrote:
>
> > I would use something liek this:
> >
> > ionice -c 2 -n 7 nice cp XXX YYY
> >
> > Best Regards,
> > Strahil Nikolov
>
> Just for a better understanding:
>
> ionice does not relate to the copy procedure in this commandline, but to
> the nice program.
> What is the advantage if nice does treat IO a bit more carefully ?
>

ionice, just like nice as well, does some configuration - in that mode
where the executable is given
as argument - and then exchanges the exec-context.
You can do that in a cascaded manner.

Klaus


> Is there a way in this commandline that ionice relates to the copy program
> ?
>
> What is with ionice -c 2 -n 7 (nice cp XXX YYY) ? With the brackets both
> programs are executed in the same shell.
> Would that help ?
>
> Bernd
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Qemu VM resources - cannot acquire state change lock

2021-08-26 Thread Klaus Wenninger

On Thu, Aug 26, 2021 at 11:13 AM lejeczek via Users 
wrote:

> Hi guys.
>
> I sometimes - I think I know when in terms of any pattern -
> get resources stuck on one node (two-node cluster) with
> these in libvirtd's logs:
> ...
> Cannot start job (query, none, none) for domain
> c8kubermaster1; current job is (modify, none, none) owned by
> (192261 qemuProcessReconnect, 0 , 0 
> (flags=0x0)) for (1093s, 0s, 0s)
> Cannot start job (query, none, none) for domain ubuntu-tor;
> current job is (modify, none, none) owned by (192263
> qemuProcessReconnect, 0 , 0  (flags=0x0)) for
> (1093s, 0s, 0s)
> Timed out during operation: cannot acquire state change lock
> (held by monitor=qemuProcessReconnect)
> Timed out during operation: cannot acquire state change lock
> (held by monitor=qemuProcessReconnect)
> ...
>
> when this happens, and if the resourec is meant to be the
> other node, I have to to disable the resource first, then
> the node on which resources are stuck will shutdown the VM
> and then I have to re-enable that resource so it would, only
> then, start on that other, the second node.
>
> I think this problem occurs if I restart 'libvirtd' via systemd.
>
> Any thoughts on this guys?
>

What are the logs on the pacemaker-side saying?
An issue with migration?

Klaus

> many thanks, L.
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Pacemaker problems with pingd

2021-08-05 Thread Klaus Wenninger

On Wed, Aug 4, 2021 at 5:30 PM Janusz Jaskiewicz <
janusz.jaskiew...@gmail.com> wrote:

> Hello.
>
> Please forgive the length of this email but I wanted to provide as much
> details as possible.
>
> I'm trying to set up a cluster of two nodes for my service.
> I have a problem with a scenario where the network between two nodes gets
> broken and they can no longer see each other.
> This causes split-brain.
> I know that proper way of implementing this would be to employ STONITH,
> but it is not feasible for me now (I don't have necessary hardware support
> and I don't want to introduce another point of failure by introducing
> shared storage based STONITH).
>
> In order to work-around the split-brain scenario I introduced pingd to my
> cluster, which in theory should do what I expect.
> pingd pings a network device, so when the NIC is broken on one of my
> nodes, this node should not run the resources because pingd would fail for
> it.
>
As we've discussed on this list in multiple previous threads already there
are lots of failure scenarios
where cluster-nodes don't see each other but both can ping something else
on the network.
Important cases where your approach wouldn't work are as well those where
nodes are just
partially alive - leads to corosync membership being lost & node not able
to stop resources
properly anymore.
Thus it is highly recommended to have all these setups that rely on some
kind of self-fencing or
bringing down of resources within some timeout being guarded by a
(hardware)-watchdog.
Previously you probably were referring to SBD which implements such a
watchdog-guarded approach. As you've probably figured out you can't
directly use SBD
in a 2-node-setup without a shared-disk. Pure watchdog-fencing needs quorum
decision
made by at least 3 instances. If you don't want a full blown 3rd node you
can consider
qdevice - can be used by multiple 2-node-clusters for quorum evaluation.
Otherwise you can use SBD with a shared disk.
You are right that both, a shared disk and any kind of 3rd node are an
additional point of
failure. Important is that in both cases we are talking about a point of
failure but not of a
single point of failure - meaning it failing it would not necessarily
impose services to be
shutdown.

Klaus

>
> pingd resource is configured to update the value of variable 'pingd'
> (interval: 5s, dampen: 3s, multiplier:1000).
> Based on the value of pingd I have a location constraint which sets score
> to -INFINITY for resource DimProdClusterIP when 'pingd' is not 1000.
> All other resources are colocated with DimProdClusterIP, and
> DimProdClusterIP should start before all other resources.
>
> Based on that setup I would expect that when the resources run on
> dimprod01 and I disconnect dimprod02 from the network, the resources will
> not start on dimprod02.
> Unfortunately I see that after a token interval + consensus interval my
> resources are brought up for a moment and then go down again.
> This is undesirable, as it causes DRBD split-brain inconsistency and
> cluster IP may also be taken over by the node which is down.
>
> I tried to debug it, but I can't figure out why it doesn't work.
> I would appreciate any help/pointers.
>
>
> Following are some details of my setup and snippet of pacemaker logs with
> comments:
>
> Setup details:
>
> pcs status:
> Cluster name: dimprodcluster
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: dimprod02 (version 2.0.5-9.el8_4.1-ba59be7122) - partition
> with quorum
>   * Last updated: Tue Aug  3 08:20:32 2021
>   * Last change:  Mon Aug  2 18:24:39 2021 by root via cibadmin on
> dimprod01
>   * 2 nodes configured
>   * 8 resource instances configured
>
> Node List:
>   * Online: [ dimprod01 dimprod02 ]
>
> Full List of Resources:
>   * DimProdClusterIP (ocf::heartbeat:IPaddr2): Started dimprod01
>   * WyrDimProdServer (systemd:wyr-dim): Started dimprod01
>   * Clone Set: WyrDimProdServerData-clone [WyrDimProdServerData]
> (promotable):
> * Masters: [ dimprod01 ]
> * Slaves: [ dimprod02 ]
>   * WyrDimProdFS (ocf::heartbeat:Filesystem): Started dimprod01
>   * DimTestClusterIP (ocf::heartbeat:IPaddr2): Started dimprod01
>   * Clone Set: ping-clone [ping]:
> * Started: [ dimprod01 dimprod02 ]
>
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
>
>
> pcs constraint
> Location Constraints:
>   Resource: DimProdClusterIP
> Constraint: location-DimProdClusterIP
>   Rule: score=-INFINITY
> Expression: pingd ne 1000
> Ordering Constraints:
>   start DimProdClusterIP then promote WyrDimProdServerData-clone
> (kind:Mandatory)
>   promote WyrDimProdServerData-clone then start WyrDimProdFS
> (kind:Mandatory)
>   start WyrDimProdFS then start WyrDimProdServer (kind:Mandatory)
>   start WyrDimProdServer then start DimTestClusterIP (kind:Mandatory)
> Colocation Constraints:
>   WyrDimProdServer with DimProdClusterIP (score:INFINITY)
>   DimTestClusterIP with

Re: [ClusterLabs] Sub-clusters / super-clusters?

2021-08-03 Thread Klaus Wenninger

On Tue, Aug 3, 2021 at 10:41 AM Antony Stone 
wrote:

> On Tuesday 11 May 2021 at 12:56:01, Strahil Nikolov wrote:
>
> > Here is the example I had promised:
> >
> > pcs node attribute server1 city=LA
> > pcs node attribute server2 city=NY
> >
> > # Don't run on any node that is not in LA
> > pcs constraint location DummyRes1 rule score=-INFINITY city ne LA
> >
> > #Don't run on any node that is not in NY
> > pcs constraint location DummyRes2 rule score=-INFINITY city ne NY
> >
> > The idea is that if you add a node and you forget to specify the
> attribute
> > with the name 'city' , DummyRes1 & DummyRes2 won't be started on it.
> >
> > For resources that do not have a constraint based on the city -> they
> will
> > run everywhere unless you specify a colocation constraint between the
> > resources.
>
> Excellent - thanks.  I happen to use crmsh rather than pcs, but I've
> adapted
> the above and got it working.
>
> Unfortunately, there is a problem.
>
> My current setup is:
>
> One 3-machine cluster in city A running a bunch of resources between them,
> the
> most important of which for this discussion is Asterisk telephony.
>
> One 3-machine cluster in city B doing exactly the same thing.
>
> The two clusters have no knowledge of each other.
>
> I have high-availability routing between my clusters and my upstream
> telephony
> provider, such that a call can be handled by Cluster A or Cluster B, and
> if
> one is unavailable, the call gets routed to the other.
>
> Thus, a total failure of Cluster A means I still get phone calls, via
> Cluster
> B.
>
>
> To implement the above "one resource which can run anywhere, but only a
> single
> instance", I joined together clusters A and B, and placed the
> corresponding
> location constraints on the resources I want only at A and the ones I want
> only at B.  I then added the resource with no location constraint, and it
> runs
> anywhere, just once.
>
> So far, so good.
>
>
> The problem is:
>
> With the two independent clusters, if two machines in city A fail, then
> Cluster A fails completely (no quorum), and Cluster B continues working.
> That
> means I still get phone calls.
>
> With the new setup, if two machines in city A fail, then _both_ clusters
> stop
> working and I have no functional resources anywhere.
>
Why that? If you are talking about quorum a 4-node partition in a 6-node
cluster should be quorate.
Not saying the config is ideal though. Even node number ...
And when city A doesn't see city B you end up with 2 3-node partitions
that aren't quorate without additional measures.
Did you consider booth? Might really be a better match for your problem.

Klaus

>
>
> So, my question now is:
>
> How can I have a 3-machine Cluster A running local resources, and a
> 3-machine
> Cluster B running local resources, plus one resource running on either
> Cluster
> A or Cluster B, but without a failure of one cluster causing _everything_
> to
> stop?
>
>
> Thanks,
>
>
> Antony.
>
> --
> One tequila, two tequila, three tequila, floor.
>
>Please reply to the
> list;
>  please *don't* CC
> me.
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: [EXT] Re: Two node cluster without fencing and no split brain?

2021-07-23 Thread Klaus Wenninger

On Fri, Jul 23, 2021 at 8:55 AM Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:

> >>> "john tillman"  schrieb am 22.07.2021 um 16:48 in
> Nachricht
> <1175ffcec0033015e13d11d7821d5acb.squir...@mail.panix.com>:
> > There was a lot of discussion on this topic which might have overshadowed
> > this question so I will ask it again in case someone missed it.
> >
> > It comes from a post (see below) that we were pointed to here by Andrei:
> >
> > Is there something like the described "ping tiebreaker" in the current
> > world of pacemaker/corosync?
>
> Maybe explain how it should work:
> If the two nodes cannot rech each other, but each can reach the ping node,
> which node has the quorum then?
>

Guess both - which is what is played down as 'disadvantage' in the
description
below ;-)


>
> >
> > Best Regards,
> > ‑John
> >
> >> Interesting read.  Thank you for providing it!
> >>
> >> In this follow up post
> >>
> >
> https://techthoughts.typepad.com/managing_computers/2007/10/more
> ‑about‑quor.htm
>
> > l
> >> the author mentions the following:
> >>
> >> Ping tiebreaker
> >>
> >> Some HA systems provide  a ping tiebreaker.  To make this work, you
> pick a
> >> address outside the cluster to ping, and any partition that can ping
> that
> >> address has quorum.  The obvious advantage is that it's very simple to
> set
> >> up ‑ doesn't require any additional servers or shared disk.  The
> >> disadvantage (and it's a big one) is that it's very possible for
> multiple
> >> partitions to think they have quorum.  In the case of split‑site
> (disaster
> >> recovery) type clusters, it's going to happen fairly often.  If you can
> >> use this method for a single site in conjunction with fencing, then it
> >> will likely work out quite well.  It's a lot better than no tiebreaker,
> or
> >> one that always says "you have quorum".  Having said that, it's
> >> significantly inferior to any of the other methods.
> >>
> >> The quote "It's a lot better than no tiebreaker..." is what I am looking
> >> for.  Is there something like a "ping tiebreaker" in the current world
> of
> >> pacemaker/corosync?
> >>
> >> Thanks to all those who have already commented on my question.  I
> >> appreciate the input/education.
> >>
> >> Best Regards,
> >> ‑John
> >>
> >>
> >>
> >>> On Wed, Jul 21, 2021 at 3:55 PM Ulrich Windl
> >>>  wrote:
> 
>  Hi!
> 
>  Maybe someone feels motivated to write some article comparing the
>  concepts
>  * split brain
>  * quorum
>  * fencing
> 
> >>>
> >>> Yet another one? Using your own reply "search is free".
> >>>
> >>>
> >
> https://techthoughts.typepad.com/managing_computers/2007/10/split
> ‑brain‑quo.htm
>
> > l
> >>>
>  There are eight possible states that I tried to illustrate on the
>  attached sketch (S="Split Brain", "Q=Quorum, F=Fencing).
> 
>  ;‑)
> 
>  Regards,
>  Ulrich
> 
> 
>  >>> Andrei Borzenkov 21.07.2021, 07:52 >>>
> 
>  On 21.07.2021 07:28, Strahil Nikolov via Users wrote:
>  > Hi,
>  > consider using a 3rd system as a Q disk.
> 
>  What was not clear in "Quorum is a different concept and doesn't
> remove
>  the need for fencing"?
> 
>  > Also, you can use iscsi from that node as a SBD device, so you will
>  have proper fencing .If you don't have a hardware watchdog device, you
>  can use softdog kernel module for that.
>  > Best Regards,Strahil Nikolov
>  >
>  >
>  > On Wed, Jul 21, 2021 at 1:45, Digimer wrote: On
>  2021‑07‑20 6:04 p.m., john tillman wrote:
>  >> Greetings,
>  >>
>  >> Is it possible to configure a two node cluster (pacemaker 2.0)
>  without
>  >> fencing and avoid split brain?
>  >
>  > No.
>  >
>  >> I was hoping there was a way to use a 3rd node's ip address, like
>  from a
>  >> network switch, as a tie breaker to provide quorum. A simple
>  successful
>  >> ping would do it.
>  >
>  > Quorum is a different concept and doesn't remove the need for
>  fencing.
>  >
>  >> I realize that this 'ping' approach is not the bullet proof
> solution
>  that
>  >> fencing would provide. However, it may be an improvement over two
>  nodes
>  >> alone.
>  >
>  > It would be, at best, a false sense of security.
>  >
>  >> Is there a configuration like that already? Any other ideas?
>  >>
>  >> Pointers to useful documents/discussions on avoiding split brain
>  with
>  two
>  >> node clusters would be welcome.
>  >
>  > https://www.alteeve.com/w/The_2‑Node_Myth
>  >
>  > (note: currently throwing a cert error related to the let's encrypt
>  > issue, should be cleared up soon).
>  >
>  >
>  > ___
>  > Manage your subscription:
>  > https://lists.clusterlabs.org/mailman/listinfo/users
>  >
>  > ClusterLabs home:

Re: [ClusterLabs] Antw: [EXT] Re: unexpected fenced node and promotion of the new master PAF ‑ postgres

2021-07-14 Thread Klaus Wenninger

ed-storage based
> > fencing daemon...
> > Jul 13 20:42:14 ltaoperdbs02 sbd[185352]:   notice: main: Doing flush +
> > writing 'b' to sysrq on timeout
> > Jul 13 20:42:14 ltaoperdbs02 sbd[185362]:   pcmk:   notice:
> > servant_pcmk: Monitoring Pacemaker health
> > Jul 13 20:42:14 ltaoperdbs02 sbd[185363]:cluster:   notice:
> > servant_cluster: Monitoring unknown cluster health
> > Jul 13 20:42:15 ltaoperdbs02 sbd[185357]:   notice: inquisitor_child:
> > Servant cluster is healthy (age: 0)
> > Jul 13 20:42:15 ltaoperdbs02 sbd[185357]:   notice: watchdog_init: Using
> > watchdog device '/dev/watchdog'
> > Jul 13 20:42:15 ltaoperdbs02 systemd[1]: Started Shared-storage based
> > fencing daemon.
> > Jul 13 20:42:19 ltaoperdbs02 sbd[185357]:   notice: inquisitor_child:
> > Servant pcmk is healthy (age: 0)
> >
> > this is happening to all 3 nodes, any toughts?
>
> Bad watchdog?
>
> >
> > Thanks for helping, have as good day
> >
> > Damiano
> >
> >
> > Il giorno mer 14 lug 2021 alle ore 10:08 Klaus Wenninger <
> > kwenn...@redhat.com> ha scritto:
> >
> >>
> >>
> >> On Wed, Jul 14, 2021 at 6:40 AM Andrei Borzenkov 
> >> wrote:
> >>
> >>> On 13.07.2021 23:09, damiano giuliani wrote:
> >>> > Hi Klaus, thanks for helping, im quite lost because cant find out the
> >>> > causes.
> >>> > i attached the corosync logs of all three nodes hoping you guys can
> find
> >>> > and hint me  something i cant see. i really appreciate the effort.
> >>> > the old master log seems cutted at 00:38. so nothing interessing.
> >>> > the new master and the third slave logged what its happened. but i
> cant
> >>> > figure out the cause the old master went lost.
> >>> >
> >>>
> >>> The reason it was lost is most likely outside of pacemaker. You need to
> >>> check other logs on the node that was lost, may be BMC if this is bare
> >>> metal or hypervisor if it is virtualized system.
> >>>
> >>> All that these logs say is that ltaoperdbs02 was lost from the point of
> >>> view of two other nodes. It happened at the same time (around Jul 13
> >>> 00:40) which suggests ltaoperdbs02 had some problem indeed. Whether it
> >>> was software crash, hardware failure or network outage cannot be
> >>> determined from these logs.
> >>>
> >>> What speaks against a pure network-outage is that we don't see
> >> the corosync memberhip messages on the node that died.
> >> Of course it is possible that the log wasn't flushed out before reboot
> >> but usually I'd expect that there would be enough time.
> >> If something kept corosync or sbd from being scheduled that would
> >> explain why we don't see messages from these instances.
> >> And that was why I was asking to check if in the setup corosync and
> >> sbd are able to switch to rt-scheduling.
> >> But of course that is all speculations and from what we know it can
> >> be merely anything from an administrative hard shutdown via
> >> some BMC to whatever.
> >>
> >>>
> >>> > something interessing could be the stonith logs of the new master and
> >>> the
> >>> > third slave:
> >>> >
> >>> > NEW MASTER:
> >>> > grep stonith-ng /var/log/messages
> >>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Node
> >>> ltaoperdbs02
> >>> > state is now lost
> >>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Purged 1
> peer
> >>> > with id=1 and/or uname=ltaoperdbs02 from the membership cache
> >>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Client
> >>> > crmd.228700.154a9e50 wants to fence (reboot) 'ltaoperdbs02' with
> device
> >>> > '(any)'
> >>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Requesting
> >>> peer
> >>> > fencing (reboot) targeting ltaoperdbs02
> >>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Couldn't
> find
> >>> > anyone to fence (reboot) ltaoperdbs02 with any device
> >>> > Jul 13 00:40:37 ltaoperdbs03 stonith-ng[228696]:  notice: Waiting 10s
> >>> for
> >>> > ltaoperdbs02 to self-fence (reboot) for client crmd.228700.f5d882d5
> >>> > Jul 13 00:40:47 ltaoperdbs03 stonith-ng[

1 2 3 4 5 6 >

1 - 100 of 515 matches

Mail list logo