Re: [ClusterLabs] Bug pacemaker with multiple IP

2022-12-20 Thread Reid Wahl
On Tue, Dec 20, 2022 at 6:25 AM Thomas CAS  wrote:
>
> Hello Ken,
>
> Thanks for your answer.
> There was no update running at the time of the bug, which is why I thought 
> that having too many IPs caused this type of error.
> The /usr/sbin/ip executable was not being modified either.
>
> We have many clusters, and only this one has so many IPs and this problem.

How often does this happen, and is it reliably reproducible under any
circumstances? Any antivirus software running? It'd be nice to check
something like lsof or strace while it's happening, but that may not
be feasible if it's sporadic; running those at every monitor would
generate lots of logs.

AFAICT, having multiple processes execute (or read) the `ip` binary
simultaneously *shouldn't* cause problems, as long as nothing opens it
for write.

>
> Best regards,
>
> Thomas Cas  |  Technicien du support infogérance
> PHONE : +33 3 51 25 23 26   WEB : www.ikoula.com/en
> IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE
> Before printing this letter, think about the impact on the environment!
>
> -Message d'origine-
> De : Ken Gaillot 
> Envoyé : lundi 19 décembre 2022 22:08
> À : Cluster Labs - All topics related to open-source clustering welcomed 
> 
> Cc : Service Infogérance 
> Objet : Re: [ClusterLabs] Bug pacemaker with multiple IP
>
> [Vous ne recevez pas souvent de courriers de kgail...@redhat.com. Découvrez 
> pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
>
> On Mon, 2022-12-19 at 09:48 +, Thomas CAS wrote:
> > Hello Clusterlabs,
> >
> > I would like to report a bug on Pacemaker with the "IPaddr2"
> > resource:
> >
> > OS: Debian 10
> > Kernel: Linux wd-websqlng01 4.19.0-18-amd64 #1 SMP Debian 4.19.208-1
> > (2021-09-29) x86_64 GNU/Linux
> > Pacemaker version: 2.0.1-5+deb10u2
> >
> > You will find the configuration of our cluster with 2 nodes attached.
> >
> > Bug :
> >
> > We have several IP configured in the cluster configuration (12)
> > Sometimes the cluster is unstable with the following errors in the
> > pacemaker logs:
> >
> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > (operation_finished)   notice: NGINX-VIP-
> > 232_monitor_1:28835:stderr [
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
>
> This doesn't sound like a bug in the agent; "Text file busy" suggests that 
> the system "ip" command is being modified while the command is running. Is a 
> software update happening when the problem occurs?
>
> I'm not sure whether there's some other situation that could cause that 
> error, but simply executing the command a bunch of times simultaneously 
> shouldn't cause it as far as I know.
>
> If simultaneous monitors is somehow causing the problem, you should be able 
> to work around it by using different intervals for different monitors.
>
> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > (operation_finished)   notice: NGINX-VIP-
> > 239_monitor_1:28877:stderr [
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > (operation_finished)   notice: NGINX-VIP-
> > 239_monitor_1:28877:stderr [
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > (operation_finished)   notice: NGINX-VIP-
> > 234_monitor_1:28830:stderr [
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > (operation_finished)   notice: NGINX-VIP-
> > 231_monitor_1:28900:stderr [
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > (operation_finished)   notice: NGINX-VIP-
> > 231_monitor_1:28900:stderr [
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > (operation_finished)   notice: NGINX-VIP-
> > 235_monitor_1:28905:stderr [
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > (operation_finished)   notice: NGINX-VIP-
> > 235_monitor_1:28905:stderr [
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> > (operation_finished)   notice: NGINX-VIP-
> > 237_monitor_1:28890:stderr [
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> > /usr/lib/oc

Re: [ClusterLabs] Antw: [EXT] Re: Stonith

2022-12-20 Thread Ken Gaillot
On Tue, 2022-12-20 at 11:33 +0300, Andrei Borzenkov wrote:
> On Tue, Dec 20, 2022 at 10:07 AM Ulrich Windl
>  wrote:
> > > But keep in mind that if the whole site is down (or unaccessible)
> > > you
> > > will not have access to IPMI/PDU/whatever on this site so your
> > > stonith
> > > agents will fail ...
> > 
> > But, considering the design, such site won't have a quorum and
> > should commit suicide, right?
> > 
> 
> Not by default.

And even if it does, the rest of the cluster can't assume that it did,
so resources can't be recovered. It could work with sbd, but the poster
said that the physical hosts aren't accessible.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Bug pacemaker with multiple IP

2022-12-20 Thread Thomas CAS
Hello Ken,

Thanks for your answer.
There was no update running at the time of the bug, which is why I thought that 
having too many IPs caused this type of error.
The /usr/sbin/ip executable was not being modified either.

We have many clusters, and only this one has so many IPs and this problem.

Best regards,

Thomas Cas  |  Technicien du support infogérance
PHONE : +33 3 51 25 23 26   WEB : www.ikoula.com/en
IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE
Before printing this letter, think about the impact on the environment!

-Message d'origine-
De : Ken Gaillot  
Envoyé : lundi 19 décembre 2022 22:08
À : Cluster Labs - All topics related to open-source clustering welcomed 

Cc : Service Infogérance 
Objet : Re: [ClusterLabs] Bug pacemaker with multiple IP

[Vous ne recevez pas souvent de courriers de kgail...@redhat.com. Découvrez 
pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]

On Mon, 2022-12-19 at 09:48 +, Thomas CAS wrote:
> Hello Clusterlabs,
>
> I would like to report a bug on Pacemaker with the "IPaddr2"
> resource:
>
> OS: Debian 10
> Kernel: Linux wd-websqlng01 4.19.0-18-amd64 #1 SMP Debian 4.19.208-1
> (2021-09-29) x86_64 GNU/Linux
> Pacemaker version: 2.0.1-5+deb10u2
>
> You will find the configuration of our cluster with 2 nodes attached.
>
> Bug :
>
> We have several IP configured in the cluster configuration (12) 
> Sometimes the cluster is unstable with the following errors in the 
> pacemaker logs:
>
> Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> (operation_finished)   notice: NGINX-VIP-
> 232_monitor_1:28835:stderr [
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]

This doesn't sound like a bug in the agent; "Text file busy" suggests that the 
system "ip" command is being modified while the command is running. Is a 
software update happening when the problem occurs?

I'm not sure whether there's some other situation that could cause that error, 
but simply executing the command a bunch of times simultaneously shouldn't 
cause it as far as I know.

If simultaneous monitors is somehow causing the problem, you should be able to 
work around it by using different intervals for different monitors.

> Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> (operation_finished)   notice: NGINX-VIP-
> 239_monitor_1:28877:stderr [
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> (operation_finished)   notice: NGINX-VIP-
> 239_monitor_1:28877:stderr [
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> (operation_finished)   notice: NGINX-VIP-
> 234_monitor_1:28830:stderr [
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> (operation_finished)   notice: NGINX-VIP-
> 231_monitor_1:28900:stderr [
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> (operation_finished)   notice: NGINX-VIP-
> 231_monitor_1:28900:stderr [
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> (operation_finished)   notice: NGINX-VIP-
> 235_monitor_1:28905:stderr [
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> (operation_finished)   notice: NGINX-VIP-
> 235_monitor_1:28905:stderr [
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> (operation_finished)   notice: NGINX-VIP-
> 237_monitor_1:28890:stderr [
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> (operation_finished)   notice: NGINX-VIP-
> 237_monitor_1:28890:stderr [
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1:
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> (operation_finished)   notice: NGINX-VIP-
> 238_monitor_1:28876:stderr [
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709:
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ]
> Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079]
> (operation_finished)   notice: NGINX-VIP-
> 238_monitor_1:28

Re: [ClusterLabs] Antw: [EXT] Re: Stonith

2022-12-20 Thread Andrei Borzenkov
On Tue, Dec 20, 2022 at 10:07 AM Ulrich Windl
 wrote:
> >
> > But keep in mind that if the whole site is down (or unaccessible) you
> > will not have access to IPMI/PDU/whatever on this site so your stonith
> > agents will fail ...
>
> But, considering the design, such site won't have a quorum and should commit 
> suicide, right?
>

Not by default.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/