Re: [ClusterLabs] Q: fence_kdump and fence_kdump_send

2022-02-25 Thread Strahil Nikolov via Users
I always used this one for triggering kdump when using 
sbd:https://www.suse.com/support/kb/doc/?id=19873
 
 
  On Fri, Feb 25, 2022 at 21:34, Reid Wahl wrote:   On Fri, 
Feb 25, 2022 at 3:47 AM Andrei Borzenkov  wrote:
>
> On Fri, Feb 25, 2022 at 2:23 PM Reid Wahl  wrote:
> >
> > On Fri, Feb 25, 2022 at 3:22 AM Reid Wahl  wrote:
> > >
> ...
> > > >
> > > > So what happens most likely is that the watchdog terminates the kdump.
> > > > In that case all the mess with fence_kdump won't help, right?
> > >
> > > You can configure extra_modules in your /etc/kdump.conf file to
> > > include the watchdog module, and then restart kdump.service. For
> > > example:
> > >
> > > # grep ^extra_modules /etc/kdump.conf
> > > extra_modules i6300esb
> > >
> > > If you're not sure of the name of your watchdog module, wdctl can help
> > > you find it. sbd needs to be stopped first, because it keeps the
> > > watchdog device timer busy.
> > >
> > > # pcs cluster stop --all
> > > # wdctl | grep Identity
> > > Identity:      i6300ESB timer [version 0]
> > > # lsmod | grep -i i6300ESB
> > > i6300esb              13566  0
> > >
> > >
> > > If you're also using fence_sbd (poison-pill fencing via block device),
> > > then you should be able to protect yourself from that during a dump by
> > > configuring fencing levels so that fence_kdump is level 1 and
> > > fence_sbd is level 2.
> >
> > RHKB, for anyone interested:
> >  - sbd watchdog timeout causes node to reboot during crash kernel
> > execution (https://access.redhat.com/solutions/3552201)
>
> What is not clear from this KB (and quotes from it above) - what
> instance updates watchdog? Quoting (emphasis mine)
>
> --><--
> With the module loaded, the timer *CAN* be updated so that it does not
> expire and force a reboot in the middle of vmcore generation.
> --><--
>
> Sure it can, but what program exactly updates the watchdog during
> kdump execution? I am pretty sure that sbd does not run at this point.

That's a valid question. I found this approach to work back in 2018
after a fair amount of frustration, and didn't question it too deeply
at the time.

The answer seems to be that the kernel does it.
  - https://stackoverflow.com/a/2020717
  - https://stackoverflow.com/a/42589110
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>


-- 
Regards,

Reid Wahl (He/Him), RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Booth ticket multi-site and quorum /Pacemaker

2022-02-25 Thread Strahil Nikolov via Users
man votequorum
auto_tie_breaker: 1 allows you to have quorum with 50%, yet if for example 
Aside (node with lowest id) dies, B side is 50% but won't be able to bring back 
the resources as the node with lowest id is in A side.If you want to avoid 
that, you can bring a qdevice on a VM in third location (even in a cloud 
nearby).


Best Regards,Strahil Nikolov
 
 
  On Fri, Feb 25, 2022 at 20:10, Viet Nguyen wrote:   
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Q: fence_kdump and fence_kdump_send

2022-02-25 Thread Reid Wahl
On Fri, Feb 25, 2022 at 3:47 AM Andrei Borzenkov  wrote:
>
> On Fri, Feb 25, 2022 at 2:23 PM Reid Wahl  wrote:
> >
> > On Fri, Feb 25, 2022 at 3:22 AM Reid Wahl  wrote:
> > >
> ...
> > > >
> > > > So what happens most likely is that the watchdog terminates the kdump.
> > > > In that case all the mess with fence_kdump won't help, right?
> > >
> > > You can configure extra_modules in your /etc/kdump.conf file to
> > > include the watchdog module, and then restart kdump.service. For
> > > example:
> > >
> > > # grep ^extra_modules /etc/kdump.conf
> > > extra_modules i6300esb
> > >
> > > If you're not sure of the name of your watchdog module, wdctl can help
> > > you find it. sbd needs to be stopped first, because it keeps the
> > > watchdog device timer busy.
> > >
> > > # pcs cluster stop --all
> > > # wdctl | grep Identity
> > > Identity:  i6300ESB timer [version 0]
> > > # lsmod | grep -i i6300ESB
> > > i6300esb   13566  0
> > >
> > >
> > > If you're also using fence_sbd (poison-pill fencing via block device),
> > > then you should be able to protect yourself from that during a dump by
> > > configuring fencing levels so that fence_kdump is level 1 and
> > > fence_sbd is level 2.
> >
> > RHKB, for anyone interested:
> >   - sbd watchdog timeout causes node to reboot during crash kernel
> > execution (https://access.redhat.com/solutions/3552201)
>
> What is not clear from this KB (and quotes from it above) - what
> instance updates watchdog? Quoting (emphasis mine)
>
> --><--
> With the module loaded, the timer *CAN* be updated so that it does not
> expire and force a reboot in the middle of vmcore generation.
> --><--
>
> Sure it can, but what program exactly updates the watchdog during
> kdump execution? I am pretty sure that sbd does not run at this point.

That's a valid question. I found this approach to work back in 2018
after a fair amount of frustration, and didn't question it too deeply
at the time.

The answer seems to be that the kernel does it.
  - https://stackoverflow.com/a/2020717
  - https://stackoverflow.com/a/42589110
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>


-- 
Regards,

Reid Wahl (He/Him), RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Odd result from ping RA

2022-02-25 Thread Reid Wahl
On Fri, Feb 25, 2022 at 4:31 AM Ulrich Windl
 wrote:
>
> >>> Reid Wahl  schrieb am 25.02.2022 um 12:31 in Nachricht
> :
> > On Thu, Feb 24, 2022 at 2:28 AM Ulrich Windl
> >  wrote:
> >>
> >> Hi!
> >>
> >> I just discovered this oddity for a SLES15 SP3 cluster:
> >> Feb 24 11:16:17 h16 pacemaker‑attrd[7274]:  notice: Setting
> val_net_gw1[h18]:
> > 1000 ‑> 139000
> >>
> >> That surprised me, because usually the value is 1000 or 0.
> >>
> >> Diggding a bit further I found:
> >> Migration Summary:
> >>   * Node: h18:
> >> * prm_ping_gw1: migration‑threshold=100 fail‑count=1
> last‑failure='Thu
> > Feb 24 11:17:18 2022'
> >>
> >> Failed Resource Actions:
> >>   * prm_ping_gw1_monitor_6 on h18 'error' (1): call=200,
> status='Error',
> > exitreason='', last‑rc‑change='2022‑02‑24 11:17:18 +01:00', queued=0ms,
> exec=0ms
> >>
> >> Digging further:
> >> Feb 24 11:16:17 h18 kernel: BUG: Bad rss‑counter state mm:c620b5fe
>
> > idx:1 val:17
> >> Feb 24 11:16:17 h18 pacemaker‑attrd[6946]:  notice: Setting
> val_net_gw1[h18]:
> > 1000 ‑> 139000
> >> Feb 24 11:17:17 h18 kernel: traps: pacemaker‑execd[38950] general
> protection
> > fault ip:7f610e71cbcf sp:77c25100 error:0 in
> > libc‑2.31.so[7f610e63b000+1e6000]
> >>
> >> (that rss‑counter causing series of core dumps seems to be a new "feature"
> of
> > SLES15 SP3 kernels that is being investigated by support)
> >>
> >> Somewhat later:
> >> Feb 24 11:17:18 h18 pacemaker‑attrd[6946]:  notice: Setting
> val_net_gw1[h18]:
> > 139000 ‑> (unset)
> >> (restarted RA)
> >> Feb 24 11:17:21 h18 pacemaker‑attrd[6946]:  notice: Setting
> val_net_gw1[h18]:
> > (unset) ‑> 1000
> >>
> >> Another node:
> >> Feb 24 11:16:17 h19 pacemaker‑attrd[7435]:  notice: Setting
> val_net_gw1[h18]:
> > 1000 ‑> 139000
> >> Feb 24 11:17:18 h19 pacemaker‑attrd[7435]:  notice: Setting
> val_net_gw1[h18]:
> > 139000 ‑> (unset)
> >> Feb 24 11:17:21 h19 pacemaker‑attrd[7435]:  notice: Setting
> val_net_gw1[h18]:
> > (unset) ‑> 1000
> >>
> >> So it seems the ping RA sets some garbage value when failing. Is that
> > correct?
> >
> > This is ocf:pacemaker:ping, right? And is use_fping enabled?
>
> Correct. use_fping is not set (default value). I found no fping on the host.
>
>
> >
> > Looks like it uses ($active * $multiplier) ‑‑ see ping_update(). I'm
> > assuming your multiplier is 1000.
>
> Corrct: multiplier=1000, and host_list has just one address.
>
> >
> > $active is set by either fping_check() or ping_check(), depending on
> > your configuration. You can see what they're doing here. I'd assume
> > $active is getting set to 139 and then is multiplied by 1000 to set
> > $score later.
>
> But wouldn't that mean 139 hosts were pinged successfully?
> (${HA_BIN}/pingd is being used)

Yeah, that seems to be the intent. Hence my saying "It could also be a
side effect of the fault though, since I don't see anything in
fping_check() or ping_check() that's an obvious candidate for setting
active=139 unless you have a massive host list."

>
> >   ‑
> > https://github.com/ClusterLabs/pacemaker/blob/Pacemaker‑2.0.5/extra/resource
>
> > s/ping#L220‑L277
>
>
> Regards,
> Ulrich
>
> >>
> >> resource‑agents‑4.8.0+git30.d0077df0‑150300.8.20.1.x86_64
> >> pacemaker‑2.0.5+20201202.ba59be712‑150300.4.16.1.x86_64
> >>
> >> Regards,
> >> Ulrich
> >>
> >>
> >> ___
> >> Manage your subscription:
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>
> >> ClusterLabs home: https://www.clusterlabs.org/
> >>
> >
> >
> > ‑‑
> > Regards,
> >
> > Reid Wahl (He/Him), RHCA
> > Senior Software Maintenance Engineer, Red Hat
> > CEE ‑ Platform Support Delivery ‑ ClusterHA
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/



-- 
Regards,

Reid Wahl (He/Him), RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Booth ticket multi-site and quorum /Pacemaker

2022-02-25 Thread Viet Nguyen
Hi,

Thank you so much for the answer. It seems to me that the one option I am
having is one big cluster with 4 nodes.

However, i still can not understand how i could solve the issue when one
site with 2 nodes is down, then the other site along does not have quorum
so it does not work...

Can you please explain more on the approach for one big cluster? I am
opened to another other solutions either commercial or open-source if
available.

Regards,
Viet

On Thu, 24 Feb 2022 at 18:22, Jan Friesse  wrote:

> Hi,
>
> On 24/02/2022 14:19, Viet Nguyen wrote:
> > Hi,
> >
> > Thank you so much! Would you please advise more on this following case:
> >
> > The cluster I am trying to setup is Postgresql with replication streaming
> > with PAF. So, it will decide one node as a master and 3 standby nodes.
> >
> > So, with this, from what I understand from Postgresql, having 2
> independent
> > clusters (one in site A, one in site B) is not possible. I have to go
> with
> > one single cluster with 4 notes located in 2 different locations (site A
> > and site B).
> >
> > Then, my question is:
> >
> > 1. Does the booth ticket work in this setup?
>
> no, not really. booth basically creates cluster on top of 2+ clusters
> and arbitrator.
>
> > 2. Is Qnetd a better option than booth ticket?
>
> It's neither better nor worse. Qdevice (qnetd) adds a vote(s) to the
> quorum (corosync level). Booth is able to fulfill pacemaker constrain
> for ticket given only to one site in automated way.
>
>
> > 3. Is there any better way to manage this?
>
> If you can really use only one big cluster then probably none of booth
> or qdevice is needed.
>
> > 4. Since we have a distributed site and arbitrator, does fencing
> make it
> > even more complicated? How I could solve this problem?
>
> fencing is "must", it doesn't make it more complicated. Probably sbd but
> I have virtually no knowledge about that.
>
>
> >
> > Sorry if my questions sound silly as I am very new to this and thank
> > you so much for your help.
>
> yw
>
> Regards,
>Honza
>
> >
> > Regards,
> > Viet
> >
> > On Thu, 24 Feb 2022 at 12:17, Jan Friesse  wrote:
> >
> >> On 24/02/2022 10:28, Viet Nguyen wrote:
> >>> Hi,
> >>>
> >>> Thank you so so much for your help. May i ask a following up question:
> >>>
> >>> For the option of having one big cluster with 4 nodes without booth,
> >> then,
> >>> if one site (having 2 nodes) is down, then the other site does not work
> >> as
> >>> it does not have quorum, am I right? Even if we have a quorum voter in
> >>
> >> Yup, you are right
> >>
> >>> either site A or B, then, if the site with quorum down, then, the other
> >>> site does not work.  So, how can we avoid this situation as I want
> >>> that if one site is down, the other site still services?
> >>
> >> probably only with qnetd - so basically yet again site C.
> >>
> >> Regards,
> >> Honza
> >>
> >>>
> >>> Regards,
> >>> Viet
> >>>
> >>> On Wed, 23 Feb 2022 at 17:08, Jan Friesse  wrote:
> >>>
>  Viet,
> 
>  On 22/02/2022 22:37, Viet Nguyen wrote:
> > Hi,
> >
> > Could you please help me out with this question?
> >
> > I have 4 nodes cluster running in the same network but in 2 different
>  sites
> > (building A - 2 nodes and building B - 2 nodes). My objective is to
> > setup HA for this cluster with pacemaker. The expectation is if a
> site
> >> is
> > down, the other site still services.
> >
> >From what I could understand so far, in order to make it work, it
> >> needs
>  to
> > have booth ticket manager installed in a different location, let's
> say
> > building C which connects to both sites A and B.
> >
> > With this assumption, i would like to ask few questions:
> >
> >   1. Am i right that I need to setup the booth ticket manager as
> a
>  quorum
> >   voter as well?
> 
>  Yes, booth (arbitrator) has to be installed on "site" C if you want to
>  use booth. Just keep in mind booth has nothing to do with quorum.
> 
> >   2. What happens if  the connection between site A and B is
> down,
> >> but
>  the
> >   connection between A and C, B and C still up? In this case,
> both
>  site A and
> >   B still have the quorum as it can connect to C, but not between
> >> each
>  other?
> 
>  If you use booth then it's not required site A to see site B. It's
> then
>  "site" C problem to decide which site gets ticket.
> 
> 
> >   3. Or is there any better way to manage 2 sites cluster, each
> has
> >> 2
> >   nodes? And if one site is down like environmental disaster,
> then,
>  the other
> >   site still services.
> 
>  Basically there are (at least) two possible solutions:
>  - Have one big cluster without booth and use pcmk constraints
>  - Have two 2 node clusters and use booth. Then each of the two node
>  clusters is 

[ClusterLabs] Antw: [EXT] Re: Odd result from ping RA

2022-02-25 Thread Ulrich Windl
>>> Reid Wahl  schrieb am 25.02.2022 um 12:31 in Nachricht
:
> On Thu, Feb 24, 2022 at 2:28 AM Ulrich Windl
>  wrote:
>>
>> Hi!
>>
>> I just discovered this oddity for a SLES15 SP3 cluster:
>> Feb 24 11:16:17 h16 pacemaker‑attrd[7274]:  notice: Setting
val_net_gw1[h18]: 
> 1000 ‑> 139000
>>
>> That surprised me, because usually the value is 1000 or 0.
>>
>> Diggding a bit further I found:
>> Migration Summary:
>>   * Node: h18:
>> * prm_ping_gw1: migration‑threshold=100 fail‑count=1
last‑failure='Thu 
> Feb 24 11:17:18 2022'
>>
>> Failed Resource Actions:
>>   * prm_ping_gw1_monitor_6 on h18 'error' (1): call=200,
status='Error', 
> exitreason='', last‑rc‑change='2022‑02‑24 11:17:18 +01:00', queued=0ms,
exec=0ms
>>
>> Digging further:
>> Feb 24 11:16:17 h18 kernel: BUG: Bad rss‑counter state mm:c620b5fe

> idx:1 val:17
>> Feb 24 11:16:17 h18 pacemaker‑attrd[6946]:  notice: Setting
val_net_gw1[h18]: 
> 1000 ‑> 139000
>> Feb 24 11:17:17 h18 kernel: traps: pacemaker‑execd[38950] general
protection 
> fault ip:7f610e71cbcf sp:77c25100 error:0 in 
> libc‑2.31.so[7f610e63b000+1e6000]
>>
>> (that rss‑counter causing series of core dumps seems to be a new "feature"
of 
> SLES15 SP3 kernels that is being investigated by support)
>>
>> Somewhat later:
>> Feb 24 11:17:18 h18 pacemaker‑attrd[6946]:  notice: Setting
val_net_gw1[h18]: 
> 139000 ‑> (unset)
>> (restarted RA)
>> Feb 24 11:17:21 h18 pacemaker‑attrd[6946]:  notice: Setting
val_net_gw1[h18]: 
> (unset) ‑> 1000
>>
>> Another node:
>> Feb 24 11:16:17 h19 pacemaker‑attrd[7435]:  notice: Setting
val_net_gw1[h18]: 
> 1000 ‑> 139000
>> Feb 24 11:17:18 h19 pacemaker‑attrd[7435]:  notice: Setting
val_net_gw1[h18]: 
> 139000 ‑> (unset)
>> Feb 24 11:17:21 h19 pacemaker‑attrd[7435]:  notice: Setting
val_net_gw1[h18]: 
> (unset) ‑> 1000
>>
>> So it seems the ping RA sets some garbage value when failing. Is that 
> correct?
> 
> This is ocf:pacemaker:ping, right? And is use_fping enabled?

Correct. use_fping is not set (default value). I found no fping on the host.


> 
> Looks like it uses ($active * $multiplier) ‑‑ see ping_update(). I'm
> assuming your multiplier is 1000.

Corrct: multiplier=1000, and host_list has just one address.

> 
> $active is set by either fping_check() or ping_check(), depending on
> your configuration. You can see what they're doing here. I'd assume
> $active is getting set to 139 and then is multiplied by 1000 to set
> $score later.

But wouldn't that mean 139 hosts were pinged successfully?
(${HA_BIN}/pingd is being used)

>   ‑ 
> https://github.com/ClusterLabs/pacemaker/blob/Pacemaker‑2.0.5/extra/resource

> s/ping#L220‑L277


Regards,
Ulrich

>>
>> resource‑agents‑4.8.0+git30.d0077df0‑150300.8.20.1.x86_64
>> pacemaker‑2.0.5+20201202.ba59be712‑150300.4.16.1.x86_64
>>
>> Regards,
>> Ulrich
>>
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
>>
> 
> 
> ‑‑ 
> Regards,
> 
> Reid Wahl (He/Him), RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE ‑ Platform Support Delivery ‑ ClusterHA
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Q: fence_kdump and fence_kdump_send

2022-02-25 Thread Andrei Borzenkov
On Fri, Feb 25, 2022 at 2:23 PM Reid Wahl  wrote:
>
> On Fri, Feb 25, 2022 at 3:22 AM Reid Wahl  wrote:
> >
...
> > >
> > > So what happens most likely is that the watchdog terminates the kdump.
> > > In that case all the mess with fence_kdump won't help, right?
> >
> > You can configure extra_modules in your /etc/kdump.conf file to
> > include the watchdog module, and then restart kdump.service. For
> > example:
> >
> > # grep ^extra_modules /etc/kdump.conf
> > extra_modules i6300esb
> >
> > If you're not sure of the name of your watchdog module, wdctl can help
> > you find it. sbd needs to be stopped first, because it keeps the
> > watchdog device timer busy.
> >
> > # pcs cluster stop --all
> > # wdctl | grep Identity
> > Identity:  i6300ESB timer [version 0]
> > # lsmod | grep -i i6300ESB
> > i6300esb   13566  0
> >
> >
> > If you're also using fence_sbd (poison-pill fencing via block device),
> > then you should be able to protect yourself from that during a dump by
> > configuring fencing levels so that fence_kdump is level 1 and
> > fence_sbd is level 2.
>
> RHKB, for anyone interested:
>   - sbd watchdog timeout causes node to reboot during crash kernel
> execution (https://access.redhat.com/solutions/3552201)

What is not clear from this KB (and quotes from it above) - what
instance updates watchdog? Quoting (emphasis mine)

--><--
With the module loaded, the timer *CAN* be updated so that it does not
expire and force a reboot in the middle of vmcore generation.
--><--

Sure it can, but what program exactly updates the watchdog during
kdump execution? I am pretty sure that sbd does not run at this point.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Odd result from ping RA

2022-02-25 Thread Reid Wahl
On Fri, Feb 25, 2022 at 3:31 AM Reid Wahl  wrote:
>
> On Thu, Feb 24, 2022 at 2:28 AM Ulrich Windl
>  wrote:
> >
> > Hi!
> >
> > I just discovered this oddity for a SLES15 SP3 cluster:
> > Feb 24 11:16:17 h16 pacemaker-attrd[7274]:  notice: Setting 
> > val_net_gw1[h18]: 1000 -> 139000
> >
> > That surprised me, because usually the value is 1000 or 0.
> >
> > Diggding a bit further I found:
> > Migration Summary:
> >   * Node: h18:
> > * prm_ping_gw1: migration-threshold=100 fail-count=1 
> > last-failure='Thu Feb 24 11:17:18 2022'
> >
> > Failed Resource Actions:
> >   * prm_ping_gw1_monitor_6 on h18 'error' (1): call=200, 
> > status='Error', exitreason='', last-rc-change='2022-02-24 11:17:18 +01:00', 
> > queued=0ms, exec=0ms
> >
> > Digging further:
> > Feb 24 11:16:17 h18 kernel: BUG: Bad rss-counter state mm:c620b5fe 
> > idx:1 val:17
> > Feb 24 11:16:17 h18 pacemaker-attrd[6946]:  notice: Setting 
> > val_net_gw1[h18]: 1000 -> 139000
> > Feb 24 11:17:17 h18 kernel: traps: pacemaker-execd[38950] general 
> > protection fault ip:7f610e71cbcf sp:77c25100 error:0 in 
> > libc-2.31.so[7f610e63b000+1e6000]
> >
> > (that rss-counter causing series of core dumps seems to be a new "feature" 
> > of SLES15 SP3 kernels that is being investigated by support)
> >
> > Somewhat later:
> > Feb 24 11:17:18 h18 pacemaker-attrd[6946]:  notice: Setting 
> > val_net_gw1[h18]: 139000 -> (unset)
> > (restarted RA)
> > Feb 24 11:17:21 h18 pacemaker-attrd[6946]:  notice: Setting 
> > val_net_gw1[h18]: (unset) -> 1000
> >
> > Another node:
> > Feb 24 11:16:17 h19 pacemaker-attrd[7435]:  notice: Setting 
> > val_net_gw1[h18]: 1000 -> 139000
> > Feb 24 11:17:18 h19 pacemaker-attrd[7435]:  notice: Setting 
> > val_net_gw1[h18]: 139000 -> (unset)
> > Feb 24 11:17:21 h19 pacemaker-attrd[7435]:  notice: Setting 
> > val_net_gw1[h18]: (unset) -> 1000
> >
> > So it seems the ping RA sets some garbage value when failing. Is that 
> > correct?
>
> This is ocf:pacemaker:ping, right? And is use_fping enabled?
>
> Looks like it uses ($active * $multiplier) -- see ping_update(). I'm
> assuming your multiplier is 1000.
>
> $active is set by either fping_check() or ping_check(), depending on
> your configuration. You can see what they're doing here. I'd assume
> $active is getting set to 139 and then is multiplied by 1000 to set
> $score later.
>   - 
> https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.5/extra/resources/ping#L220-L277

It could also be a side effect of the fault though, since I don't see
anything in fping_check() or ping_check() that's an obvious candidate
for setting active=139 unless you have a massive host list.
> >
> > resource-agents-4.8.0+git30.d0077df0-150300.8.20.1.x86_64
> > pacemaker-2.0.5+20201202.ba59be712-150300.4.16.1.x86_64
> >
> > Regards,
> > Ulrich
> >
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
>
>
> --
> Regards,
>
> Reid Wahl (He/Him), RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA



-- 
Regards,

Reid Wahl (He/Him), RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Odd result from ping RA

2022-02-25 Thread Reid Wahl
On Thu, Feb 24, 2022 at 2:28 AM Ulrich Windl
 wrote:
>
> Hi!
>
> I just discovered this oddity for a SLES15 SP3 cluster:
> Feb 24 11:16:17 h16 pacemaker-attrd[7274]:  notice: Setting val_net_gw1[h18]: 
> 1000 -> 139000
>
> That surprised me, because usually the value is 1000 or 0.
>
> Diggding a bit further I found:
> Migration Summary:
>   * Node: h18:
> * prm_ping_gw1: migration-threshold=100 fail-count=1 
> last-failure='Thu Feb 24 11:17:18 2022'
>
> Failed Resource Actions:
>   * prm_ping_gw1_monitor_6 on h18 'error' (1): call=200, status='Error', 
> exitreason='', last-rc-change='2022-02-24 11:17:18 +01:00', queued=0ms, 
> exec=0ms
>
> Digging further:
> Feb 24 11:16:17 h18 kernel: BUG: Bad rss-counter state mm:c620b5fe 
> idx:1 val:17
> Feb 24 11:16:17 h18 pacemaker-attrd[6946]:  notice: Setting val_net_gw1[h18]: 
> 1000 -> 139000
> Feb 24 11:17:17 h18 kernel: traps: pacemaker-execd[38950] general protection 
> fault ip:7f610e71cbcf sp:77c25100 error:0 in 
> libc-2.31.so[7f610e63b000+1e6000]
>
> (that rss-counter causing series of core dumps seems to be a new "feature" of 
> SLES15 SP3 kernels that is being investigated by support)
>
> Somewhat later:
> Feb 24 11:17:18 h18 pacemaker-attrd[6946]:  notice: Setting val_net_gw1[h18]: 
> 139000 -> (unset)
> (restarted RA)
> Feb 24 11:17:21 h18 pacemaker-attrd[6946]:  notice: Setting val_net_gw1[h18]: 
> (unset) -> 1000
>
> Another node:
> Feb 24 11:16:17 h19 pacemaker-attrd[7435]:  notice: Setting val_net_gw1[h18]: 
> 1000 -> 139000
> Feb 24 11:17:18 h19 pacemaker-attrd[7435]:  notice: Setting val_net_gw1[h18]: 
> 139000 -> (unset)
> Feb 24 11:17:21 h19 pacemaker-attrd[7435]:  notice: Setting val_net_gw1[h18]: 
> (unset) -> 1000
>
> So it seems the ping RA sets some garbage value when failing. Is that correct?

This is ocf:pacemaker:ping, right? And is use_fping enabled?

Looks like it uses ($active * $multiplier) -- see ping_update(). I'm
assuming your multiplier is 1000.

$active is set by either fping_check() or ping_check(), depending on
your configuration. You can see what they're doing here. I'd assume
$active is getting set to 139 and then is multiplied by 1000 to set
$score later.
  - 
https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.5/extra/resources/ping#L220-L277
>
> resource-agents-4.8.0+git30.d0077df0-150300.8.20.1.x86_64
> pacemaker-2.0.5+20201202.ba59be712-150300.4.16.1.x86_64
>
> Regards,
> Ulrich
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>


-- 
Regards,

Reid Wahl (He/Him), RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Q: fence_kdump and fence_kdump_send

2022-02-25 Thread Reid Wahl
On Fri, Feb 25, 2022 at 3:22 AM Reid Wahl  wrote:
>
> On Thu, Feb 24, 2022 at 4:22 AM Ulrich Windl
>  wrote:
> >
> > Hi!
> >
> > After reading about fence_kdump and fence_kdump_send I wonder:
> > Does anybody use that in production?
>
> Quite a lot of people, in fact.
>
> > Having the networking and bonding in initrd does not sound like a good idea 
> > to me.
> > Wouldn't it be easier to integrate that functionality into sbd?
> > I mean: Let sbd wait for a "kdump-ed" message that initrd could send when 
> > kdump is complete.
> > Basically that would be the same mechanism, but using storage instead of 
> > networking.
> >
> > If I get it right, the original fence_kdump would also introduce an extra 
> > fencing delay, and I wonder what happens with a hardware watchdog while a 
> > kdump is in progress...
> >
> > The background of all this is that our nodes kernel-panic, and support says 
> > the kdumps are all incomplete.
> > The events are most likely:
> > node1: panics (kdump)
> > other_node: seens node1 had failed and fences it (via sbd).
> >
> > However sbd fencing wont work while kdump is executing (IMHO)
> >
> > So what happens most likely is that the watchdog terminates the kdump.
> > In that case all the mess with fence_kdump won't help, right?
>
> You can configure extra_modules in your /etc/kdump.conf file to
> include the watchdog module, and then restart kdump.service. For
> example:
>
> # grep ^extra_modules /etc/kdump.conf
> extra_modules i6300esb
>
> If you're not sure of the name of your watchdog module, wdctl can help
> you find it. sbd needs to be stopped first, because it keeps the
> watchdog device timer busy.
>
> # pcs cluster stop --all
> # wdctl | grep Identity
> Identity:  i6300ESB timer [version 0]
> # lsmod | grep -i i6300ESB
> i6300esb   13566  0
>
>
> If you're also using fence_sbd (poison-pill fencing via block device),
> then you should be able to protect yourself from that during a dump by
> configuring fencing levels so that fence_kdump is level 1 and
> fence_sbd is level 2.

RHKB, for anyone interested:
  - sbd watchdog timeout causes node to reboot during crash kernel
execution (https://access.redhat.com/solutions/3552201)
>
>
> >
> > Regards,
> > Ulrich
> >
> >
> >
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
>
>
> --
> Regards,
>
> Reid Wahl (He/Him), RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA



-- 
Regards,

Reid Wahl (He/Him), RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Q: fence_kdump and fence_kdump_send

2022-02-25 Thread Reid Wahl
On Thu, Feb 24, 2022 at 4:22 AM Ulrich Windl
 wrote:
>
> Hi!
>
> After reading about fence_kdump and fence_kdump_send I wonder:
> Does anybody use that in production?

Quite a lot of people, in fact.

> Having the networking and bonding in initrd does not sound like a good idea 
> to me.
> Wouldn't it be easier to integrate that functionality into sbd?
> I mean: Let sbd wait for a "kdump-ed" message that initrd could send when 
> kdump is complete.
> Basically that would be the same mechanism, but using storage instead of 
> networking.
>
> If I get it right, the original fence_kdump would also introduce an extra 
> fencing delay, and I wonder what happens with a hardware watchdog while a 
> kdump is in progress...
>
> The background of all this is that our nodes kernel-panic, and support says 
> the kdumps are all incomplete.
> The events are most likely:
> node1: panics (kdump)
> other_node: seens node1 had failed and fences it (via sbd).
>
> However sbd fencing wont work while kdump is executing (IMHO)
>
> So what happens most likely is that the watchdog terminates the kdump.
> In that case all the mess with fence_kdump won't help, right?

You can configure extra_modules in your /etc/kdump.conf file to
include the watchdog module, and then restart kdump.service. For
example:

# grep ^extra_modules /etc/kdump.conf
extra_modules i6300esb

If you're not sure of the name of your watchdog module, wdctl can help
you find it. sbd needs to be stopped first, because it keeps the
watchdog device timer busy.

# pcs cluster stop --all
# wdctl | grep Identity
Identity:  i6300ESB timer [version 0]
# lsmod | grep -i i6300ESB
i6300esb   13566  0


If you're also using fence_sbd (poison-pill fencing via block device),
then you should be able to protect yourself from that during a dump by
configuring fencing levels so that fence_kdump is level 1 and
fence_sbd is level 2.


>
> Regards,
> Ulrich
>
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>


-- 
Regards,

Reid Wahl (He/Him), RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Q: fence_kdump and fence_kdump_send

2022-02-25 Thread Roger Zhou via Users



On 2/24/22 20:21, Ulrich Windl wrote:

Hi!

After reading about fence_kdump and fence_kdump_send I wonder:
Does anybody use that in production?
Having the networking and bonding in initrd does not sound like a good idea to 
me.


I assume one of motivation for fence_kdump is to reduce the dependency on the 
shared disk which is the fundamental infrastructure for SBD.



Wouldn't it be easier to integrate that functionality into sbd?


sbd does support "crashdump". Though, you may want to have some further 
improvement.



I mean: Let sbd wait for a "kdump-ed" message that initrd could send when kdump 
is complete.
Basically that would be the same mechanism, but using storage instead of 
networking.

If I get it right, the original fence_kdump would also introduce an extra 
fencing delay, and I wonder what happens with a hardware watchdog while a kdump 
is in progress...

The background of all this is that our nodes kernel-panic, and support says the 
kdumps are all incomplete.
The events are most likely:
node1: panics (kdump)
other_node: seens node1 had failed and fences it (via sbd).

However sbd fencing wont work while kdump is executing (IMHO)



Setup both sbd + fence_kdump sounds not a good practice.
I understand the sbd watchdog is tricky in this combination.


So what happens most likely is that the watchdog terminates the kdump.
In that case all the mess with fence_kdump won't help, right?



With sbd crashdump functionality, it deals with the watchdog properly.

Here is a knowledge page as well
https://www.suse.com/support/kb/doc/?id=19873



Regards,
Ulrich




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Re: Booth ticket multi‑site and quorum /Pacemaker

2022-02-25 Thread Ulrich Windl
>>> Viet Nguyen  schrieb am 24.02.2022 um 10:28 in
Nachricht
:
> Hi,
> 
> Thank you so so much for your help. May i ask a following up question:
> 
> For the option of having one big cluster with 4 nodes without booth, then,
> if one site (having 2 nodes) is down, then the other site does not work as
> it does not have quorum, am I right? Even if we have a quorum voter in
> either site A or B, then, if the site with quorum down, then, the other
> site does not work.  So, how can we avoid this situation as I want
> that if one site is down, the other site still services?

Obviously you need a third location (or other tie-breaker).

> 
> Regards,
> Viet
> 
> On Wed, 23 Feb 2022 at 17:08, Jan Friesse  wrote:
> 
>> Viet,
>>
>> On 22/02/2022 22:37, Viet Nguyen wrote:
>> > Hi,
>> >
>> > Could you please help me out with this question?
>> >
>> > I have 4 nodes cluster running in the same network but in 2 different
>> sites
>> > (building A - 2 nodes and building B - 2 nodes). My objective is to
>> > setup HA for this cluster with pacemaker. The expectation is if a site is
>> > down, the other site still services.
>> >
>> >  From what I could understand so far, in order to make it work, it needs
>> to
>> > have booth ticket manager installed in a different location, let's say
>> > building C which connects to both sites A and B.
>> >
>> > With this assumption, i would like to ask few questions:
>> >
>> > 1. Am i right that I need to setup the booth ticket manager as a
>> quorum
>> > voter as well?
>>
>> Yes, booth (arbitrator) has to be installed on "site" C if you want to
>> use booth. Just keep in mind booth has nothing to do with quorum.
>>
>> > 2. What happens if  the connection between site A and B is down, but
>> the
>> > connection between A and C, B and C still up? In this case, both
>> site A and
>> > B still have the quorum as it can connect to C, but not between each
>> other?
>>
>> If you use booth then it's not required site A to see site B. It's then
>> "site" C problem to decide which site gets ticket.
>>
>>
>> > 3. Or is there any better way to manage 2 sites cluster, each has 2
>> > nodes? And if one site is down like environmental disaster, then,
>> the other
>> > site still services.
>>
>> Basically there are (at least) two possible solutions:
>> - Have one big cluster without booth and use pcmk constraints
>> - Have two 2 node clusters and use booth. Then each of the two node
>> clusters is "independent" (have its own quorum) and each of the cluster
>> runs booth (site) as a cluster resource + "site" C running booth
>> (arbitrator)
>>
>> Regards,
>>Honza
>>
>> >
>> >
>> > Thank you so much for your help!
>> > Regards,
>> > Viet
>> >
>> >
>> > ___
>> > Manage your subscription:
>> > https://lists.clusterlabs.org/mailman/listinfo/users 
>> >
>> > ClusterLabs home: https://www.clusterlabs.org/ 
>> >
>>
>>




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Re: Q: fence_kdump and fence_kdump_send

2022-02-25 Thread Ulrich Windl
>>> "Walker, Chris"  schrieb am 24.02.2022 um 17:26
in
Nachricht


> We use the fence_kump* code extensively in production and have never had any

> problems with it (other than the normal initial configuration challenges). 
> Kernel panic + kdump is our most common failure mode, so we exercise this 
> code quite a bit.

Hi!

Would you like to share your configuration, specifically the fencing
mechanisms and watchdogs you are using?

Regards,
Ulrich

> Thanks,
> Chris
> 
> From: Users 
> Date: Thursday, February 24, 2022 at 7:22 AM
> To: users@clusterlabs.org 
> Subject: [ClusterLabs] Q: fence_kdump and fence_kdump_send
> Hi!
> 
> After reading about fence_kdump and fence_kdump_send I wonder:
> Does anybody use that in production?
> Having the networking and bonding in initrd does not sound like a good idea

> to me.
> Wouldn't it be easier to integrate that functionality into sbd?
> I mean: Let sbd wait for a "kdump‑ed" message that initrd could send when 
> kdump is complete.
> Basically that would be the same mechanism, but using storage instead of 
> networking.
> 
> If I get it right, the original fence_kdump would also introduce an extra 
> fencing delay, and I wonder what happens with a hardware watchdog while a 
> kdump is in progress...
> 
> The background of all this is that our nodes kernel‑panic, and support says

> the kdumps are all incomplete.
> The events are most likely:
> node1: panics (kdump)
> other_node: seens node1 had failed and fences it (via sbd).
> 
> However sbd fencing wont work while kdump is executing (IMHO)
> 
> So what happens most likely is that the watchdog terminates the kdump.
> In that case all the mess with fence_kdump won't help, right?
> 
> Regards,
> Ulrich
> 
> 
> 
> 
> ___
> Manage your subscription:
>
https://lists.clusterlabs.org/mailman/listinfo/users s.org/mailman/listinfo/users>
> 
> ClusterLabs home:
https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/