Re: [ClusterLabs] Some unexpected DLM messages; OCFS2 related? "send_repeat_remove dir" / "send_repeat_remove dir"

2021-10-08 Thread Gang He via Users
Hello Ulrich, See my comments inline. On 2021/10/8 16:38, Ulrich Windl wrote: Hi! I just noticed these two messages on two nodes of a 3-node cluster: Oct 08 10:00:14 h18 kernel: dlm: 790F9C237C2A45758135FE4945B7A744: send_repeat_remove dir 119 O09d835 Oct 08 10:00:14

Re: [ClusterLabs] Antw: [EXT] unexpected fenced node and promotion of the new master PAF ‑ postgres

2021-10-08 Thread Strahil Nikolov via Users
What do you mean by 1s default timeout ? Best Regards,Strahil Nikolov On Fri, Oct 8, 2021 at 16:02, damiano giuliani wrote: ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home:

Re: [ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.1.2: new fencing configuration options

2021-10-08 Thread Ken Gaillot
On Fri, 2021-10-08 at 08:18 +0200, Ulrich Windl wrote: > > > > Ken Gaillot schrieb am 07.10.2021 um > > > > 22:53 in > Nachricht > <8bec6dc04c52d4ac5c2a8055eb7bae455f5a449d.ca...@redhat.com>: > > Hi all, > > > > We're looking ahead to the next Pacemaker release already. Even > > though > > we

Re: [ClusterLabs] Antw: [EXT] unexpected fenced node and promotion of the new master PAF ‑ postgres

2021-10-08 Thread Jehan-Guillaume de Rorthais
On Fri, 8 Oct 2021 15:00:30 +0200 damiano giuliani wrote: > Hi Guys, Hi, Good to hear from you, thank for the follow up! My answer below. > ... > So it turn out that a lil bit of swap was used and i suspect corosync > process were swapped to disks creating lag where 1s default corosync >

Re: [ClusterLabs] Antw: [EXT] unexpected fenced node and promotion of the new master PAF ‑ postgres

2021-10-08 Thread damiano giuliani
Hi Guys, after months of suddens unexpected failovers, checking every corners and types of logs without any luck, cuz no logs and no reasons or problems were shown anywhere, i was on the edge of madness i finally managed to find out what was the problems of this suddends switches. Was a tough

[ClusterLabs] Some unexpected DLM messages; OCFS2 related? "send_repeat_remove dir" / "send_repeat_remove dir"

2021-10-08 Thread Ulrich Windl
Hi! I just noticed these two messages on two nodes of a 3-node cluster: Oct 08 10:00:14 h18 kernel: dlm: 790F9C237C2A45758135FE4945B7A744: send_repeat_remove dir 119 O09d835 Oct 08 10:00:14 h19 kernel: dlm: 790F9C237C2A45758135FE4945B7A744: receive_remove from 118 not

[ClusterLabs] Antw: Re: Antw: [EXT] Move a resource only where another has Started

2021-10-08 Thread Ulrich Windl
>>> martin doc schrieb am 08.10.2021 um 09:24 in Nachricht : > Hi, > > Yes, the suggestion to use a rule helped some. I had tried that but what I > got wrong is that the name for the score stored by ping is not ping but pingd > (yay backwards compat.) Thanks Ken for the pointer and getting me

Re: [ClusterLabs] Antw: [EXT] Move a resource only where another has Started

2021-10-08 Thread martin doc
Hi, Yes, the suggestion to use a rule helped some. I had tried that but what I got wrong is that the name for the score stored by ping is not ping but pingd (yay backwards compat.) Thanks Ken for the pointer and getting me to go back to that. Now I'm stuck with the problem of getting resources

[ClusterLabs] Antw: Antw: [EXT] Move a resource only where another has Started

2021-10-08 Thread Ulrich Windl
>>> "Ulrich Windl" schrieb am 08.10.2021 um 08:01 in Nachricht <615fdec902a100044...@gwsmtp.uni-regensburg.de>: > test is running, but where the test was sussessul. Ouch: s/sussessul/successful/ Time for the week-end... ;-) ___ Manage your

[ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.1.2: new fencing configuration options

2021-10-08 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 07.10.2021 um 22:53 in Nachricht <8bec6dc04c52d4ac5c2a8055eb7bae455f5a449d.ca...@redhat.com>: > Hi all, > > We're looking ahead to the next Pacemaker release already. Even though > we had a recent release for a regression fix, I want to get back to the > goal of having

[ClusterLabs] Antw: [EXT] Re: Q: "(bnxt_en): transmit queue 2 timed out"

2021-10-08 Thread Ulrich Windl
>>> Marek Marcola schrieb am 07.10.2021 um 18:55 in Nachricht : > Hello, > > > Oct 05 20:13:25 h19 kernel: NETDEV WATCHDOG: p4p1 (bnxt_en): transmit > queue 2 timed out > I have seen this error few times on HP DL360 with centos7 and on HP ML350 > with esxi6. > > Disabling offloading on

[ClusterLabs] Antw: [EXT] Re: what is the point of pcs status error messages while the VIP is still up and service is retained?

2021-10-08 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 07.10.2021 um 22:28 in Nachricht <597cd05761a31365b34c6b349539478a5a8b8ced.ca...@redhat.com>: > On Thu, 2021-10-07 at 12:18 +, Ian Diddams wrote: >> I trying to find out exactly what the impact/point of such cluster >> error messages as from running “pcs status” >>

[ClusterLabs] Antw: [EXT] Move a resource only where another has Started

2021-10-08 Thread Ulrich Windl
>>> martin doc schrieb am 07.10.2021 um 17:45 in Nachricht : > Hi, > > I've been trying to work out if it is possible to leave a resource on the > cluster node that it is on and only move it to another node if a dependent > resource is started. This is all using Red Hat's presentation in