[ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-18 Thread Artem
o lower score and pingd DOESN'T help at all! 4) Can I make a reliable HA failover without pingd to keep things as simple as possible? 5) Pings might help to affect cluster decisions in case GW is lost, but its not working as all the guides say. Why? T

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-18 Thread Artem
.ntslab.ru pacemaker-schedulerd[785107] (pcmk__unassign_resource)info: Unassigning OST4 Sorry for so many log lines, but I don't understand what`s going on best regards, Artem On Tue, 19 Dec 2023 at 00:13, Ken Gaillot wrote: > On Mon, 2023-12-18 at 23:39 +0300, Artem wrote: > > Hell

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-12 Thread Artem
On Tue, 12 Dec 2023 at 16:17, Andrei Borzenkov wrote: > On Fri, Dec 8, 2023 at 5:44 PM Artem wrote: > > pcs constraint location FAKE3 rule score=0 pingd lt 1 or not_defined > pingd > > pcs constraint location FAKE4 rule score=0 pingd lt 1 or not_defined > pingd > > pc

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-12 Thread Artem
Dear Ken and other experts. How can I leverage pingd to speedup failover? Or may be it is useless and we should leverage monitor/start timeouts and migration-threshold/failure-timeout ? I have preference like this for normal operations: > pcs constraint location FAKE3 prefers lustre3=100 > pcs

[ClusterLabs] resource fails manual failover

2023-12-12 Thread Artem
Is there a detailed explanation for resource monitor and start timeouts and intervals with examples, for dummies? my resource configured s follows: [root@lustre-mds1 ~]# pcs resource show MDT00 Warning: This command is deprecated and will be removed. Please use 'pcs resource config' instead.

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-12 Thread Artem
Hi Andrei. pingd==0 won't satisfy both statements. It would if I used GTE, but I used GT. pingd lt 1 --> [0] pingd gt 0 --> [1,2,3,...] On Tue, 12 Dec 2023 at 17:21, Andrei Borzenkov wrote: > On Tue, Dec 12, 2023 at 4:47 PM Artem wrote: > >> > pcs constraint location FAK

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-19 Thread Artem
on-fail=ignore break manual failover logic (stopped will be considered as failed and thus ignored)? best regards, Artem On Tue, 19 Dec 2023 at 17:03, Klaus Wenninger wrote: > > > On Tue, Dec 19, 2023 at 10:00 AM Andrei Borzenkov > wrote: > >> On Tue, Dec 19, 2023 at

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-11 Thread Artem
Hi Ken, On Mon, 11 Dec 2023 at 19:00, Ken Gaillot wrote: > > Question #2) I shut lustre3 VM down and leave it like that > How did you shut it down? Outside cluster control, or with something > like pcs resource disable? > > I did it outside of the cluster to simulate a failure. I turned off

[ClusterLabs] RemoteOFFLINE status, permanently

2023-11-29 Thread Artem
Hello, I deployed a Lustre cluster with 3 nodes (metadata) as pacemaker/corosync and 4 nodes as Remote Agents (for data). Initially all went well, I've set up MGS and MDS resources, checked failover and failback, remote agents were online. Then I tried to create a resource for OST on two nodes

Re: [ClusterLabs] RemoteOFFLINE status, permanently

2023-12-04 Thread Artem
Thank you very much Ken! I missed this step. Now I clearly see it in Morrone_LUG2017.pdf I added the constraint and RA became online. What bugs me is the following. I destroyed and recreated the cluster with the same settings on designated hosts and nothing worked - always RemoteOFFLINE. But when

[ClusterLabs] ocf:pacemaker:ping works strange

2023-12-08 Thread Artem
Hello experts. I use pacemaker for a Lustre cluster. But for simplicity and exploration I use a Dummy resource. I didn't like how resource performed failover and failback. When I shut down VM with remote agent, pacemaker tries to restart it. According to pcs status it marks the resource (not RA)