Re: [ClusterLabs] Question about automating cluster unfencing.

2021-08-29 Thread Strahil Nikolov via Users
You can setup the system in such case that on fabric fence, the node is rebooted which will allow it to 'unfence' itself afterwards. For details check https://access.redhat.com/solutions/3367151 or  https://access.redhat.com/node/65187 (You may use RH developer subscription in order to acess

Re: [ClusterLabs] Qemu VM resources - cannot acquire state change lock

2021-08-29 Thread Strahil Nikolov via Users
Are gou using sharding for glusterfs ? I would put libvirt service and glusterfs service in a systemd dependency as your libvirt relies on gluster being available. Also, check if you got 'backup-volfile-servers' mount option if using FUSE.With libgfapi, I got no clue how to configure that. Your

Re: [ClusterLabs] 8 node cluster

2021-09-08 Thread Strahil Nikolov via Users
i would go with a VM hosting all resources and setup a 3-node Virtualization cluster. The concept that the cluster should keep your resources up even if another 7 nodes died is not good -> there could be a network issue or other cases where this approach won't (and should not) work. As Antony

Re: [ClusterLabs] (no subject)

2021-09-02 Thread Strahil Nikolov via Users
In order to test properly, use firewall rules to drop the corosync traffic.I remember that this test  (ifdown NKC) was inefficient in previous versions of corosync. If you wish to be more safe, try to setup ocf:pacemaker:ping. Best Regards,Strahil Nikolov On Fri, Sep 3, 2021 at 5:09, 重力加速度

Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-03 Thread Strahil Nikolov via Users
Won't something like this work ? Each node in LA will have same score of 5000, while other cities will be -5000. pcs constraint location DummyRes1 rule score=5000 city eq LA pcs constraint location DummyRes1 rule score=-5000 city ne LA stickiness -> 1 Best Regards,Strahil Nikolov Out of

Re: [ClusterLabs] Antw: [EXT] Moving resource only one way

2021-08-03 Thread Strahil Nikolov via Users
Yes.INFINITY= 100 (one million)-INFINITY=-100(negative one mill) Set stickiness > 100 . Best Regards,Strahil Nikolov  > The `location` section overwrites the stickiness? ___ Manage your subscription:

Re: [ClusterLabs] Cloned ressource is restarted on all nodes if one node fails

2021-08-09 Thread Strahil Nikolov via Users
I've setup something similar with VIP that is everywhere using the globally-unique=true (where cluster controls which node to be passive and which active). This allows that the VIP is everywhere but only 1 node answers the requests , while the WEB server was running everywhere with config and

Re: [ClusterLabs] Cloned ressource is restarted on all nodes if one node fails

2021-08-09 Thread Strahil Nikolov via Users
> name="statusurl" value="http://localhost/server-status"/> Can you show the apache config for the status page ? It must be accessible only from localhost (127.0.0.1) and should not be reachable from the other nodes. Best Regards, Strahil Nikolov ___

Re: [ClusterLabs] Sub‑clusters / super‑clusters - working :)

2021-08-07 Thread Strahil Nikolov via Users
>Because Asterisk at cityA is bound to a floating IP address, which is held >onone of the three machines in cityA. I can't run Asterisk on all >threemachines there because only one of them has the IP address. That's not true. You can use a cloned IP resource with 'globally-unique=true' which

Re: [ClusterLabs] Antw: Re: [EXT] Re: Two node cluster without fencing and no split brain?

2021-07-28 Thread Strahil Nikolov via Users
So far, I never had a cluster with nodes directly connected to the same switches. Usually it's a nodeA -> switchA -> switchB -> nodeB and sometimes connectivity between switches goes down (for example a firewall rule). Best Regards, Strahil Nikolov В сряда, 28 юли 2021 г., 15:51:36 ч.

Re: [ClusterLabs] pcs add node command is success but node is not configured to existing cluster

2021-07-28 Thread Strahil Nikolov via Users
Firewall issue ? Did you check on corosync level if all nodes reach each other ? Best Regards, Strahil Nikolov В сряда, 28 юли 2021 г., 16:32:51 ч. Гринуич+3, S Sathish S via Users написа:    Hi Team,   we are trying to add node03 to existing cluster after adding we could see

Re: [ClusterLabs] Antw: [EXT] Moving resource only one way

2021-08-04 Thread Strahil Nikolov via Users
1/html/Clusters_from_Scratch/_move_resources_manually.html Best Regards,Strahil Nikolov On Tue, Aug 3, 2021 at 22:16, Ervin Hegedüs wrote: Hi, On Tue, Aug 03, 2021 at 05:46:51PM +, Strahil Nikolov via Users wrote: > Yes.INFINITY= 100 (one million)-INFINITY=-100(negative one mi

Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Strahil Nikolov via Users
, Andrei Borzenkov wrote: > On Wed, Aug 4, 2021 at 5:03 PM Antony Stone wrote: > > On Wednesday 04 August 2021 at 13:31:12, Andrei Borzenkov wrote: > > > On Wed, Aug 4, 2021 at 1:48 PM Antony Stone wrote: > > > > On Tuesday 03 August 2021 at 12:12:03, Strahil N

Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Strahil Nikolov via Users
I still can't understand why the whole cluster will fail when only 3 nodes are down and a qdisk is used. CityA -> 3 nodes to run packageA -> 3 votesCityB -> 3 nodes to run packageB -> 3 votesCityC -> 1 node which cannot run any package (qdisk) -> 1 vote Max votes:7Quorum: 4 As long as one city

Re: [ClusterLabs] Antw: [EXT] unexpected fenced node and promotion of the new master PAF ‑ postgres

2021-10-09 Thread Strahil Nikolov via Users
Ah... That's the first thing I change.In SLES, that is defaulted to 10s and so far I have never seen an environment that is stable enough for the default 1s timeout. Best Regards,Strahil Nikolov On Sat, Oct 9, 2021 at 9:59, Jehan-Guillaume de Rorthais wrote: Le 9 octobre 2021 00:11:27

Re: [ClusterLabs] Antw: [EXT] unexpected fenced node and promotion of the new master PAF ‑ postgres

2021-10-08 Thread Strahil Nikolov via Users
What do you mean by 1s default timeout ? Best Regards,Strahil Nikolov On Fri, Oct 8, 2021 at 16:02, damiano giuliani wrote: ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home:

Re: [ClusterLabs] corosync/pacemaker resources start after reboot - incorrect node ID calculated

2021-09-28 Thread Strahil Nikolov via Users
Erm, in my corosync.conf I got also 'name: the-name-of-the-host' and 'nodeid: ' . I don't see these 2 in your config . Best Regards, Strahil Nikolov В вторник, 28 септември 2021 г., 02:39:20 ч. Гринуич+3, Neitzert, Greg A написа: Hello,   We have an issue with a 2 node cluster

Re: [ClusterLabs] corosync/pacemaker resources start after reboot - incorrect node ID calculated

2021-09-28 Thread Strahil Nikolov via Users
Yeah, it seems I missed the nodeid, so can you try setting the "name: hostname" in the corosync.conf ? Best Regards, Strahil Nikolov В вторник, 28 септември 2021 г., 10:34:41 ч. Гринуич+3, Strahil Nikolov via Users написа: Erm, in my corosync.conf I got also 'name

Re: [ClusterLabs] Problem with high load (IO)

2021-09-30 Thread Strahil Nikolov via Users
Dif you try the 'ionice -c 2 -n 7 nice cp ' ? Best Regards,Strahil Nikolov On Thu, Sep 30, 2021 at 14:58, Lentes, Bernd wrote: - On Sep 30, 2021, at 3:55 AM, Gang He g...@suse.com wrote: >> >> 1) No problems during this step, the procedure just needs a few seconds. >> reflink

Re: [ClusterLabs] Antw: [EXT] Re: Problem with high load (IO)

2021-10-05 Thread Strahil Nikolov via Users
These 'dirty' sysctl settings are configureable. For large sequential I/O it's desirable 'dirty' ratio/bytes to be bigger, while for small files/random I/O it's better to be kept low. Best Regards, Strahil Nikolov В вторник, 5 октомври 2021 г., 08:52:20 ч. Гринуич+3, Ulrich Windl

Re: [ClusterLabs] Problem with high load (IO)

2021-09-27 Thread Strahil Nikolov via Users
I would use something liek this: ionice -c 2 -n 7 nice cp XXX YYY Best Regards, Strahil Nikolov В понеделник, 27 септември 2021 г., 13:37:31 ч. Гринуич+3, Lentes, Bernd написа: Hi, i have a two-node cluster running on SLES 12SP5 with two HP servers and a common FC SAN. Most of my

Re: [ClusterLabs] Problem with high load (IO)

2021-09-27 Thread Strahil Nikolov via Users
Hey Ken, how should someone set the maintenace via pcs ? Best Regards,Strahil Nikolov On Mon, Sep 27, 2021 at 19:56, Ken Gaillot wrote: On Mon, 2021-09-27 at 12:37 +0200, Lentes, Bernd wrote: > Hi, > > i have a two-node cluster running on SLES 12SP5 with two HP servers > and a common FC

Re: [ClusterLabs] resource start after network reconnected

2021-11-21 Thread Strahil Nikolov via Users
, Nov 21, 2021 at 8:47, Andrei Borzenkov wrote: On 21.11.2021 00:39, Strahil Nikolov via Users wrote: > Nope, as long as you use SBD's integration with pacemaker. As the 2 nodes can > communicate between each other sbd won't act. I thinkt it was an entry like > this in the /etc/sysc

Re: [ClusterLabs] resource start after network reconnected

2021-11-20 Thread Strahil Nikolov via Users
You can also use this 3rd node to provide iSCSI and then the SBD will be disk-full :D . The good thing about this type of setup is that you do won't need to put location constraints for the 3rd node. Also, check the ping resource -> you can set it up to "kick-out" all resources on failure of

Re: [ClusterLabs] resource start after network reconnected

2021-11-20 Thread Strahil Nikolov via Users
, 2021 at 08:33:26PM +, Strahil Nikolov via Users wrote: > You can also use this 3rd node to provide iSCSI and then the SBD will > be disk-full :D . The good thing about this type of setup is that you > do won't need to put location constraints for the 3rd node. Wouldn't that make the i

Re: [ClusterLabs] Which verson of pacemaker/corosync provides crm_feature_set 3.0.10?

2021-11-23 Thread Strahil Nikolov via Users
Have you tried with a Fedora package from the archives? I found https://dl.fedoraproject.org/pub/archive/fedora/linux/releases/23/Everything/x86_64/os/Packages/p/pacemaker-1.1.13-3.fc23.x86_64.rpm &

Re: [ClusterLabs] drbd nfs slave not working

2021-11-14 Thread Strahil Nikolov via Users
Also, check what 'drbdadm' has to tell you. Both nodes should be in sync, otherwise pacemaker will prevent the failover. Best Regards,Strahil Nikolov On Sun, Nov 14, 2021 at 20:09, Andrei Borzenkov wrote: On 14.11.2021 19:47, Neil McFadyen wrote: > I have a Ubuntu 20.04 drbd nfs

Re: [ClusterLabs] Fence node when network interface goes down

2021-11-15 Thread Strahil Nikolov via Users
Have you tried with ping and a location constraint for avoiding hosts that cannot ping an extrrnal system. Best Regards,Strahil Nikolov On Mon, Nov 15, 2021 at 0:07, S Rogers wrote: Using on-fail=fence is what I initially tried, but it doesn't work unfortunately. It looks like this is

Re: [ClusterLabs] How to globally enable trace log level in pacemaker?

2021-10-31 Thread Strahil Nikolov via Users
Have you checked the options in /etc/sysconfig/pacemaker as recommended in  https://documentation.suse.com/sle-ha/15-SP3/html/SLE-HA-all/app-ha-troubleshooting.html#sec-ha-troubleshooting-log ? Best Regards, Strahil Nikolov В неделя, 31 октомври 2021 г., 13:33:43 ч. Гринуич+2, Andrei

Re: [ClusterLabs] How to globally enable trace log level in pacemaker?

2021-10-31 Thread Strahil Nikolov via Users
At least it's worth trying (/etc/sysconfig/pacemaker):PCMK_trace_files=* Best Regards,Strahil Nikolov On Sun, Oct 31, 2021 at 18:10, Vladislav Bogdanov wrote: ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users

Re: [ClusterLabs] Antw: Re: Antw: [EXT] VIP monitor Timed Out

2021-07-20 Thread Strahil Nikolov via Users
I think Ulrich was ment the "dirty" buffers like the ones described at  https://www.suse.com/support/kb/doc/?id=17857  Based on my experience, you should lower the background dirty tunable as low as possible (let's say 500-600MB) and increase the other tunable at least the double. Keep in

Re: [ClusterLabs] Two node cluster without fencing and no split brain?

2021-07-20 Thread Strahil Nikolov via Users
Hi, consider using a 3rd system as a Q disk. Also, you can use iscsi from that node as a SBD device, so you will have proper fencing .If you don't have a hardware watchdog device, you can use softdog kernel module for that. Best Regards,Strahil Nikolov On Wed, Jul 21, 2021 at 1:45, Digimer

Re: [ClusterLabs] Moving resource only one way

2021-07-16 Thread Strahil Nikolov via Users
Yep, just set the stickiness to something bigger than '0' (max is INFINITY -> 100) Best Regards,Strahil Nikolov On Thu, Jul 15, 2021 at 15:02, Ervin Hegedüs wrote: Hi there, I have to build a very simple cluster with only one resource: a virtual IP. The "challenge":* there are two

Re: [ClusterLabs] Antw: [EXT] Re: Feedback wanted: Native language support for Pacemaker help output

2022-01-13 Thread Strahil Nikolov via Users
you're talking about, but less-experienced admins need to run cluster commands occasionally. On Tue, 2022-01-11 at 12:17 +, Strahil Nikolov via Users wrote: > To be honest, I don't see any benefit. > Even if you have the stack translated, when a more complex setup is > needed -> you

Re: [ClusterLabs] Antw: [EXT] Re: Feedback wanted: Native language support for Pacemaker help output

2022-01-11 Thread Strahil Nikolov via Users
To be honest, I don't see any benefit.Even if you have the stack translated, when a more complex setup is needed -> you will always have to search in the source/github issues/documentation/mailing list history and rely on English. Best Regards,Strahil Nikolov On Tue, Jan 11, 2022 at 9:23,

Re: [ClusterLabs] what is the "best" way to completely shutdown a two-node cluster ?

2022-02-10 Thread Strahil Nikolov via Users
, 2022 at 11:15, Jehan-Guillaume de Rorthais wrote: On Wed, 9 Feb 2022 17:42:35 + (UTC) Strahil Nikolov via Users wrote: > If you gracefully shutdown a node - pacemaker will migrate all resources away >  so you need to shut them down simultaneously and all resources should be >

Re: [ClusterLabs] constraining multiple cloned resources to the same node

2022-03-15 Thread Strahil Nikolov via Users
You can try creating a dummy resource and colocate all clones with it. Best Regards,Strahil Nikolov On Tue, Mar 15, 2022 at 20:53, john tillman wrote: > On 15.03.2022 19:35, john tillman wrote: >> Hello, >> >> I'm trying to guarantee that all my cloned drbd resources start on the >> same

Re: [ClusterLabs] Cluster timeout

2022-03-09 Thread Strahil Nikolov via Users
You can bump the 'token' to a higher value (for example 10s ) and adjust the consensus based on that value. See man 5 corosync.conf Don't forget to sync the nodes and reload the corosync stack. Of course proper testing on non-Prod is highly recommend. Note: Both parameters use milliseconds (at

Re: [ClusterLabs] Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

2022-02-17 Thread Strahil Nikolov via Users
To be honest, I always check  https://documentation.suse.com/sle-ha/15-SP3/html/SLE-HA-all/cha-ha-storage-protect.html#sec-ha-storage-protect-watchdog-timings for sbd and timings. Best Regards,Strahil Nikolov On Wed, Feb 16, 2022 at 19:31, Klaus Wenninger wrote:

Re: [ClusterLabs] Help with PostgreSQL Automatic Failover demotion

2022-02-18 Thread Strahil Nikolov via Users
Also,there is a way to tell the cluster to cleanup failures -> failure-timeout  Best Regards,Strahil Nikolov On Sat, Feb 19, 2022 at 1:52, Jehan-Guillaume de Rorthais wrote: Hello, On Fri, 18 Feb 2022 21:44:58 + "Larry G. Mills" wrote: > ... This happened again recently, and the

Re: [ClusterLabs] Booth ticket multi-site and quorum /Pacemaker

2022-02-25 Thread Strahil Nikolov via Users
man votequorum auto_tie_breaker: 1 allows you to have quorum with 50%, yet if for example Aside (node with lowest id) dies, B side is 50% but won't be able to bring back the resources as the node with lowest id is in A side.If you want to avoid that, you can bring a qdevice on a VM in third

Re: [ClusterLabs] Q: fence_kdump and fence_kdump_send

2022-02-25 Thread Strahil Nikolov via Users
I always used this one for triggering kdump when using sbd:https://www.suse.com/support/kb/doc/?id=19873 On Fri, Feb 25, 2022 at 21:34, Reid Wahl wrote: On Fri, Feb 25, 2022 at 3:47 AM Andrei Borzenkov wrote: > > On Fri, Feb 25, 2022 at 2:23 PM Reid Wahl wrote: > > > > On Fri, Feb

Re: [ClusterLabs] The 2 servers of the cluster randomly reboot almost together

2022-02-17 Thread Strahil Nikolov via Users
Token timeout -> network issue ? Just run a continious ping (with timestamp) and log it into a file (from each host to other host + qdevice ip). Best Regards,Strahil Nikolov On Thu, Feb 17, 2022 at 11:38, Sebastien BASTARD wrote: Hello CoroSync's team ! We currently have a proxmox

Re: [ClusterLabs] VirtualDomain + GlusterFS - troubles coming with CentOS 9

2022-02-15 Thread Strahil Nikolov via Users
I haven't heard about removal of the libgfapi, so most probably ot's a packaging issue. The FUSE mount point can be setup via a cloned Filesystem resource and there should be no problems with it and live migration should work. Best Regards,Strahil Nikolov On Tue, Feb 15, 2022 at 19:16,

Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Strahil Nikolov via Users
Strangely I can't see any timeouts set in   at the example in https://pve.proxmox.com/wiki/Fencing ? Best Regards,Strahil Nikolov On Tue, Feb 22, 2022 at 18:54, Sebastien BASTARD wrote: Hello Strahil, I don't have pcs software (corosync is embedded in proxmox), but I have "pvecm

Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Strahil Nikolov via Users
Is the qdevice on a VM ? Best Regards,Strahil Nikolov On Tue, Feb 22, 2022 at 15:03, Sebastien BASTARD wrote: ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Strahil Nikolov via Users
What kind of fencing are you using ? Best Regards,Strahil Nikolov On Tue, Feb 22, 2022 at 15:24, Sebastien BASTARD wrote: Hello Strahil Nikolov, Qdevice is not a vm. It is a Linux Debian, physical server. Best regards. Le mar. 22 févr. 2022 à 14:20, Strahil Nikolov a écrit : Is the

Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Strahil Nikolov via Users
fencing is the reboot mechanism pcs status Best Regards,Strahil Nikolov On Tue, Feb 22, 2022 at 16:44, Sebastien BASTARD wrote: Hello Strahil, As I don't know the kind of fencing, here is the current configuration of corosync : logging {   debug: off   to_syslog: yes} nodelist {   node

Re: [ClusterLabs] Antw: [EXT] Re: Parsing the output of crm_mon

2022-03-24 Thread Strahil Nikolov via Users
Also xmllint has '--xpath' (unless you are running something as old as RHEL6) and is available on every linux distro. Best Regards,Strahil Nikolov On Mon, Mar 21, 2022 at 15:41, Ken Gaillot wrote: On Mon, 2022-03-21 at 08:27 +0100, Ulrich Windl wrote: > > > > Ken Gaillot schrieb am

Re: [ClusterLabs] Antw: [EXT] Re: Corosync Transport‑ Knet Vs UDPU

2022-03-28 Thread Strahil Nikolov via Users
Corosync rings are never enough , especially when the network team has such naughty hands. Best Regards,Strahil Nikolov On Mon, Mar 28, 2022 at 16:55, Ulrich Windl wrote: >>> Strahil Nikolov via Users schrieb am 28.03.2022 um 15:49 in Nachricht <1758982440.55908

Re: [ClusterLabs] Corosync Transport- Knet Vs UDPU

2022-03-28 Thread Strahil Nikolov via Users
One huge benefit of the new stack is that you can have 8 corosync rings, which is really powerful. Best Regards,Strahil Nikolov On Mon, Mar 28, 2022 at 9:27, Christine caulfield wrote: On 28/03/2022 03:30, Somanath Jeeva via Users wrote: > Hi , > > I am upgrading from corosync

Re: [ClusterLabs] Antw: [EXT] Re: Failed migration causing fencing loop

2022-03-31 Thread Strahil Nikolov via Users
What about if you disable the enable-startup-probes at fencing (custom fencing  that sets it to false and fails, so the next fencing device in the topology kicks in)? When the node joins , it will skip startup probes and later a systemd service or some script check if all nodes were up for at

Re: [ClusterLabs] Removing a resource without stopping it

2022-01-29 Thread Strahil Nikolov via Users
I think there is pcs cluster edit --scope=resources (based on memory).Can you try to delete it from there ? Best Regards,Strahil Nikolov On Sat, Jan 29, 2022 at 7:12, Digimer wrote: ___ Manage your subscription:

Re: [ClusterLabs] Removing a resource without stopping it

2022-01-29 Thread Strahil Nikolov via Users
I know... and the editor stuff can be bypassed, if the approach works. Best Regards,Strahil Nikolov On Sat, Jan 29, 2022 at 15:43, Digimer wrote:On 2022-01-29 03:16, Strahil Nikolov wrote: I think there is pcs cluster edit --scope=resources (based on memory). Can you try to

Re: [ClusterLabs] heads up: Possible VM data corruption upgrading to SLES15 SP3

2022-01-27 Thread Strahil Nikolov via Users
Are you using HA-LVM or CLVM ? Best Regards,Strahil Nikolov On Thu, Jan 27, 2022 at 16:10, Ulrich Windl wrote: Hi! I know this is semi-offtopic, but I think it's important: I've upgraded one cluster node being a Xen host from SLES15 SP2 to SLES15 SP3 using virtual DVD boot (i.e. the

Re: [ClusterLabs] what is the "best" way to completely shutdown a two-node cluster ?

2022-02-09 Thread Strahil Nikolov via Users
If you gracefully shutdown a node - pacemaker will migrate all resources away  so you need to shut them down simultaneously and all resources should be stopped by the cluster. Shutting down the nodes would be my choice. Best Regards,Strahil Nikolov On Wed, Feb 9, 2022 at 12:52, Lentes,

Re: [ClusterLabs] Antw: [EXT] Cluster Removing VIP and Not Following Order Constraint

2022-02-11 Thread Strahil Nikolov via Users
Shouldn't you use kind ' Mandatory' and simetrical TRUE ? If true, the reverse of the constraint applies for the opposite action (for example, if B starts after A starts, then B stops before A stops).  Best Regards,Strahil Nikolov On Fri, Feb 11, 2022 at 9:11, Ulrich Windl wrote: >>>

Re: [ClusterLabs] Antw: [EXT] Cluster Removing VIP and Not Following Order Constraint

2022-02-11 Thread Strahil Nikolov via Users
efaults rsc-options: \         resource-stickiness=1000 \         migration-threshold=5000 op_defaults op-options: \         timeout=600 \         record-pending=true \         no-quorum-policy=ignore On Fri, 11 Feb 2022 at 21:29, Klaus Wenninger wrote: On Fri, Feb 11, 2022 at 9:13 AM Strahi

Re: [ClusterLabs] Restarting parent of ordered clone resources on specific node causes restart of all resources in the ordering constraint on all nodes of the cluster

2022-04-08 Thread Strahil Nikolov via Users
You can use 'kind' and 'symmetrical' to control order constraints. The default value for symmetrical is 'true' which means that in order to stop dummy1 , the cluster has to stop dummy1 & dummy2. Best Regards,Strahil Nikolov On Fri, Apr 8, 2022 at 15:29, ChittaNagaraj, Raghav wrote:

Re: [ClusterLabs] Antw: [EXT] SAP HANA monitor fails ‑ Error performing operation: No such device or address

2022-04-10 Thread Strahil Nikolov via Users
debug start is doing the described in  https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures Best Regards,Strahil Nikolov On Mon, Apr 11, 2022 at 7:21, Aj Revelino wrote: Hi Strahil, Yes I went through the documentation from Azure. In fact, we have 6 production clusters running

Re: [ClusterLabs] Antw: [EXT] Re: SAP HANA monitor fails ‑ Error performing operation: No such device or address

2022-04-11 Thread Strahil Nikolov via Users
It's not like that. Let's assume you have a resource Dummy1 with prefference to nodeA (score 10).If your stickiness is 20 - the resource will not fall back to nodeA (after a failure) when it returns as nodeA = 10, current node = 20 (due to stickiness. If your stickiness is '1' , while nodeA has

Re: [ClusterLabs] Antw: [EXT] SAP HANA monitor fails ‑ Error performing operation: No such device or address

2022-04-09 Thread Strahil Nikolov via Users
You can use pcs resource debug-start, but you have to shut it down before that. Have you used some documentation for the setup ? Usually I reffer to the vendor's documentation. Go over it and check for a step that was not implemented. RH's latest version is:

Re: [ClusterLabs] I_DC_TIMEOUT and node fenced when it joins the cluster

2022-04-15 Thread Strahil Nikolov via Users
Set the corosync token to 1 miliseconds and adjust the consensus as per the man 5 corosync.conf and give it a try. Don't forget to sync the corosync settings among the cluster. Best Regards,Strahil Nikolov On Fri, Apr 15, 2022 at 15:27, vitaly wrote: Hello Everybody. I am seeing

Re: [ClusterLabs] Fencing for quorum device?

2022-07-16 Thread Strahil Nikolov via Users
Well, you can always make a single-node cluster with the quorum device's host and setup  systemd resource to keep the service up and running.With SBD, that single-node cluster will suicide in case the machine ends in a unresponsive state. Best Regards,Strahil Nikolov  On Fri, Jul 15, 2022

Re: [ClusterLabs] OCF_TIMEOUT - Does it recover by itself?

2022-04-27 Thread Strahil Nikolov via Users
You can use a meta attribute to expire failures . The attribute name is 'failure-timeout'I have used it for my fencing devices as during the night the network was quite busy. Best Regards,Strahil Nikolov On Tue, Apr 26, 2022 at 23:54, Hayden, Robert via Users wrote: Robert Hayden |

Re: [ClusterLabs] How many nodes redhat cluster does supports

2022-04-27 Thread Strahil Nikolov via Users
What is the output of 'gfs2_edit -p jindex /dev/shared_vg1/shared_lv1 |grep journal Source: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/configuring_gfs2_file_systems#proc_adding-gfs2-journal-creating-mounting-gfs2 Best Regards,Strahil Nikolov On

Re: [ClusterLabs] OT: Linstor/DRBD Problem

2022-04-27 Thread Strahil Nikolov via Users
Why do you use Linstor and not DRBD ?As far as I know Linstor is more suitable for Kubernetes/Openshift . Best Regards,Strahil Nikolov On Thu, Apr 28, 2022 at 8:19, Eric Robinson wrote: This is probably off-topic but I’ll try anyway. Do we have any Linstor gurus around here? I’ve read

Re: [ClusterLabs] Help understanding recover of promotable resource after a "pcs cluster stop --all"

2022-05-02 Thread Strahil Nikolov via Users
Have you checked with drbd commands if the 2 nodes were in sync? Also consider adding the shared dir, lvm,etc into a single group -> see  https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_administration/s1-resourcegroupcreatenfs-haaa Best

Re: [ClusterLabs] IPaddr2 resource times out and cant be killed

2022-08-01 Thread Strahil Nikolov via Users
In clouds you can't just use VIPs.Use azure-lb resource instead. Best Regards,Strahil Nikolov  On Fri, Jul 29, 2022 at 23:21, Reid Wahl wrote: On Fri, Jul 29, 2022 at 1:02 PM Reid Wahl wrote: > > On Fri, Jul 29, 2022 at 12:52 PM Ross Sponholtz > wrote: > > > > I’m running a RHEL

Re: [ClusterLabs] [ClusterLabs Developers] How do I install and configure Pacemaker high-availability cluster resource manager?

2022-08-21 Thread Strahil Nikolov via Users
Also both SuSE and Red Hat documentation is quite extensive and can be considered as a good start. Best Regards,Strahil Nikolov  On Wed, Aug 10, 2022 at 18:41, Turritopsis Dohrnii Teo En Ming wrote: On Wed, 10 Aug 2022 at 23:37, Reid Wahl wrote: > > On Wed, Aug 10, 2022 at 8:13 AM

Re: [ClusterLabs] Antw: [EXT] Heads up for ldirectord in SLES12 SP5 "Use of uninitialized value $ip_port in pattern match (m//) at /usr/sbin/ldirectord line 1830"

2022-08-06 Thread Strahil Nikolov via Users
My rule number 1 is to put all DNS entries in /etc/hosts or use dnsmasq for local DNS caching.Rule number 2 , add cluster nodes as ntp/chrony peers (with 'prefer' for the ntp servers) to avoid node drift if time source is down for a long time. Should the cluster take care of unstable infra ->

[ClusterLabs] Questionsabout GCP VIP setup

2024-02-07 Thread Strahil Nikolov via Users
Hi All, This is my first cluster in the cloud and I have 2 questions that I'm hoping to get a clue. 1. Where I can find the 'gcloud-ra' binary on EL9 system ? I have installed resource-agents-cloud but I can't find it. 2. Is gcp-vpc-move-vip a good approach to setup the VIP ? Best

Re: [ClusterLabs] pcsd web interface not working on EL 9.3

2024-02-21 Thread Strahil Nikolov via Users
On Mon, Feb 19, 2024 at 10:16 AM lejeczek via Users wrote: > > > > On 19/02/2024 09:06, Strahil Nikolov via Users wrote: > > Hi All, > > > > Is there a specific setup I missed in order to setup the > > web interface ? > > > > Usually, you just login

[ClusterLabs] pcsd web interface not working on EL 9.3

2024-02-19 Thread Strahil Nikolov via Users
Hi All, Is there a specific setup I missed in order to setup the web interface ? Usually, you just login with the hacluster user on https://fqdn:2224 but when I do a curl, I get an empty response. Best Regards,Strahil Nikolov___ Manage your

[ClusterLabs] GCP and IP address question

2024-01-26 Thread Strahil Nikolov via Users
Hello All, I will soon build my first cluster in the cloud and I was wondering if I can still use IPAddr2 resource in GCP or I really have to use ocf:heartbeat:gcp-vpc-move-route & ocf:heartbeat:gcp-vpc-move-vip ? I'm still trying to find a guide, so I can understand the idea behind those

Re: [ClusterLabs] Questionsabout GCP VIP setup

2024-02-28 Thread Strahil Nikolov via Users
Hi Oyvind I found your e-mail in my spam folder.It seems 'gcloud-ra' doesn't exits and it's not needed for the fence agent or the gcp-vpc-move-vip. Best Regards,Strahil Nikolov On Wed, Feb 7, 2024 at 13:26, Oyvind Albrigtsen wrote: On 07/02/24 11:15 +, Strahil Nikolov via Users wrote

[ClusterLabs] Fencing doesn't work with google-cloud-cli

2024-03-27 Thread Strahil Nikolov via Users
Hi All, I'm starting this thread in order to warn you that if you updated recently and 'google-cloud-cli' rpm was deployed (obsoletes 'google-cloud-sdk'), fencing won't work for you despite that fence_gce and 'pcs stonith fence' report success. The VM stays in a odd status (right now I don't

Re: [ClusterLabs] Fencing doesn't work with google-cloud-cli

2024-03-27 Thread Strahil Nikolov via Users
Hi All, I'm sorry for the previous post. Most probably it's not google-cloud-cli as even after downgrading, fencing still doesn't work all the time. Best Regards, Strahil Nikolov В сряда, 27 март 2024 г. в 15:39:06 ч. Гринуич+2, Strahil Nikolov via Users написа: Hi All, I'm

Re: [ClusterLabs] Pcsd port change after cluster setup

2024-04-15 Thread Strahil Nikolov via Users
The interesting part is that after repeating the process (update the file, stop & start pcsd and pcs host auth ) everything is working fine including the web UI. Best Regards, Strahil Nikolov On Mon, Apr 15, 2024 at 17:20, Strahil Nikolov via Users wrote: Hi All, I need your

[ClusterLabs] Pcsd port change after cluster setup

2024-04-15 Thread Strahil Nikolov via Users
Hi All, I need your help to change the pcsd port.I set the port in /etc/sysconfig/pcsd on all nodes:PCSD_PORT=3500 Yet, the daemon is not listening on it. Best Regards, Strahil Nikolov___ Manage your subscription: