Re: [ClusterLabs] Remote nodes in an opt-in cluster

2024-04-23 Thread Andrei Borzenkov
On 23.04.2024 19:40, Jochen wrote: On 23. Apr 2024, at 17:41, Andrei Borzenkov wrote: On 23.04.2024 10:02, Jochen wrote: When trying to add a remote node to an opt-in cluster, the cluster does not start the remote resource. When I change the cluster to opt-out the remote resource

Re: [ClusterLabs] Remote nodes in an opt-in cluster

2024-04-23 Thread Andrei Borzenkov
On 23.04.2024 10:02, Jochen wrote: When trying to add a remote node to an opt-in cluster, the cluster does not start the remote resource. When I change the cluster to opt-out the remote resource is started. It's not clear what do you mean. Is "remote resource" the resource used to

Re: [ClusterLabs] colocation constraint - do I get it all wrong?

2024-02-05 Thread Andrei Borzenkov
On Mon, Feb 5, 2024 at 12:44 PM lejeczek via Users wrote: > > > > On 01/01/2024 18:28, Ken Gaillot wrote: > > On Fri, 2023-12-22 at 17:02 +0100, lejeczek via Users wrote: > >> hi guys. > >> > >> I have a colocation constraint: > >> > >> -> $ pcs constraint ref DHCPD > >> Resource: DHCPD > >>

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-19 Thread Andrei Borzenkov
on-fail=ignore break manual failover logic (stopped will be considered as failed and thus ignored)? best regards, Artem On Tue, 19 Dec 2023 at 17:03, Klaus Wenninger wrote: On Tue, Dec 19, 2023 at 10:00 AM Andrei Borzenkov wrote: On Tue, Dec 19, 2023 at 10:41 AM Artem wrote: ... Dec 19

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-19 Thread Andrei Borzenkov
On Tue, Dec 19, 2023 at 10:41 AM Artem wrote: ... > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] > (update_resource_action_runnable)warning: OST4_stop_0 on lustre4 is > unrunnable (node is offline) > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] >

Re: [ClusterLabs] resource-agents and VMs

2023-12-15 Thread Andrei Borzenkov
On Fri, Dec 15, 2023 at 2:23 PM lejeczek via Users wrote: > > Hi guys. > > my resources-agents depend like so: > > resource-agents-deps.target > ○ ├─00\\x2dVMsy.mount > ● └─virt-guest-shutdown.target > If this is output of "systemctl list-depenedncies" - it has a lot of flags that completely

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-12 Thread Andrei Borzenkov
On Tue, Dec 12, 2023 at 4:47 PM Artem wrote: > > > > On Tue, 12 Dec 2023 at 16:17, Andrei Borzenkov wrote: >> >> On Fri, Dec 8, 2023 at 5:44 PM Artem wrote: >> > pcs constraint location FAKE3 rule score=0 pingd lt 1 or not_defined pingd >> > pcs constra

Re: [ClusterLabs] resource fails manual failover

2023-12-12 Thread Andrei Borzenkov
On Tue, Dec 12, 2023 at 4:50 PM Artem wrote: > > Is there a detailed explanation for resource monitor and start timeouts and > intervals with examples, for dummies? > > my resource configured s follows: > [root@lustre-mds1 ~]# pcs resource show MDT00 > Warning: This command is deprecated and

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-12 Thread Andrei Borzenkov
On Fri, Dec 8, 2023 at 5:44 PM Artem wrote: > > Hello experts. > > I use pacemaker for a Lustre cluster. But for simplicity and exploration I > use a Dummy resource. I didn't like how resource performed failover and > failback. When I shut down VM with remote agent, pacemaker tries to restart

Re: [ClusterLabs] make promoted follow promoted resource ?

2023-11-26 Thread Andrei Borzenkov
On 26.11.2023 12:32, lejeczek via Users wrote: Hi guys. With these: -> $ pcs resource status REDIS-6381-clone   * Clone Set: REDIS-6381-clone [REDIS-6381] (promotable):     * Promoted: [ ubusrv2 ]     * Unpromoted: [ ubusrv1 ubusrv3 ] -> $ pcs resource status PGSQL-PAF-5433-clone   *

Re: [ClusterLabs] Using cluster without fencing

2023-10-16 Thread Andrei Borzenkov
On Mon, Oct 16, 2023 at 9:28 AM Sergey Cherukhin wrote: > > Hello! > > I use Postgresql+Pacemaker+Corosync 3 nodes cluster with 2 Postgresql > instances in synchronous replication mode on two high performance nodes and > Pacemaker+Corosync on the third low performance node for quorum only. At

Re: [ClusterLabs] Mutually exclusive resources ?

2023-09-27 Thread Andrei Borzenkov
On Wed, Sep 27, 2023 at 3:21 PM Adam Cecile wrote: > > Hello, > > > I'm struggling to understand if it's possible to create some kind of > constraint to avoid two different resources to be running on the same host. > > Basically, I'd like to have floating IP "1" and floating IP "2" always being

Re: [ClusterLabs] PAF / PGSQLMS on Ubuntu

2023-09-07 Thread Andrei Borzenkov
On Thu, Sep 7, 2023 at 5:01 PM lejeczek via Users wrote: > > Hi guys. > > I'm trying to set ocf_heartbeat_pgsqlms agent but I get: > ... > Failed Resource Actions: > * PGSQL-PAF-5433 stop on ubusrv3 returned 'invalid parameter' because > 'Parameter "recovery_target_timeline" MUST be set to

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Andrei Borzenkov
> last_man_standing. > Then, I should set up another server with qdevice and configure that using > the LMS algorithm. > > Thanks > David > > On Mon, 4 Sept 2023 at 13:32, Klaus Wenninger wrote: >> >> >> >> On Mon, Sep 4, 2023 at

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Andrei Borzenkov
On Mon, Sep 4, 2023 at 2:18 PM Klaus Wenninger wrote: > > > > On Mon, Sep 4, 2023 at 12:45 PM David Dolan wrote: >> >> Hi Klaus, >> >> With default quorum options I've performed the following on my 3 node cluster >> >> Bring down cluster services on one node - the running services migrate to >>

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Andrei Borzenkov
On Mon, Sep 4, 2023 at 2:25 PM Klaus Wenninger wrote: > > > Or go for qdevice with LMS where I would expect it to be able to really go > down to > a single node left - any of the 2 last ones - as there is still qdevice.# > Sry for the confusion btw. > According to documentation, "LMS is also

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Andrei Borzenkov
On Mon, Sep 4, 2023 at 1:45 PM David Dolan wrote: > > Hi Klaus, > > With default quorum options I've performed the following on my 3 node cluster > > Bring down cluster services on one node - the running services migrate to > another node > Wait 3 minutes > Bring down cluster services on one of

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-08-30 Thread Andrei Borzenkov
On 30.08.2023 19:23, David Dolan wrote: Use fencing. Quorum is not a replacement for fencing. With (reliable) fencing you can simply run pacemaker with no-quorum-policy=ignore. The practical problem is that usually the last resort that will work in all cases is SBD + suicide and SBD cannot

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-08-30 Thread Andrei Borzenkov
On Wed, Aug 30, 2023 at 3:34 PM David Dolan wrote: > > Hi All, > > I'm running Pacemaker on Centos7 > Name: pcs > Version : 0.9.169 > Release : 3.el7.centos.3 > Architecture: x86_64 > > > I'm performing some cluster failover tests in a 3 node cluster. We have 3 > resources in the

Re: [ClusterLabs] pacemaker:start-delay

2023-08-18 Thread Andrei Borzenkov
On Fri, Aug 18, 2023 at 12:13 PM Mr.R via Users wrote: > > Hi all, > > There is a problem with the start-delay of monitor during the process of > configuring and starting resources. > > For example, there is the result of resource config. > > Resource: d1 (class=ocf provider=pacemaker type=Dummy)

Re: [ClusterLabs] [EXT] Re: Fence Agents Format

2023-07-28 Thread Andrei Borzenkov
On 28.07.2023 09:46, Windl, Ulrich wrote: Hi! On " Manual fencing or meatware is when an administrator must manually power-cycle a machine (or unplug its storage cables) and follow up with the cluster, notifying the cluster that the machine has been fenced. This is never recommended.": Maybe

Re: [ClusterLabs] location constraint does not move promoted resource ?

2023-07-03 Thread Andrei Borzenkov
On 03.07.2023 19:39, Ken Gaillot wrote: On Mon, 2023-07-03 at 19:22 +0300, Andrei Borzenkov wrote: On 03.07.2023 18:07, Ken Gaillot wrote: On Mon, 2023-07-03 at 12:20 +0200, lejeczek via Users wrote: On 03/07/2023 11:16, Andrei Borzenkov wrote: On 03.07.2023 12:05, lejeczek via Users wrote

Re: [ClusterLabs] location constraint does not move promoted resource ?

2023-07-03 Thread Andrei Borzenkov
On 03.07.2023 18:07, Ken Gaillot wrote: On Mon, 2023-07-03 at 12:20 +0200, lejeczek via Users wrote: On 03/07/2023 11:16, Andrei Borzenkov wrote: On 03.07.2023 12:05, lejeczek via Users wrote: Hi guys. I have pgsql with I constrain like so: -> $ pcs constraint location PGSQL-clone r

Re: [ClusterLabs] location constraint does not move promoted resource ?

2023-07-03 Thread Andrei Borzenkov
On 03.07.2023 12:05, lejeczek via Users wrote: Hi guys. I have pgsql with I constrain like so: -> $ pcs constraint location PGSQL-clone rule role=Promoted score=-1000 gateway-link ne 1 and I have a few more location constraints with that ethmonitor & those work, but this one does not seem to.

Re: [ClusterLabs] silence resource ? - PGSQL

2023-06-28 Thread Andrei Borzenkov
On 28.06.2023 14:11, lejeczek via Users wrote: Hi guys. Having 'pgsql' set up in what I'd say is a vanilla-default confg, pacemaker's journal log is flooded with: ... pam_unix(runuser:session): session closed for user postgres pam_unix(runuser:session): session opened for user postgres(uid=26)

Re: [ClusterLabs] no-quorum-policy=ignore is (Deprecated ) and replaced with other options but not an effective solution

2023-06-27 Thread Andrei Borzenkov
So quorum wasn't attained again. 1) For such a scenario we need help to be able to have one cluster live . 2) And in cases where only one node of the cluster is up and others are down we need the resources and cluster to be up . Thanks Priyanka On Tue, Jun 27, 2023 at 12:25 AM Andrei Borzen

Re: [ClusterLabs] no-quorum-policy=ignore is (Deprecated ) and replaced with other options but not an effective solution

2023-06-26 Thread Andrei Borzenkov
On 26.06.2023 21:14, Priyanka Balotra wrote: Hi All, We are seeing an issue where we replaced no-quorum-policy=ignore with other options in corosync.conf order to simulate the same behaviour : * wait_for_all: 0* *last_man_standing: 1last_man_standing_window: 2* There

Re: [ClusterLabs] host in standby causes havoc

2023-06-15 Thread Andrei Borzenkov
On 15.06.2023 13:58, Kadlecsik József wrote: Hello, We had a strange issue here: 7 node cluster, one node was put into standby mode to test a new iscsi setting on it. During configuring the machine it was rebooted and after the reboot the iscsi didn't come up. That caused a malformed

Re: [ClusterLabs] 99-VirtualDomain-libvirt.conf under control - ?

2023-05-05 Thread Andrei Borzenkov
On Fri, May 5, 2023 at 12:10 PM lejeczek via Users wrote: > > > > On 05/05/2023 10:08, Andrei Borzenkov wrote: > > On Fri, May 5, 2023 at 11:03 AM lejeczek via Users > > wrote: > >> > >> > >> On 29/04/2023 21:02, Reid Wahl wrote: >

Re: [ClusterLabs] 99-VirtualDomain-libvirt.conf under control - ?

2023-05-05 Thread Andrei Borzenkov
On Fri, May 5, 2023 at 11:03 AM lejeczek via Users wrote: > > > > On 29/04/2023 21:02, Reid Wahl wrote: > > On Sat, Apr 29, 2023 at 3:34 AM lejeczek via Users > > wrote: > >> Hi guys. > >> > >> I presume these are a consequence of having resource of VirtuaDomain type > >> set up(& enabled) -

Re: [ClusterLabs] How to block/stop a resource from running twice?

2023-04-24 Thread Andrei Borzenkov
On Mon, Apr 24, 2023 at 11:52 AM Klaus Wenninger wrote: > The checking for a running resource that isn't expected to be running isn't > done periodically (at > least not per default and I don't know a way to achieve that from the top of > my mind). op monitor role=Stopped interval=20s

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-17 Thread Andrei Borzenkov
On Mon, Apr 17, 2023 at 10:48 AM Philip Schiller wrote: > > Hello Andrei, > > you wrote: > > >>As a workaround you could add dummy clone resource colocated with and > >>ordered after your DRBD masters and order VM after this clone. > > Thanks for the idea. This looks like a good option to solve

Re: [ClusterLabs] VirtualDomain - node map - ?

2023-04-16 Thread Andrei Borzenkov
On 16.04.2023 16:29, lejeczek via Users wrote: On 16/04/2023 12:54, Andrei Borzenkov wrote: On 16.04.2023 13:40, lejeczek via Users wrote: Hi guys Some agents do employ that concept of node/host map which I do not see in any manual/docs that this agent does - would you suggest some

Re: [ClusterLabs] VirtualDomain - node map - ?

2023-04-16 Thread Andrei Borzenkov
On 16.04.2023 13:40, lejeczek via Users wrote: Hi guys Some agents do employ that concept of node/host map which I do not see in any manual/docs that this agent does - would you suggest some technique or tips on how to achieve similar? I'm thinking specifically of 'migrate' here, as I

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-14 Thread Andrei Borzenkov
On 14.04.2023 14:35, Andrei Borzenkov wrote: On Fri, Apr 14, 2023 at 11:45 AM Philip Schiller wrote: I would like to know if the order constraint is equivalent to: "First promote ms-drbd_fs then start drbd_vm". No, it is not. It is equivalent to order drbd_vm_after_drbd_fs Man

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-14 Thread Andrei Borzenkov
On Fri, Apr 14, 2023 at 2:35 PM Andrei Borzenkov wrote: > > As far as I can tell, pacemaker simply does not support migration > together with demote/promote actions. I don't really know the reasons. Thinking about it - migrating a resource depending on a master is simply not

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-14 Thread Andrei Borzenkov
On Fri, Apr 14, 2023 at 11:45 AM Philip Schiller wrote: > > I would like to know if the order constraint Mandatory: ms-drbd_fs:promote drbd_vm> > is equivalent to: "First promote ms-drbd_fs then start drbd_vm". > No, it is not. It is equivalent to order drbd_vm_after_drbd_fs Mandatory:

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-14 Thread Andrei Borzenkov
I am not sure if I have to mark it as fixed. > Also I hope that I used the Mailing List correctly as I didn't really reply > to answers. Instead I wrote new Mails to users@... with the topic in CC. > > Can you elaborate a little bit more on the behavior of the order constraint >

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-13 Thread Andrei Borzenkov
On 13.04.2023 22:24, Andrei Borzenkov wrote: ... order drbd_vm_after_drbd_fs Mandatory: ms-drbd_fs:promote drbd_vm:start After I added it back I get the same failed "demote" action. Transition Summary: * Stop zfs_drbd_storage:0 (ha1 ) due to node av

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-13 Thread Andrei Borzenkov
On 12.04.2023 15:44, Philip Schiller wrote: Here are also some Additional some additional information for a failover with setting the node standby. Apr 12 12:40:28 s1 pacemaker-controld[1611990]:  notice: State transition S_IDLE -> S_POLICY_ENGINE Apr 12 12:40:28 s1

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-12 Thread Andrei Borzenkov
On Wed, Apr 12, 2023 at 1:21 PM Vladislav Bogdanov wrote: > > Hi, > > Just add a Master role for drbd resource in the colocation. Default is > Started (or Slave). > Could you elaborate why it is needed? The problem is not leaving the resource on the node with a demoted instance - when the node

Re: [ClusterLabs] Location not working [FIXED]

2023-04-12 Thread Andrei Borzenkov
On Tue, Apr 11, 2023 at 6:27 PM Ken Gaillot wrote: > > On Tue, 2023-04-11 at 17:31 +0300, Miro Igov wrote: > > I fixed the issue by changing location definition from: > > > > location intranet-ip_on_any_nginx intranet-ip \ > > rule -inf: opa-nginx_1_active eq 0 \ > > rule -inf:

Re: [ClusterLabs] NFS mount fails to stop if NFS server is lost

2023-04-11 Thread Andrei Borzenkov
On 11.04.2023 17:35, Miro Igov wrote: Hello, I have a node nas-sync-test1 with NFS server and NFS export running and another node intranet-test1 with data_1 fs mount: primitive data_1 Filesystem \ params device="nas-sync-test1:/home/pharmya/NAS" fstype=nfs options=v4

Re: [ClusterLabs] Location not working

2023-04-10 Thread Andrei Borzenkov
On Mon, Apr 10, 2023 at 4:26 PM Ken Gaillot wrote: > > On Mon, 2023-04-10 at 14:18 +0300, Miro Igov wrote: > > Hello, > > I have a resource with location constraint set to: > > > > location intranet-ip_on_any_nginx intranet-ip \ > > rule -inf: opa-nginx_1_active eq 0 \ > > rule

Re: [ClusterLabs] Location not working

2023-04-10 Thread Andrei Borzenkov
On Mon, Apr 10, 2023 at 2:19 PM Miro Igov wrote: > Hello, > > I have a resource with location constraint set to: > > > > location intranet-ip_on_any_nginx intranet-ip \ > > rule -inf: opa-nginx_1_active eq 0 \ > > rule -inf: opa-nginx_2_active eq 0 > > > > In syslog I see the

Re: [ClusterLabs] Help with tweaking an active/passive NFS cluster

2023-04-05 Thread Andrei Borzenkov
On Wed, Apr 5, 2023 at 10:36 AM Andrei Borzenkov wrote: > but in your case members of set are on the same node Are NOT on the same node of course. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs h

Re: [ClusterLabs] Help with tweaking an active/passive NFS cluster

2023-04-05 Thread Andrei Borzenkov
On Fri, Mar 31, 2023 at 12:42 AM Ronny Adsetts wrote: > > Hi, > > I wonder if someone more familiar with the workings of pacemaker/corosync > would be able to assist in solving an issue. > > I have a 3-node NFS cluster which exports several iSCSI LUNs. The LUNs are > presented to the nodes via

[ClusterLabs] The latest shim from Leap 15.4 disallows shim from Tumbleweed and possibly other distributions

2023-04-01 Thread Andrei Borzenkov
https://forums.opensuse.org/t/after-a-shim-update-yesterday-no-longer-able-to-boot-with-secure-boot-enabled/165382/16 https://bugzilla.opensuse.org/show_bug.cgi?id=1209985 To explain. There is relatively new standard SBAT which makes it possible to "mass blacklist" EFI binaries supporting it.

Re: [ClusterLabs] resource cloned group colocations

2023-03-02 Thread Andrei Borzenkov
On Thu, Mar 2, 2023 at 4:16 PM Gerald Vogt wrote: > > On 02.03.23 13:51, Klaus Wenninger wrote: > > Now if I stop pacemaker on one of those nodes, e.g. on node ha2, it's > > fine. ip2 will be moved immediately to ha3. Good. > > > > However, if pacemaker on ha2 starts up again, it will

Re: [ClusterLabs] Systemd resource started on node after reboot before cluster is stable ?

2023-02-15 Thread Andrei Borzenkov
On Wed, Feb 15, 2023 at 12:49 PM Adam Cecile wrote: > > Hello, > > Just had some issue with unexpected server behavior after reboot. This node > was powered off, so cluster was running fine with this tomcat9 resource > running on a different machine. > > After powering on this node again, it

Re: [ClusterLabs] Load balancing, of a sort

2023-01-25 Thread Andrei Borzenkov
On Wed, Jan 25, 2023 at 3:49 PM Antony Stone wrote: > > Hi. > > I have a corosync / pacemaker 3-node cluster with a resource group which can > run on any node in the cluster. > > Every night a cron job on the node which is running the resources performs > "crm_standby -v on" followed a short

Re: [ClusterLabs] Failed 'virsh' call when test RA run by crm_resource (con't)

2023-01-12 Thread Andrei Borzenkov
On Thu, Jan 12, 2023 at 12:50 PM Keisuke MORI wrote: > > Hi, > > Just a guess but could it be the same issue with this? > > https://serverfault.com/questions/1105733/virsh-command-hangs-when-script-runs-in-the-background > That is exactly the same issue. The reason for SIGTTOU is explained in

Re: [ClusterLabs] Antw: [EXT] Re: Stonith

2022-12-20 Thread Andrei Borzenkov
On Tue, Dec 20, 2022 at 10:07 AM Ulrich Windl wrote: > > > > But keep in mind that if the whole site is down (or unaccessible) you > > will not have access to IPMI/PDU/whatever on this site so your stonith > > agents will fail ... > > But, considering the design, such site won't have a quorum and

Re: [ClusterLabs] Stonith

2022-12-19 Thread Andrei Borzenkov
On Mon, Dec 19, 2022 at 4:01 PM Antony Stone wrote: > > On Monday 19 December 2022 at 13:55:45, Andrei Borzenkov wrote: > > > On Mon, Dec 19, 2022 at 3:44 PM Antony Stone > > > > wrote: > > > So, do I simply create one stonith resource for each server, and re

Re: [ClusterLabs] Stonith

2022-12-19 Thread Andrei Borzenkov
On Mon, Dec 19, 2022 at 3:44 PM Antony Stone wrote: > > So, do I simply create one stonith resource for each server, and rely on some > other random server to invoke it when needed? > Yes, this is the most simple approach. You need to restrict this stonith resource to only one cluster node (set

Re: [ClusterLabs] [External] : Re: Fence Agent tests

2022-11-09 Thread Andrei Borzenkov
On Mon, Nov 7, 2022 at 5:07 PM Robert Hayden wrote: > > > > -Original Message- > > From: Users On Behalf Of Valentin Vidic > > via Users > > Sent: Sunday, November 6, 2022 5:20 PM > > To: users@clusterlabs.org > > Cc: Valentin Vidić > > Subject: Re: [ClusterLabs] [External] : Re: Fence

Re: [ClusterLabs] Fence Agent tests

2022-11-05 Thread Andrei Borzenkov
On 04.11.2022 23:46, Robert Hayden wrote: I am working on a Fencing agent for the Oracle Cloud Infrastructure (OCI) environment to complete power fencing of compute instances. The only fencing setups I have seen for OCI are using SBD, but that is insufficient with full network interruptions

Re: [ClusterLabs] Cluster does not start resources

2022-08-23 Thread Andrei Borzenkov
On 24.08.2022 08:13, Lentes, Bernd wrote: > > > - On 24 Aug, 2022, at 07:03, arvidjaar arvidj...@gmail.com wrote: > >> On 24.08.2022 07:34, Lentes, Bernd wrote: >>> >>> >>> - On 24 Aug, 2022, at 05:33, Reid Wahl nw...@redhat.com wrote: >>> >>> The stop-all-resources cluster

Re: [ClusterLabs] Cluster does not start resources

2022-08-23 Thread Andrei Borzenkov
On 24.08.2022 07:34, Lentes, Bernd wrote: > > > - On 24 Aug, 2022, at 05:33, Reid Wahl nw...@redhat.com wrote: > > >> The stop-all-resources cluster property is set to true. Is that intentional? > OMG. Thanks Reid ! > > But unfortunately not all virtual domains are running: > what

Re: [ClusterLabs] Start resource only if another resource is stopped

2022-08-18 Thread Andrei Borzenkov
On 17.08.2022 16:58, Miro Igov wrote: > As you guessed i am using crm res stop nfs_export_1. > I tried the solution with attribute and it does not work correct. > It does what you asked for originally, but you are shifting the goalposts ... > When i stop nfs_export_1 it stops data_1

Re: [ClusterLabs] Start resource only if another resource is stopped

2022-08-11 Thread Andrei Borzenkov
On 11.08.2022 17:34, Miro Igov wrote: > Hello, > > I am trying to create failover resource that would start if another resource > is stopped and stop when the resource is started back. > > It is 4 node cluster (with qdevice) where nodes are virtual machines and two > of them are hosted in a

Re: [ClusterLabs] Antw: [EXT] node1 and node2 communication time question

2022-08-10 Thread Andrei Borzenkov
On 10.08.2022 09:37, Ulrich Windl wrote: > Unfortunately the documentation for fencing agents leaves verymuch to be > desired: > When I tried to write one myself, I just stopped due to lack of details. > It is not about writing own agent but about using existing ones. There are enough fencing

Re: [ClusterLabs] 2-Node Cluster - fencing with just one node running ?

2022-08-04 Thread Andrei Borzenkov
On 04.08.2022 16:06, Lentes, Bernd wrote: > > - On 4 Aug, 2022, at 00:27, Reid Wahl nw...@redhat.com wrote: > >> >> Such constraints are unnecessary. >> >> Let's say we have two stonith devices called "fence_dev1" and >> "fence_dev2" that fence nodes 1 and 2, respectively. If node 2 needs >>

Re: [ClusterLabs] Antw: [EXT] Re: Q: About a false negative of storage_mon

2022-08-03 Thread Andrei Borzenkov
On 03.08.2022 09:02, Ulrich Windl wrote: Ken Gaillot schrieb am 02.08.2022 um 16:09 in > Nachricht > <0a2125a43bbfc09d2ca5bad1a693710f00e33731.ca...@redhat.com>: >> On Tue, 2022-08-02 at 19:13 +0900, 井上和徳 wrote: >>> Hi, >>> >>> Since O_DIRECT is not specified in open() [1], it reads the

Re: [ClusterLabs] Fencing for quorum device?

2022-07-15 Thread Andrei Borzenkov
On 15.07.2022 09:24, Viet Nguyen wrote: > Hi, > > I just wonder that do we need to have fencing for a quorum device? I have 2 > node cluster with one quorum device. Both 2 nodes have fencing agents. > > But I wonder that should i define the fencing agent for quorum device or > not? You cannot.

Re: [ClusterLabs] Question regarding the security of corosync

2022-06-21 Thread Andrei Borzenkov
On 22.06.2022 02:27, Antony Stone wrote: > On Friday 17 June 2022 at 11:39:14, Mario Freytag wrote: > >> I’d like to ask about the security of corosync. We’re using a Proxmox HA >> setup in our testing environment and need to confirm it’s compliance with >> PCI guidelines. >> >> We have a few

Re: [ClusterLabs] related to fencing in general , docker containers

2022-06-19 Thread Andrei Borzenkov
ers(pacemaker + >> corosync + sqlServer), it should be of some help, and will check how to >> handle VM failures >> >> You >>> will probably need one fencing agent for each physical host where >>> docker >>> is running and map cluster nodes (containers)

Re: [ClusterLabs] related to fencing in general , docker containers

2022-06-17 Thread Andrei Borzenkov
On 17.06.2022 16:53, Sridhar K wrote: > Hi Team, > > Please share any pointers, references, example usage's w.r.t fencing in > general and its use w.r.t docker containers. > > referring as of now > https://clusterlabs.org/pacemaker/doc/crm_fencing.html > > need to check the feasibility of

Re: [ClusterLabs] Required guidance w.r.t pacemaker

2022-06-08 Thread Andrei Borzenkov
On 08.06.2022 17:01, Ken Gaillot wrote: > On Wed, 2022-06-08 at 18:31 +0530, Sridhar K wrote: >> Hi Team, >> >> Required guidance w.r.t below problem statement >> >> Need to have a HA setup for SQLServer running as a docker container >> and HA managed by the Pacemaker which is running as a

Re: [ClusterLabs] Required guidance w.r.t pacemaker

2022-06-08 Thread Andrei Borzenkov
and sqlserver are running as > different containers > > Regards > Sridharan > > > > > > > > > > > > > On Wed, 8 Jun 2022 at 18:52, Andrei Borzenkov wrote: > >> On Wed, Jun 8, 2022 at 4:01 PM Sridhar K wrote: >>> &g

Re: [ClusterLabs] Required guidance w.r.t pacemaker

2022-06-08 Thread Andrei Borzenkov
On Wed, Jun 8, 2022 at 4:01 PM Sridhar K wrote: > > Hi Team, > > Required guidance w.r.t below problem statement > > Need to have a HA setup for SQLServer running as a docker container and HA > managed by the Pacemaker which is running as a separate docker container. > It is very unlikely to be

Re: [ClusterLabs] fencing configuration

2022-06-07 Thread Andrei Borzenkov
On 07.06.2022 11:50, Klaus Wenninger wrote: >> >> From the documentation is not clear to me whether this would be: >> a) multiple fencing where ipmi would be first level and sbd would be a >> second level fencing (where sbd always succeeds) >> b) or this is considered a single level fencing with

Re: [ClusterLabs] fencing configuration

2022-06-07 Thread Andrei Borzenkov
On 07.06.2022 11:26, Zoran Bošnjak wrote: > > In the test scenario, the dummy resource is currently running on node1. I > have simulated node failure by unplugging the ipmi AND host network > interfaces from node1. The result was that node1 gets rebooted (by watchdog), > but the rest of the

Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Andrei Borzenkov
For test purpose try to use script that loops until sbd is actually stopped for ExecStop. Note that systemd strongly recommends to use synchronous command for ExecStop (we may argue that this should be handled by service manager itself, but well ...). > > Zoran > > - Original Message ---

Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Andrei Borzenkov
On 03.06.2022 11:18, Zoran Bošnjak wrote: > Hi all, > I would appreciate an advice about sbd fencing (without shared storage). > > I am using ubuntu 20.04., with default packages from the repository > (pacemaker, corosync, fence-agents, ipmitool, pcs...). > > HW watchdog is present on servers.

Re: [ClusterLabs] More pacemaker oddities while stopping DC

2022-05-27 Thread Andrei Borzenkov
On 25.05.2022 09:47, Gao,Yan via Users wrote: > On 2022/5/25 8:10, Ulrich Windl wrote: >> Hi! >> >> We are still suffering from kernel RAM corruption on the Xen hypervisor when >> a VM or the hypervisor is doing I/O (three months since the bug report at >> SUSE, but no fix or workaround meaning

Re: [ClusterLabs] how does the VirtualDomain RA know with which options it's called ?

2022-05-12 Thread Andrei Borzenkov
On 12.05.2022 21:03, Lentes, Bernd wrote: > Hi, > > from my understanding the resource agents in > /usr/lib/ocf/resource.d/heartbeat are quite similar > to the old scripts in /etc/init.d started by init. > Init starts these scripts with "script [start|stop|reload|restart|status]". > Inside the

Re: [ClusterLabs] Antw: [EXT] Re: Help understanding recover of promotable resource after a "pcs cluster stop ‑‑all"

2022-05-03 Thread Andrei Borzenkov
On 03.05.2022 10:40, Ulrich Windl wrote: > Hi! > > I don't use DRBD, but I can imagine: > If DRBD does asynchronous replication, it may make sense not to promote the > slave as master after an interrupte dconnection (such when the master died) > (as > this will cause some data loss). > Probably

Re: [ClusterLabs] Help understanding recover of promotable resource after a "pcs cluster stop --all"

2022-05-03 Thread Andrei Borzenkov
On 03.05.2022 00:25, Ken Gaillot wrote: > On Mon, 2022-05-02 at 13:11 -0300, Salatiel Filho wrote: >> Hi, Ken, here is the info you asked for. >> >> >> # pcs constraint >> Location Constraints: >> Resource: fence-server1 >> Disabled on: >> Node: server1 (score:-INFINITY) >> Resource:

Re: [ClusterLabs] Can a two node cluster start resources if only one node is booted?

2022-04-22 Thread Andrei Borzenkov
On 22.04.2022 16:01, john tillman wrote: >> On Fri, Apr 22, 2022 at 12:05 PM Tomas Jelinek >> wrote: >>> >>> As discussed in other branches of this thread, you need to figure out >>> why pacemaker is not starting. Even if one node is not running, corosync >>> and pacemaker are expected to be able

Re: [ClusterLabs] Can a two node cluster start resources if only one node is booted?

2022-04-22 Thread Andrei Borzenkov
On Fri, Apr 22, 2022 at 12:05 PM Tomas Jelinek wrote: > > As discussed in other branches of this thread, you need to figure out > why pacemaker is not starting. Even if one node is not running, corosync > and pacemaker are expected to be able to start on the other node. Well, when trying to

Re: [ClusterLabs] Can a two node cluster start resources if only one node is booted?

2022-04-21 Thread Andrei Borzenkov
On 21.04.2022 18:26, john tillman wrote: >> Dne 20. 04. 22 v 20:21 john tillman napsal(a): On 20.04.2022 19:53, john tillman wrote: > I have a two node cluster that won't start any resources if only one > node > is booted; the pacemaker service does not start. > > Once the

Re: [ClusterLabs] Can a two node cluster start resources if only one node is booted?

2022-04-20 Thread Andrei Borzenkov
On 20.04.2022 19:53, john tillman wrote: > I have a two node cluster that won't start any resources if only one node > is booted; the pacemaker service does not start. > > Once the second node boots up, the first node will start pacemaker and the > resources are started. All is well. But I

Re: [ClusterLabs] Coming in Pacemaker 2.1.3: multiple-active=stop_unexpected

2022-04-08 Thread Andrei Borzenkov
On 08.04.2022 20:16, Ken Gaillot wrote: > Hi all, > > I'm hoping to have the first release candidate for Pacemaker 2.1.3 > available in a couple of weeks. > > One of the new features will be a new possible value for the "multiple- > active" resource meta-attribute, which specifies how the

Re: [ClusterLabs] Antw: [EXT] Re: Failed migration causing fencing loop

2022-04-03 Thread Andrei Borzenkov
On 31.03.2022 14:02, Ulrich Windl wrote: "Gao,Yan" schrieb am 31.03.2022 um 11:18 in Nachricht > <67785c2f-f875-cb16-608b-77d63d9b0...@suse.com>: >> On 2022/3/31 9:03, Ulrich Windl wrote: >>> Hi! >>> >>> I just wanted to point out one thing that hit us with SLES15 SP3: >>> Some failed live

Re: [ClusterLabs] Order constraint with a timeout?

2022-03-29 Thread Andrei Borzenkov
On 29.03.2022 15:38, john tillman wrote: >> On 29.03.2022 00:26, john tillman wrote: On Mon, 2022-03-28 at 14:03 -0400, john tillman wrote: > Greetings all, > > Is it possible to have an order constraint with a timeout? I can't > find > one but perhaps I am using the

Re: [ClusterLabs] Order constraint with a timeout?

2022-03-28 Thread Andrei Borzenkov
On 29.03.2022 00:26, john tillman wrote: >> On Mon, 2022-03-28 at 14:03 -0400, john tillman wrote: >>> Greetings all, >>> >>> Is it possible to have an order constraint with a timeout? I can't >>> find >>> one but perhaps I am using the wrong keywords in google. >>> >>> I have several Filesystem

Re: [ClusterLabs] Resources too_active (active on all nodes of the cluster, instead of only 1 node)

2022-03-23 Thread Andrei Borzenkov
On 23.03.2022 08:30, Balotra, Priyanka wrote: > Hi All, > > We have a scenario on SLES 12 SP3 cluster. > The scenario is explained as follows in the order of events: > > * There is a 2-node cluster (FILE-1, FILE-2) > * The cluster and the resources were up and running fine initially . >

Re: [ClusterLabs] What's wrong with IPsrcaddr?

2022-03-17 Thread Andrei Borzenkov
There are both .23 alias and def route src. After a network > failure, there is NO default route at all on both nodes and IPsrcaddr > fails, as it requires default route. > I already explained above why IPsrcaddr was not migrated. > > ср, 16 мар. 2022 г. в 19:23, Andrei Borzenkov

Re: [ClusterLabs] What's wrong with IPsrcaddr?

2022-03-16 Thread Andrei Borzenkov
On 16.03.2022 12:24, ZZ Wave wrote: > Hello. I'm trying to implement floating IP with pacemaker but I can't > get IPsrcaddr to work correctly. I want a following thing - floating > IP and its route SRC is started on node1. If node1 loses network > connectivity to node2, node1 should instantly

Re: [ClusterLabs] constraining multiple cloned resources to the same node

2022-03-15 Thread Andrei Borzenkov
On 15.03.2022 21:53, john tillman wrote: >> On 15.03.2022 19:35, john tillman wrote: >>> Hello, >>> >>> I'm trying to guarantee that all my cloned drbd resources start on the >>> same node and I can't figure out the syntax of the constraint to do it. >>> >>> I could nominate one of the drbd

Re: [ClusterLabs] constraining multiple cloned resources to the same node

2022-03-15 Thread Andrei Borzenkov
On 15.03.2022 19:35, john tillman wrote: > Hello, > > I'm trying to guarantee that all my cloned drbd resources start on the > same node and I can't figure out the syntax of the constraint to do it. > > I could nominate one of the drbd resources as a "leader" and have all the > others follow it.

Re: [ClusterLabs] Filesystem resource agent w/ filesystem attribute 'noauto'

2022-03-09 Thread Andrei Borzenkov
On 09.03.2022 18:45, Asseel Sidique wrote: > Hi Team, > > My question is regarding the filesystem resource agent. In the filesystem > resource > agent > , there is a comment that states: > > # Do not put this

Re: [ClusterLabs] Q: fence_kdump and fence_kdump_send

2022-02-25 Thread Andrei Borzenkov
On Fri, Feb 25, 2022 at 2:23 PM Reid Wahl wrote: > > On Fri, Feb 25, 2022 at 3:22 AM Reid Wahl wrote: > > ... > > > > > > So what happens most likely is that the watchdog terminates the kdump. > > > In that case all the mess with fence_kdump won't help, right? > > > > You can configure

Re: [ClusterLabs] Booth ticket multi-site and quorum /Pacemaker

2022-02-24 Thread Andrei Borzenkov
On Thu, Feb 24, 2022 at 1:17 PM Jan Friesse wrote: > > On 24/02/2022 10:28, Viet Nguyen wrote: > > Hi, > > > > Thank you so so much for your help. May i ask a following up question: > > > > For the option of having one big cluster with 4 nodes without booth, then, > > if one site (having 2 nodes)

Re: [ClusterLabs] crm resource stop VirtualDomain - but VirtualDomain shutdown start some minutes later

2022-02-16 Thread Andrei Borzenkov
On 16.02.2022 20:48, Andrei Borzenkov wrote: > > I guess the real question here is why "Transition aborted" is logged although > transition apparently continues. Transition 128 started at 20:54:30 and > completed > at 21:04:26, but there were multiple "Tr

Re: [ClusterLabs] crm resource stop VirtualDomain - but VirtualDomain shutdown start some minutes later

2022-02-16 Thread Andrei Borzenkov
On 16.02.2022 14:35, Lentes, Bernd wrote: > > > - On Feb 16, 2022, at 12:52 AM, kgaillot kgail...@redhat.com wrote: > > >>> Any idea ? >>> What is about that transition 128, which is aborted ? >> >> A transition is the set of actions that need to be taken in response to >> current

Re: [ClusterLabs] Antw: Antw: [EXT] Re: heads up: Possible VM data corruption upgrading to SLES15 SP3

2022-01-28 Thread Andrei Borzenkov
On Fri, Jan 28, 2022 at 11:00 AM Ulrich Windl wrote: > > >>> "Ulrich Windl" schrieb am 28.01.2022 > um > 08:51 in Nachricht <61f3a06602a100047...@gwsmtp.uni-regensburg.de>: > >>>> Andrei Borzenkov schrieb am 28.01.2022 um 06:38 in >

Re: [ClusterLabs] heads up: Possible VM data corruption upgrading to SLES15 SP3

2022-01-27 Thread Andrei Borzenkov
On Thu, Jan 27, 2022 at 5:10 PM Ulrich Windl wrote: > > Any better ideas anyone? > Perform online upgrade. Any reason you need to do an offline upgrade in the first place? ___ Manage your subscription:

Re: [ClusterLabs] Question: Mount Monitoring for Non-shared File-system

2021-12-07 Thread Andrei Borzenkov
On 07.12.2021 21:35, Asseel Sidique wrote: > Hi Everyone, > I'm looking for some insight on what the best way is to configure mount > monitoring for a cloned database resource. > Consider the resource model below: > * Clone Set: database_1-clone [database_1] (promotable): > * Masters: [

  1   2   3   4   5   6   7   >