Re: [ClusterLabs] pacemaker-remote

2023-09-18 Thread Ken Gaillot
On Thu, 2023-09-14 at 18:28 +0800, Mr.R via Users wrote:
> Hi all,
>
> In Pacemaker-Remote 2.1.6, the pacemaker package is required
> for guest nodes and not for remote nodes. Why is that? What does 
> pacemaker do?
> After adding guest node, pacemaker package does not seem to be 
> needed. Can I not install it here?

I'm not sure what's requiring it in your environment. There's no
dependency in the upstream RPM at least.

The pacemaker package does have the crm_master script needed by some
resource agents, so you will need it if you use any of those. (That
script should have been moved to the pacemaker-cli package in 2.1.3,
oops ...)

> After testing, remote nodes can be offline, but guest nodes cannot
>  be offline. Is there any way to get them offline? Are there
> relevant 
> failure test cases?
> 
> thanks,

To make a guest node offline, stop the resource that creates it.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Limit the number of resources starting/stoping in parallel possible?

2023-09-18 Thread Ken Gaillot
On Mon, 2023-09-18 at 14:24 +, Knauf Steffen wrote:
> Hi,
> 
> we have multiple Cluster (2 node + quorum setup) with more then 100
> Resources ( 10 x VIP + 90 Microservices) per Node.  
> If the Resources are stopped/started at the same time the Server is
> under heavy load, which may result into timeouts and an unresponsive
> server. 
> We configured some Ordering Constraints (VIP --> Microservice). Is
> there a way to limit the number of resources starting/stoping in
> parallel?
> Perhaps you have some other tips to handle such a situation.
> 
> Thanks & greets
> 
> Steffen
> 

Hi,

Yes, see the batch-limit cluster option:

https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/html/options.html#cluster-options

-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Limit the number of resources starting/stoping in parallel possible?

2023-09-18 Thread Antony Stone
On Monday 18 September 2023 at 16:24:02, Knauf Steffen wrote:

> Hi,
> 
> we have multiple Cluster (2 node + quorum setup) with more then 100
> Resources ( 10 x VIP + 90 Microservices) per Node. If the Resources are
> stopped/started at the same time the Server is under heavy load, which may
> result into timeouts and an unresponsive server. We configured some
> Ordering Constraints (VIP --> Microservice). Is there a way to limit the
> number of resources starting/stoping in parallel? Perhaps you have some
> other tips to handle such a situation.

Do all the services actually need to be stopped when a VIP is moved away from 
a node (and started again when the VIP is replaced)?

I've found that in many cases keeping a service running all the time (for 
example with monit) and simply moving the VIP between nodes to control which 
services get any requests from remote clients is sufficient to provide High 
Availability / Load Balancing.

Antony.

-- 
"The future is already here.   It's just not evenly distributed yet."

 - William Gibson

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Limit the number of resources starting/stoping in parallel possible?

2023-09-18 Thread Knauf Steffen
Hi,

we have multiple Cluster (2 node + quorum setup) with more then 100 Resources ( 
10 x VIP + 90 Microservices) per Node.
If the Resources are stopped/started at the same time the Server is under heavy 
load, which may result into timeouts and an unresponsive server.
We configured some Ordering Constraints (VIP --> Microservice). Is there a way 
to limit the number of resources starting/stoping in parallel?
Perhaps you have some other tips to handle such a situation.

Thanks & greets

Steffen
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] [EXTERNE] Re: Centreon HA Cluster - VIP issue

2023-09-18 Thread Ken Gaillot
On Fri, 2023-09-15 at 09:32 +, Adil BOUAZZAOUI wrote:
> Hi Ken,
> 
> Any update please?
> 
> The idea is clear; I just need to know more information about this 2
> clusters setup:
> 
> 1. Arbitrator:
> 1.1. Only one arbitrator is needed for everything: should I use the
> Quorum provided by Centreon on the official documentation? Or should
> I use the booth ticket manager instead?

I would use booth for distributed data centers. The Centreon setup is
appropriate for a cluster within a single data center or data centers
on the same campus with a low-latency link.

> 1.2. is fencing configured separately? Or is is configured during the
> booth ticket manager installation?

You'll have to configure fencing in each cluster separately.

> 
> 2. Floating IP:
> 2.1. it doesn't hurt if both Floating IPs are running at the same
> time right?

Correct.

> 
> 3. Fail over:
> 3.1. How to update the DNS to point to the appropriate IP?
> 3.2. we're running our own DNS servers; so How to configure booth
> ticket for just the DNS resource?

You can have more than one ticket. On the Pacemaker side, tickets are
tied to resources with rsc_ticket constraints (though you'll probably
be using a higher-level tool that abstracts that).

How to update the DNS depends on what server you're using -- just
follow its documentation for making changes. You can use the
ocf:pacemaker:Dummy agent as a model and update start to make the DNS
change (in addition to creating the dummy state file). The monitor can
check whether the dummy state file is present and DNS is returning the
desired info. Stop would just remove the dummy state file.

> 4. MariaDB replication:
> 4.1. How can Centreon MariaDB replicat between the 2 clusters?

Native MySQL replication should work fine for that.

> 5. Centreon:
> 5.1. Will this setup (2 clusters, 2 floating IPs, 1 booth manager)
> work for our Centreon project? 

I don't have any experience with that, but it sounds fine.

> 
> 
> 
> Regards
> Adil Bouazzaoui
> 
> 
> Adil BOUAZZAOUI
> Ingénieur Infrastructures & Technologies
> GSM : +212 703 165 758
> E-mail  : adil.bouazza...@tmandis.ma
> 
> 
> -Message d'origine-
> De : Adil BOUAZZAOUI 
> Envoyé : Friday, September 8, 2023 5:15 PM
> À : Ken Gaillot ; Adil Bouazzaoui <
> adilb...@gmail.com>
> Cc : Cluster Labs - All topics related to open-source clustering
> welcomed 
> Objet : RE: [EXTERNE] Re: [ClusterLabs] Centreon HA Cluster - VIP
> issue
> 
> Hi Ken,
> 
> Thank you for the update and the clarification.
> The idea is clear; I just need to know more information about this 2
> clusters setup:
> 
> 1. Arbitrator:
> 1.1. Only one arbitrator is needed for everything: should I use the
> Quorum provided by Centreon on the official documentation? Or should
> I use the booth ticket manager instead?
> 1.2. is fencing configured separately? Or is is configured during the
> booth ticket manager installation?
> 
> 2. Floating IP:
> 2.1. it doesn't hurt if both Floating IPs are running at the same
> time right?
> 
> 3. Fail over:
> 3.1. How to update the DNS to point to the appropriate IP?
> 3.2. we're running our own DNS servers; so How to configure booth
> ticket for just the DNS resource?
> 
> 4. MariaDB replication:
> 4.1. How can Centreon MariaDB replicat between the 2 clusters?
> 
> 5. Centreon:
> 5.1. Will this setup (2 clusters, 2 floating IPs, 1 booth manager)
> work for our Centreon project? 
> 
> 
> 
> Regards
> Adil Bouazzaoui
> 
> 
> Adil BOUAZZAOUI
> Ingénieur Infrastructures & Technologies GSM : +212 703 165
> 758 E-mail  : adil.bouazza...@tmandis.ma
> 
> 
> -Message d'origine-
> De : Ken Gaillot [mailto:kgail...@redhat.com] Envoyé : Tuesday,
> September 5, 2023 10:00 PM À : Adil Bouazzaoui 
> Cc : Cluster Labs - All topics related to open-source clustering
> welcomed ; Adil BOUAZZAOUI <
> adil.bouazza...@tmandis.ma> Objet : [EXTERNE] Re: [ClusterLabs]
> Centreon HA Cluster - VIP issue
> 
> On Tue, 2023-09-05 at 21:13 +0100, Adil Bouazzaoui wrote:
> > Hi Ken,
> > 
> > thank you a big time for the feedback; much appreciated.
> > 
> > I suppose we go with a new Scenario 3: Setup 2 Clusters across 
> > different DCs connected by booth; so could you please clarify
> > below 
> > points to me so i can understand better and start working on the
> > architecture:
> > 
> > 1- in case of separate clusters connected by booth: should each 
> > cluster have a quorum device for the Master/slave elections?
> 
> Hi,
> 
> Only one arbitrator is needed for everything.
> 
> Since each cluster in this case has two nodes, Corosync will use the
> "two_node" configuration to determine quorum. When first starting the
> cluster, both nodes must come up before quorum is obtained. After
> then, only one node is required to keep quorum -- which means that
> fencing is essential to prevent split-brain.
> 
> > 2- separate floating IPs at each cluster: please check the
> > attached 
> > diagram and let me know if this is exactly what you mean?
> 

Re: [ClusterLabs] PostgreSQL HA on EL9

2023-09-18 Thread Ken Gaillot
Ah, good catch. FYI, we created a hook for situations like this a while
back: resource-agents-deps.target. Which reminds me we really need to
document it ...

To use it, put a drop-in unit under /etc/systemd/system/resource-
agents-deps.target.d/ (any name ending in .conf) with:

  [Unit]
  Requires=
  After=

Pacemaker is ordered after resource-agents-deps, so you can use it to
start any non-clustered depedencies.

On Thu, 2023-09-14 at 15:43 +, Larry G. Mills via Users wrote:
> I found my issue with reboots - and it wasn't pacemaker-related at
> all.  My EL9 test system was different from the EL7 system in that it
> hosted the DB on a iSCSI-attached array.  During OS shutdown, the
> array was being unmounted concurrently with pacemaker shutdown, so it
> was not able to cleanly shut down the pgsql resource. I added a
> systemd override to make corosync dependent upon, and require,
> "remote-fs.target".   Everything shuts down cleanly now, as expected.
> 
> Thanks for the suggestions,
> 
> Larry
> 
> > -Original Message-
> > From: Users  On Behalf Of Oyvind
> > Albrigtsen
> > Sent: Thursday, September 14, 2023 5:43 AM
> > To: Cluster Labs - All topics related to open-source clustering
> > welcomed
> > 
> > Subject: Re: [ClusterLabs] PostgreSQL HA on EL9
> > 
> > If you're using network filesystems with the Filesystem agent this
> > patch might solve your issue:
> > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__github.com_ClusterLabs_resource-
> > 2Dagents_pull_1869=DwICAg=gRgGjJ3BkIsb5y6s49QqsA=-
> > 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE=VO4147YbENDjp3d
> > xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD=vEhk79BWO
> > NaF8zrTI3oGbq7xqEYdQUICm-2H3Wal0J8=
> > 
> > 
> > Oyvind
> > 
> > On 13/09/23 17:56 +, Larry G. Mills via Users wrote:
> > > > On my RHEL 9 test cluster, both "reboot" and "systemctl reboot"
> > > > wait
> > > > for the cluster to stop everything.
> > > > 
> > > > I think in some environments "reboot" is equivalent to
> > > > "systemctl
> > > > reboot --force" (kill all processes immediately), so maybe see
> > > > if
> > > > "systemctl reboot" is better.
> > > > 
> > > > > On EL7, this scenario caused the cluster to shut itself down
> > > > > on the
> > > > > node before the OS shutdown completed, and the DB resource
> > > > > was
> > > > > stopped/shutdown before the OS stopped.  On EL9, this is not
> > > > > the
> > > > > case, the DB resource is not stopped before the OS shutdown
> > > > > completes.  This leads to errors being thrown when the
> > > > > cluster is
> > > > > started back up on the rebooted node similar to the
> > > > > following:
> > > > > 
> > > 
> > > Ken,
> > > 
> > > Thanks for the reply - and that's interesting that RHEL9 behaves
> > > as expected
> > and AL9 seemingly doesn't.   I did try shutting down via "systemctl
> > reboot",
> > but the cluster and resources were still not stopped cleanly before
> > the OS
> > stopped.  In fact, the commands "shutdown" and "reboot" are just
> > symlinks
> > to systemctl on AL9.2, so that make sense why the behavior is the
> > same.
> > > Just as a point of reference, my systemd version is:
> > > systemd.x86_64
> > 252-14.el9_2.3
> > > Larry
> > > ___
> > > Manage your subscription:
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__lists.clusterlabs.org_mailman_listinfo_users=DwICAg=gRgGjJ3
> > BkIsb
> > 5y6s49QqsA=-
> > 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE=VO4147YbENDjp3d
> > xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD=2Rx_74MVv
> > kAWfZLyMhZw5GCY_37uyRffB2HV4_zkvOY=
> > > ClusterLabs home: 
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__www.clusterlabs.org_=DwICAg=gRgGjJ3BkIsb5y6s49QqsA=-
> > 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE=VO4147YbENDjp3d
> > xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD=lofFF14IrTG
> > 21epUbKbV0oUl-IrXZDSuNcaM1GM7FvU=
> > 
> > ___
> > Manage your subscription:
> > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__lists.clusterlabs.org_mailman_listinfo_users=DwICAg=gRgGjJ3
> > BkIsb
> > 5y6s49QqsA=-
> > 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE=VO4147YbENDjp3d
> > xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD=2Rx_74MVv
> > kAWfZLyMhZw5GCY_37uyRffB2HV4_zkvOY=
> > 
> > ClusterLabs home: https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__www.clusterlabs.org_=DwICAg=gRgGjJ3BkIsb5y6s49QqsA=-
> > 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE=VO4147YbENDjp3d
> > xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD=lofFF14IrTG
> > 21epUbKbV0oUl-IrXZDSuNcaM1GM7FvU=
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: