[ClusterLabs] pacemaker-remote

2023-09-14 Thread Mr.R via Users
Hi all??
   
In Pacemaker-Remote 2.1.6, the pacemaker package is required
for guest nodes and not for remote nodes. Why is that? What does 
pacemaker do?
After adding guest node, pacemaker package does not seem to be 
needed. Can I not install it here?

After testing, remote nodes can be offline, but guest nodes cannot
 be offline. Is there any way to get them offline? Are there relevant 
failure test cases?


thanks,___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] PostgreSQL HA on EL9

2023-09-14 Thread Oyvind Albrigtsen

If you're using network filesystems with the Filesystem agent this
patch might solve your issue:
https://github.com/ClusterLabs/resource-agents/pull/1869


Oyvind

On 13/09/23 17:56 +, Larry G. Mills via Users wrote:


On my RHEL 9 test cluster, both "reboot" and "systemctl reboot" wait
for the cluster to stop everything.

I think in some environments "reboot" is equivalent to "systemctl
reboot --force" (kill all processes immediately), so maybe see if
"systemctl reboot" is better.

>
> On EL7, this scenario caused the cluster to shut itself down on the
> node before the OS shutdown completed, and the DB resource was
> stopped/shutdown before the OS stopped.  On EL9, this is not the
> case, the DB resource is not stopped before the OS shutdown
> completes.  This leads to errors being thrown when the cluster is
> started back up on the rebooted node similar to the following:
>


Ken,

Thanks for the reply - and that's interesting that RHEL9 behaves as expected and AL9 seemingly doesn't.   I 
did try shutting down via "systemctl reboot", but the cluster and resources were still not stopped 
cleanly before the OS stopped.  In fact, the commands "shutdown" and "reboot" are just 
symlinks to systemctl on AL9.2, so that make sense why the behavior is the same.

Just as a point of reference, my systemd version is: systemd.x86_64 
252-14.el9_2.3

Larry
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] MySQL cluster with auto failover

2023-09-14 Thread Damiano Giuliani
i fired up a multimaster galera cluster.
i would now add it into pacemaker to let it mange and monitor the status.
reading the galera resource documentation seems is only support master
slave replica, is it true?
also mysql resource seems to support master slave replica and externally
managed replica (MM galera?) using clone.
my question is how mysql agent can figure out who is the most advanced node
in case of shutdown of the entire cluster.

i wonder if it should be better move to a master slave replica
configuration.
seems there is not enough documentation about how to build a mysql cluster
using pacemaker on the web :/


Il giorno mar 12 set 2023 alle ore 10:28 Damiano Giuliani <
damianogiulian...@gmail.com> ha scritto:

> thanks Ken,
>
> could you point me in th right direction for a guide or some already
> working configuration?
>
> Thanks
>
> Damiano
>
> Il giorno lun 11 set 2023 alle ore 16:26 Ken Gaillot 
> ha scritto:
>
>> On Thu, 2023-09-07 at 10:27 +0100, Antony Stone wrote:
>> > On Wednesday 06 September 2023 at 17:01:24, Damiano Giuliani wrote:
>> >
>> > > Everything is clear now.
>> > > So the point is to use pacemaker and create the floating vip and
>> > > bind it to
>> > > sqlproxy to health check and route the traffic to the available and
>> > > healthy
>> > > galera nodes.
>> >
>> > Good summary.
>> >
>> > > It could be useful let pacemaker manage also galera services?
>> >
>> > No; MySQL / Galera needs to be running on all nodes all the
>> > time.  Pacemaker
>> > is for managing resources which move between nodes.
>>
>> It's still helpful to configure galera as a clone in the cluster. That
>> way, Pacemaker can monitor it and restart it on errors, it will respect
>> things like maintenance mode and standby, and it can be used in
>> ordering constraints with other resources, as well as advanced features
>> such as node utilization.
>>
>> >
>> > If you want something that ensures processes are running on
>> > machines,
>> > irrespective of where the floating IP is, look at monit - it's very
>> > simple,
>> > easy to configure and knows how to manage resources which should run
>> > all the
>> > time.
>> >
>> > > Do you have any guide that pack this everything together?
>> >
>> > No; I've largely made this stuff up myself as I've needed it.
>> >
>> >
>> > Antony.
>> >
>> --
>> Ken Gaillot 
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] PostgreSQL HA on EL9

2023-09-14 Thread Larry G. Mills via Users
I found my issue with reboots - and it wasn't pacemaker-related at all.  My EL9 
test system was different from the EL7 system in that it hosted the DB on a 
iSCSI-attached array.  During OS shutdown, the array was being unmounted 
concurrently with pacemaker shutdown, so it was not able to cleanly shut down 
the pgsql resource. I added a systemd override to make corosync dependent 
upon, and require, "remote-fs.target".   Everything shuts down cleanly now, as 
expected.

Thanks for the suggestions,

Larry

> -Original Message-
> From: Users  On Behalf Of Oyvind Albrigtsen
> Sent: Thursday, September 14, 2023 5:43 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> 
> Subject: Re: [ClusterLabs] PostgreSQL HA on EL9
> 
> If you're using network filesystems with the Filesystem agent this
> patch might solve your issue:
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_ClusterLabs_resource-
> 2Dagents_pull_1869&d=DwICAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=-
> 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE&m=VO4147YbENDjp3d
> xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD&s=vEhk79BWO
> NaF8zrTI3oGbq7xqEYdQUICm-2H3Wal0J8&e=
> 
> 
> Oyvind
> 
> On 13/09/23 17:56 +, Larry G. Mills via Users wrote:
> >>
> >> On my RHEL 9 test cluster, both "reboot" and "systemctl reboot" wait
> >> for the cluster to stop everything.
> >>
> >> I think in some environments "reboot" is equivalent to "systemctl
> >> reboot --force" (kill all processes immediately), so maybe see if
> >> "systemctl reboot" is better.
> >>
> >> >
> >> > On EL7, this scenario caused the cluster to shut itself down on the
> >> > node before the OS shutdown completed, and the DB resource was
> >> > stopped/shutdown before the OS stopped.  On EL9, this is not the
> >> > case, the DB resource is not stopped before the OS shutdown
> >> > completes.  This leads to errors being thrown when the cluster is
> >> > started back up on the rebooted node similar to the following:
> >> >
> >
> >Ken,
> >
> >Thanks for the reply - and that's interesting that RHEL9 behaves as expected
> and AL9 seemingly doesn't.   I did try shutting down via "systemctl reboot",
> but the cluster and resources were still not stopped cleanly before the OS
> stopped.  In fact, the commands "shutdown" and "reboot" are just symlinks
> to systemctl on AL9.2, so that make sense why the behavior is the same.
> >
> >Just as a point of reference, my systemd version is: systemd.x86_64
> 252-14.el9_2.3
> >
> >Larry
> >___
> >Manage your subscription:
> >https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lists.clusterlabs.org_mailman_listinfo_users&d=DwICAg&c=gRgGjJ3BkIsb
> 5y6s49QqsA&r=-
> 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE&m=VO4147YbENDjp3d
> xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD&s=2Rx_74MVv
> kAWfZLyMhZw5GCY_37uyRffB2HV4_zkvOY&e=
> >
> >ClusterLabs home: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__www.clusterlabs.org_&d=DwICAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=-
> 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE&m=VO4147YbENDjp3d
> xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD&s=lofFF14IrTG
> 21epUbKbV0oUl-IrXZDSuNcaM1GM7FvU&e=
> >
> 
> ___
> Manage your subscription:
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lists.clusterlabs.org_mailman_listinfo_users&d=DwICAg&c=gRgGjJ3BkIsb
> 5y6s49QqsA&r=-
> 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE&m=VO4147YbENDjp3d
> xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD&s=2Rx_74MVv
> kAWfZLyMhZw5GCY_37uyRffB2HV4_zkvOY&e=
> 
> ClusterLabs home: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__www.clusterlabs.org_&d=DwICAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=-
> 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE&m=VO4147YbENDjp3d
> xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD&s=lofFF14IrTG
> 21epUbKbV0oUl-IrXZDSuNcaM1GM7FvU&e=
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Centreon 2 Node HA cluster

2023-09-14 Thread Adil Bouazzaoui
Hi Jan, Any update please? Sent from my Huawei phone Original message From: Adil Bouazzaoui Date: Mon, Sep 4, 2023, 21:28To: users@clusterlabs.org, jfrie...@redhat.comCc: Adil BOUAZZAOUI Subject: Re: Users Digest, Vol 104, Issue 5Hi Jan,to add more information, we deployed Centreon 2 Node HA Cluster (Master in DC 1 & Slave in DC 2), quorum device which is responsible for split-brain is on DC 1 too, and the poller which is responsible for monitoring is i DC 1 too. The problem is that a VIP address is required (attached to Master node, in case of failover it will be moved to Slave) and we don't know what VIP we should use? also we don't know what is the perfect setup for our current scenario so if DC 1 goes down then the Slave on DC 2 will be the Master, that's why we don't know where to place the Quorum device and the poller?i hope to get some ideas so we can setup this cluster correctly.thanks in advance.Adil BouazzaouiIT Infrastructure engineeradil.bouazza...@tmandis.maadilb...@gmail.comLe lun. 4 sept. 2023 à 15:24,  a écrit :Send Users mailing list submissions to
        users@clusterlabs.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.clusterlabs.org/mailman/listinfo/users
or, via email, send a message with subject or body 'help' to
        users-requ...@clusterlabs.org

You can reach the person managing the list at
        users-ow...@clusterlabs.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Users digest..."


Today's Topics:

   1. Re: issue during Pacemaker failover testing (Klaus Wenninger)
   2. Re: issue during Pacemaker failover testing (Klaus Wenninger)
   3. Re: issue during Pacemaker failover testing (David Dolan)
   4. Re: Centreon HA Cluster - VIP issue (Jan Friesse)


--

Message: 1
Date: Mon, 4 Sep 2023 14:15:52 +0200
From: Klaus Wenninger 
To: Cluster Labs - All topics related to open-source clustering
        welcomed 
Cc: David Dolan 
Subject: Re: [ClusterLabs] issue during Pacemaker failover testing
Message-ID:
        wody...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

On Mon, Sep 4, 2023 at 1:44?PM Andrei Borzenkov  wrote:

> On Mon, Sep 4, 2023 at 2:25?PM Klaus Wenninger 
> wrote:
> >
> >
> > Or go for qdevice with LMS where I would expect it to be able to really
> go down to
> > a single node left - any of the 2 last ones - as there is still qdevice.#
> > Sry for the confusion btw.
> >
>
> According to documentation, "LMS is also incompatible with quorum
> devices, if last_man_standing is specified in corosync.conf then the
> quorum device will be disabled".
>

That is why I said qdevice with LMS - but it was probably not explicit
enough without telling that I meant the qdevice algorithm and not
the corosync flag.

Klaus

> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-- next part --
An HTML attachment was scrubbed...
URL: 

--

Message: 2
Date: Mon, 4 Sep 2023 14:32:39 +0200
From: Klaus Wenninger 
To: Cluster Labs - All topics related to open-source clustering
        welcomed 
Cc: David Dolan 
Subject: Re: [ClusterLabs] issue during Pacemaker failover testing
Message-ID:
        
Content-Type: text/plain; charset="utf-8"

On Mon, Sep 4, 2023 at 1:50?PM Andrei Borzenkov  wrote:

> On Mon, Sep 4, 2023 at 2:18?PM Klaus Wenninger 
> wrote:
> >
> >
> >
> > On Mon, Sep 4, 2023 at 12:45?PM David Dolan 
> wrote:
> >>
> >> Hi Klaus,
> >>
> >> With default quorum options I've performed the following on my 3 node
> cluster
> >>
> >> Bring down cluster services on one node - the running services migrate
> to another node
> >> Wait 3 minutes
> >> Bring down cluster services on one of the two remaining nodes - the
> surviving node in the cluster is then fenced
> >>
> >> Instead of the surviving node being fenced, I hoped that the services
> would migrate and run on that remaining node.
> >>
> >> Just looking for confirmation that my understanding is ok and if I'm
> missing something?
> >
> >
> > As said I've never used it ...
> > Well when down to 2 nodes LMS per definition is getting into trouble as
> after another
> > outage any of them is gonna be alone. In case of an ordered shutdo