from:"Ken Gaillot"

[ClusterLabs] Pacemaker 2.1.9 released

2024-10-31 Thread Ken Gaillot

Hi all,

The final release of Pacemaker 2.1.9 is now available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.9

This is primarily a bug fix release, to provide a clean separation
point for the upcoming 3.0.0 release. See the link above for more
details.

Many thanks to all contributors to this release, including Aleksei
Burlakov, Chris Lumens, Hideo Yamauchi, Ken Gaillot, and Reid Wahl.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] ClusterLabs website overhauled

2024-10-30 Thread Ken Gaillot

Hi all,

Today, we unveiled the new ClusterLabs website design:

 https://clusterlabs.org/

The old site had a lot of outdated info as well as broken CSS after
multiple OS upgrades, and the Jekyll-based source for site generation
was difficult to maintain. The new site is Hugo-based with a much
simpler theme and design.

It's not the most beautiful or modern site in the world, but it's
simple and clean. If someone wants to pitch in and improve it, you can
see the source and submit pull requests at:

 https://github.com/ClusterLabs/clusterlabs-www

Enjoy!
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Pacemaker 2.1.9-rc3 released

2024-10-21 Thread Ken Gaillot

Hi all,

The third (and likely final) release candidate for Pacemaker 2.1.9 is
now available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.9-rc3

I decided to squeeze in a couple more minor fixes. For details, see the
above link.

Everyone is encouraged to download, compile and test the new release.
We do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

If no one reports any issues with this candidate, it will likely become
the final release around the end of the month.

Many thanks to all contributors to this release, including Aleksei
Burlakov and Ken Gaillot.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] pacemaker & rsyslog

2024-10-21 Thread Ken Gaillot

On Thu, 2024-10-17 at 11:19 +, 管理者 via Users wrote:
> Nice to meet you. Thank you for your help.
> 
> I would like your opinion on setting up rsyslog on a spacemaker
> resource 
> and giving it VIP.
> I am aware that if rsyslog is clustered with pacemaker, rsyslog will
> be 
> active standby, so the logs will not be output on the standby
> machine.

Hi,

Welcome to the community.

What is your use case for rsyslog? Do you want local logging on each
node, or should one node be an aggregator for all the other nodes, or
is this cluster providing a rsyslog aggregator for hosts elsewhere?

> Is there any way to make only certain logs active standby?
> If so, should we consider other means instead of using pacemaker?
> 
> 
> -- 
> ＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/
> 李沢 誠二 (Seiji Sumomozawa)
> TEL：080-5099-4247
> Mail：s-sumomoz...@sumomozawa.com
> ＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/＿/
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Interleaving clones with different number of instances per node

2024-10-17 Thread Ken Gaillot

On Thu, 2024-10-17 at 17:51 +0200, Jochen wrote:
> Thanks very much for the info and the help.
> 
> For my problem I have decided to use systemd and its dependency
> management to create an "instance" service for each clone instance,
> that then starts the "main" service on a node if any instance service
> is started. Hopefully works without any coding required...
> 
> I still might have a try at extending the attribute resource. One
> question though: Is incrementing and decrementing the count robust
> enough? I would think a solution that actually counts the running
> instances each time so we don't get any drift would be preferable.
> But what would be the best way for the agent to get this count?

Incrementing and decrementing should be sufficient in general. If an
instance crashes without decrementing the count, Pacemaker will stop it
as part of recovery.

The main opportunity for trouble would be an instance started outside
Pacemaker control. Pacemaker would detect it and either stop it
(decrementing when we shouldn't) or leave it alone (not incrementing
when we should).

To count each time instead, probably the best way would be to look for
state files with instance numbers.

> 
> > On 17. Oct 2024, at 16:50, Ken Gaillot  wrote:
> > 
> > On Thu, 2024-10-17 at 16:34 +0200, Jochen wrote:
> > > Thanks for the help!
> > > 
> > > Before I break out my editor and start writing custom resource
> > > agents, one question: Is there a way to use a cloned
> > > ocf:pacemaker:attribute resource to set a clone-specific
> > > attribute on
> > > a node? I.e. attribute "started-0=1" and "started-1=1", depending
> > > on
> > > the clone ID? For this I would need e.g. a rule to configure a
> > > clone
> > > specific resource parameter, or is there something like variable
> > > substitution in resource parameters?
> > 
> > No, ocf:pacemaker:attribute won't work properly as a unique clone.
> > If
> > one instance is started and another stopped, it will get the status
> > of
> > one of them wrong.
> > 
> > I noticed that yesterday and came up with an idea for a general
> > solution if you feel like tackling it:
> > 
> > https://projects.clusterlabs.org/T899
> > 
> > > > On 16. Oct 2024, at 16:22, Ken Gaillot 
> > > > wrote:
> > > > 
> > > > On Mon, 2024-10-14 at 18:49 +0200, Jochen wrote:
> > > > > Hi, I have two cloned resources in my cluster that have the
> > > > > following
> > > > > properties:
> > > > > 
> > > > > * There are a maximum of two instances of R1 in the cluster,
> > > > > with
> > > > > a
> > > > > maximum of two per node
> > > > > * When any instance of R1 is started on a node, exactly one
> > > > > instance
> > > > > of R2 should run on that node
> > > > > 
> > > > > When I configure this, and verify the configuration with
> > > > > "crm_verify
> > > > > -LV", I get the following error:
> > > > > 
> > > > > clone_rsc_colocation_rh)  error: Cannot interleave R2-
> > > > > clone and
> > > > > R1-clone because they do not support the same number of
> > > > > instances
> > > > > per
> > > > > node
> > > > > 
> > > > > How can I make this work? Any help would be greatly
> > > > > appreciated.
> > > > 
> > > > Hi,
> > > > 
> > > > I believe the number of instances has to be the same because
> > > > each
> > > > instance pair on a single node is interleaved.
> > > > 
> > > > There's no direct way to configure what you want, but it might
> > > > be
> > > > possible with a custom OCF agent for R1 and attribute-based
> > > > rules.
> > > > 
> > > > On start, the R1 agent could set a custom node attribute to
> > > > some
> > > > value.
> > > > On stop, it could check whether any other instances are active
> > > > (assuming that's possible), and if not, clear the attribute.
> > > > Then,
> > > > R2
> > > > could have a location rule enabling it only on nodes where the
> > > > attribute has the desired value.
> > > > 
> > > > R2 wouldn't stop until *after* the last instance of R1 stops,
> > > > whi

Re: [ClusterLabs] Interleaving clones with different number of instances per node

2024-10-17 Thread Ken Gaillot

On Thu, 2024-10-17 at 16:34 +0200, Jochen wrote:
> Thanks for the help!
> 
> Before I break out my editor and start writing custom resource
> agents, one question: Is there a way to use a cloned
> ocf:pacemaker:attribute resource to set a clone-specific attribute on
> a node? I.e. attribute "started-0=1" and "started-1=1", depending on
> the clone ID? For this I would need e.g. a rule to configure a clone
> specific resource parameter, or is there something like variable
> substitution in resource parameters?

No, ocf:pacemaker:attribute won't work properly as a unique clone. If
one instance is started and another stopped, it will get the status of
one of them wrong.

I noticed that yesterday and came up with an idea for a general
solution if you feel like tackling it:

 https://projects.clusterlabs.org/T899

> 
> > On 16. Oct 2024, at 16:22, Ken Gaillot  wrote:
> > 
> > On Mon, 2024-10-14 at 18:49 +0200, Jochen wrote:
> > > Hi, I have two cloned resources in my cluster that have the
> > > following
> > > properties:
> > > 
> > > * There are a maximum of two instances of R1 in the cluster, with
> > > a
> > > maximum of two per node
> > > * When any instance of R1 is started on a node, exactly one
> > > instance
> > > of R2 should run on that node
> > > 
> > > When I configure this, and verify the configuration with
> > > "crm_verify
> > > -LV", I get the following error:
> > > 
> > > clone_rsc_colocation_rh)  error: Cannot interleave R2-clone and
> > > R1-clone because they do not support the same number of instances
> > > per
> > > node
> > > 
> > > How can I make this work? Any help would be greatly appreciated.
> > 
> > Hi,
> > 
> > I believe the number of instances has to be the same because each
> > instance pair on a single node is interleaved.
> > 
> > There's no direct way to configure what you want, but it might be
> > possible with a custom OCF agent for R1 and attribute-based rules.
> > 
> > On start, the R1 agent could set a custom node attribute to some
> > value.
> > On stop, it could check whether any other instances are active
> > (assuming that's possible), and if not, clear the attribute. Then,
> > R2
> > could have a location rule enabling it only on nodes where the
> > attribute has the desired value.
> > 
> > R2 wouldn't stop until *after* the last instance of R1 stops, which
> > could be a problem depending on the particulars of the service.
> > There
> > might also be a race condition if two instances are stopping at the
> > same time, so it might be worthwhile to set ordered=true on the
> > clone.
> > 
> > > 
> > > Current configuration is as follows:
> > > 
> > > 
> > >  
> > >
> > >  
> > >  
> > >  
> > >  
> > >  
> > >  
> > >  
> > >   > > name="target-
> > > role" value="Stopped"/>
> > >
> > >  
> > >  
> > >
> > >  
> > >  
> > >  
> > >  
> > >   > > name="target-
> > > role" value="Stopped"/>
> > >
> > >  
> > > 
> > > 
> > > 
> > >   > > then="R2-clone"/>
> > >   > > rsc="R2-
> > > clone" with-rsc="R1-clone"/>
> > > 
> > > 
> > > 
> > -- 
> > Ken Gaillot 
> > 
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Interleaving clones with different number of instances per node

2024-10-16 Thread Ken Gaillot

On Mon, 2024-10-14 at 18:49 +0200, Jochen wrote:
> Hi, I have two cloned resources in my cluster that have the following
> properties:
> 
> * There are a maximum of two instances of R1 in the cluster, with a
> maximum of two per node
> * When any instance of R1 is started on a node, exactly one instance
> of R2 should run on that node
> 
> When I configure this, and verify the configuration with "crm_verify
> -LV", I get the following error:
> 
> clone_rsc_colocation_rh)  error: Cannot interleave R2-clone and
> R1-clone because they do not support the same number of instances per
> node
> 
> How can I make this work? Any help would be greatly appreciated.

Hi,

I believe the number of instances has to be the same because each
instance pair on a single node is interleaved.

There's no direct way to configure what you want, but it might be
possible with a custom OCF agent for R1 and attribute-based rules.

On start, the R1 agent could set a custom node attribute to some value.
On stop, it could check whether any other instances are active
(assuming that's possible), and if not, clear the attribute. Then, R2
could have a location rule enabling it only on nodes where the
attribute has the desired value.

R2 wouldn't stop until *after* the last instance of R1 stops, which
could be a problem depending on the particulars of the service. There
might also be a race condition if two instances are stopping at the
same time, so it might be worthwhile to set ordered=true on the clone.

> 
> 
> Current configuration is as follows:
> 
> 
>   
> 
>   
>   
>   
>   
>   
>   
>   
>   
> 
>   
>   
> 
>   
>   
>   
>   
>   
> 
>   
> 
> 
> 
>then="R2-clone"/>
>   
> 
> 
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Pacemaker 2.1.9-rc2 released

2024-10-15 Thread Ken Gaillot

Hi all,

The second (and possibly final) release candidate for Pacemaker 2.1.9
is available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.9-rc2

This adds a few more bug fixes. For details, see the above link.

Everyone is encouraged to download, compile and test the new release.
We do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

If no one reports any issues with this candidate, it will likely become
the final release in a couple of weeks.

Many thanks to all contributors to this release, including Chris
Lumens, Ken Gaillot, and Reid Wahl.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Pacemaker 2.1.9-rc1 released

2024-10-03 Thread Ken Gaillot

Hi all,

The first release candidate for Pacemaker 2.1.9 is available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.9-rc1

This is primarily a bug-fix release to give a clean separation point
with the upcoming 3.0.0 release. It also supports the ability to build
with the latest version of libxml2, and introduces no-quorum-
policy="fence" as a synonym for the newly deprecated "suicide".

For details, see the above link.

Many thanks to all contributors to this release, including Chris
Lumens, Hideo Yamauchi, Ken Gaillot, and Reid Wahl.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Exception in adding resources to pacemaker

2024-09-26 Thread Ken Gaillot

> > > > > 
> > > > > The `pcs resource create` should error in case resource
> > > > > cannot be
> > > > > created.
> > > > > 
> > > > > How do you run pcs commands? Manually or by some script?
> > > > > Could you
> > > > > provide more information like:
> > > > > * version of pcs
> > > > > * example of pcs commands used
> > > > > * cluster configuration (CIB, output of pcs config...)
> > > > > > Could you please shed some light on what might have caused
> > > > > > this 
> > > > > > phenomenon and whether there exists a known limitation with
> > > > > > respect to 
> > > > > > the total number of resources or resource groups that can
> > > > > > be 
> > > > > > effectively managed within a pacemaker cluster of this
> > > > > > size?
> > > > > > 
> > > > > > Looking forward to your reply.
> > > > > > 
> > > > > > 
> > > > > > ___
> > > > > > Manage your subscription:
> > > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > > > 
> > > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > > Regards,
> > > > > Miroslav
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Exception in adding resources to pacemaker

2024-09-24 Thread Ken Gaillot

Hi,

There is no hard limit on the number of resources. Can you share the
CIB that had the problem, and the operation that failed?

On Tue, 2024-09-24 at 15:34 +0800, 毕晓妍 wrote:
> 你好：
> 我想请问一下pacemaker集群中添加的资源有数量上限吗？我在4节点集群中添加资源， 一组5个资源，
> 添加104组资源时，使用pcs命令添加没有任何报错，但是对组资源进行操作时显示unable to find a
> resource/clone/group。请问是什么原因导致的这一现象？
> Hello,
> I would like to inquire about the potential limit on the quantity of
> resources that can be added to a pacemaker cluster. Specifically,
> I've been working with a 4-node cluster configuration, initially
> adding a single group comprising 5 resources. However, upon
> attempting to augment the cluster by adding 104 additional resource
> groups using the pcs command, I encountered an unexpected behavior:
> while the command execution itself did not report any errors,
> subsequent operations targeting those groups resulted in a message
> stating 'unable to find a resource/clone/group'.
> Could you please shed some light on what might have caused this
> phenomenon and whether there exists a known limitation with respect
> to the total number of resources or resource groups that can be
> effectively managed within a pacemaker cluster of this size?
> Looking forward to your reply.
> 

Ken Gaillot 
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Pacemaker release plans for 2.1.9 and 3.0.0

2024-09-23 Thread Ken Gaillot

Hi all,

The first Pacemaker 3 release will happen in a few months.

Pacemaker major version bumps indicate backward-incompatible changes
that may break rolling upgrades from certain earlier versions, as well
as changes that could affect user scripts or cluster behavior.

The changes in 3.0.0 mostly relate to deprecated and/or undocumented
features and the low-level C API, and shouldn't affect most users. The
full list of changes is on the ClusterLabs wiki:

https://projects.clusterlabs.org/w/projects/pacemaker/pacemaker_3.0_changes/

Next week, we will start the release process for 2.1.9, to give a clear
line between the 2 and 3 series. 2.1.8 just came out last month, but
we've made a number of fixes since then, and this ensures that the only
differences between 2.1.9 and 3.0.0 will be changes in backward
compatibility.

When the 2.1.9 final comes out around the beginning of November, we'll
release the first release candidate of 3.0.0.

Once 3.0.0 is released around the end of the year, the 3.0 series will
get all new primary development. Selected backward-compatible bug fixes
from each Pacemaker 3 release will be backported to the 2.1 series,
which will continue to get releases for probably another 3 to 5 years.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Got pacemaker into a hung state

2024-09-16 Thread Ken Gaillot

On Sun, 2024-09-15 at 19:47 -0400, Madison Kelly wrote:
> Hi all,
> 
>I was working on our OCF RA, and had a bug where the RA hung. 
> (specifically, a DNS query returned a fake IP, probably a search
> engine 
> after entering an invalid domain, and the RA hung checking if the
> target 
> was in ~/.ssh/known_hosts). Specifically, I was trying to do a 
> migration, which of course timed out and went into a FAILED state.
> 
>I expected the FAILED state, but after that, both nodes were 
> repeatedly showing:
> 
> 
> Sep 15 19:41:07 an-a01n02.alteeve.com pacemaker-controld[1283158]:  
> warning: Delaying join-33 finalization while transition in progress
> Sep 15 19:41:07 an-a01n02.alteeve.com pacemaker-controld[1283158]:  
> warning: Delaying join-33 finalization while transition in progress
> 

That sounds like a bug. Once the timeout happened, the transition
should have been complete. The DC's pacemaker.log should show what
actions were needed in the transition just before the most recent
"saving inputs" message before this point. Then you can check the logs
for the results of those actions to see if maybe something was still in
progress for a long time.

Also, only the DC logs that message. Are you sure it was on both nodes
at the same time? If so, they must have lost cluster communication. But
of course that should lead to Corosync failure and fencing.

> 
>I could not do a 'pcs resource cleanup', I could not withdraw the 
> node I triggered the migration from, and even after I fenced the
> node 
> that I had run the migration from, the peer remained stuck. In the
> end, 
> I had to reboot both nodes in the pacemaker cluster.
> 
>This was a dev system, so no harm, but now I am worried something 
> could leave a production system hung. How would you recover from a 
> situation like this, without rebooting?
> 
> Madi
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Pacemaker newbie

2024-09-16 Thread Ken Gaillot

On Fri, 2024-09-13 at 17:32 +0200, Antony Stone wrote:
> On Friday 13 September 2024 at 17:23:59, Taylor, Marc D wrote:
> 
> > We bought a storage system from Dell and they recommended to us
> > that we
> > should use a two-node cluster
> 
> I do hope you realise that a literal two-node cluster is not a good
> idea?
> 
> If the two nodes lose contact with each other, you get a situation
> called 
> "split brain" where neither node can know what state the other node
> is in, and 
> neither node can safely take over resources.

As long as fencing is configured (and tested!), a two-node cluster is
not a problem. If the nodes lose communication, one side will fence the
other and take over all resources. (Various fencing options are
available to avoid a "death match" where both nodes fence each other.)

> 
> You should always have an odd number of nodes in a cluster, and
> provided more 
> than 50% of the nodes can see each other, they will run resources;
> any node 
> which cannot see enough other nodes to be in a group of more than 50%
> will 
> stop running resources.
> 
> > to share the storage out as either NFS or SMB.
> 
> Do they explicitly say you can do both?
> 
> It might be possible to share a single storage resource using both
> NFS and 
> SMB, but it must have some interesting file-locking capabilities.
> 
> 
> Antony
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Pacemaker newbie

2024-09-16 Thread Ken Gaillot

On Fri, 2024-09-13 at 15:08 +, Taylor, Marc D wrote:
> Hello,
>  
> I just found this list and have a few questions.
>  
> My understanding is that you can’t run a cluster that is both
> active/active and active/passive on the same cluster nodes.  Is this
> correct?

Nope, Pacemaker has no concept of active or passive nodes, that's just
an easy way for people to think about it. In Pacemaker, any node can
run any resource in any mode unless told otherwise.

> We need to run a cluster to share out storage.  On one LUN we need to
> share out NFS and on another LUN we need to share out Samba.  We
> already shared out the NFS LUN in an active/passive configuration. 
> It looks like Samba should be shared out as Active/Active though we
> did find procedures for Active/Passive.  What is the best wisdom
> here?

Both resources would be clones. I'm not familiar with those particular
resource agents, but assuming they support promote/demote actions, you
would just configure both clones with promotable="true", and set
promoted-max="2" on the samba clone.

https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/html/collective.html#clone-options

>  
> Thanks in advance,
>  
> Marc Taylor
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Recommendation Fencing

2024-09-03 Thread Ken Gaillot

Hi,

It sounds like watchdog-only sbd is your best option.

The problem with fence_vmware in this situation is that if the
connection between the two sites is severed, the other site won't be
able to reach the vmware host to request fencing.

An alternative is to use a fencing topology, which allows you to
combine multiple fencing devices. You would have fence_vmware as the
first level and watchdog-only sbd as the second level. When fencing is
needed, the cluster would try fence_vmware first, and if that failed,
rely on sbd.

On Sat, 2024-08-31 at 15:12 +, Angelo M Ruggiero via Users wrote:
> Hi,
> 
> Thanks for the previous replies. I am currently reading them.
> 
> Can i be cheeky i have been researching and having some other
> "organisation issues" around fencing. 
> 
> Maybe it is possible to recommend what fencing and whether to use sbd
> with just watchdog or also with storage.  Or some pointers...
> 
> Here is my setup
> 
> 2 Node,  with a Quorum Device
> 
> nodes run on vmware and RHEL 8+ there is vmware watch available (i
> even tried it out a bit)
> 
> Each node is a different site, but they are close i.e approx 15 km
> with a good connection (i work for a big bank in switzerland)
> 
> Application DB Shared storage is given via nfs mounting and available
> on both side. The shared storage can be used at both sites. 
> 
> We want to run SAP with HANA the above setup and using pacemaker
> there are some restrictions around sbd and vmware but lets put that
> to one side just from a view of pacemaker what option i choose i have
> to make sure it is SAP Certified... Oh joy. 🙂
> 
> I see the following main options
> 
> Option 1. Just fence_vmware with no sbd at all
> 
> Option 2. Fence_vmware with sbd but just watchdog
> 
> Option 3. Fence_vmware with sbd, watchdog and share storage 
> 
> The organisational issue is that the sbd shared storage is considered
> non standard, although we are setting up Oracle RAC which needs
> similar setup.
> 
> What i read around at sort of think is good is option 3 as it provide
> posion pill, self fencing
> 
> What i do not have clear in my head and what i will work on next week
> to work out the pros and cons and what situations can be handled.
> 
> I have discounted other fence agents as I am not sure they work on
> vmware, but happy to be  told other options.
> 
> Any input gratefully received.
> 
> regards
> Angelo
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Batch start (approx > 5) of systemd resources fail (exitreason='inactive')

2024-08-28 Thread Ken Gaillot

succeeded
> Jul 05 08:02:40.975 m1 pacemaker-execd [3088074]
> (action_complete)  debug: wait5_2 start may still be in
> progress: re-scheduling (elapsed=487ms, remaining=99513ms,
> start_delay=2000ms)
> ```
> 
> Last resource started (success):
> ```
> Jul 05 08:02:44.747 m1 pacemaker-execd [3088074]
> (process_unit_method_reply)debug: DBus request for start of
> wait5_10 using /org/freedesktop/systemd1/job/640170 succeeded
> Jul 05 08:02:44.747 m1 pacemaker-execd [3088074]
> (action_complete)  debug: wait5_10 start may still be in
> progress: re-scheduling (elapsed=4259ms, remaining=95741ms,
> start_delay=2000ms)
> ```
> 
> Thus any number of concurrent systemd resource starts greater than
> ceil(2/reload_time) is prone to failure.
> 
> Some extra information:
> Resources that reached the activating status before those 2 seconds
> ran out succeeded to start as they reported 'activating' when the
> first monitor was performed:
> ```
> Jul 05 08:02:44.827 m1 pacemaker-execd [3088074] (log_execute)  
>debug: executing - rsc:wait5_6 action:monitor
> call_id:249
> Jul 05 08:02:44.827 m1 pacemaker-execd [3088074]
> (services__execute_systemd)debug: Performing asynchronous status
> op on systemd unit wait_5_to_start@6 for resource wait5_6
> Jul 05 08:02:44.831 m1 pacemaker-execd [3088074]
> (action_complete)   info: wait5_6 monitor is still in
> progress: re-scheduling (elapsed=4342ms, remaining=95658ms,
> start_delay=2000ms)
> ```
> 
> The systemd service used for those tests:
> ```
> root@m1:~# systemctl cat wait_5_to_start@.service
> # /etc/systemd/system/wait_5_to_start@.service
> [Unit]
> Description=notify start after 5 seconds service %i
> 
> [Service]
> Type=notify
> ExecStart=/usr/bin/python3 -c 'import time; import systemd.daemon;
> time.sleep(5); systemd.daemon.notify("READY=1"); time.sleep(86400)'
> ```
> How the resources were created (and tested):
> ```
> # for I in $(seq 1 10); do pcs resource create wait5_$I
> systemd:wait_5_to_start@$I op monitor interval="60s" timeout="100s"
> op start interval="0s" timeout="100s" op stop interval="0s"
> timeout="100s" --disabled; done
> # for I in $(seq 2 10); do pcs constraint colocation add wait5_$I
> with wait5_1 INFINITY; done
> # for I in $(seq 2 10); do pcs constraint order start wait5_1 then
> start wait5_$I kind=Mandatory; done
> # for I in $(seq 2 10); do pcs resource enable wait5_$I; done
> # pcs resource move wait5_1
> ```
> 
> Best regards,
> Borja.
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] corosync won't start after node failure

2024-08-20 Thread Ken Gaillot

On Mon, 2024-08-19 at 12:58 +0300, Murat Inal wrote:
> [Resending the below due to message format problem]
> 
> 
> Dear List,
> 
> I have been running two different 3-node clusters for some time. I
> am 
> having a fatal problem with corosync: After a node failure, rebooted 
> node does NOT start corosync.
> 
> Clusters;
> 
>   * All nodes are running Ubuntu Server 24.04
>   * corosync is 3.1.7
>   * corosync-qdevice is 3.0.3
>   * pacemaker is 2.1.6
>   * The third node at both clusters is a quorum device. Cluster is on
> ffsplit algorithm.
>   * All nodes are baremetal & attached to a dedicated kronosnet
> network.
>   * STONITH is enabled in one of the clusters and disabled for the
> other.
> 
> corosync & pacemaker service starts (systemd) are disabled. I am 
> starting any cluster with the command pcs cluster start.
> 
> corosync NEVER starts AFTER a node failure (node is rebooted). There 

Do you mean that the first time you run "pcs cluster start" after a
node reboot, corosync does not come up completely?

Try adding "debug: on" to the logging section of
/etc/corosync/corosync.conf

> is 
> nothing in /var/log/corosync/corosync.log, service freezes as:
> 
> Aug 01 12:54:56 [3193] charon corosync notice  [MAIN  ] Corosync
> Cluster 
> Engine 3.1.7 starting up
> Aug 01 12:54:56 [3193] charon corosync info[MAIN  ] Corosync 
> built-in features: dbus monitoring watchdog augeas systemd xmlconf
> vqsim 
> nozzle snmp pie relro bindnow
> 
> corosync never starts kronosnet. I checked kronosnet interfaces, all
> OK, 
> there is IP connectivity in between. If I do corosync -t, it is the
> same 
> freeze.
> 
> I could ONLY manage to start corosync by reinstalling it: apt
> reinstall 
> corosync ; pcs cluster start.
> 
> The above issue repeated itself at least 5-6 times. I do NOT see 
> anything in syslog either. I will be glad if you lead me on how to
> solve 
> this.
> 
> Thanks,
> 
> Murat
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Initial Setup

2024-08-19 Thread Ken Gaillot

Your summary looks correct to me.

Note that by default, pcs cluster setup will configure the cluster
communication layer (knet) to use encryption with cipher AES-256 and
hash SHA-256. That covers Corosync and Pacemaker communication between
nodes during cluster operation.

In addition, Pacemaker's configuration (CIB) is readable and writable
only by root. Users may optionally be added to the haclient group to
gain read/write access, and ACLs may optionally be configured to
restrict that access to specific portions.

On Fri, 2024-08-16 at 12:41 +, Angelo M Ruggiero via Users wrote:
> Hello,
> 
> I have been learning and playing with the pacemaker. Its great. We
> are going to use is in SAP R3/HANA on RHEL8 hopefully in the next few
> months.
> 
> I am trying to make sure I know how it works from a security point of
> view. As in my world I have to explain to security powers at be 
> 
> So been looking at the man pages, netstatin,tcpdumping, lsofing etc
> and looking at the code even as far as i can.
> 
> Here is an initial sort of description what actually happens during
> the initial setup until all processes are up and "trusted" thereafter
> with resources is less of an issue.
> 
> I know it some how not exact enough. But I need some sort of pointers
> or some basic corrections then I will make it better. Happy to
> contribute something here if people think valuable.
> I got some pics as well. 
> 
> Just to be I do not have a problem it is all working. 
> 
> So can someone help me to review the below.
> packages pcs, pacemaker, corosync., ... installed on each host 
> hacluster password set and pcsd started
> On one of the intended cluster hostspcs host add 
> pcs(1) connects to the local pcsd(8) via only root writable unix
> domain socket
> local pcsd connects to each remote host on port 2244 via TLS and
> configured cipher
> the remote pcsd via PAM requests uid, password authentication
> (hacluster and the above set passwd)
> if successfull the remote pcsd
> writes into the local /var/lib/pcsd/known_hosts its own entry
> writes the node list entry into the /etc/corosync/corosync.,conf
> if there is no /etc/corosync/authkey the corosync_keygen is running
> to generate and write the key
> the local pcsd
> writes also the remotes pcsd the remote hosts entry
> writes the node list entry into the /etc/corosync/corosync.,conf
> if there is no /etc/corosync/authkey the corosync_keygen is running
> to generate and write the key
> On one of the intended cluster hosts... pcs cluster setup  hosts>
> pcs(1) connects to the local pcsd(8) via only root writable unix
> domain socket
> allocates a random /etc/pacemaker/authkey
> connects to each of the list of hosts via TLS and for each
> presents the remote host token from the previously setup known hosts
> entry for authentication
> presents the /etc/pacemaker/authkey if not yet on the remote host
> send the configuration data
> 
> Angelo
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Resource stop sequence with massive CIB update

2024-08-12 Thread Ken Gaillot

On Mon, 2024-08-12 at 22:47 +0300, ale...@pavlyuts.ru wrote:
> Hi All,
>  
> We use Pacemaker in specific scenario, where complex network
> environment, including VLANs, IPs and routes managed by external
> system and integrated by glue code by:
> Load CIB database config section with cibadmin –query --
> scope=configuration
> Add/delete prototypes and constraints
> Apply the new config by cibadmin –replace --scope=configuration --
> xml-pip --sync-call
> CIB taken from stdout, new cib load to by stdin, all done by Python
> code
>  
> All types handed with standard ocf:heartbeat resource scripts.
>  
> VLANs defined as clones to ensure it is up on all nodes.
> Then, order constraints given to start IP after vlan-clone (to ensure
> VLAN exists), then route after proper IP.
>  
> This works very good on mass-create, but we got some problems on
> mass-delete.
>  
> For my understanding of Pacemaker architecture and behavior: if it
> got the new config, it recalculates resource allocation, build up the
> target map with respect to [co]location constraints and then schedule
> changes with respect to order constraints. So, if we delete at once
> VLANS, IPs and routes, we also have to delete its constraints. Then,
> the scheduling of resource stop will NOT take order constraints from
> OLD config into consideration. Then, all the stops for VLAN, IPs and
> routes will start in random order. However:

Correct

> If VLAN deletes (stops) it also deletes all IPs, bound to the
> interface. And all routes.
> Then, IP resource trying to remove IP address which already deleted,
> and then files. Moreover, as scripts run in parallel, it may see IP
> active when it checks, but there already no IP when it tries to
> delete. As it failed, it left as an orphan (stopped, blocked) and
> only be clear with cleanup command. And this also ruins future CIB
> updates.
> About same logic between IP and routes.
>  
> After realizing this, I have changed the logic, use two stages:
> Read CIB
> Disable all resources to delete (setting target_role=Stopped) and
> send it with cibadmin
> Delete all the resources from CIB and send it with cibadmin

Good plan

> My idea was that Pacemaker will plan and do resource shutdown at step
> 2 with respect to order constraints which are still in the config.
> And then, it is safe to delete already stopped resources.
>  
> But I see the same troubles with this two-stage approach. Sometimes
> some resources fail to stop because referenced entity is already
> deleted.
>  
> It seems like one of two:
> Pacemaker does not respect order constraints when we put the new
> config section directly
> I misunderstand --sync-call cibadmin option, and it won’t wait for
> new config really applied and returns immediately, therefore delete
> starts before all stops compete. I did not find any explanation, and
> my guess was it should wait for changes applied by pacemaker, but I
> am not sure.

The second one: --sync-call only waits for the change to be committed
to the CIB, not for the cluster to respond. For that, call crm_resource
--wait afterward.

>  
> I need an advice about this situation and more information about
> –sync-call option. Is it right approach or I need extra delay? Or
> wait for everything stop by request state once and once?
>  
> I will be very grateful for any ideas or information!
>  
> Sincerely,
>  
> Alex
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Pacemaker 2.1.8 released

2024-08-12 Thread Ken Gaillot

Hi all,

The final release of Pacemaker 2.1.8 is now available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.8

This release includes a significant number of bug fixes (including 10
regression fixes) and a few new features. It also deprecates some
obscure features and many C APIs in preparation for Pacemaker 3.0.0
dropping support for them later this year.

See the link above for more details.

Many thanks to all contributors to this release, including bixiaoyan1,
Chris Lumens, Ferenc Wágner, Gao,Yan, Grace Chin, Ken Gaillot, Klaus
Wenninger, liupei, Oyvind Albrigtsen, Reid Wahl, tomyouyou, wangluwei,
xin liang, and xuezhixin.

-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Does anyone use or care about custom OCF actions?

2024-08-07 Thread Ken Gaillot

Hi all,

Pacemaker supports the usual actions for OCF agents such as start,
stop, and monitor.

Usually monitor is the only action configured as recurring. However it
is possible to create a custom named action in an agent and configure
it as recurring in Pacemaker. 

Currently, a recurring custom action that fails will be handled
similarly to a failed monitor (taking the on-fail response, marking the
resource as failed, etc.). That's not documented or tested, and I
suspect it might be handled inconsistently.

For Pacemaker 3.0.0, we're considering ignoring the results of custom
recurring actions. They would still be run, they just wouldn't affect
the state of the resource. With OCF_CHECK_LEVEL, it's possible to
configure three different monitor behaviors, so I don't see much
benefit to custom actions.

What's your opinion? Does anyone use custom actions or find them
potentially worthwhile?
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] 9 nodes pacemaker cluster setup non-DC nodes reboot parallelly

2024-07-24 Thread Ken Gaillot

On Tue, 2024-07-16 at 16:05 +, S Sathish S wrote:
> Hi Ken,
> 
> Thank you for quick response.
> 
> We have checked pacemaker logs found signal 15 on pacemaker component
> . Post that we have executed pcs cluster start then pacemaker and
> corosync service started properly and joined cluster also.
> 
> With respect to reboot query , In our application pacemaker cluster
> no quorum or fencing is configured. Please find reboot procedure
> followed in our upgrade procedure which will be executed parallelly
> on all 9 nodes cluster. Whether it is recommended way to reboot?
> 
>  pacemaker cluster in maintenance mode.
> Bring down pacemaker cluster service using below command.
> # pcs cluster stop
> # pcs cluster disable
>  3) reboot 
>  4) Bring up pacemaker cluster Service

That's fine. The disable command means the cluster services will not
start at boot, which I presume is intentional.

No quorum or fencing means you are at risk of service interruption and
possibly data unavailability or corruption, depending on what resources
you are running.

Without quorum, if one or more nodes are split from the cluster, each
side of the split will bring up all resources. The effect of that
varies by the type of resources. For example, with an IP address,
packets might be routed randomly to the two sides, rendering it
useless. With a database in single-primary mode, you will end up with
divergent data sets. And so on.

Without fencing, if a node is malfunctioning (high CPU load, a device
driver hanging, a flaky network card, etc.), the cluster may be unable
to communicate with it and will bring its resources up on other nodes.
The malfunctioning node is likely still running those resources and,
especially if it recovers, you may have similar problems as a quorum
split.


> 
> 
> Regards,
> S Sathish S
> From: Ken Gaillot 
> Sent: Tuesday, July 16, 2024 7:53 PM
> To: Cluster Labs - All topics related to open-source clustering
> welcomed 
> Cc: S Sathish S 
> Subject: Re: [ClusterLabs] 9 nodes pacemaker cluster setup non-DC
> nodes reboot parallelly
>  
> On Tue, 2024-07-16 at 11:18 +, S Sathish S via Users wrote:
> > Hi Team,
> >  
> > In our product we have 9 nodes pacemaker cluster setup non-DC nodes
> > reboot parallelly. Most of nodes join cluster properly and only one
> > node pacemaker and corosync service is not came up properly with
> > below error message.
> >  
> > Error Message:
> > Error: error running crm_mon, is pacemaker running?
> >   crm_mon: Connection to cluster failed: Connection refused
> 
> All that indicates is that Pacemaker is not responding. You'd have to
> look at the system log and/or pacemaker.log from that time to find
> out
> more.
> 
> > 
> > Query : Is it recommended to reboot parallelly of non-DC nodes ?
> 
> As long as they are cleanly rebooted, there should be no fencing or
> other actual problems. However the cluster will lose quorum and have
> to
> stop all resources. If you reboot less than half of the nodes at one
> time and wait for them to rejoin before rebooting more, you would
> avoid
> that.
> 
> >  
> > Thanks and Regards,
> > S Sathish S
> > ___
> > Manage your subscription:
> > 
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.clusterlabs.org%2Fmailman%2Flistinfo%2Fusers&data=05%7C02%7Cs.s.sathish%40ericsson.com%7C5e391698a47643d1c7fb08dca5a2ec0e%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638567366368643199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=QIk47YY2QLsIBwA1lWM%2BeG%2FEFfEL%2B5D7GEn0nOTeRV8%3D&reserved=0
> > 
> > ClusterLabs home: 
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.clusterlabs.org%2F&data=05%7C02%7Cs.s.sathish%40ericsson.com%7C5e391698a47643d1c7fb08dca5a2ec0e%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638567366368652616%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=WJe0xE95VNwHECBIB8onLtn537l9p6teIrHQGQwU24U%3D&reserved=0
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Podman active-active bundle

2024-07-22 Thread Ken Gaillot

On Fri, 2024-07-19 at 09:43 +, MAIER Jakob wrote:
> Hi,
> 
> I am aware of the documentation that you have linked here, and I have
> managed to setup a bundle running on two nodes. What I am having
> problems with is running them as master-slave. Would this be handled
> by the primitive running inside the bundle, since the podman
> resource-agent does not have the promote/demote actions? If possible,
> I would like the promotion and demotion only depend on the state of
> the container, not what is running inside (so no primitive). 

Yes, promotion is handled by the primitive inside the bundle. Only OCF
resource agents support promote and demote actions, so there has to be
a primitive inside the bundle. Set the promoted-max option on the
bundle to activate support for promotion (how many instances can be
promoted at any given time, usually 1).

> 
> Also, how would you setup networking to have a single virtual ip for
> all clones in the bundle? I have not found any documentation on how
> to do something like that.

I'm not aware of that being possible. I think what you want is a
separate virtual IP resource colocated with the bundle's promoted
instance, and a port mapping to forward connections to the container.

> 
> Concerning the RHEL UBI image, I have created my own base image that
> has pacemaker-remoted installed, so that should not be a problem. 
> 
> Best,
> Jakob
> 
> 
> -Original Message-
> From: Ken Gaillot  
> Sent: Mittwoch, 17. Juli 2024 16:09
> To: Cluster Labs - All topics related to open-source clustering
> welcomed 
> Cc: MAIER Jakob 
> Subject: Re: [ClusterLabs] Podman active-active bundle
> 
> *EXTERNAL source*
> 
> 
> Hi,
> 
> The upstream bundle documentation is at:
> 
> https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/html/collective.html#bundles-containerized-resources
> 
> There is also a walk-through:
> 
> https://projects.clusterlabs.org/w/projects/pacemaker/bundle_walk-through/
> 
> Unfortunately, since that was written, the HA packages were taken off
> the RHEL UBI container image, so you'll have to adjust the
> instructions for some other distro base image that does have the HA
> packages. It should be nearly identical though.
> 
> On Wed, 2024-07-17 at 11:56 +, MAIER Jakob via Users wrote:
> > Hi!
> > 
> > I would like to setup a number of podman containers in two nodes
> > such 
> > that every container is running on both nodes (active-active). One
> > of 
> > the containers should be master while the other is slave, and when 
> > there is a failure with the containers healthcheck, there should be
> > a 
> > failover. These two master-slave containers should also have a
> > virtual 
> > ip address assigned to them, that points to the current master.
> > There 
> > is sadly not a lot of documentation on bundles. I have found some 
> > documentation from redhat on such “complex bundles”, but
> > unfortunately 
> > that is also not very detailed, and I don’t really know how to 
> > proceed.
> > 
> > Does anybody know how I could go about setting something like this
> > up?
> > 
> > Kind regards,
> > Jakob
> > 
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> --
> Ken Gaillot 
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Pacemaker 2.1.8-rc4 released (likely final)

2024-07-17 Thread Ken Gaillot

Hi all,

The fourth (and likely final) release candidate for Pacemaker 2.1.8 is
available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.8-rc4

This release candidate fixes a regression introduced in 2.1.7 causing
crm_node -i to fail intermittently. We squeezed in a couple of tool
improvements as well. For details, see the above link.

If no further regressions are found, this will likely become the final
release. Everyone is encouraged to download, compile, and test the new
release. We do many regression tests and simulations, but we can't
cover all possible use cases, so your feedback is important and
appreciated.

Many thanks to all contributors to this release, including Gao,Yan,
Grace Chin, Ken Gaillot, and Reid Wahl.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Podman active-active bundle

2024-07-17 Thread Ken Gaillot

Hi,

The upstream bundle documentation is at:

https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/html/collective.html#bundles-containerized-resources

There is also a walk-through:

https://projects.clusterlabs.org/w/projects/pacemaker/bundle_walk-through/

Unfortunately, since that was written, the HA packages were taken off
the RHEL UBI container image, so you'll have to adjust the instructions
for some other distro base image that does have the HA packages. It
should be nearly identical though.

On Wed, 2024-07-17 at 11:56 +, MAIER Jakob via Users wrote:
> Hi!
>  
> I would like to setup a number of podman containers in two nodes such
> that every container is running on both nodes (active-active). One of
> the containers should be master while the other is slave, and when
> there is a failure with the containers healthcheck, there should be a
> failover. These two master-slave containers should also have a
> virtual ip address assigned to them, that points to the current
> master. There is sadly not a lot of documentation on bundles. I have
> found some documentation from redhat on such “complex bundles”, but
> unfortunately that is also not very detailed, and I don’t really know
> how to proceed.
>  
> Does anybody know how I could go about setting something like this
> up?
>  
> Kind regards,
> Jakob
>  
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] 9 nodes pacemaker cluster setup non-DC nodes reboot parallelly

2024-07-16 Thread Ken Gaillot

On Tue, 2024-07-16 at 11:18 +, S Sathish S via Users wrote:
> Hi Team,
>  
> In our product we have 9 nodes pacemaker cluster setup non-DC nodes
> reboot parallelly. Most of nodes join cluster properly and only one
> node pacemaker and corosync service is not came up properly with
> below error message.
>  
> Error Message:
> Error: error running crm_mon, is pacemaker running?
>   crm_mon: Connection to cluster failed: Connection refused

All that indicates is that Pacemaker is not responding. You'd have to
look at the system log and/or pacemaker.log from that time to find out
more.

> 
> Query : Is it recommended to reboot parallelly of non-DC nodes ?

As long as they are cleanly rebooted, there should be no fencing or
other actual problems. However the cluster will lose quorum and have to
stop all resources. If you reboot less than half of the nodes at one
time and wait for them to rejoin before rebooting more, you would avoid
that.

>  
> Thanks and Regards,
> S Sathish S
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Pacemaker 2.1.8-rc3 released (possibly final)

2024-07-03 Thread Ken Gaillot

Hi all,

The third (and possibly final) release candidate for Pacemaker 2.1.8 is
available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.8-rc3

This adds a few bug fixes, including issues found in earlier release
candidates. For details, see the above link.

If no further regressions are found, this will likely become the final
release. Everyone is encouraged to download, compile, and test the new
release. We do many regression tests and simulations, but we can't
cover all possible use cases, so your feedback is important and
appreciated.

Many thanks to all contributors to this release, including Chris
Lumens, Gao,Yan, Ken Gaillot, and Reid Wahl.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] clutering rabbitmq

2024-06-24 Thread Ken Gaillot

Hi,

The rabbitmq-cluster agent was written specifically for the OpenStack
use case, which is fine with recreating the cluster from scratch after
problems. I'm not sure about the other two, and I'm not really familiar
with any of the agents. Hopefully someone with more experience with
RabbitMQ can jump in.

On Thu, 2024-06-20 at 10:33 +0200, Damiano Giuliani wrote:
> Hi,
> 
> hope you guys can help me,
> 
> we have builded up a rabbitmq cluster using pacemaker resource called
> rabbitmq-cluster.
> everything works as exptected till for maintenance reason, we shutted
> down the entire cluster gracefully.
> at the startup we noticed all the user and permissions were dropped
> and probably also the quorum queues.
> So investigating the resource agent (rabbitmq-cluster), i find out it
> callss this wipe function
> 
> rmq_wipe_data()
> {
> rm -rf $RMQ_DATA_DIR > /dev/null 2>&1 
> }
> 
> when the first start function is called 
> 
> rmq_start_first()
> {
> local rc
> 
> ocf_log info "Bootstrapping rabbitmq cluster"
> rmq_wipe_data
> rmq_init_and_wait
> rc=$?
> 
> So probably when all the cluster is fired up by pacemaker all the
> rabbitmq istances are wiped out.
> 
> the rabbitmq-cluster is quite old (3-4yo) and probably didnt take
> into account quorum queues which are presistent, so a full wipe is
> not acceptable.
> 
> So i moved to the RA called rabbitmq-server-ha which is quite huge
> and big script but im a bit lost because i notice also this one seems
> cleans mnesia folder.
> 
> So the third and last one is the RA rabbitmq-server  which seems
> simple resoruce but not manage cluster status but only simple actions
> like start stop etc.
> i could probably build the cluster using this one + rabbitmq.conf
> file where i defined cluster istances, something like this.
> https://www.rabbitmq.com/docs/cluster-formation#peer-discovery-classic-config
> 
> so im a bit lost because seems there is no easy way to build up a
> rabbitmq cluster using pacemaker.
> 
> can you guys help me heading on the correct way?
> 
> thanks
> 
> Damiano
> 
> 
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] lost corosync/pacemaker pair

2024-06-13 Thread Ken Gaillot

On Thu, 2024-06-13 at 03:22 +, eli estrella wrote:
> Hello.
> I recently lost one of my LB servers running in a corosync/pacemaker
> pair, would it be possible to clone the live one to create the lost
> pair, changing the IP, hostname etc?
> Thanks for any help you can provide.
> 

Yes, that should be fine as far as the cluster goes. Of course your
specific resources may have other needs (especially a database or
clustered file system).
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Pacemaker 2.1.8-rc2 released

2024-06-11 Thread Ken Gaillot

Hi all,

The second release candidate for Pacemaker 2.1.8 is available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.8-rc2

This mainly fixes issues introduced in 2.1.8-rc1, with a couple of
other memory fixes and one new feature: the PCMK_panic_action
environment variable may now be set to "off" or "sync-off".

For details, see the above link.

Everyone is encouraged to download, compile and test the new release.
We do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

Many thanks to all contributors to this release, including Chris
Lumens, Gao,Yan, Ken Gaillot, Reid Wahl, and tomyouyou.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Need advice: deep pacemaker integration, best approach?

2024-06-10 Thread Ken Gaillot

On Sun, 2024-06-09 at 23:13 +0300, ale...@pavlyuts.ru wrote:
> Hi All,
>  
> We intend to integrate Pacemaker as failover engine into a very
> specific product. The handmade prototype works pretty well. It
> includes a couple of dozens coordinated resources to implement one
> target application instance with its full network configuration. The
> prototype was made with pcs shell, but the process is very complex
> and annoying for mass- rollout by field engineers.
>  
> Our goal is to develop a kind of configuration shell to allow a user
> to setup, monitor and manage app instance as entities, not as a set
> of cluster resources. Means, user deals with app settings and status,
> the shell translates it to resources configuration and status and
> back.
>  
> The shell be made with Python, as it is the best for us for now. The
> question for me: what is the best approach to put Pacemaker under the
> capote. I did not consider to build it over pcs as pcs output quite
> hard to render, so I have to use more machine-friendly interface to
> pacemaker for sure but the question is which ones fits our needs the
> best.

pcs, crm shell, and the Pacemaker command-line tools are the basic
options
 
> It seems like the best way is to use custom resource agents, XML
> structures and cibadmin to manage configuration and get status
> information. However, it is not clean: should cibadmin be used
> exclusively, or there also other API to pacemaker config pull/push?

If you're using the command-line tools, yes, cibadmin is the interface
for CIB XML configuration changes. Other tools can perform certain
configuration changes at a logically higher level (for example,
crm_attribute can set node attributes in the CIB), but cibadmin can
handle any XML changes.

>  
> Also, it is not clear how to manage resource errors and cleanup? Are
> there other ways that call to crm_resource for cleanup and failed
> resource restart? Could it be made via CIB manipulation like force
> lrm history records delete?

That's what crm_resource --cleanup is for

>  
> I understand that the source is the ultimate answer for any question,
> but I will be very grateful for any advice from ones who has all the
> answers on their fingertips.
>  
> Thank you in advance for sharing your thoughts and experience!
>  
> Sincerely,
>  
> Alex
>  
>  
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Is XSD definition of CIB available?

2024-06-06 Thread Ken Gaillot

On Thu, 2024-06-06 at 16:07 +0300, ale...@pavlyuts.ru wrote:
> Hi all,
>  
> Is there XSD scheme for Pacemaker CIB available as a document to see
> the full XML syntax and definitions?

We use RNG. The starting point is xml/pacemaker.rng in the repository,
typically installed in /usr/share/pacemaker.

>  
> I was tried to search over the sources, but got no success.
>  
> Thank you in advance!
>  
> Alex
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Thoughts on priority-fence-delay

2024-06-03 Thread Ken Gaillot

On Mon, 2024-06-03 at 14:32 +0800, Mr.R via Users wrote:
> Hi, all
> 
> The priority-fence-delay attribute adds a delay to the node
> running with a higher priority
> in the primary and secondary resources, making the node fence take
> effect later. Is it possible
> to apply a similar mechanism to common types of resources, so that
> nodes running more
> resources avoid being fenced?
> 
> Thanks
> 

That is the original use case for priority-fence-delay. Simply assign
every resource a priority of 1 (which you can do in resource defaults).
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] crm services not getting started after upgrading to snmp40

2024-05-22 Thread Ken Gaillot

On Wed, 2024-05-22 at 07:33 +, ., Anoop wrote:
> Hello,
>  
> We have HA setup with 2 node cluster using CRM. OS is Suse 15sp3.
> After upgrading to snmp40, cluster services are not getting started
> like pacemaker , corosync etc. After booting we have to manually
> mount certain filesystem and start the crm services like pacemaker
> etc. We have a SharedFileSystem group as the resource with 5
> fileystems , but not getting mounted while booting.  Let me know any
> other info required.
>  
> Regards
> Anoop
> 

What is the "certain filesystem"? If cluster services require it, that
would explain why they can't start.

What do the systemd journal logs say about the filesystem and cluster
services? Did it try to start them at all?
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Pacemaker 2.1.8-rc1 released

2024-05-15 Thread Ken Gaillot

Hi all,

The first release candidate for Pacemker 2.1.8 is available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.8-rc1

This release has a number of new features and bugfixes, along with
deprecations of (mostly undocumented and/or broken) some obscure
features.

Two new commands, crm_attribute --list-options and crm_resource --list-
options, show all possible cluster options, resource meta-attributes,
and special fence device parameters.

A longstanding bug that could convert utilization attributes to normal
node attributes has been fixed.

The C API sees a large number of changes, particularly deprecations of
stuff that will get removed in Pacemaker 3.0.0 later this year.

For more details, see the above link.

Many thanks to all contributors to this release, including bixiaoyan1,
Chris Lumens, Ferenc Wágner, Gao,Yan, Grace Chin, Ken Gaillot, Klaus
Wenninger, liupei, Oyvind Albrigtsen, Reid Wahl, wangluwei, xin liang,
and xuezhixin.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Mixing globally-unique with non-globally-unique resources

2024-05-14 Thread Ken Gaillot

On Tue, 2024-05-14 at 13:56 +0200, Jochen wrote:
> I have the following use case: There are several cluster IP addresses
> in the cluster. Each address is different, and multiple addresses can
> be scheduled on the same node. This makes the address clone a
> globally-unique clone as far as I understood.

Actually each IP address would be a separate (uncloned) resource.

The IPaddr2 resource agent does support cloned IPs, but only using an
obsolete technology that's not around anymore. That technology allowed
a *single* IP address to be answered by any of a set of nodes
(effectively load-balancing by the client's address).

> Then I have one service per node which manages traffic for all
> addresses on a node where an address is active, which makes the
> service clone not-globally-unique. The service should only run if at
> least one address is active on the node, and there cannot be more
> than one instance of the service on each node.
> 
> How would I create this pattern in Pacemaker?
> 

What you want is an ordering constraint with the IP resources in a
resource set with require-all=false:

https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/html/constraints.html#resource-sets

-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] FYI: clusterlabs.org planned outages

2024-05-07 Thread Ken Gaillot

Hi all,

We are in the process of changing the OS on the servers used to run the
clusterlabs.org sites. There is an expected outage of all services from
4AM to 9AM UTC this Thursday. If problems arise, there may be more
outages later Thursday and Friday.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Fast-failover on 2 nodes + qnetd: qdevice connenction disrupted.

2024-05-06 Thread Ken Gaillot

On Mon, 2024-05-06 at 10:05 -0500, Ken Gaillot wrote:
> On Fri, 2024-05-03 at 16:18 +0300, ale...@pavlyuts.ru wrote:
> > Hi,
> > 
> > > > Thanks great for your suggestion, probably I need to think
> > > > about
> > > > this
> > > > way too, however, the project environment is not a good one to
> > > > rely on
> > > > fencing and, moreover, we can't control the bottom layer a
> > > > trusted
> > > > way.
> > > 
> > > That is a problem. A VM being gone is not the only possible
> > > failure
> > > scenario. For
> > > example, a kernel or device driver issue could temporarily freeze
> > > the node, or
> > > networking could temporarily drop out, causing the node to appear
> > > lost to
> > > Corosync, but the node could be responsive again (with the app
> > > running) after the
> > > app has been started on the other node.
> > > 
> > > If there's no problem with the app running on both nodes at the
> > > same time, then
> > > that's fine, but that's rarely the case. If an IP address is
> > > needed, or shared storage
> > > is used, simultaneous access will cause problems that only
> > > fencing
> > > can avoid.
> > The pacemaker use very pessimistic approach if you set resources to
> > require quorum. 
> > If a network outage is trigger changes, it will ruin quorum first
> > and
> > after that try to rebuild it. Therefore there are two questions: 
> > 1. How to keep active app running?
> > 2. How to prevent two copies started.
> > As for me, quorum-dependent resource management performs well on
> > both
> > points.
> 
> That's fine as long as the cluster is behaving properly. Fencing is
> for
> when it's not.
> 
> Quorum prevents multiple copies only if the nodes can communicate and
> operate normally. There are many situations when that's not true: a
> device driver or kernel bug locks up a node for more than the
> Corosync
> token timeout, CPU or I/O load gets high enough for a node to become
> unresponsive for long stretches of time, a failing network controller
> randomly drops large numbers of packets, etc.
> 
> In such situations, a node that appears lost to the other nodes may
> actually just be temporarily unreachable, and may come back at any
> moment (with its resources still active).
> 
> If an IP address is active in more than one location, packets will be
> randomly routed to one node or another, rendering all communication
> via
> that IP useless. If an application that uses shared storage is active
> in more than one location, data can be corrupted. And so forth.
> 
> Fencing ensures that the lost node is *definitely* not running
> resources before recovering them elsewhere.
> 
> > > > my goal is to keep the app from moves (e.g. restarts) as long
> > > > as
> > > > possible. This means only two kinds of moves accepted: current
> > > > host
> > > > fail (move to other with restart) or admin move (managed move
> > > > at
> > > > certain time with restart). Any other troubles should NOT
> > > > trigger
> > > > app
> > > > down/restart. Except of total connectivity loss where no second
> > > > node,
> > > > no arbiter => stop service.
> > > 
> > > Total connectivity loss may not be permanent. Fencing ensures the
> > > connectivity
> > > will not be restored after the app is started elsewhere.
> > Nothing bad if it restored and the node alive, but got app down
> > because of no quorum.
> 
> Again, that assumes it is operating normally. HA is all about the
> times
> when it's not.
>  
> > > Pacemaker 2.0.4 and later supports priority-fencing-delay which
> > > allows the node
> > > currently running the app to survive. The node not running the
> > > app
> > > will wait the
> > > configured amount of time before trying to fence the other node.
> > > Of
> > > course that
> > > does add more time to the recovery if the node running the app is
> > > really gone.
> > I feel I am not sure about how it works.
> > Imagine just connectivity loss between nodes but no to the other
> > pars.
> > And Node1 runs app. Everything well, node2 off.
> > So, we start Node2 with intention to restore cluster.
> > Node 2 starts and trying to find it's partner, failure and fence
> > node1 out.
> > While Node1 not even know ab

Re: [ClusterLabs] Fast-failover on 2 nodes + qnetd: qdevice connenction disrupted.

2024-05-06 Thread Ken Gaillot

 why quorum-
> based resource management is unreliable without fencing.
> May a host hold quorum bit longer than another host got quorum and
> run app. Probably, it may do this.
> But fencing is not immediate too. So, it can't protect for 100% from
> short-time parallel runs.

Certainly -- with fencing enabled, the cluster will not recover
resources elsewhere until fencing succeeds.

> 
> > That does complicate the situation. Ideally there would be some way
> > to request
> > the VM to be immediately destroyed (whether via fence_xvm, a cloud
> > provider
> > API, or similar).
> What you mean by "destroyed"? Mean get down?

Correct. For fencing purposes, it should not be a clean shutdown but an
immediate halt.

> 
> > > Please, mind all the above is from my common sense and quite poor
> > > fundamental knowledge in clustering. And please be so kind to
> > > correct
> > > me if I am wrong at any point.
> > > 
> > > Sincerely,
> > > 
> > > Alex
> > > -Original Message-
> > > From: Users  On Behalf Of Ken
> > > Gaillot
> > > Sent: Thursday, May 2, 2024 5:55 PM
> > > To: Cluster Labs - All topics related to open-source clustering
> > > welcomed 
> > > Subject: Re: [ClusterLabs] Fast-failover on 2 nodes + qnetd:
> > > qdevice
> > > connenction disrupted.
> > > 
> > > I don't see fencing times in here -- fencing is absolutely
> > > essential.
> > > 
> > > With the setup you describe, I would drop qdevice. With fencing,
> > > quorum is not strictly required in a two-node cluster (two_node
> > > should
> > > be set in corosync.conf). You can set priority-fencing-delay to
> > > reduce
> > > the chance of simultaneous fencing. For VMs, you can use
> > > fence_xvm,
> > > which is extremely quick.
> > > 
> > > On Thu, 2024-05-02 at 02:56 +0300, ale...@pavlyuts.ru wrote:
> > > > Hi All,
> > > > 
> > > > I am trying to build application-specific 2-node failover
> > > > cluster
> > > > using ubuntu 22, pacemaker 2.1.2 + corosync 3.1.6 and DRBD
> > > > 9.2.9,
> > > > knet transport.
> > > > 
> > > > For some reason I can’t use 3-node then I have to use
> > > > qnetd+qdevice
> > > > 3.0.1.
> > > > 
> > > > The main goal Is to protect custom app which is not cluster-
> > > > aware by
> > > > itself. It is quite stateful, can’t store the state outside
> > > > memory
> > > > and take some time to get converged with other parts of the
> > > > system,
> > > > then the best scenario is “failover is a restart with same
> > > > config”,
> > > > but each unnecessary restart is painful. So, if failover done,
> > > > app
> > > > must retain on the backup node until it fail or admin push it
> > > > back,
> > > > this work well with stickiness param.
> > > > 
> > > > So, the goal is to detect serving node fail ASAP and restart it
> > > > ASAP
> > > > on other node, using DRBD-synced config/data. ASAP means within
> > > > 5-
> > > > 7
> > > > sec, not 30 or more.
> > > > 
> > > > I was tried different combinations of timing, and finally got
> > > > acceptable result within 5 sec for the best case. But! The case
> > > > is
> > > > very unstable.
> > > > 
> > > > My setup is a simple: two nodes on VM, and one more VM as
> > > > arbiter
> > > > (qnetd), VMs under Proxmox and connected by net via external
> > > > ethernet switch to get closer to reality where “nodes VM”
> > > > should
> > > > locate as VM on different PHY hosts in one rack.
> > > > 
> > > > Then, it was adjusted for faster detect and failover.
> > > > In Corosync, left the token default 1000ms, but add
> > > > “heartbeat_failures_allowed: 3”, this made corosync catch node
> > > > failure for about 200ms (4x50ms heartbeat).
> > > > Both qnet and qdevice was run
> > > > with  net_heartbeat_interval_min=200
> > > > to allow play with faster hearbeats and detects Also,
> > > > quorum.device.net has timeout: 500, sync_timeout: 3000, algo:
> > > > LMS.
> > > > 
> > > > The testing is to issue “ate +%M:%S.%N && qm stop 201”, and
> > > > then
> >

Re: [ClusterLabs] Fast-failover on 2 nodes + qnetd: qdevice connenction disrupted.

2024-05-02 Thread Ken Gaillot

On Thu, 2024-05-02 at 22:56 +0300, ale...@pavlyuts.ru wrote:
> Dear Ken, 
> 
> First of all, there no fencing at all, it is off.
> 
> Thanks great for your suggestion, probably I need to think about this
> way too, however, the project environment is not a good one to rely
> on fencing and, moreover, we can't control the bottom layer a trusted
> way.

That is a problem. A VM being gone is not the only possible failure
scenario. For example, a kernel or device driver issue could
temporarily freeze the node, or networking could temporarily drop out,
causing the node to appear lost to Corosync, but the node could be
responsive again (with the app running) after the app has been started
on the other node.

If there's no problem with the app running on both nodes at the same
time, then that's fine, but that's rarely the case. If an IP address is
needed, or shared storage is used, simultaneous access will cause
problems that only fencing can avoid.

> 
> As I understand, fence_xvm just kills VM that not inside the quorum
> part, or, in a case of two-host just one survive who shoot first. But

Correct

> my goal is to keep the app from moves (e.g. restarts) as long as
> possible. This means only two kinds of moves accepted: current host
> fail (move to other with restart) or admin move (managed move at
> certain time with restart). Any other troubles should NOT trigger app
> down/restart. Except of total connectivity loss where no second node,
> no arbiter => stop service.
> 

Total connectivity loss may not be permanent. Fencing ensures the
connectivity will not be restored after the app is started elsewhere.

> AFAIK, fencing in two-nodes creates undetermined fence racing, and
> even it warrants only one node survive, it has no respect to if the
> app already runs on the node or not. So, the situation: one node 
> already run app, while other lost its connection to the first, but
> not to the fence device. And win the race => kill current active =>
> app restarts. That's exactly what I am trying to avoid.


Pacemaker 2.0.4 and later supports priority-fencing-delay which allows
the node currently running the app to survive. The node not running the
app will wait the configured amount of time before trying to fence the
other node. Of course that does add more time to the recovery if the
node running the app is really gone.

> 
> Therefore, quorum-based management seems better way for my exact
> case.

Unfortunately it's unsafe without fencing.

> 
> Also, VM fencing rely on the idea that all VMs are inside a well-
> managed first layer cluster with it's own quorum/fencing on place or
> separate nodes and VMs never moved between without careful fencing
> reconfig. In mu case, I can't be sure about both points, I do not
> manage bottom layer. The max I can do is to request that every my MV
> (node, arbiter) located on different phy node and this may protect
> app from node failure and bring more freedom to get nodes off for
> service. Also, I have to limit overall MV count while there need for
> multiple app instances (VM pairs) running at once and one extra VM as
> arbiter for all them (2*N+1), but not 3-node for each instance (3*N)
> which could be more reasonable for my opinion, but not for one who
> allocate resources.

That does complicate the situation. Ideally there would be some way to
request the VM to be immediately destroyed (whether via fence_xvm, a
cloud provider API, or similar).

> 
> Please, mind all the above is from my common sense and quite poor
> fundamental knowledge in clustering. And please be so kind to correct
> me if I am wrong at any point.
> 
> Sincerely,
> 
> Alex
> -Original Message-
> From: Users  On Behalf Of Ken Gaillot
> Sent: Thursday, May 2, 2024 5:55 PM
> To: Cluster Labs - All topics related to open-source clustering
> welcomed 
> Subject: Re: [ClusterLabs] Fast-failover on 2 nodes + qnetd: qdevice
> connenction disrupted.
> 
> I don't see fencing times in here -- fencing is absolutely essential.
> 
> With the setup you describe, I would drop qdevice. With fencing,
> quorum is not strictly required in a two-node cluster (two_node
> should be set in corosync.conf). You can set priority-fencing-delay
> to reduce the chance of simultaneous fencing. For VMs, you can use
> fence_xvm, which is extremely quick.
> 
> On Thu, 2024-05-02 at 02:56 +0300, ale...@pavlyuts.ru wrote:
> > Hi All,
> >  
> > I am trying to build application-specific 2-node failover cluster 
> > using ubuntu 22, pacemaker 2.1.2 + corosync 3.1.6 and DRBD 9.2.9,
> > knet 
> > transport.
> >  
> > For some reason I can’t use 3-node then I have to use
> > qnetd+qdevice 
> > 3.0.1.
> >  
&

Re: [ClusterLabs] Fast-failover on 2 nodes + qnetd: qdevice connenction disrupted.

2024-05-02 Thread Ken Gaillot

1 23:30:56 node2 corosync-qdevice[781]: Received preinit reply
> msg
> May 01 23:30:56 node2 corosync-qdevice[781]: Received init reply msg
> May 01 23:30:56 node2 corosync-qdevice[781]: Scheduling send of
> heartbeat every 400ms
> May 01 23:30:56 node2 corosync-qdevice[781]: Executing after-connect
> heuristics.
> May 01 23:30:56 node2 corosync-qdevice[781]: worker:
> qdevice_heuristics_worker_cmd_process_exec: Received exec command
> with seq_no "25" and timeout "250"
> May 01 23:30:56 node2 corosync-qdevice[781]: Received heuristics exec
> result command with seq_no "25" and result "Disabled"
> May 01 23:30:56 node2 corosync-qdevice[781]: Algorithm decided to
> send config node list, send membership node list, send quorum node
> list, heuristics is Undefined and result vote is Wait for reply
> May 01 23:30:56 node2 corosync-qdevice[781]: Sending set option seq =
> 98, HB(0) = 0ms, KAP Tie-breaker(1) = Enabled
> May 01 23:30:56 node2 corosync-qdevice[781]: Sending config node list
> seq = 99
> May 01 23:30:56 node2 corosync-qdevice[781]:   Node list:
> May 01 23:30:56 node2 corosync-qdevice[781]: 0 node_id = 1,
> data_center_id = 0, node_state = not set
> May 01 23:30:56 node2 corosync-qdevice[781]: 1 node_id = 2,
> data_center_id = 0, node_state = not set
> May 01 23:30:56 node2 corosync-qdevice[781]: Sending membership node
> list seq = 100, ringid = (2.801), heuristics = Undefined.
> May 01 23:30:56 node2 corosync-qdevice[781]:   Node list:
> May 01 23:30:56 node2 corosync-qdevice[781]: 0 node_id = 2,
> data_center_id = 0, node_state = not set
> May 01 23:30:56 node2 corosync-qdevice[781]: Sending quorum node list
> seq = 101, quorate = 0
> May 01 23:30:56 node2 corosync-qdevice[781]:   Node list:
> May 01 23:30:56 node2 corosync-qdevice[781]: 0 node_id = 1,
> data_center_id = 0, node_state = dead
> May 01 23:30:56 node2 corosync-qdevice[781]: 1 node_id = 2,
> data_center_id = 0, node_state = member
> May 01 23:30:56 node2 corosync-qdevice[781]: Cast vote timer is now
> stopped.
> May 01 23:30:56 node2 corosync-qdevice[781]: Received set option
> reply seq(1) = 98, HB(0) = 0ms, KAP Tie-breaker(1) = Enabled
> May 01 23:30:56 node2 corosync-qdevice[781]: Received initial config
> node list reply
> May 01 23:30:56 node2 corosync-qdevice[781]:   seq = 99
> May 01 23:30:56 node2 corosync-qdevice[781]:   vote = No change
> May 01 23:30:56 node2 corosync-qdevice[781]:   ring id = (2.801)
> May 01 23:30:56 node2 corosync-qdevice[781]: Algorithm result vote is
> No change
> May 01 23:30:56 node2 corosync-qdevice[781]: Received membership node
> list reply
> May 01 23:30:56 node2 corosync-qdevice[781]:   seq = 100
> May 01 23:30:56 node2 corosync-qdevice[781]:   vote = ACK
> May 01 23:30:56 node2 corosync-qdevice[781]:   ring id = (2.801)
> May 01 23:30:56 node2 corosync-qdevice[781]: Algorithm result vote is
> ACK
> May 01 23:30:56 node2 corosync-qdevice[781]: Cast vote timer is now
> scheduled every 250ms voting ACK.
> May 01 23:30:56 node2 corosync-qdevice[781]: Received quorum node
> list reply
> May 01 23:30:56 node2 corosync-qdevice[781]:   seq = 101
> May 01 23:30:56 node2 corosync-qdevice[781]:   vote = ACK
> May 01 23:30:56 node2 corosync-qdevice[781]:   ring id = (2.801)
> May 01 23:30:56 node2 corosync-qdevice[781]: Algorithm result vote is
> ACK
> May 01 23:30:56 node2 corosync-qdevice[781]: Cast vote timer remains
> scheduled every 250ms voting ACK.
> May 01 23:30:56 node2 corosync-qdevice[781]: Votequorum quorum notify
> callback:
> May 01 23:30:56 node2 corosync-qdevice[781]:   Quorate = 1
> May 01 23:30:56 node2 corosync-qdevice[781]:   Node list (size = 3):
> May 01 23:30:56 node2 corosync-qdevice[781]: 0 nodeid = 1, state
> = 2
> May 01 23:30:56 node2 corosync-qdevice[781]: 1 nodeid = 2, state
> = 1
> May 01 23:30:56 node2 corosync-qdevice[781]: 2 nodeid = 0, state
> = 0
> May 01 23:30:56 node2 corosync-qdevice[781]: algo-lms: quorum_notify.
> quorate = 1
> May 01 23:30:56 node2 corosync-qdevice[781]: Algorithm decided to
> send list and result vote is No change
> May 01 23:30:56 node2 corosync-qdevice[781]: Sending quorum node list
> seq = 102, quorate = 1
> May 01 23:30:56 node2 corosync-qdevice[781]:   Node list:
> May 01 23:30:56 node2 corosync-qdevice[781]: 0 node_id = 1,
> data_center_id = 0, node_state = dead
> May 01 23:30:56 node2 corosync-qdevice[781]: 1 node_id = 2,
> data_center_id = 0, node_state = member
> May 01 23:30:56 node2 corosync-qdevice[781]: Received quorum node
> list reply
> May 01 23:30:56 node2 corosync-qdevice[781]:   seq = 102
> May 01 23:30:56 node2 corosync-qdevice[781]:   vote = ACK
> May 01 23:30:56 node2 corosyn

Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix

2024-04-18 Thread Ken Gaillot

What OS are you using? Does it use systemd?

What does happen when you kill Corosync?

On Thu, 2024-04-18 at 13:13 +, NOLIBOS Christophe via Users wrote:
> Classified as: {OPEN}
> 
> Dear All,
>  
> I have a question about the "pacemakerd: recover properly from
> Corosync crash" fix implemented in version 2.1.2.
> I have observed the issue when testing pacemaker version 2.0.5, just
> by killing the ‘corosync’ process: Corosync was not recovered.
>  
> I am using now pacemaker version 2.1.5-8.
> Doing the same test, I have the same result: Corosync is still not
> recovered.
>  
> Please confirm the "pacemakerd: recover properly from Corosync crash"
> fix implemented in version 2.1.2 covers this scenario.
> If it is, did I miss something in the configuration of my cluster?
>  
> Best Regard.
>  
> Christophe.
>  
> 
> Christophe NolibosDL-FEP Component ManagerTHALES Land & Air
> Systems105, avenue du Général Eisenhower, 31100 Toulouse, FRANCETél.
> : +33 (0)5 61 19 79 09Mobile : +33 (0)6 31 22 20 58
> Email : christophe.noli...@thalesgroup.com
>  
>  
> 
> {OPEN}
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Likely deprecation: ocf:pacemaker:o2cb resource agent

2024-04-17 Thread Ken Gaillot

Hi all,

I just discovered today that the OCFS2 file system hasn't needed
ocf_controld.pcmk in nearly a decade. I can't recall ever running
across anyone using the ocf:pacemaker:o2cb agent that manages that
daemon in a cluster.

Unless anyone has a good reason to the contrary, we'll deprecate the
agent for the Pacemaker 2.1.8 release and drop it for 3.0.0.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Potential deprecation: Node-attribute-based rules in operation meta-attributes

2024-04-02 Thread Ken Gaillot

Hi all,

I have recently been cleaning up Pacemaker's rule code, and came across
an inconsistency.

Currently, meta-attributes may have rules with date/time-based
expressions (the  element). Node attribute
expressions (the  element) are not allowed, with the
exception of operation meta-attributes (beneath an  or
 element).

I'd like to deprecate support for node attribute expressions for
operation meta-attributes in Pacemaker 2.1.8, and drop support in
3.0.0.

I don't think it makes sense to vary meta-attributes by node. For
example, if a clone monitor has on-fail="block" (to cease all actions
on instances everywhere) on one node and on-fail="stop" (to stop all
instances everywhere) on another node, what should the cluster do if
monitors fail on both nodes? It seems to me that it's more likely to be
confusing than helpful.

If anyone has a valid use case for node attribute expressions for
operation meta-attributes, now is the time to speak up!
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Potential deprecation: Disabling schema validation for the CIB

2024-04-02 Thread Ken Gaillot

Hi all,

Pacemaker uses an XML schema to prevent invalid syntax from being added
to the CIB. The CIB's "validate-with" option is typically set to a
version of this schema (like "pacemaker-3.9").

It is possible to explicitly disable schema validation by setting
validate-with to "none". This is clearly a bad idea since it allows
invalid syntax to be added, which will at best be ignored and at worst
cause undesired or buggy behavior.

I'm thinking of deprecating the ability to use "none" in Pacemaker
2.1.8 and dropping support in 3.0.0. If anyone has a valid use case for
this feature, now is the time to speak up!
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] resources cluster stoped with one node

2024-03-20 Thread Ken Gaillot

On Wed, 2024-03-20 at 23:29 +0100, mierdatutis mi wrote:
> HI,
> I've configured a cluster of two nodes.
> When I start one node only I see that the resources won't start.

Hi,

In a two-node cluster, it is not safe to start resources until the
nodes have seen each other once. Otherwise, there's no way to know
whether the other node is unreachable because it is safely down or
because communication has been interrupted (meaning it could still be
running resources).

Corosync's two_node setting automatically takes care of that by also
enabling wait_for_all. If you are certain that the other node is down,
you can disable wait_for_all in the Corosync configuration, start the
node, then re-enable wait_for_all.


> 
> [root@nodo1 ~]# pcs status --full
> Cluster name: mycluster
> Stack: corosync
> Current DC: nodo1 (1) (version 1.1.23-1.el7-9acf116022) - partition
> WITHOUT quorum
> Last updated: Wed Mar 20 23:28:45 2024
> Last change: Wed Mar 20 19:33:09 2024 by root via cibadmin on nodo1
> 
> 2 nodes configured
> 3 resource instances configured
> 
> Online: [ nodo1 (1) ]
> OFFLINE: [ nodo2 (2) ]
> 
> Full list of resources:
> 
>  Virtual_IP (ocf::heartbeat:IPaddr2):   Stopped
>  Resource Group: HA-LVM
>  My_VG  (ocf::heartbeat:LVM-activate):  Stopped
>  My_FS  (ocf::heartbeat:Filesystem):Stopped
> 
> Node Attributes:
> * Node nodo1 (1):
> 
> Migration Summary:
> * Node nodo1 (1):
> 
> Fencing History:
> 
> PCSD Status:
>   nodo1: Online
>   nodo2: Offline
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> Do you know what these behaviors are?
> Thanks
> _______
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Can I add a pacemaker v2 node to a v2 cluster ?

2024-03-04 Thread Ken Gaillot

On Fri, 2024-03-01 at 15:41 +, Morgan Cox wrote:
> Hi - i have a fair few rhel7 pacemaker clusters (running on pacemaker
> v1)   I want to migrate to rhel8 (which uses pacemaker v2) - due to
> using shared/cluster ip it would be a pain  and involve downtime to
> take down cluster ip on rhel7 cluster then setup on rhel8.
> 
> Can I add a pacemaker v2 node to a pacemaker v1 one  (in order to
> migrate easily..)

No, the Corosync versions are incompatible. The closest you could do
would be to set up booth between the Pacemaker 1 and 2 clusters, then
grant the ticket to the new cluster. That's effectively the same as
manually disabling the IP on the old cluster and enabling it on the new
one, but a little more automated (which is more helpful the more
resources you have to move).

> 
> i.e i have these versions 
> 
> rhel7 : pacemaker-1.1.23-1.el7_9.1.x86_64
> rhel8: pacemaker-2.1.5-8.el8.x86_64
> 
> For the purposes of migrating to rhel8 - could i add a rhel8
> (pacemaker v2) node to an existing rhel7 (pacemaker v1.x) cluster 
> 
> If it matters here is the 'supporting' versions of both (using #
> pacemakerd --features )
> 
> rhel7: Supporting v3.0.14:  generated-manpages agent-manpages ncurses
> libqb-logging libqb-ipc systemd nagios  corosync-native atomic-attrd
> acls
> 
> rhel8 :  Supporting v3.16.2: agent-manpages cibsecrets compat-2.0
> corosync-ge-2 default-concurrent-fencing default-sbd-sync generated-
> manpages monotonic nagios ncurses remote systemd
> 
> If this is not possible I will have to think of another solution.
> 
> Thanks 
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Is it possible to downgrade feature-set in 2.1.6-8

2024-02-26 Thread Ken Gaillot

On Thu, 2024-02-22 at 08:05 -0500, vitaly wrote:
> Hello. 
> We have a product with 2 node clusters.
> Our current version is using Pacemaker 2.1.4 the new version will be
> using Pacemaker 2.1.6
> During upgrade failure it is possible that one node will come up with
> the new Pacemaker and work alone for a while.
> Then old node would later come up and try to join the cluster.
> This would fail due to the different feature-sets of the cluster
> nodes. The older feature-set would not be able to join the newer
> feature-set.
>  
> Question: 
> Is is possible to force new node with Pacemaker 2.1.6 to use older
> feature-set (3.15.0) for a while until second node is upgraded and is
> able to work with Pacemaker 2.1.6?

No

>  
> Thank you very much!
> _Vitaly
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] clone_op_key pcmk__notify_key - Triggered fatal assertion

2024-02-19 Thread Ken Gaillot

On Sat, 2024-02-17 at 13:39 +0100, lejeczek via Users wrote:
> Hi guys.
> 
> Everything seems to be working a ok yet pacemakers logs
> ...
>  error: clone_op_key: Triggered fatal assertion at
> pcmk_graph_producer.c:207 : (n_type != NULL) && (n_task != NULL)
>  error: pcmk__notify_key: Triggered fatal assertion at actions.c:187
> : op_type != NULL
>  error: clone_op_key: Triggered fatal assertion at
> pcmk_graph_producer.c:207 : (n_type != NULL) && (n_task != NULL)
>  error: pcmk__notify_key: Triggered fatal assertion at actions.c:187
> : op_type != NULL
> ...
>  error: pcmk__create_history_xml: Triggered fatal assertion at
> pcmk_sched_actions.c:1163 : n_type != NULL
>  error: pcmk__create_history_xml: Triggered fatal assertion at
> pcmk_sched_actions.c:1164 : n_task != NULL
>  error: pcmk__notify_key: Triggered fatal assertion at actions.c:187
> : op_type != NULL
> ...
> 
> Looks critical, is it it - would you know?
> many thanks, L.
> 

That's odd. This suggests the scheduler created a notify action without
adding all the necessary information, which would be a bug. Do you have
the scheduler input that causes these messages? 

Also, what version are you using, and how did you get it?
--
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] pacemaker resource configure issue

2024-02-08 Thread Ken Gaillot

On Thu, 2024-02-08 at 10:12 +0800, hywang via Users wrote:
> hello, everyone,
>  I want to make a node fenced or the cluster stopped after a
> resource start failed 3 times, how to make the resource configure to
> achive it?
> Thanks!
> 

The current design doesn't allow it. You can set start-failure-is-fatal 
to false to let the cluster reattempt the start and migration-threshold 
to 3 to have it try to start on a different node after three failures,
or you can set on-fail to fence to have it fence the node if the
(first) start fails, but you can't combine those approaches.

It's a longstanding goal to allow more flexibility in failure handling,
but there hasn't been time to deal with it.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] how to disable pacemaker throttle mode

2024-02-05 Thread Ken Gaillot

On Mon, 2024-02-05 at 21:30 +0100, Vladislav Bogdanov wrote:
> IIRC, there is one issue with that, is that IO load is considered a
> CPU load, so on busy storage servers you get throttling with almost
> free CPU. I may be wrong that load is calculated from loadavg, which 

Yep, it checks the 1-minute average from /proc/loadavg (it also checks
the CIB manager separately using utime/stime from /proc/PID/stat)

> is a different story at all, as it indicates the number of processes
> which are ready to consume the CPU time, including those waiting for
> IOs to complete, but that is what my mind recalls.
> 
> I easily get loadavg of 128 on iscsi storage servers with almost free
> CPU, no thermal reaction at all.
> 
> Best,
> Vlad
> 
> On February 5, 2024 19:22:11 Ken Gaillot  wrote:
> 
> > On Mon, 2024-02-05 at 18:08 +0800, hywang via Users wrote:
> > > hello, everyone:
> > > Is there any way to disable pacemaker throttle mode. If there is,
> > > where to find it?
> > > Thanks!
> > > 
> > > 
> > 
> > You can influence it via the load-threshold and node-action-limit
> > cluster options.
> > 
> > The cluster throttles when CPU usage approaches load-threshold
> > (defaulting to 80%), and limits the number of simultaneous actions
> > on a
> > node to node-action-limit (defaulting to twice the number of
> > cores).
> > 
> > The node action limit can be overridden per node by setting the
> > PCMK_node_action_limit environment variable (typically in
> > /etc/sysconfig/pacemaker, /etc/default/pacemaker, etc. depending on
> > distro).
> > -- 
> > Ken Gaillot 
> > 
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> > 
> 
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] how to disable pacemaker throttle mode

2024-02-05 Thread Ken Gaillot

On Mon, 2024-02-05 at 18:08 +0800, hywang via Users wrote:
> hello, everyone:
> Is there any way to disable pacemaker throttle mode. If there is,
> where to find it?
> Thanks!
> 

You can influence it via the load-threshold and node-action-limit
cluster options.

The cluster throttles when CPU usage approaches load-threshold
(defaulting to 80%), and limits the number of simultaneous actions on a
node to node-action-limit (defaulting to twice the number of cores).

The node action limit can be overridden per node by setting the
PCMK_node_action_limit environment variable (typically in
/etc/sysconfig/pacemaker, /etc/default/pacemaker, etc. depending on
distro).
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Gracefully Failing Live Migrations

2024-02-01 Thread Ken Gaillot

On Thu, 2024-02-01 at 12:57 -0600, Billy Croan wrote:
> How do I figure out which of the three steps failed and why?

They're normal resource actions: migrate_to, migrate_from, and stop.
You can investigate them in the usual way (status, logs).

> 
> On Thu, Feb 1, 2024 at 11:15 AM Ken Gaillot 
> wrote:
> > On Thu, 2024-02-01 at 10:20 -0600, Billy Croan wrote:
> > > Sometimes I've tried to move a resource from one node to another,
> > and
> > > it migrates live without a problem.  Other times I get 
> > > > Failed Resource Actions:
> > > > * vm_myvm_migrate_to_0 on node1 'unknown error' (1): call=102,
> > > > status=complete, exitreason='myvm: live migration to node2
> > failed:
> > > > 1',
> > > > last-rc-change='Sat Jan 13 09:13:31 2024', queued=1ms,
> > > > exec=35874ms
> > > > 
> > > 
> > > And I find out the live part of the migration failed, when the vm
> > > reboots and an (albeit minor) outage occurs.
> > > 
> > > Is there a way to configure pacemaker, so that if it is unable to
> > > migrate live it simply does not migrate at all?
> > > 
> > 
> > No. Pacemaker automatically replaces a required stop/start sequence
> > with live migration when possible. If there is a live migration
> > attempted, by definition the resource must move one way or another.
> > Also, live migration involves three steps, and if one of them
> > fails,
> > the resource is in an unknown state, so it must be restarted
> > anyway.
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Gracefully Failing Live Migrations

2024-02-01 Thread Ken Gaillot

On Thu, 2024-02-01 at 10:20 -0600, Billy Croan wrote:
> Sometimes I've tried to move a resource from one node to another, and
> it migrates live without a problem.  Other times I get 
> > Failed Resource Actions:
> > * vm_myvm_migrate_to_0 on node1 'unknown error' (1): call=102,
> > status=complete, exitreason='myvm: live migration to node2 failed:
> > 1',
> > last-rc-change='Sat Jan 13 09:13:31 2024', queued=1ms,
> > exec=35874ms
> > 
> 
> And I find out the live part of the migration failed, when the vm
> reboots and an (albeit minor) outage occurs.
> 
> Is there a way to configure pacemaker, so that if it is unable to
> migrate live it simply does not migrate at all?
> 

No. Pacemaker automatically replaces a required stop/start sequence
with live migration when possible. If there is a live migration
attempted, by definition the resource must move one way or another.
Also, live migration involves three steps, and if one of them fails,
the resource is in an unknown state, so it must be restarted anyway.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] trigger something at ?

2024-02-01 Thread Ken Gaillot

On Thu, 2024-02-01 at 14:31 +0100, lejeczek via Users wrote:
> 
> On 31/01/2024 18:11, Ken Gaillot wrote:
> > On Wed, 2024-01-31 at 16:37 +0100, lejeczek via Users wrote:
> > > On 31/01/2024 16:06, Jehan-Guillaume de Rorthais wrote:
> > > > On Wed, 31 Jan 2024 16:02:12 +0100
> > > > lejeczek via Users  wrote:
> > > > 
> > > > > On 29/01/2024 17:22, Ken Gaillot wrote:
> > > > > > On Fri, 2024-01-26 at 13:55 +0100, lejeczek via Users
> > > > > > wrote:
> > > > > > > Hi guys.
> > > > > > > 
> > > > > > > Is it possible to trigger some... action - I'm thinking
> > > > > > > specifically
> > > > > > > at shutdown/start.
> > > > > > > If not within the cluster then - if you do that - perhaps
> > > > > > > outside.
> > > > > > > I would like to create/remove constraints, when cluster
> > > > > > > starts &
> > > > > > > stops, respectively.
> > > > > > > 
> > > > > > > many thanks, L.
> > > > > > > 
> > > > > > You could use node status alerts for that, but it's risky
> > > > > > for
> > > > > > alert
> > > > > > agents to change the configuration (since that may result
> > > > > > in
> > > > > > more
> > > > > > alerts and potentially some sort of infinite loop).
> > > > > > 
> > > > > > Pacemaker has no concept of a full cluster start/stop, only
> > > > > > node
> > > > > > start/stop. You could approximate that by checking whether
> > > > > > the
> > > > > > node
> > > > > > receiving the alert is the only active node.
> > > > > > 
> > > > > > Another possibility would be to write a resource agent that
> > > > > > does what
> > > > > > you want and order everything else after it. However it's
> > > > > > even
> > > > > > more
> > > > > > risky for a resource agent to modify the configuration.
> > > > > > 
> > > > > > Finally you could write a systemd unit to do what you want
> > > > > > and
> > > > > > order it
> > > > > > after pacemaker.
> > > > > > 
> > > > > > What's wrong with leaving the constraints permanently
> > > > > > configured?
> > > > > yes, that would be for a node start/stop
> > > > > I struggle with using constraints to move pgsql (PAF) master
> > > > > onto a given node - seems that co/locating paf's master
> > > > > results in troubles (replication brakes) at/after node
> > > > > shutdown/reboot (not always, but way too often)
> > > > What? What's wrong with colocating PAF's masters exactly? How
> > > > does
> > > > it brake any
> > > > replication? What's these constraints you are dealing with?
> > > > 
> > > > Could you share your configuration?
> > > Constraints beyond/above of what is required by PAF agent
> > > itself, say...
> > > you have multiple pgSQL cluster with PAF - thus multiple
> > > (separate, for each pgSQL cluster) masters and you want to
> > > spread/balance those across HA cluster
> > > (or in other words - avoid having more that 1 pgsql master
> > > per HA node)
> > > These below, I've tried, those move the master onto chosen
> > > node but.. then the issues I mentioned.
> > > 
> > > -> $ pcs constraint location PGSQL-PAF-5438-clone prefers
> > > ubusrv1=1002
> > > or
> > > -> $ pcs constraint colocation set PGSQL-PAF-5435-clone
> > > PGSQL-PAF-5434-clone PGSQL-PAF-5433-clone role=Master
> > > require-all=false setoptions score=-1000
> > > 
> > Anti-colocation sets tend to be tricky currently -- if the first
> > resource can't be assigned to a node, none of them can. We have an
> > idea
> > for a better implementation:
> > 
> >   https://projects.clusterlabs.org/T383
> > 
> > In the meantime, a possible workaround is to use placement-
> > strategy=balanced and define utilization for the clones only. The
> > promoted roles will each get a slight additional utilization, and

Re: [ClusterLabs] trigger something at ?

2024-01-31 Thread Ken Gaillot

On Wed, 2024-01-31 at 16:37 +0100, lejeczek via Users wrote:
> 
> On 31/01/2024 16:06, Jehan-Guillaume de Rorthais wrote:
> > On Wed, 31 Jan 2024 16:02:12 +0100
> > lejeczek via Users  wrote:
> > 
> > > On 29/01/2024 17:22, Ken Gaillot wrote:
> > > > On Fri, 2024-01-26 at 13:55 +0100, lejeczek via Users wrote:
> > > > > Hi guys.
> > > > > 
> > > > > Is it possible to trigger some... action - I'm thinking
> > > > > specifically
> > > > > at shutdown/start.
> > > > > If not within the cluster then - if you do that - perhaps
> > > > > outside.
> > > > > I would like to create/remove constraints, when cluster
> > > > > starts &
> > > > > stops, respectively.
> > > > > 
> > > > > many thanks, L.
> > > > > 
> > > > You could use node status alerts for that, but it's risky for
> > > > alert
> > > > agents to change the configuration (since that may result in
> > > > more
> > > > alerts and potentially some sort of infinite loop).
> > > > 
> > > > Pacemaker has no concept of a full cluster start/stop, only
> > > > node
> > > > start/stop. You could approximate that by checking whether the
> > > > node
> > > > receiving the alert is the only active node.
> > > > 
> > > > Another possibility would be to write a resource agent that
> > > > does what
> > > > you want and order everything else after it. However it's even
> > > > more
> > > > risky for a resource agent to modify the configuration.
> > > > 
> > > > Finally you could write a systemd unit to do what you want and
> > > > order it
> > > > after pacemaker.
> > > > 
> > > > What's wrong with leaving the constraints permanently
> > > > configured?
> > > yes, that would be for a node start/stop
> > > I struggle with using constraints to move pgsql (PAF) master
> > > onto a given node - seems that co/locating paf's master
> > > results in troubles (replication brakes) at/after node
> > > shutdown/reboot (not always, but way too often)
> > What? What's wrong with colocating PAF's masters exactly? How does
> > it brake any
> > replication? What's these constraints you are dealing with?
> > 
> > Could you share your configuration?
> Constraints beyond/above of what is required by PAF agent 
> itself, say...
> you have multiple pgSQL cluster with PAF - thus multiple 
> (separate, for each pgSQL cluster) masters and you want to 
> spread/balance those across HA cluster
> (or in other words - avoid having more that 1 pgsql master 
> per HA node)
> These below, I've tried, those move the master onto chosen 
> node but.. then the issues I mentioned.
> 
> -> $ pcs constraint location PGSQL-PAF-5438-clone prefers 
> ubusrv1=1002
> or
> -> $ pcs constraint colocation set PGSQL-PAF-5435-clone 
> PGSQL-PAF-5434-clone PGSQL-PAF-5433-clone role=Master 
> require-all=false setoptions score=-1000
> 

Anti-colocation sets tend to be tricky currently -- if the first
resource can't be assigned to a node, none of them can. We have an idea
for a better implementation:

 https://projects.clusterlabs.org/T383

In the meantime, a possible workaround is to use placement-
strategy=balanced and define utilization for the clones only. The
promoted roles will each get a slight additional utilization, and the
cluster should spread them out across nodes whenever possible. I don't
know if that will avoid the replication issues but it may be worth a
try.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] controlling cluster behavior on startup

2024-01-30 Thread Ken Gaillot

On Tue, 2024-01-30 at 13:20 +, Walker, Chris wrote:
> >>> However, now it seems to wait that amount of time before it
> elects a
> >>> DC, even when quorum is acquired earlier.  In my log snippet
> below,
> >>> with dc-deadtime 300s,
> >>
> >> The dc-deadtime is not waiting for quorum, but for another DC to
> show
> >> up. If all nodes show up, it can proceed, but otherwise it has to
> wait.
> 
> > I believe all the nodes showed up by 14:17:04, but it still waited
> until 14:19:26 to elect a DC:
> 
> > Jan 29 14:14:25 gopher12 pacemaker-controld  [123697]
> (peer_update_callback)info: Cluster node gopher12 is now membe 
> (was in unknown state)
> > Jan 29 14:17:04 gopher12 pacemaker-controld  [123697]
> (peer_update_callback)info: Cluster node gopher11 is now membe 
> (was in unknown state)
> > Jan 29 14:17:04 gopher12 pacemaker-controld  [123697]
> (quorum_notification_cb)  notice: Quorum acquired | membership=54
> members=2
> > Jan 29 14:19:26 gopher12 pacemaker-controld  [123697] (do_log) 
> info: Input I_ELECTION_DC received in state S_ELECTION from
> election_win_cb
> 
> > This is a cluster with 2 nodes, gopher11 and gopher12.
> 
> This is our experience with dc-deadtime too: even if both nodes in
> the cluster show up, dc-deadtime must elapse before the cluster
> starts.  This was discussed on this list a while back (
> https://www.mail-archive.com/users@clusterlabs.org/msg03897.html) and
> an RFE came out of it (
> https://bugs.clusterlabs.org/show_bug.cgi?id=5310). 

Ah, I misremembered, I thought we had done that :(

>  
> I’ve worked around this by having an ExecStartPre directive for
> Corosync that does essentially:
>  
> while ! systemctl -H ${peer} is-active corosync; do sleep 5; done
>  
> With this in place, the nodes wait for each other before starting
> Corosync and Pacemaker.  We can then use the default 20s dc-deadtime
> so that the DC election happens quickly once both nodes are up.

That makes sense

> Thanks,
> Chris
>  
> From: Users  on behalf of Faaland,
> Olaf P. via Users 
> Date: Monday, January 29, 2024 at 7:46 PM
> To: Ken Gaillot , Cluster Labs - All topics
> related to open-source clustering welcomed 
> Cc: Faaland, Olaf P. 
> Subject: Re: [ClusterLabs] controlling cluster behavior on startup
> 
> >> However, now it seems to wait that amount of time before it elects
> a
> >> DC, even when quorum is acquired earlier.  In my log snippet
> below,
> >> with dc-deadtime 300s,
> >
> > The dc-deadtime is not waiting for quorum, but for another DC to
> show
> > up. If all nodes show up, it can proceed, but otherwise it has to
> wait.
> 
> I believe all the nodes showed up by 14:17:04, but it still waited
> until 14:19:26 to elect a DC:
> 
> Jan 29 14:14:25 gopher12 pacemaker-controld  [123697]
> (peer_update_callback)info: Cluster node gopher12 is now membe 
> (was in unknown state)
> Jan 29 14:17:04 gopher12 pacemaker-controld  [123697]
> (peer_update_callback)info: Cluster node gopher11 is now membe 
> (was in unknown state)
> Jan 29 14:17:04 gopher12 pacemaker-controld  [123697]
> (quorum_notification_cb)  notice: Quorum acquired | membership=54
> members=2
> Jan 29 14:19:26 gopher12 pacemaker-controld  [123697] (do_log)  info:
> Input I_ELECTION_DC received in state S_ELECTION from election_win_cb
> 
> This is a cluster with 2 nodes, gopher11 and gopher12.
> 
> Am I misreading that?
> 
> thanks,
> Olaf
> 
> 
> From: Ken Gaillot 
> Sent: Monday, January 29, 2024 3:49 PM
> To: Faaland, Olaf P.; Cluster Labs - All topics related to open-
> source clustering welcomed
> Subject: Re: [ClusterLabs] controlling cluster behavior on startup
> 
> On Mon, 2024-01-29 at 22:48 +, Faaland, Olaf P. wrote:
> > Thank you, Ken.
> >
> > I changed my configuration management system to put an initial
> > cib.xml into /var/lib/pacemaker/cib/, which sets all the property
> > values I was setting via pcs commands, including dc-deadtime.  I
> > removed those "pcs property set" commands from the ones that are
> run
> > at startup time.
> >
> > That worked in the sense that after Pacemaker start, the node waits
> > my newly specified dc-deadtime of 300s before giving up on the
> > partner node and fencing it, if the partner never appears as a
> > member.
> >
> > However, now it seems to wait that amount of time before it elects
> a
> > DC, even when quorum is acquired earlier.  In my log snippet below,
> > with dc-deadtime 300s,
> 
> The dc-deadtim

Re: [ClusterLabs] controlling cluster behavior on startup

2024-01-29 Thread Ken Gaillot

On Mon, 2024-01-29 at 14:35 -0800, Reid Wahl wrote:
> 
> 
> On Monday, January 29, 2024, Ken Gaillot  wrote:
> > On Mon, 2024-01-29 at 18:05 +, Faaland, Olaf P. via Users
> wrote:
> >> Hi,
> >>
> >> I have configured clusters of node pairs, so each cluster has 2
> >> nodes.  The cluster members are statically defined in
> corosync.conf
> >> before corosync or pacemaker is started, and quorum {two_node: 1}
> is
> >> set.
> >>
> >> When both nodes are powered off and I power them on, they do not
> >> start pacemaker at exactly the same time.  The time difference may
> be
> >> a few minutes depending on other factors outside the nodes.
> >>
> >> My goals are (I call the first node to start pacemaker "node1"):
> >> 1) I want to control how long pacemaker on node1 waits before
> fencing
> >> node2 if node2 does not start pacemaker.
> >> 2) If node1 is part-way through that waiting period, and node2
> starts
> >> pacemaker so they detect each other, I would like them to proceed
> >> immediately to probing resource state and starting resources which
> >> are down, not wait until the end of that "grace period".
> >>
> >> It looks from the documentation like dc-deadtime is how #1 is
> >> controlled, and #2 is expected normal behavior.  However, I'm
> seeing
> >> fence actions before dc-deadtime has passed.
> >>
> >> Am I misunderstanding Pacemaker's expected behavior and/or how dc-
> >> deadtime should be used?
> >
> > You have everything right. The problem is that you're starting with
> an
> > empty configuration every time, so the default dc-deadtime is being
> > used for the first election (before you can set the desired value).
> 
> Why would there be fence actions before dc-deadtime expires though?

There isn't -- after the (default) dc-deadtime pops, the node elects
itself DC and runs the scheduler, which considers the other node unseen
and in need of startup fencing. The dc-deadtime has been raised in the
meantime, but that no longer matters.

> 
> >
> > I can't think of anything you can do to get around that, since the
> > controller starts the timer as soon as it starts up. Would it be
> > possible to bake an initial configuration into the PXE image?
> >
> > When the timer value changes, we could stop the existing timer and
> > restart it. There's a risk that some external automation could make
> > repeated changes to the timeout, thus never letting it expire, but
> that
> > seems preferable to your problem. I've created an issue for that:
> >
> >   https://projects.clusterlabs.org/T764
> >
> > BTW there's also election-timeout. I'm not sure offhand how that
> > interacts; it might be necessary to raise that one as well.
> >
> >>
> >> One possibly unusual aspect of this cluster is that these two
> nodes
> >> are stateless - they PXE boot from an image on another server -
> and I
> >> build the cluster configuration at boot time with a series of pcs
> >> commands, because the nodes have no local storage for this
> >> purpose.  The commands are:
> >>
> >> ['pcs', 'cluster', 'start']
> >> ['pcs', 'property', 'set', 'stonith-action=off']
> >> ['pcs', 'property', 'set', 'cluster-recheck-interval=60']
> >> ['pcs', 'property', 'set', 'start-failure-is-fatal=false']
> >> ['pcs', 'property', 'set', 'dc-deadtime=300']
> >> ['pcs', 'stonith', 'create', 'fence_gopher11', 'fence_powerman',
> >> 'ip=192.168.64.65', 'pcmk_host_check=static-list',
> >> 'pcmk_host_list=gopher11,gopher12']
> >> ['pcs', 'stonith', 'create', 'fence_gopher12', 'fence_powerman',
> >> 'ip=192.168.64.65', 'pcmk_host_check=static-list',
> >> 'pcmk_host_list=gopher11,gopher12']
> >> ['pcs', 'resource', 'create', 'gopher11_zpool', 'ocf:llnl:zpool',
> >> 'import_options="-f -N -d /dev/disk/by-vdev"', 'pool=gopher11',
> 'op',
> >> 'start', 'timeout=805']
> >> ...
> >> ['pcs', 'property', 'set', 'no-quo

Re: [ClusterLabs] controlling cluster behavior on startup

2024-01-29 Thread Ken Gaillot

On Mon, 2024-01-29 at 22:48 +, Faaland, Olaf P. wrote:
> Thank you, Ken.
> 
> I changed my configuration management system to put an initial
> cib.xml into /var/lib/pacemaker/cib/, which sets all the property
> values I was setting via pcs commands, including dc-deadtime.  I
> removed those "pcs property set" commands from the ones that are run
> at startup time.
> 
> That worked in the sense that after Pacemaker start, the node waits
> my newly specified dc-deadtime of 300s before giving up on the
> partner node and fencing it, if the partner never appears as a
> member.
> 
> However, now it seems to wait that amount of time before it elects a
> DC, even when quorum is acquired earlier.  In my log snippet below,
> with dc-deadtime 300s,

The dc-deadtime is not waiting for quorum, but for another DC to show
up. If all nodes show up, it can proceed, but otherwise it has to wait.

> 
> 14:14:24 Pacemaker starts on gopher12
> 14:17:04 quorum is acquired
> 14:19:26 Election Trigger just popped (start time + dc-deadtime
> seconds)
> 14:19:26 gopher12 wins the election
> 
> Is there other configuration that needs to be present in the cib at
> startup time?
> 
> thanks,
> Olaf
> 
> === log extract using new system of installing partial cib.xml before
> startup
> Jan 29 14:14:24 gopher12 pacemakerd  [123690]
> (main)notice: Starting Pacemaker 2.1.7-1.t4 | build=2.1.7
> features:agent-manpages ascii-docs compat-2.0 corosync-ge-2 default-
> concurrent-fencing generated-manpages monotonic nagios ncurses remote
> systemd
> Jan 29 14:14:25 gopher12 pacemaker-attrd [123695]
> (attrd_start_election_if_needed)  info: Starting an election to
> determine the writer
> Jan 29 14:14:25 gopher12 pacemaker-attrd [123695]
> (election_check)  info: election-attrd won by local node
> Jan 29 14:14:25 gopher12 pacemaker-controld  [123697]
> (peer_update_callback)info: Cluster node gopher12 is now member
> (was in unknown state)
> Jan 29 14:17:04 gopher12 pacemaker-controld  [123697]
> (quorum_notification_cb)  notice: Quorum acquired | membership=54
> members=2
> Jan 29 14:19:26 gopher12 pacemaker-controld  [123697]
> (crm_timer_popped)info: Election Trigger just popped |
> input=I_DC_TIMEOUT time=30ms
> Jan 29 14:19:26 gopher12 pacemaker-controld  [123697]
> (do_log)  warning: Input I_DC_TIMEOUT received in state S_PENDING
> from crm_timer_popped
> Jan 29 14:19:26 gopher12 pacemaker-controld  [123697]
> (do_state_transition) info: State transition S_PENDING ->
> S_ELECTION | input=I_DC_TIMEOUT cause=C_TIMER_POPPED
> origin=crm_timer_popped
> Jan 29 14:19:26 gopher12 pacemaker-controld  [123697]
> (election_check)  info: election-DC won by local node
> Jan 29 14:19:26 gopher12 pacemaker-controld  [123697] (do_log)  info:
> Input I_ELECTION_DC received in state S_ELECTION from election_win_cb
> Jan 29 14:19:26 gopher12 pacemaker-controld  [123697]
> (do_state_transition) notice: State transition S_ELECTION ->
> S_INTEGRATION | input=I_ELECTION_DC cause=C_FSA_INTERNAL
> origin=election_win_cb
> Jan 29 14:19:26 gopher12 pacemaker-schedulerd[123696]
> (recurring_op_for_active) info: Start 10s-interval monitor
> for gopher11_zpool on gopher11
> Jan 29 14:19:26 gopher12 pacemaker-schedulerd[123696]
> (recurring_op_for_active) info: Start 10s-interval monitor
> for gopher12_zpool on gopher12
> 
> 
> === initial cib.xml contents
>  num_updates="0" admin_epoch="0" cib-last-written="Mon Jan 29 11:07:06
> 2024" update-origin="gopher12" update-client="root" update-
> user="root" have-quorum="0" dc-uuid="2">
>   
> 
>   
>  name="stonith-action" value="off"/>
> 
>     
>  name="cluster-infrastructure" value="corosync"/>
>  name="cluster-name" value="gopher11"/>
>  name="cluster-recheck-interval" value="60"/>
>  name="start-failure-is-fatal" value="false"/>
> 
>   
> 
> 
>   
>   
> 
> 
> 
>   
> 
> 
> 
> From: Ken Gaillot 
> Sent: Monday, January 29, 2024 10:51 AM
> To: Cluster Labs - All topics related to open-source clustering
> welcomed
> Cc: Faaland, Olaf P.
> Subject: Re: [ClusterLabs] controlling cluster behavior on startup
> 
> On Mon, 2024-01-29 at 18:05 +, Faaland, Olaf P. via Users wrote:
> > Hi,
> > 
> > I have configured clusters of node pairs, so each cluster has 2
> > nod

Re: [ClusterLabs] controlling cluster behavior on startup

2024-01-29 Thread Ken Gaillot

gent-manpages ascii-docs compat-2.0 corosync-ge-2 default-
> concurrent-fencing generated-manpages monotonic nagios ncurses remote
> systemd
> Jan 25 17:55:39 gopher12 pacemaker-controld  [116040]
> (peer_update_callback)info: Cluster node gopher12 is now member
> (was in unknown state)
> Jan 25 17:55:43 gopher12 pacemaker-based [116035]
> (cib_perform_op)  info: ++
> /cib/configuration/crm_config/cluster_property_set[@id='cib-
> bootstrap-options']:   name="dc-deadtime" value="300"/>
> Jan 25 17:56:00 gopher12 pacemaker-controld  [116040]
> (crm_timer_popped)info: Election Trigger just popped |
> input=I_DC_TIMEOUT time=30ms
> Jan 25 17:56:01 gopher12 pacemaker-based [116035]
> (cib_perform_op)  info: ++
> /cib/configuration/crm_config/cluster_property_set[@id='cib-
> bootstrap-options']:  
> Jan 25 17:56:01 gopher12 pacemaker-controld  [116040]
> (abort_transition_graph)  info: Transition 0 aborted by cib-
> bootstrap-options-no-quorum-policy doing create no-quorum-
> policy=ignore: Configuration change | cib=0.26.0
> source=te_update_diff_v2:464
> path=/cib/configuration/crm_config/cluster_property_set[@id='cib-
> bootstrap-options'] complete=true
> Jan 25 17:56:01 gopher12 pacemaker-controld  [116040]
> (controld_execute_fence_action)   notice: Requesting fencing (off)
> targeting node gopher11 | action=11 timeout=60
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] trigger something at ?

2024-01-29 Thread Ken Gaillot

On Fri, 2024-01-26 at 13:55 +0100, lejeczek via Users wrote:
> Hi guys.
> 
> Is it possible to trigger some... action - I'm thinking specifically
> at shutdown/start.
> If not within the cluster then - if you do that - perhaps outside.
> I would like to create/remove constraints, when cluster starts &
> stops, respectively.
> 
> many thanks, L.
> 

You could use node status alerts for that, but it's risky for alert
agents to change the configuration (since that may result in more
alerts and potentially some sort of infinite loop).

Pacemaker has no concept of a full cluster start/stop, only node
start/stop. You could approximate that by checking whether the node
receiving the alert is the only active node.

Another possibility would be to write a resource agent that does what
you want and order everything else after it. However it's even more
risky for a resource agent to modify the configuration.

Finally you could write a systemd unit to do what you want and order it
after pacemaker.

What's wrong with leaving the constraints permanently configured?
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Planning for Pacemaker 3

2024-01-25 Thread Ken Gaillot

On Thu, 2024-01-25 at 10:31 +0100, Jehan-Guillaume de Rorthais wrote:
> On Wed, 24 Jan 2024 16:47:54 -0600
> Ken Gaillot  wrote:
> ...
> > > Erm. Well, as this is a major upgrade where we can affect
> > > people's
> > > conf and
> > > break old things & so on, I'll jump in this discussion with a
> > > wishlist to
> > > discuss :)
> > >   
> > 
> > I made sure we're tracking all these (links below),
> 
> Thank you Ken, for creating these tasks. I subscribed to them, but it
> seems I
> can not discuss on them (or maybe I failed to find how to do it).

Hmm, that's bad news. :( I don't immediately see a way to allow
comments without making the issue fully editable. Hopefully we can find
some configuration magic ...

> 
> > but realistically we're going to have our hands full dropping all
> > the
> > deprecated stuff in the time we have.
> 
> Let me know how I can help on these subject. Also, I'm still silently
> sitting on
> IRC chan if needed.
>
> 
> > Most of these can be done in any version.
> 
> Four out of seven can be done in any version. For the three other
> left, in my
> humble opinion and needs from the PAF agent point of view:
> 
> 1. «Support failure handling of notify actions»
>https://projects.clusterlabs.org/T759
> 2. «Change allowed range of scores and value of +/-INFINITY»
>https://projects.clusterlabs.org/T756
> 3. «Default to sending clone notifications when agent supports it»
>https://projects.clusterlabs.org/T758
> 
> The first is the most important as it allows to implement an actual
> election
> before the promotion, breaking the current transition if promotion
> score doesn't
> reflect the reality since last monitor action. Current PAF's code
> makes a lot of
> convolution to have a decent election mechanism preventing the
> promotion of a
> lagging node.
> 
> The second one would help removing some useless complexity from some
> resource
> agent code (at least in PAF).
> 
> The third one is purely for confort and cohesion between actions
> setup.
>
> Have a good day!
> 
> Regards,
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Planning for Pacemaker 3

2024-01-24 Thread Ken Gaillot

On Tue, 2024-01-23 at 18:49 +0100, Jehan-Guillaume de Rorthais wrote:
> Hi there !
> 
> On Wed, 03 Jan 2024 11:06:27 -0600
> Ken Gaillot  wrote:
> 
> > Hi all,
> > 
> > I'd like to release Pacemaker 3.0.0 around the middle of this
> > year. 
> > I'm gathering proposed changes here:
> > 
> >  
> > https://projects.clusterlabs.org/w/projects/pacemaker/pacemaker_3.0_changes/
> > 
> > Please review for anything that might affect you, and reply here if
> > you
> > have any concerns.
> 
> Erm. Well, as this is a major upgrade where we can affect people's
> conf and
> break old things & so on, I'll jump in this discussion with a
> wishlist to
> discuss :)
> 

I made sure we're tracking all these (links below), but realistically
we're going to have our hands full dropping all the deprecated stuff in
the time we have. Most of these can be done in any version.

> 1. "recover", "migration-to" and "migration-from" actions support ?
> 
>   See discussion:
>   
> https://lists.clusterlabs.org/pipermail/developers/2020-February/002258.html

https://projects.clusterlabs.org/T317

https://projects.clusterlabs.org/T755

> 
> 2.1. INT64 promotion scores?

https://projects.clusterlabs.org/T756

> 2.2. discovering promotion score ahead of promotion?

https://projects.clusterlabs.org/T505

> 2.3. make OCF_RESKEY_CRM_meta_notify_* or equivalent officially
> available in all
>  actions 
> 
>   See discussion:
>   
> https://lists.clusterlabs.org/pipermail/developers/2020-February/002255.html
> 

https://projects.clusterlabs.org/T757


> 3.1. deprecate "notify=true" clone option, make it true by default

https://projects.clusterlabs.org/T758

> 3.2. react to notify action return code
> 
>   See discussion:
>   
> https://lists.clusterlabs.org/pipermail/developers/2020-February/002256.html
> 

https://projects.clusterlabs.org/T759

> Off course, I can volunteer to help on some topics.
> 
> Cheers!
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] New ClusterLabs wiki

2024-01-23 Thread Ken Gaillot

Hi all,

The ClusterLabs project manager is now publicly viewable, without
needing a GitHub account:

  https://projects.clusterlabs.org/

Anyone can now follow issues tracked there. (Issues created before the
site was public will still require an account unless someone updates
their settings.)

The site has a simple built-in wiki, so to reduce sysadmin overhead, we
have moved the ClusterLabs wiki there:

  https://projects.clusterlabs.org/w/

The old wiki.clusterlabs.org site is gone, and redirects to the new
one. A lot of the wiki pages were more than a decade old, so they were
dropped if they didn't apply to current software and OSes.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Beginner lost with promotable "group" design

2024-01-17 Thread Ken Gaillot

On Wed, 2024-01-17 at 14:23 +0100, Adam Cécile wrote:
> Hello,
> 
> 
> I'm trying to achieve the following setup with 3 hosts:
> 
> * One master gets a shared IP, then remove default gw, add another
> gw, 
> start a service
> 
> * Two slaves should have none of them but add a different default gw
> 
> I managed quite easily to get the master workflow running with
> ordering 
> constraints but I don't understand how I should move forward with
> the 
> slave configuration.
> 
> I think I must create a promotable resource first then assign my
> other 
> resources with started/stopped  setting depending on the promote
> status 
> of the node. Is that correct ? How to create a promotable
> "placeholder" 
> where I can later attach my existing resources ?

A promotable resource would be appropriate if the service should run on
all nodes, but one node runs with a special setting. That doesn't sound
like what you have.

If you just need the service to run on one node, the shared IP,
service, and both gateways can be regular resources. You just need
colocation constraints between them:

- colocate service and external default route with shared IP
- clone the internal default route and anti-colocate it with shared IP

If you want the service to be able to run even if the IP can't, make
its colocation score finite (or colocate the IP and external route with
the service).

Ordering is separate. You can order the shared IP, service, and
external route however needed. Alternatively, you can put the three of
them in a group (which does both colocation and ordering, in sequence),
and anti-colocate the cloned internal route with the group.

> 
> Sorry for the stupid question but I really don't understand what type
> of 
> elements I should create...
> 
> 
> Thanks in advance,
> 
> Regards, Adam.
> 
> 
> PS: Bonus question should I use "pcs" or "crm" ? It seems both
> command 
> seem to be equivalent and documentations use sometime one or another
> 

They are equivalent -- it's a matter of personal preference (and often
what choices your distro give you).
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Migrating off CentOS

2024-01-15 Thread Ken Gaillot

On Sat, 2024-01-13 at 09:07 -0600, Billy Croan wrote:
> I'm planning to migrate a two-node cluster off CentOS 7 this year.  I
> think I'm taking it to Debian Stable, but open for suggestions if any
> distribution is better supported by pacemaker.


Debian, RHEL, SUSE, Ubuntu, and compatible distros should all have good
support.

Fedora and FreeBSD get regular builds and basic testing but have fewer
users exercising them in production.

FYI, if you want to keep the interfaces you're familiar with, the free
RHEL developer license now allows most personal and small-business
production use: https://access.redhat.com/discussions/5719451

> 
> Have any of you had success doing major upgrades (bullseye to
> bookworm on Debian) of your physical nodes one at a time while each
> node is in standby+maintenance, and rolling the vm from one to the
> other so it doesn't reboot while the hosts are upgraded?  That has
> worked well for me for minor OS updates, but I'm curious about the
> majors.  
> 
> My project this year is even more major, not just upgrading the OS
> but changing distributions.
> 
> I think I have three possible ways I can try this:
> 1) wipe all server disks and start fresh.

A variation, if you can get new hosts, is to set up a test cluster on
new hosts, and once you're comfortable that it will work, stop the old
cluster and turn the new one into production.

> 
> 2) standby and maintenance one node, then reinstall it with a new OS
> and make a New Cluster.  shutdown the vm and copy it, offline, to the
> new one-node cluster. and start it up there. Then once that's
> working, wipe and reinstall the other node, and add it to the new
> cluster.

This should be fine.

> 
> 3) standby and maintenance one node, then Remove it from the
> cluster.  Then reinstall it with the new distribution's OS.  Then re-
> add it to the Existing Cluster.  Move the vm resource to it and
> verify it's working, then do the same with the other physical node,
> and take it out of standby&maint to finish.
> 

This would be fine as long as the corosync and pacemaker versions are
compatible. However as Michele mentioned, RHEL 7 uses Corosync 2, and
the latest of any distro will use Corosync 3, so that will sink this
option.

> (Obviously any of those methods begin with a full backup to offsite
> and local media. and end with a verification against that backup.)
> 
> #1 would be the longest outage but the "cleanest result"
> #3 would be possibly no outage, but I think the least likely to
> work.  I understand EL uses pcs and debian uses crm for example...

Debian offers both IIRC. But that won't affect the upgrade, they both
use Pacemaker command-line tools under the hood. The only difference is
what commands you run to get the same effect.

> #2 is a compromise that should(tm) have only a few seconds of
> outage.  But could blow up i suppose.  They all could blow up though
> so I'm not sure that should play a factor in the decision.
> 
> I can't be the first person to go down this path.  So what do you all
> think?  how have you done it in the past?

-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Planning for Pacemaker 3

2024-01-04 Thread Ken Gaillot

Thanks, I hadn't heard that!

On Thu, 2024-01-04 at 01:13 +0100, Valentin Vidić via Users wrote:
> On Wed, Jan 03, 2024 at 11:06:27AM -0600, Ken Gaillot wrote:
> > I'd like to release Pacemaker 3.0.0 around the middle of this
> > year. 
> > I'm gathering proposed changes here:
> > 
> >  
> > https://projects.clusterlabs.org/w/projects/pacemaker/pacemaker_3.0_changes/
> > 
> > Please review for anything that might affect you, and reply here if
> > you
> > have any concerns.
> 
> Probably best to drop support for rkt bundles as that project has
> ended:
> 
>   https://github.com/rkt/rkt/issues/4024
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Planning for Pacemaker 3

2024-01-03 Thread Ken Gaillot

Hi all,

I'd like to release Pacemaker 3.0.0 around the middle of this year. 
I'm gathering proposed changes here:

 https://projects.clusterlabs.org/w/projects/pacemaker/pacemaker_3.0_changes/

Please review for anything that might affect you, and reply here if you
have any concerns.

Pacemaker major-version releases drop support for deprecated features,
to make the code easier to maintain. The biggest planned changes are
dropping support for Upstart and Nagios resources, as well as rolling
upgrades from Pacemaker 1. Much of the lowest-level public C API will
be dropped.

Because the changes will be backward-incompatible, we will continue to
make 2.1 releases for a few years, with backports of compatible fixes,
to help distribution packagers who need to keep backward compatibility.
-- 
Ken Gaillot 




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] colocate Redis - weird

2024-01-01 Thread Ken Gaillot

On Wed, 2023-12-20 at 11:16 +0100, lejeczek via Users wrote:
> 
> 
> On 19/12/2023 19:13, lejeczek via Users wrote:
> > hi guys,
> > 
> > Is this below not the weirdest thing?
> > 
> > -> $ pcs constraint ref PGSQL-PAF-5435
> > Resource: PGSQL-PAF-5435
> >   colocation-HA-10-1-1-84-PGSQL-PAF-5435-clone-INFINITY
> >   colocation-REDIS-6385-clone-PGSQL-PAF-5435-clone-INFINITY
> >   order-PGSQL-PAF-5435-clone-HA-10-1-1-84-Mandatory
> >   order-PGSQL-PAF-5435-clone-HA-10-1-1-84-Mandatory-1
> >   colocation_set_PePePe

Can you show the actual constraint information (resources and scores)
for the whole cluster? In particular I'm wondering about that set.

> > 
> > Here Redis master should folow pgSQL master.
> > Which such constraint:
> > 
> > -> $ pcs resource status PGSQL-PAF-5435
> >   * Clone Set: PGSQL-PAF-5435-clone [PGSQL-PAF-5435] (promotable):
> > * Promoted: [ ubusrv1 ]
> > * Unpromoted: [ ubusrv2 ubusrv3 ]
> > -> $ pcs resource status REDIS-6385-clone
> >   * Clone Set: REDIS-6385-clone [REDIS-6385] (promotable):
> > * Unpromoted: [ ubusrv1 ubusrv2 ubusrv3 ]
> > 
> > If I remove that constrain:
> > -> $ pcs constraint delete colocation-REDIS-6385-clone-PGSQL-PAF-
> > 5435-clone-INFINITY
> > -> $ pcs resource status REDIS-6385-clone
> >   * Clone Set: REDIS-6385-clone [REDIS-6385] (promotable):
> > * Promoted: [ ubusrv1 ]
> > * Unpromoted: [ ubusrv2 ubusrv3 ]
> > 
> > and ! I can manually move Redis master around, master moves to each
> > server just fine.
> > I again, add that constraint:
> > 
> > -> $ pcs constraint colocation add master REDIS-6385-clone with
> > master PGSQL-PAF-5435-clone
> > 
> > and the same...
> > 
> > 
>  What there might be about that one node - resource removed, created
> anew and cluster insists on keeping master there.
> I can manually move the master anywhere but if I _clear_ the
> resource, no constraints then cluster move it back to the same node.
> 
> I wonder about:  a) "transient" node attrs & b) if this cluster is
> somewhat broken.
> On a) - can we read more about those somewhere?(not the
> code/internals)
> thanks, L.
> 

Transient attributes are the same as permanent ones except they get
cleared when a node leaves the cluster.

The constraint says that the masters must be located together, but they
each still need to be enabled on a given node with either a master
score attribute (permanent or transient) or a location constraint.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] colocation constraint - do I get it all wrong?

2024-01-01 Thread Ken Gaillot

On Fri, 2023-12-22 at 17:02 +0100, lejeczek via Users wrote:
> hi guys.
> 
> I have a colocation constraint:
> 
> -> $ pcs constraint ref DHCPD
> Resource: DHCPD
>   colocation-DHCPD-GATEWAY-NM-link-INFINITY
> 
> and the trouble is... I thought DHCPD is to follow GATEWAY-NM-link,
> always!
> If that is true that I see very strange behavior, namely.
> When there is an issue with DHCPD resource, cannot be started, then
> GATEWAY-NM-link gets tossed around by the cluster.
> 
> Is that normal & expected - is my understanding of _colocation_
> completely wrong - or my cluster is indeed "broken"?
> many thanks, L.
> 

Pacemaker considers the preferences of colocated resources when
assigning a resource to a node, to ensure that as many resources as
possible can run. So if a colocated resource becomes unable to run on a
node, the primary resource might move to allow the colocated resource
to run.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Pacemaker 2.1.7 final release now available

2023-12-19 Thread Ken Gaillot

Hi all,

Source code for Pacemaker version 2.1.7 is available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.7

This is primarily a bug fix release. See the ChangeLog or the link
above for details.

Many thanks to all contributors of source code to this release,
including Chris Lumens, Gao,Yan, Grace Chin, Hideo Yamauchi, Jan
Pokorný, Ken Gaillot, liupei, Oyvind Albrigtsen, Reid Wahl, xin liang,
and xuezhixin.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Build cluster one node at a time

2023-12-19 Thread Ken Gaillot

Correct. You want to enable pcsd to start at boot. Also, after starting
pcsd the first time on a node, authorize it from the first node with
"pcs host auth  -u hacluster".

On Tue, 2023-12-19 at 22:42 +0200, Tiaan Wessels wrote:
> So i run the pcs add command for every new node on the first original
> node, not on the node being added? Only corosync, pacemaker and pcsd
> needs to run on the node to be added and the commands being run on
> the original node will speak to these on the new node?
> 
> On Tue, 19 Dec 2023, 21:39 Ken Gaillot,  wrote:
> > On Tue, 2023-12-19 at 17:03 +0200, Tiaan Wessels wrote:
> > > Hi,
> > > Is it possible to build a corosync pacemaker cluster on redhat9
> > one
> > > node at a time? In other words, when I'm finished with the first
> > node
> > > and reboot it, all services are started on it. Then i build a
> > second
> > > node to integrate into the cluster and once done, pcs status
> > shows
> > > two nodes on-line ?
> > > Thanks 
> > 
> > Yes, you can use pcs cluster setup with the first node, then pcs
> > cluster node add for each additional node.
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Build cluster one node at a time

2023-12-19 Thread Ken Gaillot

On Tue, 2023-12-19 at 17:03 +0200, Tiaan Wessels wrote:
> Hi,
> Is it possible to build a corosync pacemaker cluster on redhat9 one
> node at a time? In other words, when I'm finished with the first node
> and reboot it, all services are started on it. Then i build a second
> node to integrate into the cluster and once done, pcs status shows
> two nodes on-line ?
> Thanks 

Yes, you can use pcs cluster setup with the first node, then pcs
cluster node add for each additional node.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-18 Thread Ken Gaillot

# for i in lustre-mgs lustre-mds1 lustre-mds2 lustre{1..2}; do pcs
> constraint location OST4 avoids $i; done
> # pcs resource create ping ocf:pacemaker:ping dampen=5s
> host_list=192.168.34.250 op monitor interval=3s timeout=7s meta
> target-role="started" globally-unique="false" clone
> # for i in lustre-mgs lustre-mds{1..2} lustre{1..4}; do pcs
> constraint location ping-clone prefers $i; done
> # pcs constraint location OST3 rule score=0 pingd lt 1 or not_defined
> pingd
> # pcs constraint location OST4 rule score=0 pingd lt 1 or not_defined
> pingd
> # pcs constraint location OST3 rule score=125 defined pingd
> # pcs constraint location OST4 rule score=125 defined pingd
> 
> ###  same home base:
> # crm_simulate --simulate --live-check --show-scores
> pcmk__primitive_assign: OST4 allocation score on lustre3: 90
> pcmk__primitive_assign: OST4 allocation score on lustre4: 210
> # pcs status
>   * OST3(ocf::lustre:Lustre):Started lustre3
>   * OST4(ocf::lustre:Lustre):Started lustre4
> 
> ### VM with lustre4 (OST4) is OFF. 
> 
> # crm_simulate --simulate --live-check --show-scores
> pcmk__primitive_assign: OST4 allocation score on lustre3: 90
> pcmk__primitive_assign: OST4 allocation score on lustre4: 100
> Start  OST4( lustre3 )
> Resource action: OST4start on lustre3
> Resource action: OST4monitor=2 on lustre3
> # pcs status
>   * OST3(ocf::lustre:Lustre):Started lustre3
>   * OST4(ocf::lustre:Lustre):Stopped
> 
> Again lustre3 seems unable to overrule due to lower score and pingd
> DOESN'T help at all!
> 
> 
> 4) Can I make a reliable HA failover without pingd to keep things as
> simple as possible?
> 5) Pings might help to affect cluster decisions in case GW is lost,
> but its not working as all the guides say. Why?
> 
> 
> Thanks in advance,
> Artem
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Pacemaker 2.1.7-rc4 now available (likely final for real)

2023-12-12 Thread Ken Gaillot

Hi all,

Source code for the fourth (and very likely final) release candidate
for Pacemaker version 2.1.7 is available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.7-rc4

This release candidate fixes a newly found regression that was
introduced in rc1.

This is probably your last chance to test before the final release,
which I expect will be next Tuesday.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-12 Thread Ken Gaillot

On Tue, 2023-12-12 at 18:08 +0300, Artem wrote:
> Hi Andrei. pingd==0 won't satisfy both statements. It would if I used
> GTE, but I used GT.
> pingd lt 1 --> [0]
> pingd gt 0 --> [1,2,3,...]

It's the "or defined pingd" part of the rule that will match pingd==0.
A value of 0 is defined.

I'm guessing you meant to use "pingd gt 0 *AND* pingd defined", but
then the defined part would become redundant since any value greater
than 0 is inherently defined. So, for that rule, you only need "pingd
gt 0".

> 
> On Tue, 12 Dec 2023 at 17:21, Andrei Borzenkov 
> wrote:
> > On Tue, Dec 12, 2023 at 4:47 PM Artem  wrote:
> > >> > pcs constraint location FAKE3 rule score=0 pingd lt 1 or
> > not_defined pingd
> > >> > pcs constraint location FAKE3 rule score=125 pingd gt 0 or
> > defined pingd
> > > Are they really contradicting?
> > 
> > Yes. pingd == 0 will satisfy both rules. My use of "always" was
> > incorrect, it does not happen for all possible values of pingd, but
> > it
> > does happen for some.
> 
> May be defined/not_defined should be put in front of lt/gt ? It is
> possible that VM goes down, pingd to not_defined, then the rule
> evaluates "lt 1" first, catches an error and doesn't evaluate next
> part (after OR)?

No, the order of and/or clauses doesn't matter.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-12 Thread Ken Gaillot

On Mon, 2023-12-11 at 21:05 +0300, Artem wrote:
> Hi Ken,
> 
> On Mon, 11 Dec 2023 at 19:00, Ken Gaillot 
> wrote:
> > > Question #2) I shut lustre3 VM down and leave it like that
> > How did you shut it down? Outside cluster control, or with
> > something
> > like pcs resource disable?
> > 
> 
> I did it outside of the cluster to simulate a failure. I turned off
> this VM from vCenter. Cluster is unaware of anything behind OS.

In that case check pacemaker.log for messages around the time of the
failure. They should tell you what error originally occurred and why
the cluster is blocked on it.

>  
> > >   * FAKE3   (ocf::pacemaker:Dummy):  Stopped
> > >   * FAKE4   (ocf::pacemaker:Dummy):  Started lustre4
> > >   * Clone Set: ping-clone [ping]:
> > > * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1
> > lustre2
> > > lustre4 ] << lustre3 missing
> > > OK for now
> > > VM boots up. pcs status: 
> > >   * FAKE3   (ocf::pacemaker:Dummy):  FAILED (blocked) [
> > lustre3
> > > lustre4 ]  << what is it?
> > >   * Clone Set: ping-clone [ping]:
> > > * ping  (ocf::pacemaker:ping):   FAILED lustre3
> > (blocked)   
> > > << why not started?
> > > * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1
> > lustre2
> > > lustre4 ]
> > > I checked server processes manually and found that lustre4 runs
> > > "/usr/lib/ocf/resource.d/pacemaker/ping monitor" while lustre3
> > > doesn't
> > > All is according to documentation but results are strange.
> > > Then I tried to add meta target-role="started" to pcs resource
> > create
> > > ping and this time ping started after node rebooted. Can I expect
> > > that it was just missing from official setup documentation, and
> > now
> > > everything will work fine?
> > 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] resource fails manual failover

2023-12-12 Thread Ken Gaillot

On Tue, 2023-12-12 at 16:50 +0300, Artem wrote:
> Is there a detailed explanation for resource monitor and start
> timeouts and intervals with examples, for dummies?

No, though Pacemaker Explained has some reference information:

https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/index.html#resource-operations

> 
> my resource configured s follows:
> [root@lustre-mds1 ~]# pcs resource show MDT00
> Warning: This command is deprecated and will be removed. Please use
> 'pcs resource config' instead.
> Resource: MDT00 (class=ocf provider=heartbeat type=Filesystem)
>   Attributes: MDT00-instance_attributes
> device=/dev/mapper/mds00
> directory=/lustre/mds00
> force_unmount=safe
> fstype=lustre
>   Operations:
> monitor: MDT00-monitor-interval-20s
>   interval=20s
>   timeout=40s
> start: MDT00-start-interval-0s
>   interval=0s
>   timeout=60s
> stop: MDT00-stop-interval-0s
>   interval=0s
>   timeout=60s
> 
> I issued manual failover with the following commands:
> crm_resource --move -r MDT00 -H lustre-mds1
> 
> resource tried but returned back with the entries in pacemaker.log
> like these:
> Dec 12 15:53:23  Filesystem(MDT00)[1886100]:INFO: Running start
> for /dev/mapper/mds00 on /lustre/mds00
> Dec 12 15:53:45  Filesystem(MDT00)[1886100]:ERROR: Couldn't mount
> device [/dev/mapper/mds00] as /lustre/mds00
> 
> tried again with the same result:
> Dec 12 16:11:04  Filesystem(MDT00)[1891333]:INFO: Running start
> for /dev/mapper/mds00 on /lustre/mds00
> Dec 12 16:11:26  Filesystem(MDT00)[1891333]:ERROR: Couldn't mount
> device [/dev/mapper/mds00] as /lustre/mds00
> 
> Why it cannot move?

The error is outside the cluster software, in the mount attempt itself.
The resource agent logged the ERROR above, so if you can't find more
information in the system logs you may want to look at the agent code
to see what it's doing around that message.

> 
> Does this 20 sec interval (between start and error) have anything to
> do with monitor interval settings?

No. The monitor interval says when to schedule another recurring
monitor check after the previous one completes. The first monitor isn't
scheduled until after the start succeeds.

> 
> [root@lustre-mgs ~]# pcs constraint show --full
> Location Constraints:
>   Resource: MDT00
> Enabled on:
>   Node: lustre-mds1 (score:100) (id:location-MDT00-lustre-mds1-
> 100)
>   Node: lustre-mds2 (score:100) (id:location-MDT00-lustre-mds2-
> 100)
> Disabled on:
>   Node: lustre-mgs (score:-INFINITY) (id:location-MDT00-lustre-
> mgs--INFINITY)
>   Node: lustre1 (score:-INFINITY) (id:location-MDT00-lustre1
> --INFINITY)
>   Node: lustre2 (score:-INFINITY) (id:location-MDT00-lustre2
> --INFINITY)
>   Node: lustre3 (score:-INFINITY) (id:location-MDT00-lustre3
> --INFINITY)
>   Node: lustre4 (score:-INFINITY) (id:location-MDT00-lustre4
> --INFINITY)
> Ordering Constraints:
>   start MGT then start MDT00 (kind:Optional) (id:order-MGT-MDT00-
> Optional)
>   start MDT00 then start OST1 (kind:Optional) (id:order-MDT00-OST1-
> Optional)
>   start MDT00 then start OST2 (kind:Optional) (id:order-MDT00-OST2-
> Optional)
> 
> with regards to ordering constraint: OST1 and OST2 are started now,
> while I'm exercising MDT00 failover.
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-11 Thread Ken Gaillot

On Fri, 2023-12-08 at 17:44 +0300, Artem wrote:
> Hello experts.
> 
> I use pacemaker for a Lustre cluster. But for simplicity and
> exploration I use a Dummy resource. I didn't like how resource
> performed failover and failback. When I shut down VM with remote
> agent, pacemaker tries to restart it. According to pcs status it
> marks the resource (not RA) Online for some time while VM stays
> down. 
> 
> OK, I wanted to improve its behavior and set up a ping monitor. I
> tuned the scores like this:
> pcs resource create FAKE3 ocf:pacemaker:Dummy
> pcs resource create FAKE4 ocf:pacemaker:Dummy
> pcs constraint location FAKE3 prefers lustre3=100
> pcs constraint location FAKE3 prefers lustre4=90
> pcs constraint location FAKE4 prefers lustre3=90
> pcs constraint location FAKE4 prefers lustre4=100
> pcs resource defaults update resource-stickiness=110
> pcs resource create ping ocf:pacemaker:ping dampen=5s host_list=local
> op monitor interval=3s timeout=7s clone meta target-role="started"
> for i in lustre{1..4}; do pcs constraint location ping-clone prefers
> $i; done
> pcs constraint location FAKE3 rule score=0 pingd lt 1 or not_defined
> pingd
> pcs constraint location FAKE4 rule score=0 pingd lt 1 or not_defined
> pingd
> pcs constraint location FAKE3 rule score=125 pingd gt 0 or defined
> pingd
> pcs constraint location FAKE4 rule score=125 pingd gt 0 or defined
> pingd

The gt 0 part is redundant since "defined pingd" matches *any* score.

> 
> 
> Question #1) Why I cannot see accumulated score from pingd in
> crm_simulate output? Only location score and stickiness. 
> pcmk__primitive_assign: FAKE3 allocation score on lustre3: 210
> pcmk__primitive_assign: FAKE3 allocation score on lustre4: 90
> pcmk__primitive_assign: FAKE4 allocation score on lustre3: 90
> pcmk__primitive_assign: FAKE4 allocation score on lustre4: 210
> Either when all is OK or when VM is down - score from pingd not added
> to total score of RA

ping scores aren't added to resource scores, they're just set as node
attribute values. Location constraint rules map those values to
resource scores (in this case any defined ping score gets mapped to
125).

> 
> 
> Question #2) I shut lustre3 VM down and leave it like that. pcs
> status:

How did you shut it down? Outside cluster control, or with something
like pcs resource disable?

>   * FAKE3   (ocf::pacemaker:Dummy):  Stopped
>   * FAKE4   (ocf::pacemaker:Dummy):  Started lustre4
>   * Clone Set: ping-clone [ping]:
> * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2
> lustre4 ] << lustre3 missing
> OK for now
> VM boots up. pcs status: 
>   * FAKE3   (ocf::pacemaker:Dummy):  FAILED (blocked) [ lustre3
> lustre4 ]  << what is it?
>   * Clone Set: ping-clone [ping]:
> * ping  (ocf::pacemaker:ping):   FAILED lustre3 (blocked)   
> << why not started?
> * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2
> lustre4 ]
> I checked server processes manually and found that lustre4 runs
> "/usr/lib/ocf/resource.d/pacemaker/ping monitor" while lustre3
> doesn't
> All is according to documentation but results are strange.
> Then I tried to add meta target-role="started" to pcs resource create
> ping and this time ping started after node rebooted. Can I expect
> that it was just missing from official setup documentation, and now
> everything will work fine?
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Pacemaker 2.1.7-rc3 now available (likely final)

2023-12-07 Thread Ken Gaillot

Hi all,

Source code for the third (and likely final) release candidate for
Pacemaker version 2.1.7 is available at:

 
https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.7-rc3

This release candidate fixes a couple issues introduced in rc1. See the
ChangeLog or the link above for details.

Everyone is encouraged to download, build, and test the new release. We
do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

This is probably your last chance to test before the final release,
which I expect in about two weeks. If anyone needs more time, let me
know and I can delay it till early January.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Prevent cluster transition when resource unavailable on both nodes

2023-12-06 Thread Ken Gaillot

On Wed, 2023-12-06 at 17:55 +0100, Alexander Eastwood wrote:
> Hello, 
> 
> I administrate a Pacemaker cluster consisting of 2 nodes, which are
> connected to each other via ethernet cable to ensure that they are
> always able to communicate with each other. A network switch is also
> connected to each node via ethernet cable and provides external
> access.
> 
> One of the managed resources of the cluster is a virtual IP, which is
> assigned to a physical network interface card and thus depends on the
> network switch being available. The virtual IP is always hosted on
> the active node.
> 
> We had the situation where the network switch lost power or was
> rebooted, as a result both servers reported `NIC Link is Down`. The
> recover operation on the Virtual IP resource then failed repeatedly
> on the active node, and a transition was initiated. Since the other 

The default reaction to a start failure is to ban the resource from
that node. If it tries to recover repeatedly on the same node, I assume
you set start-failure-is-fatal to false, and/or have a very low
failure-timeout on starts?

> node was also unable to start the resource, the cluster was swaying
> between the 2 nodes until the NIC links were up again.
> 
> Is there a way to change this behaviour? I am thinking of the
> following sequence of events, but have not been able to find a way to
> configure this:
> 
>  1. active node detects NIC Link is Down, which affects a resource
> managed by the cluster (monitor operation on the resource starts to
> fail)
>  2. active node checks if the other (passive) node in the cluster
> would be able to start the resource

There's really no way to check without actually trying to start it, so
basically you're describing what Pacemaker does.

>  3. if passive node can start the resource, transition all resources
> to passive node

I think maybe the "all resources" part is key. Presumably that means
you have a bunch of other resources colocated with and/or ordered after
the IP, so they all have to stop to try to start the IP elsewhere.

If those resources really do require the IP to be active, then that's
the correct behavior. If they don't, then the constraints could be
dropped, reversed, or made optional or asymmetric.

It sounds like you might want an optional colocation, or a colocation
of the IP with the other resources (rather than vice versa).

>  4. if passive node is unable to start the resource, then there is
> nothing to be gained a transition, so no action should be taken

If start-failure-is-fatal is left to true, and no failure-timeout is
configured, then it will try once per node then wait for manual
cleanup. If the colocation is made optional or reversed, the other
resources can continue to run.

> 
> Any pointers or advice will be much appreciated!
> 
> Thank you and kind regards,
> 
> Alex Eastwood
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Redundant entries in log

2023-12-05 Thread Ken Gaillot

On Tue, 2023-12-05 at 17:21 +, Jean-Baptiste Skutnik wrote:
> Hi,
> 
> It was indeed a configuration of 1m on the recheck interval that
> triggered the transitions.
> 
> Could you elaborate on why this is not relevant anymore ? I am
> training
> on the HA stack and if there are mechanisms to detect failure more
> advanced than a recheck I would be interested in what to look for in
> the documentation.

Hi,

The recheck interval has nothing to do with detecting resource failures
-- that is done per-resource via the configured monitor operation
interval.

In the past, time-based configuration such as failure timeouts and
date/time-based rules were only guaranteed to be checked as often as
the recheck interval. That was the most common reason why people
lowered it. However, since the 2.0.3 release, these are checked at the
exact appropriate time, so the recheck interval is no longer relevant
for these.

The recheck interval is still useful in two situations: evaluation of
rules using the (cron-like) date_spec element is still only guaranteed
to occur this often; and if there are scheduler bugs resulting in an
incompletely scheduled transition that can be corrected with a new
transition, this will be the maximum time until that happens.

> 
> Cheers,
> 
> JB
> 
> > On Nov 29, 2023, at 18:52, Ken Gaillot  wrote:
> > 
> > Hi,
> > 
> > Something is triggering a new transition. The most likely candidate
> > is
> > a low value for cluster-recheck-interval.
> > 
> > Many years ago, a low cluster-recheck-interval was necessary to
> > make
> > certain things like failure-timeout more timely, but that has not
> > been
> > the case in a long time. It should be left to default (15 minutes)
> > in
> > the vast majority of cases. (A new transition will still occur on
> > that
> > schedule, but that's reasonable.)
> > 
> > On Wed, 2023-11-29 at 10:05 +, Jean-Baptiste Skutnik via Users
> > wrote:
> > > Hello all,
> > > 
> > > I am managing a cluster using pacemaker for high availability. I
> > > am
> > > parsing the logs for relevant information on the cluster health
> > > and
> > > the logs are full of the following:
> > > 
> > > ```
> > > Nov 29 09:17:41 esvm2 pacemaker-controld[2893]:  notice: State
> > > transition S_IDLE -> S_POLICY_ENGINE
> > > Nov 29 09:17:41 esvm2 pacemaker-schedulerd[2892]:  notice:
> > > Calculated
> > > transition 8629, saving inputs in /var/lib/pacemaker/pengine/pe-
> > > input-250.bz2
> > > Nov 29 09:17:41 esvm2 pacemaker-controld[2893]:  notice:
> > > Transition
> > > 8629 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > > Source=/var/lib/pacemaker/pengine/pe-input-250.bz2): Complete
> > > Nov 29 09:17:41 esvm2 pacemaker-controld[2893]:  notice: State
> > > transition S_TRANSITION_ENGINE -> S_IDLE
> > > Nov 29 09:18:41 esvm2 pacemaker-controld[2893]:  notice: State
> > > transition S_IDLE -> S_POLICY_ENGINE
> > > Nov 29 09:18:41 esvm2 pacemaker-schedulerd[2892]:  notice:
> > > Calculated
> > > transition 8630, saving inputs in /var/lib/pacemaker/pengine/pe-
> > > input-250.bz2
> > > Nov 29 09:18:41 esvm2 pacemaker-controld[2893]:  notice:
> > > Transition
> > > 8630 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > > Source=/var/lib/pacemaker/pengine/pe-input-250.bz2): Complete
> > > Nov 29 09:18:41 esvm2 pacemaker-controld[2893]:  notice: State
> > > transition S_TRANSITION_ENGINE -> S_IDLE
> > > Nov 29 09:19:41 esvm2 pacemaker-controld[2893]:  notice: State
> > > transition S_IDLE -> S_POLICY_ENGINE
> > > Nov 29 09:19:41 esvm2 pacemaker-schedulerd[2892]:  notice:
> > > Calculated
> > > transition 8631, saving inputs in /var/lib/pacemaker/pengine/pe-
> > > input-250.bz2
> > > Nov 29 09:19:41 esvm2 pacemaker-controld[2893]:  notice:
> > > Transition
> > > 8631 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > > Source=/var/lib/pacemaker/pengine/pe-input-250.bz2): Complete
> > > Nov 29 09:19:41 esvm2 pacemaker-controld[2893]:  notice: State
> > > transition
> > > ...
> > > ```
> > > 
> > > The transition IDs seem to differ however the file containing the
> > > transition data stays the same, implying that the transition does
> > > not
> > > affect the cluster. (/var/lib/pacemaker/pengine/pe-input-250.bz2)
> > > 
> > > I noticed the option to restrict the logging to higher levels
> > > however
> > > some valuable information is logged under the `notice` level and
> > > I
> > > would like to keep it in the logs.
> > > 
> > > Please let me know if I am doing something wrong or if there is a
> > > way
> > > to turn off these messages.
> > > 
> > > Thanks,
> > > 
> > > Jean-Baptiste Skutnik
> > > ___
> > 
> > -- 
> > Ken Gaillot 
> > 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] RemoteOFFLINE status, permanently

2023-12-04 Thread Ken Gaillot

o: No peers with id=0 and/or uname=lustre1
> to purge from the membership cache
> Nov 29 12:50:03 lustre-mgs.ntslab.ru pacemaker-attrd [2484]
> (attrd_client_peer_remove)  info: Client e1142409-f793-4839-a938-
> f512958a925e is requesting all values for lustre1 be removed
> Nov 29 12:50:03 lustre-mgs.ntslab.ru pacemaker-attrd [2484]
> (attrd_peer_remove) notice: Removing all lustre1 attributes for
> peer lustre-mgs
> Nov 29 12:50:03 lustre-mgs.ntslab.ru pacemaker-attrd [2484]
> (reap_crm_member)   info: No peers with id=0 and/or uname=lustre1
> to purge from the membership cache
> Nov 29 12:50:11 lustre-mgs.ntslab.ru pacemaker-based [2481]
> (log_info)  info: ++ /cib/configuration/resources:   class="ocf" id="lustre1" provider="pacemaker" type="remote"/>
> Nov 29 12:50:11 lustre-mgs.ntslab.ru pacemaker-based [2481]
> (log_info)  info: ++
>  
> Nov 29 12:50:11 lustre-mgs.ntslab.ru pacemaker-based [2481]
> (log_info)  info: ++ id="lustre1-instance_attributes-server" name="server"
> value="lustre1"/>
> Nov 29 12:50:11 lustre-mgs.ntslab.ru pacemaker-based [2481]
> (log_info)  info: ++ id="lustre1-migrate_from-interval-0s" interval="0s"
> name="migrate_from" timeout="60s"/>
> Nov 29 12:50:11 lustre-mgs.ntslab.ru pacemaker-based [2481]
> (log_info)  info: ++ id="lustre1-migrate_to-interval-0s" interval="0s" name="migrate_to"
> timeout="60s"/>
> Nov 29 12:50:11 lustre-mgs.ntslab.ru pacemaker-based [2481]
> (log_info)  info: ++ id="lustre1-monitor-interval-60s" interval="60s" name="monitor"
> timeout="30s"/>
> Nov 29 12:50:11 lustre-mgs.ntslab.ru pacemaker-based [2481]
> (log_info)  info: ++ id="lustre1-reload-interval-0s" interval="0s" name="reload"
> timeout="60s"/>
> Nov 29 12:50:11 lustre-mgs.ntslab.ru pacemaker-based [2481]
> (log_info)  info: ++ id="lustre1-reload-agent-interval-0s" interval="0s" name="reload-
> agent" timeout="60s"/>
> Nov 29 12:50:11 lustre-mgs.ntslab.ru pacemaker-based [2481]
> (log_info)  info: ++ id="lustre1-start-interval-0s" interval="0s" name="start"
> timeout="60s"/>
> Nov 29 12:50:11 lustre-mgs.ntslab.ru pacemaker-based [2481]
> (log_info)  info: ++ id="lustre1-stop-interval-0s" interval="0s" name="stop"
> timeout="60s"/>
> Nov 29 12:50:11 lustre-mgs.ntslab.ru pacemaker-execd [2483]
> (process_lrmd_get_rsc_info) info: Agent information for 'lustre1'
> not in cache
> Nov 29 12:50:11 lustre-mgs.ntslab.ru pacemaker-controld  [2486]
> (do_lrm_rsc_op) notice: Requesting local execution of probe
> operation for lustre1 on lustre-mgs | transition_key=5:88:7:288b2e10-
> 0bee-498d-b9eb-4bc5f0f8d5bf op_key=lustre1_monitor_0
> Nov 29 12:50:11 lustre-mgs.ntslab.ru pacemaker-controld  [2486]
> (log_executor_event)notice: Result of probe operation for lustre1
> on lustre-mgs: not running (Remote connection inactive) | graph
> action confirmed; call=7 key=lustre1_monitor_0 rc=7
> Nov 29 12:50:11 lustre-mgs.ntslab.ru pacemaker-based [2481]
> (log_info)  info: ++
> /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources:
>   type="remote"/>
> Nov 29 12:50:11 lustre-mgs.ntslab.ru pacemaker-based [2481]
> (log_info)  info: ++
> operation_key="lustre1_monitor_0" operation="monitor" crm-debug-
> origin="controld_update_resource_history" crm_feature_set="3.17.4"
> transition-key="3:88:7:288b2e10-0bee-498d-b9eb-4bc5f0f8d5bf"
> transition-magic="-1:193;3:88:7:288b2e10-0bee-498d-b9eb-4bc5f0f8d5bf" 
> exit-reason="" on_node="lustre-mds1" call-id="-1" rc-code="193" op-st
> Nov 29 12:50:11 lustre-mgs.ntslab.ru pacemaker-based [2481]
> (log_info)  info: +
>  /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resou
> rce[@id='lustre1']/lrm_rsc_op[@id='lustre1_last_0']:  @transitio

Re: [ClusterLabs] Redundant entries in log

2023-11-29 Thread Ken Gaillot

Hi,

Something is triggering a new transition. The most likely candidate is
a low value for cluster-recheck-interval.

Many years ago, a low cluster-recheck-interval was necessary to make
certain things like failure-timeout more timely, but that has not been
the case in a long time. It should be left to default (15 minutes) in
the vast majority of cases. (A new transition will still occur on that
schedule, but that's reasonable.)

On Wed, 2023-11-29 at 10:05 +, Jean-Baptiste Skutnik via Users
wrote:
> Hello all,
> 
> I am managing a cluster using pacemaker for high availability. I am
> parsing the logs for relevant information on the cluster health and
> the logs are full of the following:
> 
> ```
> Nov 29 09:17:41 esvm2 pacemaker-controld[2893]:  notice: State
> transition S_IDLE -> S_POLICY_ENGINE
> Nov 29 09:17:41 esvm2 pacemaker-schedulerd[2892]:  notice: Calculated
> transition 8629, saving inputs in /var/lib/pacemaker/pengine/pe-
> input-250.bz2
> Nov 29 09:17:41 esvm2 pacemaker-controld[2893]:  notice: Transition
> 8629 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-250.bz2): Complete
> Nov 29 09:17:41 esvm2 pacemaker-controld[2893]:  notice: State
> transition S_TRANSITION_ENGINE -> S_IDLE
> Nov 29 09:18:41 esvm2 pacemaker-controld[2893]:  notice: State
> transition S_IDLE -> S_POLICY_ENGINE
> Nov 29 09:18:41 esvm2 pacemaker-schedulerd[2892]:  notice: Calculated
> transition 8630, saving inputs in /var/lib/pacemaker/pengine/pe-
> input-250.bz2
> Nov 29 09:18:41 esvm2 pacemaker-controld[2893]:  notice: Transition
> 8630 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-250.bz2): Complete
> Nov 29 09:18:41 esvm2 pacemaker-controld[2893]:  notice: State
> transition S_TRANSITION_ENGINE -> S_IDLE
> Nov 29 09:19:41 esvm2 pacemaker-controld[2893]:  notice: State
> transition S_IDLE -> S_POLICY_ENGINE
> Nov 29 09:19:41 esvm2 pacemaker-schedulerd[2892]:  notice: Calculated
> transition 8631, saving inputs in /var/lib/pacemaker/pengine/pe-
> input-250.bz2
> Nov 29 09:19:41 esvm2 pacemaker-controld[2893]:  notice: Transition
> 8631 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-250.bz2): Complete
> Nov 29 09:19:41 esvm2 pacemaker-controld[2893]:  notice: State
> transition
> ...
> ```
> 
> The transition IDs seem to differ however the file containing the
> transition data stays the same, implying that the transition does not
> affect the cluster. (/var/lib/pacemaker/pengine/pe-input-250.bz2)
> 
> I noticed the option to restrict the logging to higher levels however
> some valuable information is logged under the `notice` level and I
> would like to keep it in the logs.
> 
> Please let me know if I am doing something wrong or if there is a way
> to turn off these messages.
> 
> Thanks,
> 
> Jean-Baptiste Skutnik
> ___

-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Pacemaker 2.1.7-rc1 now available

2023-10-31 Thread Ken Gaillot

Hi all,

Source code for the first release candidate for Pacemaker version 2.1.7
is available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.7-rc1

This is primarily a bug fix release. See the ChangeLog or the link
above for details.

Everyone is encouraged to download, build, and test the new release. We
do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

Many thanks to all contributors of source code to this release,
including Chris Lumens, Gao,Yan, Grace Chin, Hideo Yamauchi, Jan
Pokorný, Ken Gaillot, liupei, Oyvind Albrigtsen, Reid Wahl, and
xuezhixin.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] How to output debug messages in the log file?

2023-10-03 Thread Ken Gaillot

On Tue, 2023-10-03 at 18:19 +0800, Jack via Users wrote:
> I wrote a resource file Stateful1 in /lib/ocf/resources/pacemaker on
> Ubuntu 22.04. It didn't working. So I wrote  ocf_log debug "hello
> world"  in the file Stateful1. But it didn't output debug messages.
> How can I output debug messages?
> 

Hi,

Set PCMK_debug=true wherever your distro keeps environment variables
for daemons (/etc/sysconfig/pacemaker, /etc/default/pacemaker, etc.).

Debug messages will show up in the Pacemaker detail log (typically
/var/log/pacemaker/pacemaker.log).
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Mutually exclusive resources ?

2023-09-27 Thread Ken Gaillot

On Wed, 2023-09-27 at 16:24 +0200, Adam Cecile wrote:
> On 9/27/23 16:02, Ken Gaillot wrote:
> > On Wed, 2023-09-27 at 15:42 +0300, Andrei Borzenkov wrote:
> > > On Wed, Sep 27, 2023 at 3:21 PM Adam Cecile 
> > > wrote:
> > > > Hello,
> > > > 
> > > > 
> > > > I'm struggling to understand if it's possible to create some
> > > > kind
> > > > of constraint to avoid two different resources to be running on
> > > > the
> > > > same host.
> > > > 
> > > > Basically, I'd like to have floating IP "1" and floating IP "2"
> > > > always being assigned to DIFFERENT nodes.
> > > > 
> > > > Is that something possible ?
> > > 
> > > Sure, negative colocation constraint.
> > > 
> > > > Can you give me a hint ?
> > > > 
> > > 
> > > Using crmsh:
> > > 
> > > colcoation IP1-no-with-IP2 -inf: IP1 IP2
> > > 
> > > > Thanks in advance, Adam.
> > 
> > To elaborate, use -INFINITY if you want the IPs to *never* run on
> > the
> > same node, even if there are no other nodes available (meaning one
> > of
> > them has to stop). If you *prefer* that they run on different
> > nodes,
> > but want to allow them to run on the same node in a degraded
> > cluster,
> > use a finite negative score.
> 
> That's exactly what I tried to do:
> crm configure primitive Freeradius systemd:freeradius.service op
> start interval=0 timeout=120 op stop interval=0 timeout=120 op
> monitor interval=60 timeout=100
> crm configure clone Clone-Freeradius Freeradius
> 
> crm configure primitive Shared-IPv4-Cisco-ISE-1 IPaddr2 params
> ip=10.1.1.1 nic=eth0 cidr_netmask=24 meta migration-threshold=2 op
> monitor interval=60 timeout=30 resource-stickiness=50
> crm configure primitive Shared-IPv4-Cisco-ISE-2 IPaddr2 params
> ip=10.1.1.2 nic=eth0 cidr_netmask=24 meta migration-threshold=2 op
> monitor interval=60 timeout=30 resource-stickiness=50
> 
> crm configure location Shared-IPv4-Cisco-ISE-1-Prefer-BRT Shared-
> IPv4-Cisco-ISE-1 50: infra-brt
> crm configure location Shared-IPv4-Cisco-ISE-2-Prefer-BTZ Shared-
> IPv4-Cisco-ISE-2 50: infra-btz
> crm configure colocation Shared-IPv4-Cisco-ISE-Different-Nodes -100:
> Shared-IPv4-Cisco-ISE-1 Shared-IPv4-Cisco-ISE-2
> My hope is that IP1 stays in infra-brt and IP2 goes on infra-btz. I
> want to allow them to keep running on different host so I also added
> stickiness. However, I really do not want them to both run on same
> node so I added a colocation with negative higher score.
> Does it looks good to you ?

Yep, that should work.

The way you have it, if there's some sort of problem and both IPs end
up on the same node, the IP that doesn't prefer that node will move
back to its preferred node once the problem is resolved. That sounds
like what you want, but if you'd rather it not move, you could raise
stickiness above 100.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Mutually exclusive resources ?

2023-09-27 Thread Ken Gaillot

On Wed, 2023-09-27 at 15:42 +0300, Andrei Borzenkov wrote:
> On Wed, Sep 27, 2023 at 3:21 PM Adam Cecile 
> wrote:
> > Hello,
> > 
> > 
> > I'm struggling to understand if it's possible to create some kind
> > of constraint to avoid two different resources to be running on the
> > same host.
> > 
> > Basically, I'd like to have floating IP "1" and floating IP "2"
> > always being assigned to DIFFERENT nodes.
> > 
> > Is that something possible ?
> 
> Sure, negative colocation constraint.
> 
> > Can you give me a hint ?
> > 
> 
> Using crmsh:
> 
> colcoation IP1-no-with-IP2 -inf: IP1 IP2
> 
> > Thanks in advance, Adam.

To elaborate, use -INFINITY if you want the IPs to *never* run on the
same node, even if there are no other nodes available (meaning one of
them has to stop). If you *prefer* that they run on different nodes,
but want to allow them to run on the same node in a degraded cluster,
use a finite negative score.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] pacemaker-remote

2023-09-18 Thread Ken Gaillot

On Thu, 2023-09-14 at 18:28 +0800, Mr.R via Users wrote:
> Hi all，
>
> In Pacemaker-Remote 2.1.6, the pacemaker package is required
> for guest nodes and not for remote nodes. Why is that? What does 
> pacemaker do?
> After adding guest node, pacemaker package does not seem to be 
> needed. Can I not install it here?

I'm not sure what's requiring it in your environment. There's no
dependency in the upstream RPM at least.

The pacemaker package does have the crm_master script needed by some
resource agents, so you will need it if you use any of those. (That
script should have been moved to the pacemaker-cli package in 2.1.3,
oops ...)

> After testing, remote nodes can be offline, but guest nodes cannot
>  be offline. Is there any way to get them offline? Are there
> relevant 
> failure test cases?
> 
> thanks,

To make a guest node offline, stop the resource that creates it.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Limit the number of resources starting/stoping in parallel possible?

2023-09-18 Thread Ken Gaillot

On Mon, 2023-09-18 at 14:24 +, Knauf Steffen wrote:
> Hi,
> 
> we have multiple Cluster (2 node + quorum setup) with more then 100
> Resources ( 10 x VIP + 90 Microservices) per Node.  
> If the Resources are stopped/started at the same time the Server is
> under heavy load, which may result into timeouts and an unresponsive
> server. 
> We configured some Ordering Constraints (VIP --> Microservice). Is
> there a way to limit the number of resources starting/stoping in
> parallel?
> Perhaps you have some other tips to handle such a situation.
> 
> Thanks & greets
> 
> Steffen
> 

Hi,

Yes, see the batch-limit cluster option:

https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/html/options.html#cluster-options

-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] [EXTERNE] Re: Centreon HA Cluster - VIP issue

2023-09-18 Thread Ken Gaillot

On Fri, 2023-09-15 at 09:32 +, Adil BOUAZZAOUI wrote:
> Hi Ken,
> 
> Any update please?
> 
> The idea is clear; I just need to know more information about this 2
> clusters setup:
> 
> 1. Arbitrator:
> 1.1. Only one arbitrator is needed for everything: should I use the
> Quorum provided by Centreon on the official documentation? Or should
> I use the booth ticket manager instead?

I would use booth for distributed data centers. The Centreon setup is
appropriate for a cluster within a single data center or data centers
on the same campus with a low-latency link.

> 1.2. is fencing configured separately? Or is is configured during the
> booth ticket manager installation?

You'll have to configure fencing in each cluster separately.

> 
> 2. Floating IP:
> 2.1. it doesn't hurt if both Floating IPs are running at the same
> time right?

Correct.

> 
> 3. Fail over:
> 3.1. How to update the DNS to point to the appropriate IP?
> 3.2. we're running our own DNS servers; so How to configure booth
> ticket for just the DNS resource?

You can have more than one ticket. On the Pacemaker side, tickets are
tied to resources with rsc_ticket constraints (though you'll probably
be using a higher-level tool that abstracts that).

How to update the DNS depends on what server you're using -- just
follow its documentation for making changes. You can use the
ocf:pacemaker:Dummy agent as a model and update start to make the DNS
change (in addition to creating the dummy state file). The monitor can
check whether the dummy state file is present and DNS is returning the
desired info. Stop would just remove the dummy state file.

> 4. MariaDB replication:
> 4.1. How can Centreon MariaDB replicat between the 2 clusters?

Native MySQL replication should work fine for that.

> 5. Centreon:
> 5.1. Will this setup (2 clusters, 2 floating IPs, 1 booth manager)
> work for our Centreon project? 

I don't have any experience with that, but it sounds fine.

> 
> 
> 
> Regards
> Adil Bouazzaoui
> 
> 
> Adil BOUAZZAOUI
> Ingénieur Infrastructures & Technologies
> GSM : +212 703 165 758
> E-mail  : adil.bouazza...@tmandis.ma
> 
> 
> -Message d'origine-
> De : Adil BOUAZZAOUI 
> Envoyé : Friday, September 8, 2023 5:15 PM
> À : Ken Gaillot ; Adil Bouazzaoui <
> adilb...@gmail.com>
> Cc : Cluster Labs - All topics related to open-source clustering
> welcomed 
> Objet : RE: [EXTERNE] Re: [ClusterLabs] Centreon HA Cluster - VIP
> issue
> 
> Hi Ken,
> 
> Thank you for the update and the clarification.
> The idea is clear; I just need to know more information about this 2
> clusters setup:
> 
> 1. Arbitrator:
> 1.1. Only one arbitrator is needed for everything: should I use the
> Quorum provided by Centreon on the official documentation? Or should
> I use the booth ticket manager instead?
> 1.2. is fencing configured separately? Or is is configured during the
> booth ticket manager installation?
> 
> 2. Floating IP:
> 2.1. it doesn't hurt if both Floating IPs are running at the same
> time right?
> 
> 3. Fail over:
> 3.1. How to update the DNS to point to the appropriate IP?
> 3.2. we're running our own DNS servers; so How to configure booth
> ticket for just the DNS resource?
> 
> 4. MariaDB replication:
> 4.1. How can Centreon MariaDB replicat between the 2 clusters?
> 
> 5. Centreon:
> 5.1. Will this setup (2 clusters, 2 floating IPs, 1 booth manager)
> work for our Centreon project? 
> 
> 
> 
> Regards
> Adil Bouazzaoui
> 
> 
> Adil BOUAZZAOUI
> Ingénieur Infrastructures & Technologies GSM : +212 703 165
> 758 E-mail  : adil.bouazza...@tmandis.ma
> 
> 
> -Message d'origine-
> De : Ken Gaillot [mailto:kgail...@redhat.com] Envoyé : Tuesday,
> September 5, 2023 10:00 PM À : Adil Bouazzaoui 
> Cc : Cluster Labs - All topics related to open-source clustering
> welcomed ; Adil BOUAZZAOUI <
> adil.bouazza...@tmandis.ma> Objet : [EXTERNE] Re: [ClusterLabs]
> Centreon HA Cluster - VIP issue
> 
> On Tue, 2023-09-05 at 21:13 +0100, Adil Bouazzaoui wrote:
> > Hi Ken,
> > 
> > thank you a big time for the feedback; much appreciated.
> > 
> > I suppose we go with a new Scenario 3: Setup 2 Clusters across 
> > different DCs connected by booth; so could you please clarify
> > below 
> > points to me so i can understand better and start working on the
> > architecture:
> > 
> > 1- in case of separate clusters connected by booth: should each 
> > cluster have a quorum device for the Master/slave elections?
> 
> Hi,
> 
> Only one arbitrator is needed for everything.
> 
> Since ea

Re: [ClusterLabs] PostgreSQL HA on EL9

2023-09-18 Thread Ken Gaillot

Ah, good catch. FYI, we created a hook for situations like this a while
back: resource-agents-deps.target. Which reminds me we really need to
document it ...

To use it, put a drop-in unit under /etc/systemd/system/resource-
agents-deps.target.d/ (any name ending in .conf) with:

  [Unit]
  Requires=
  After=

Pacemaker is ordered after resource-agents-deps, so you can use it to
start any non-clustered depedencies.

On Thu, 2023-09-14 at 15:43 +, Larry G. Mills via Users wrote:
> I found my issue with reboots - and it wasn't pacemaker-related at
> all.  My EL9 test system was different from the EL7 system in that it
> hosted the DB on a iSCSI-attached array.  During OS shutdown, the
> array was being unmounted concurrently with pacemaker shutdown, so it
> was not able to cleanly shut down the pgsql resource. I added a
> systemd override to make corosync dependent upon, and require,
> "remote-fs.target".   Everything shuts down cleanly now, as expected.
> 
> Thanks for the suggestions,
> 
> Larry
> 
> > -Original Message-
> > From: Users  On Behalf Of Oyvind
> > Albrigtsen
> > Sent: Thursday, September 14, 2023 5:43 AM
> > To: Cluster Labs - All topics related to open-source clustering
> > welcomed
> > 
> > Subject: Re: [ClusterLabs] PostgreSQL HA on EL9
> > 
> > If you're using network filesystems with the Filesystem agent this
> > patch might solve your issue:
> > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__github.com_ClusterLabs_resource-
> > 2Dagents_pull_1869&d=DwICAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=-
> > 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE&m=VO4147YbENDjp3d
> > xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD&s=vEhk79BWO
> > NaF8zrTI3oGbq7xqEYdQUICm-2H3Wal0J8&e=
> > 
> > 
> > Oyvind
> > 
> > On 13/09/23 17:56 +, Larry G. Mills via Users wrote:
> > > > On my RHEL 9 test cluster, both "reboot" and "systemctl reboot"
> > > > wait
> > > > for the cluster to stop everything.
> > > > 
> > > > I think in some environments "reboot" is equivalent to
> > > > "systemctl
> > > > reboot --force" (kill all processes immediately), so maybe see
> > > > if
> > > > "systemctl reboot" is better.
> > > > 
> > > > > On EL7, this scenario caused the cluster to shut itself down
> > > > > on the
> > > > > node before the OS shutdown completed, and the DB resource
> > > > > was
> > > > > stopped/shutdown before the OS stopped.  On EL9, this is not
> > > > > the
> > > > > case, the DB resource is not stopped before the OS shutdown
> > > > > completes.  This leads to errors being thrown when the
> > > > > cluster is
> > > > > started back up on the rebooted node similar to the
> > > > > following:
> > > > > 
> > > 
> > > Ken,
> > > 
> > > Thanks for the reply - and that's interesting that RHEL9 behaves
> > > as expected
> > and AL9 seemingly doesn't.   I did try shutting down via "systemctl
> > reboot",
> > but the cluster and resources were still not stopped cleanly before
> > the OS
> > stopped.  In fact, the commands "shutdown" and "reboot" are just
> > symlinks
> > to systemctl on AL9.2, so that make sense why the behavior is the
> > same.
> > > Just as a point of reference, my systemd version is:
> > > systemd.x86_64
> > 252-14.el9_2.3
> > > Larry
> > > ___
> > > Manage your subscription:
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__lists.clusterlabs.org_mailman_listinfo_users&d=DwICAg&c=gRgGjJ3
> > BkIsb
> > 5y6s49QqsA&r=-
> > 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE&m=VO4147YbENDjp3d
> > xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD&s=2Rx_74MVv
> > kAWfZLyMhZw5GCY_37uyRffB2HV4_zkvOY&e=
> > > ClusterLabs home: 
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__www.clusterlabs.org_&d=DwICAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=-
> > 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE&m=VO4147YbENDjp3d
> > xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD&s=lofFF14IrTG
> > 21epUbKbV0oUl-IrXZDSuNcaM1GM7FvU&e=
> > 
> > ___________
> > Manage your subscription:
> > https://urldefense.pr

Re: [ClusterLabs] PostgreSQL HA on EL9

2023-09-13 Thread Ken Gaillot

On Wed, 2023-09-13 at 16:45 +, Larry G. Mills via Users wrote:
> Hello Pacemaker community,
>  
> I have several two-node postgres 14 clusters that I am migrating from
> EL7 (Scientific Linux 7) to EL9 (AlmaLinux 9.2).
>  
> My configuration:
>  
> Cluster size: two nodes
> Postgres version: 14
> Corosync version: 3.1.7-1.el9  
> Pacemaker version: 2.1.5-9.el9_2
> pcs version: 0.11.4-7.el9_2
>  
> The migration has mostly gone smoothly, but I did notice one non-
> trivial change in recovery behavior between EL7 and EL9.  The
> recovery scenario is:
>  
> With the cluster running normally with one primary DB (i.e. Promoted)
> and one standby (i.e. Unpromoted), reboot one of the cluster nodes
> without first shutting down the cluster on that node.  The reboot is
> a “clean” system shutdown done via either the “reboot” or “shutdown”
> OS commands.

On my RHEL 9 test cluster, both "reboot" and "systemctl reboot" wait
for the cluster to stop everything.

I think in some environments "reboot" is equivalent to "systemctl
reboot --force" (kill all processes immediately), so maybe see if
"systemctl reboot" is better.

>  
> On EL7, this scenario caused the cluster to shut itself down on the
> node before the OS shutdown completed, and the DB resource was
> stopped/shutdown before the OS stopped.  On EL9, this is not the
> case, the DB resource is not stopped before the OS shutdown
> completes.  This leads to errors being thrown when the cluster is
> started back up on the rebooted node similar to the following:
> 
>   * pgsql probe on mynode returned 'error' (Instance "pgsql"
> controldata indicates a running secondary instance, the instance has
> probably crashed)
>  
> While this is not too serious for a standby DB instance, as the
> cluster is able to recover it back to the standby/Unpromoted state,
> if you reboot the Primary/Promoted DB node, the cluster is not able
> to recover it (because that DB still thinks it’s a primary), and the
> node is fenced.
>  
> Is this an intended behavior for the versions of pacemaker/corosync
> that I’m running, or a regression?   It may be possible to put an
> override into the systemd unit file for corosync to force the cluster
> to shutdown before the OS stops, but I’d rather not do that if
> there’s a better way to handle this recovery scenario.
>  
> Thanks for any advice,
>  
> Larry
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] MySQL cluster with auto failover

2023-09-13 Thread Ken Gaillot

On Tue, 2023-09-12 at 10:28 +0200, Damiano Giuliani wrote:
> thanks Ken,
> 
> could you point me in th right direction for a guide or some already
> working configuration?
> 
> Thanks
> 
> Damiano

Nothing specific to galera, just the usual Pacemaker Explained
documentation about clones.

There are some regression tests in the code base that include galera
resources. Some use clones and others bundles (containerized). For
example:

https://github.com/ClusterLabs/pacemaker/blob/main/cts/scheduler/xml/unrunnable-2.xml


> 
> Il giorno lun 11 set 2023 alle ore 16:26 Ken Gaillot <
> kgail...@redhat.com> ha scritto:
> > On Thu, 2023-09-07 at 10:27 +0100, Antony Stone wrote:
> > > On Wednesday 06 September 2023 at 17:01:24, Damiano Giuliani
> > wrote:
> > > 
> > > > Everything is clear now.
> > > > So the point is to use pacemaker and create the floating vip
> > and
> > > > bind it to
> > > > sqlproxy to health check and route the traffic to the available
> > and
> > > > healthy
> > > > galera nodes.
> > > 
> > > Good summary.
> > > 
> > > > It could be useful let pacemaker manage also galera services?
> > > 
> > > No; MySQL / Galera needs to be running on all nodes all the
> > > time.  Pacemaker 
> > > is for managing resources which move between nodes.
> > 
> > It's still helpful to configure galera as a clone in the cluster.
> > That
> > way, Pacemaker can monitor it and restart it on errors, it will
> > respect
> > things like maintenance mode and standby, and it can be used in
> > ordering constraints with other resources, as well as advanced
> > features
> > such as node utilization.
> > 
> > > 
> > > If you want something that ensures processes are running on
> > > machines, 
> > > irrespective of where the floating IP is, look at monit - it's
> > very
> > > simple, 
> > > easy to configure and knows how to manage resources which should
> > run
> > > all the 
> > > time.
> > > 
> > > > Do you have any guide that pack this everything together?
> > > 
> > > No; I've largely made this stuff up myself as I've needed it.
> > > 
> > > 
> > > Antony.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] MySQL cluster with auto failover

2023-09-11 Thread Ken Gaillot

On Thu, 2023-09-07 at 10:27 +0100, Antony Stone wrote:
> On Wednesday 06 September 2023 at 17:01:24, Damiano Giuliani wrote:
> 
> > Everything is clear now.
> > So the point is to use pacemaker and create the floating vip and
> > bind it to
> > sqlproxy to health check and route the traffic to the available and
> > healthy
> > galera nodes.
> 
> Good summary.
> 
> > It could be useful let pacemaker manage also galera services?
> 
> No; MySQL / Galera needs to be running on all nodes all the
> time.  Pacemaker 
> is for managing resources which move between nodes.

It's still helpful to configure galera as a clone in the cluster. That
way, Pacemaker can monitor it and restart it on errors, it will respect
things like maintenance mode and standby, and it can be used in
ordering constraints with other resources, as well as advanced features
such as node utilization.

> 
> If you want something that ensures processes are running on
> machines, 
> irrespective of where the floating IP is, look at monit - it's very
> simple, 
> easy to configure and knows how to manage resources which should run
> all the 
> time.
> 
> > Do you have any guide that pack this everything together?
> 
> No; I've largely made this stuff up myself as I've needed it.
> 
> 
> Antony.
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Centreon HA Cluster - VIP issue

2023-09-05 Thread Ken Gaillot

On Tue, 2023-09-05 at 21:13 +0100, Adil Bouazzaoui wrote:
> Hi Ken,
> 
> thank you a big time for the feedback; much appreciated.
> 
> I suppose we go with a new Scenario 3: Setup 2 Clusters across
> different DCs connected by booth; so could you please clarify below
> points to me so i can understand better and start working on the
> architecture:
> 
> 1- in case of separate clusters connected by booth: should each
> cluster have a quorum device for the Master/slave elections?

Hi,

Only one arbitrator is needed for everything.

Since each cluster in this case has two nodes, Corosync will use the
"two_node" configuration to determine quorum. When first starting the
cluster, both nodes must come up before quorum is obtained. After then,
only one node is required to keep quorum -- which means that fencing is
essential to prevent split-brain.

> 2- separate floating IPs at each cluster: please check the attached
> diagram and let me know if this is exactly what you mean?

Yes, that looks good

> 3- To fail over, you update the DNS to point to the appropriate IP:
> can you suggest any guide to work on so we can have the DNS updated
> automatically?

Unfortunately I don't know of any. If your DNS provider offers an API
of some kind, you can write a resource agent that uses it. If you're
running your own DNS servers, the agent has to update the zone files
appropriately and reload.

Depending on what your services are, it might be sufficient to use a
booth ticket for just the DNS resource, and let everything else stay
running all the time. For example it doesn't hurt anything for both
sites' floating IPs to stay up.

> Regards
> Adil Bouazzaoui
> 
> Le mar. 5 sept. 2023 à 16:48, Ken Gaillot  a
> écrit :
> > Hi,
> > 
> > The scenario you describe is still a challenging one for HA.
> > 
> > A single cluster requires low latency and reliable communication. A
> > cluster within a single data center or spanning data centers on the
> > same campus can be reliable (and appears to be what Centreon has in
> > mind), but it sounds like you're looking for geographical
> > redundancy.
> > 
> > A single cluster isn't appropriate for that. Instead, separate
> > clusters
> > connected by booth would be preferable. Each cluster would have its
> > own
> > nodes and fencing. Booth tickets would control which cluster could
> > run
> > resources.
> > 
> > Whatever design you use, it is pointless to put a quorum tie-
> > breaker at
> > one of the data centers. If that data center becomes unreachable,
> > the
> > other one can't recover resources. The tie-breaker (qdevice for a
> > single cluster or a booth arbitrator for multiple clusters) can be
> > very
> > lightweight, so it can run in a public cloud for example, if a
> > third
> > site is not available.
> > 
> > The IP issue is separate. For that, you will need separate floating
> > IPs
> > at each cluster, on that cluster's network. To fail over, you
> > update
> > the DNS to point to the appropriate IP. That is a tricky problem
> > without a universal automated solution. Some people update the DNS
> > manually after being alerted of a failover. You could write a
> > custom
> > resource agent to update the DNS automatically. Either way you'll
> > need
> > low TTLs on the relevant records.
> > 
> > On Sun, 2023-09-03 at 11:59 +, Adil BOUAZZAOUI wrote:
> > > Hello,
> > >  
> > > My name is Adil, I’m working for Tman company, we are testing the
> > > Centreon HA cluster to monitor our infrastructure for 13
> > companies,
> > > for now we are using the 100 IT license to test the platform,
> > once
> > > everything is working fine then we can purchase a license
> > suitable
> > > for our case.
> > >  
> > > We're stuck at scenario 2: setting up Centreon HA Cluster with
> > Master
> > > & Slave on a different datacenters.
> > > For scenario 1: setting up the Cluster with Master & Slave and
> > VIP
> > > address on the same network (VLAN) it is working fine.
> > >  
> > > Scenario 1: Cluster on Same network (same DC) ==> works fine
> > > Master in DC 1 VLAN 1: 172.30.9.230 /24
> > > Slave in DC 1 VLAN 1: 172.30.9.231 /24
> > > VIP in DC 1 VLAN 1: 172.30.9.240/24
> > > Quorum in DC 1 LAN: 192.168.253.230/24
> > > Poller in DC 1 LAN: 192.168.253.231/24
> > >  
> > > Scenario 2: Cluster on different networks (2 separate DCs
> > connected
> > > with VPN) ==> still not work

Re: [ClusterLabs] Centreon HA Cluster - VIP issue

2023-09-05 Thread Ken Gaillot

Hi,

The scenario you describe is still a challenging one for HA.

A single cluster requires low latency and reliable communication. A
cluster within a single data center or spanning data centers on the
same campus can be reliable (and appears to be what Centreon has in
mind), but it sounds like you're looking for geographical redundancy.

A single cluster isn't appropriate for that. Instead, separate clusters
connected by booth would be preferable. Each cluster would have its own
nodes and fencing. Booth tickets would control which cluster could run
resources.

Whatever design you use, it is pointless to put a quorum tie-breaker at
one of the data centers. If that data center becomes unreachable, the
other one can't recover resources. The tie-breaker (qdevice for a
single cluster or a booth arbitrator for multiple clusters) can be very
lightweight, so it can run in a public cloud for example, if a third
site is not available.

The IP issue is separate. For that, you will need separate floating IPs
at each cluster, on that cluster's network. To fail over, you update
the DNS to point to the appropriate IP. That is a tricky problem
without a universal automated solution. Some people update the DNS
manually after being alerted of a failover. You could write a custom
resource agent to update the DNS automatically. Either way you'll need
low TTLs on the relevant records.

On Sun, 2023-09-03 at 11:59 +, Adil BOUAZZAOUI wrote:
> Hello,
>  
> My name is Adil, I’m working for Tman company, we are testing the
> Centreon HA cluster to monitor our infrastructure for 13 companies,
> for now we are using the 100 IT license to test the platform, once
> everything is working fine then we can purchase a license suitable
> for our case.
>  
> We're stuck at scenario 2: setting up Centreon HA Cluster with Master
> & Slave on a different datacenters.
> For scenario 1: setting up the Cluster with Master & Slave and VIP
> address on the same network (VLAN) it is working fine.
>  
> Scenario 1: Cluster on Same network (same DC) ==> works fine
> Master in DC 1 VLAN 1: 172.30.9.230 /24
> Slave in DC 1 VLAN 1: 172.30.9.231 /24
> VIP in DC 1 VLAN 1: 172.30.9.240/24
> Quorum in DC 1 LAN: 192.168.253.230/24
> Poller in DC 1 LAN: 192.168.253.231/24
>  
> Scenario 2: Cluster on different networks (2 separate DCs connected
> with VPN) ==> still not working
> Master in DC 1 VLAN 1: 172.30.9.230 /24
> Slave in DC 2 VLAN 2: 172.30.10.230 /24
> VIP: example 102.84.30.XXX. We used a public static IP from our
> internet service provider, we thought that using a IP from a site
> network won't work, if the site goes down then the VIP won't be
> reachable!
> Quorum: 192.168.253.230/24
> Poller: 192.168.253.231/24
>  
>  
> Our goal is to have Master & Slave nodes on different sites, so when
> Site A goes down, we keep monitoring with the slave.
> The problem is that we don't know how to set up the VIP address? Nor
> what kind of VIP address will work? or how can the VIP address work
> in this scenario? or is there anything else that can replace the VIP
> address to make things work.
> Also, can we use a backup poller? so if the poller 1 on Site A goes
> down, then the poller 2 on Site B can take the lead?
>  
> we looked everywhere (The watch, youtube, Reddit, Github...), and we
> still couldn't get a workaround!
>  
> the guide we used to deploy the 2 Nodes Cluster: 
> https://docs.centreon.com/docs/installation/installation-of-centreon-ha/overview/
>  
> attached the 2 DCs architecture example, and also most of the
> required screenshots/config.
>  
>  
> We appreciate your support.
> Thank you in advance.
>  
>  
>  
> Regards
> Adil Bouazzaoui
>  
>Adil BOUAZZAOUI Ingénieur Infrastructures & Technologies   
>  GSM : +212 703 165 758 E-mail  : adil.bouazza...@tmandis.ma 
>  
>  
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] corosync 2.4 and 3.0 in one cluster.

2023-09-05 Thread Ken Gaillot

On Fri, 2023-09-01 at 17:56 +0300, Мельник Антон wrote:
> Hello,
> 
> I have a cluster with two nodes with corosync version 2.4 installed
> there.
> I need to upgrade to corosync version 3.0 without shutting down the
> cluster.

Hi,

It's not possible for Corosync 2 and 3 nodes to form a cluster. They're
"wire-incompatible".

> I thought to do it in this way:
> 1. Stop HA on the first node, do upgrade to newer version of Linux
> with upgrade corosync, change corosync config.
> 2. Start upgraded node and migrate resources there.
> 3. Do upgrade on the second node.

You could still do something similar if you use two separate clusters.
You'd remove the first node from the cluster configuration (Corosync
and Pacemaker) before shutting it down, and create a new cluster on it
after upgrading. The new cluster would have itself as the only node,
and all resources would be disabled, but otherwise it would be
identical. Then you could manually migrate resources by disabling them
on the second node and enabling them on the first.

You could even automate it using booth. To migrate the resources, you'd
just have to reassign the ticket.

> Currently on version 2.4 corosync is configured with udpu transport
> and crypto_hash set to sha256.
> As far as I know version 3.0 does not support udpu with configured
> options crypto_hash and crypto_cipher.
> The question is how to allow communication between corosync instances
> with version 2 and 3, if corosync version 2 is configured with
> crypto_hash sha256.
> 
> 
> Thanks,
> Anton.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Coming in Pacemaker 2.1.7: Pacemaker Remote nodes honor PCMK_node_start_state

2023-08-28 Thread Ken Gaillot

Hi all,

The Pacemaker 2.1.7 release, expected in a couple of months, will
primarily be a bug fix release.

One new feature is that the PCMK_node_start_state start-up variable
(set in /etc/sysconfig, /etc/default, etc.) will support Pacemaker
Remote nodes. Previously, it was supported only for full cluster nodes.
It lets you tell the cluster that a new node should start in standby
mode when it is added.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1969 matches

Mail list logo