[ClusterLabs] Detecting pacemaker version incompatibility during node rebuild

2024-06-13 Thread Madison Kelly

Hi all,

  I'm working on a tool to rebuild a node that was lost. Given this 
scenario, upgrading the surviving node is not viable (at least, not 
until after the rebuild is completed and the services can be migrated).


  I ran into a problem where 'pcs cluster start' exits with RC 0, and 
it _looks_ like the cluster is starting, but then it exits without a 
message on STDOUT. In the logs though, I can see this;



Jun 13 22:35:04 an-a01n02.alteeve.com pacemaker-controld[105161]: 
notice: Node an-a01n01 state is now member
Jun 13 22:35:04 an-a01n02.alteeve.com pacemaker-controld[105161]: error: 
Local feature set (3.17.4) is incompatible with DC's (3.19.0)
Jun 13 22:35:04 an-a01n02.alteeve.com pacemaker-controld[105161]: 
notice: Forcing immediate exit with status 100 (Fatal error occurred, 
will not respawn)
Jun 13 22:35:04 an-a01n02.alteeve.com pacemaker-controld[105161]: 
warning: Inhibiting respawn



  So I have two questions;

1. Is there a way to test (using pcs or another tool) to see if the 
local machine is compatible with the peer?


2. If the node being rebuilt isn't compatible, is there a way to tell it 
to start in a compatibility mode, or to tell the surviving peer to 
switch to a compatibility mode? Which depending on which is newer.


  Of course, in this particular test case, the node being rebuilt is 
behind the survivor, so the fix here is a simple update of pacemaker 
before rejoining. However in the real world, it's far more likely that 
the node being joined will be a newer version.


  The reason for this is that a large number of our deployments are in 
location with no or limited internet. So keeping the active cluster 
regularly updated is not feasible (and some clients "lock" their 
deployments to approved/tested versions).


Thanks for any hints/tips!

Madi

--
wiki - https://alteeve.com/w
cell - 647-471-0951
work - 647-417-7486 x 404

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] lost corosync/pacemaker pair

2024-06-13 Thread Ken Gaillot
On Thu, 2024-06-13 at 03:22 +, eli estrella wrote:
> Hello.
> I recently lost one of my LB servers running in a corosync/pacemaker
> pair, would it be possible to clone the live one to create the lost
> pair, changing the IP, hostname etc?
> Thanks for any help you can provide.
> 

Yes, that should be fine as far as the cluster goes. Of course your
specific resources may have other needs (especially a database or
clustered file system).
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] lost corosync/pacemaker pair

2024-06-13 Thread eli estrella
Hello.
I recently lost one of my LB servers running in a corosync/pacemaker pair, 
would it be possible to clone the live one to create the lost pair, changing 
the IP, hostname etc?
Thanks for any help you can provide.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/