On Sat, Jan 13, 2024 at 09:07:52AM -0600, Billy Croan wrote:
> I'm planning to migrate a two-node cluster off CentOS 7 this year.  I think
> I'm taking it to Debian Stable, but open for suggestions if any
> distribution is better supported by pacemaker.
> 
> Have any of you had success doing major upgrades (bullseye to bookworm on
> Debian) of your physical nodes one at a time while each node is in
> standby+maintenance, and rolling the vm from one to the other so it doesn't
> reboot while the hosts are upgraded?  That has worked well for me for minor
> OS updates, but I'm curious about the majors.

We did it with OpenStack but since our upgrade also implied a major corosync
upgrade we *had* to bring down the whole plane. I am not sure this
changed recently, but a few years ago at least you could not mix and
match nodes with corosync 2.x and 3.x versions. So make sure you put
that aspect into consideration.
> 
> My project this year is even more major, not just upgrading the OS but
> changing distributions.
> 
> I think I have three possible ways I can try this:
> 1) wipe all server disks and start fresh.
> 
> 2) standby and maintenance one node, then reinstall it with a new OS and
> make a New Cluster.  shutdown the vm and copy it, offline, to the new
> one-node cluster. and start it up there. Then once that's working, wipe and
> reinstall the other node, and add it to the new cluster.
> 
> 3) standby and maintenance one node, then Remove it from the cluster.  Then
> reinstall it with the new distribution's OS.  Then re-add it to the
> Existing Cluster.  Move the vm resource to it and verify it's working, then
> do the same with the other physical node, and take it out of standby&maint
> to finish.
> 
> (Obviously any of those methods begin with a full backup to offsite and
> local media. and end with a verification against that backup.)
> 
> #1 would be the longest outage but the "cleanest result"
> #3 would be possibly no outage, but I think the least likely to work.  I
> understand EL uses pcs and debian uses crm for example...
> #2 is a compromise that should(tm) have only a few seconds of outage.  But
> could blow up i suppose.  They all could blow up though so I'm not sure
> that should play a factor in the decision.
> 
> I can't be the first person to go down this path.  So what do you all
> think?  how have you done it in the past?

> -- 
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/


-- 
Michele Baldessari            <mich...@acksyn.org>
C2A5 9DA3 9961 4FFB E01B  D0BC DDD4 DCCB 7515 5C6D
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to