On Sat, Jan 13, 2024 at 09:07:52AM -0600, Billy Croan wrote: > I'm planning to migrate a two-node cluster off CentOS 7 this year. I think > I'm taking it to Debian Stable, but open for suggestions if any > distribution is better supported by pacemaker. > > Have any of you had success doing major upgrades (bullseye to bookworm on > Debian) of your physical nodes one at a time while each node is in > standby+maintenance, and rolling the vm from one to the other so it doesn't > reboot while the hosts are upgraded? That has worked well for me for minor > OS updates, but I'm curious about the majors.
We did it with OpenStack but since our upgrade also implied a major corosync upgrade we *had* to bring down the whole plane. I am not sure this changed recently, but a few years ago at least you could not mix and match nodes with corosync 2.x and 3.x versions. So make sure you put that aspect into consideration. > > My project this year is even more major, not just upgrading the OS but > changing distributions. > > I think I have three possible ways I can try this: > 1) wipe all server disks and start fresh. > > 2) standby and maintenance one node, then reinstall it with a new OS and > make a New Cluster. shutdown the vm and copy it, offline, to the new > one-node cluster. and start it up there. Then once that's working, wipe and > reinstall the other node, and add it to the new cluster. > > 3) standby and maintenance one node, then Remove it from the cluster. Then > reinstall it with the new distribution's OS. Then re-add it to the > Existing Cluster. Move the vm resource to it and verify it's working, then > do the same with the other physical node, and take it out of standby&maint > to finish. > > (Obviously any of those methods begin with a full backup to offsite and > local media. and end with a verification against that backup.) > > #1 would be the longest outage but the "cleanest result" > #3 would be possibly no outage, but I think the least likely to work. I > understand EL uses pcs and debian uses crm for example... > #2 is a compromise that should(tm) have only a few seconds of outage. But > could blow up i suppose. They all could blow up though so I'm not sure > that should play a factor in the decision. > > I can't be the first person to go down this path. So what do you all > think? how have you done it in the past? > -- > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Michele Baldessari <mich...@acksyn.org> C2A5 9DA3 9961 4FFB E01B D0BC DDD4 DCCB 7515 5C6D _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/