We went through this exercise, though our starting point was ubuntu 16.04 /
nautilus. We reduced our double builds as follows:
1. Rebuild each monitor host on 18.04/bionic and rejoin still on nautilus
2. Upgrade all mons, mgrs., (and rgws optionally) to pacific
3. Convert each mon, mgr, rgw to cephadm and enable orchestrator
4. Rebuild each mon, mgr, rgw on 20.04/focal and rejoin pacfic cluster
5. Drain and rebuild each osd host on focal and pacific
This has the advantage of only having to drain and rebuild the OSD hosts once.
Double building the control cluster hosts isn’t so bad, and orchestrator makes
all of the ceph parts easy once it’s enabled.
The biggest challenge we ran into was: https://tracker.ceph.com/issues/51652
because we still had a lot of filestore osds. It’s frustrating, but we managed
to get through it without much client interruption on a dozen prod clusters,
most of which were 38 osd hosts and 912 total osds each. One thing which
helped, was, before beginning the osd host builds, set all of the old osds
primary-affinity to something <1. This way when the new pacific (or octopus)
osds join the cluster they will automatically be favored for primary on their
pgs. If a heartbeat timeout storm starts to get out of control, start by
setting nodown and noout. The flapping osds are the worst. Then figure out
which osds are the culprit and restart them.
Hopefully your nautilus osds are all bluestore and you won’t have this problem.
We put up with it, because the filestore to bluestore conversion was one of
the most important parts of this upgrade for us.
Best of luck, whatever route you take.
Regards,
Josh Beaman
From: Götz Reinicke
Date: Tuesday, August 1, 2023 at 1:01 PM
To: ceph-users@ceph.io
Subject: [EXTERNAL] [ceph-users] Upgrading nautilus / centos7 to octopus /
ubuntu 20.04. - Suggestions and hints?
Hi,
As I’v read and thought a lot about the migration as this is a bigger project,
I was wondering if anyone has done that already and might share some notes or
playbooks, because in all readings there where some parts missing or miss
understandable to me.
I do have some different approaches in mind, so may be you have some
suggestions or hints.
a) upgrade nautilus on centos 7 with the few missing features like dashboard
and prometheus. After that migrate one node after an other to ubuntu 20.04 with
octopus and than upgrade ceph to the recent stable version.
b) migrate one node after an other to ubuntu 18.04 with nautilus and then
upgrade to octupus and after that to ubuntu 20.04.
or
c) upgrade one node after an other to ubuntu 20.04 with octopus and join it to
the cluster until all nodes are upgraded.
For test I tried c) with a mon node, but adding that to the cluster fails with
some failed state, still probing for the other mons. (I dont have the right log
at hand right now.)
So my questions are:
a) What would be the best (most stable) migration path and
b) is it in general possible to add a new octopus mon (not upgraded one) to a
nautilus cluster, where the other mons are still on nautilus?
I hope my thoughts and questions are understandable :)
Thanks for any hint and suggestion. Best . Götz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io