[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

Eugen Block Wed, 06 Apr 2022 23:16:02 -0700

Basically, these are the steps to remove all OSDs from that host (OSDsare not "replaced" so they aren't marked "destroyed") [1]:


1) Call 'ceph osd out $id'
2) Call systemctl stop ceph-osd@$id
3) ceph osd purge $id --yes-i-really-mean-it
4) call ceph-volume lvm zap --osd-id $id --destroy

After all disks have been wiped there's a salt runner to deploy allavailable OSDs on that host again[2]. All OSDs are created with anormal weight. All OSD restarts I did were on different hosts, not onthe rebuilt host. The only difference I can think of that may have animpact is that this cluster consists of two datacenters, the otherswere not devided into several buckets. Could that be an issue?

[1]https://github.com/SUSE/DeepSea/blob/master/srv/modules/runners/osd.py#L179

[2] https://github.com/SUSE/DeepSea/blob/master/srv/salt/_modules/dg.py#L1396

Zitat von Josh Baergen <jbaer...@digitalocean.com>:

On Wed, Apr 6, 2022 at 11:20 AM Eugen Block <ebl...@nde.ag> wrote:

I'm pretty sure that their cluster isn't anywhere near the limit for
mon_max_pg_per_osd, they currently have around 100 PGs per OSD and the
configs have not been touched, it's pretty basic.


How is the host being "rebuilt"? Depending on the CRUSH rule, if the
host's OSDs are all marked destroyed and then re-created one at a time
with normal weight, CRUSH may decide to put a large number of PGs on
the first OSD that is created, and so on, until the rest of the host's
OSDs are available to take those OSDs.

Josh




_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph PGs stuck inactive after rebuild node

Reply via email to