Hi again, Oops, I'd missed the part about some PGs being degraded, which prevents the balancer from continuing.
So I assume that you have PGs which are simultaneously undersized+backfill_toofull? That case does indeed sound tricky. To solve that you would either need to move PGs out of the toofull OSD, to make room for the undersized PGs; or, upmap those undersized PGs to some other less-full OSDs. For the former, you could either use the rm-upmaps-underfull script and hope that it incidentally moves data out of those toofull OSDs. Or a similar script with some variables reversed could be used to remove any upmaps which are directing PGs *to* those toofull OSDs. Or maybe it will be enough to just reweight those OSDs to 0.9. -- Dan On Fri, Apr 2, 2021 at 10:47 AM Dan van der Ster <d...@vanderster.com> wrote: > > Hi Andras. > > Assuming that you've already tightened the > mgr/balancer/upmap_max_deviation to 1, I suspect that this cluster > already has too many upmaps. > > Last time I checked, the balancer implementation is not able to > improve a pg-upmap-items entry if one already exists for a PG. (It can > add an OSD mapping pair to an PG, but not change an existing pair from > one osd to another). > So I think that what happens in this case is the balancer gets stuck > in a sort of local minimum in the overall optimization. > > It can therefore help to simply remove some upmaps, and then wait for > the balancer to do a better job when it re-creates new entries for > those PGs. > And there's usually some low hanging fruit -- you can start by > removing pg-upmap-items which are mapping PGs away from the least full > OSDs. (Those upmap entries are making the least full OSDs even *less* > full.) > > We have a script for that: > https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/rm-upmaps-underfull.py > It's a pretty hacky and I don't use it often, so please use it with > caution -- you can run it and review which upmaps it would remove. > > Hope this helps, > > Dan > > > > On Fri, Apr 2, 2021 at 10:18 AM Andras Pataki > <apat...@flatironinstitute.org> wrote: > > > > Dear ceph users, > > > > On one of our clusters I have some difficulties with the upmap > > balancer. We started with a reasonably well balanced cluster (using the > > balancer in upmap mode). After a node failure, we crush reweighted all > > the OSDs of the node to take it out of the cluster - and waited for the > > cluster to rebalance. Obviously, this significantly changes the crush > > map - hence the nice balance created by the balancer was gone. The > > recovery mostly completed - but some of the OSDs became too full - so we > > neded up with a few PGs that were backfill_toofull. The cluster has > > plenty of space (overall perhaps 65% full), only a few OSDs are >90% (we > > have backfillfull_ratio at 92%). The balancer refuses to change > > anything since the cluster is not clean. Yet - the cluster can't become > > clean without a few upmaps to help the top 3 or 4 most full OSDs. > > > > I would think this is a fairly common situation - trying to recover > > after some failure. Are there any recommendations on how to proceed? > > Obviously I can manually find and insert upmaps - but for a large > > cluster with tens of thousands of PGs, that isn't too practical. Is > > there a way to tell the balancer to still do something even though some > > PGs are undersized (with a quick look at the python module - I didn't > > see any)? > > > > The cluster is on Nautilus 14.2.15. > > > > Thanks, > > > > Andras > > _______________________________________________ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io