Thank You! I will see about trying these out, probably using your suggestion of several iterations with #1 and then #3.
________________________________ From: Stefan Kooman <ste...@bit.nl> Sent: Monday, August 29, 2022 1:38 AM To: Wyll Ingersoll <wyllys.ingers...@keepertech.com>; ceph-users@ceph.io <ceph-users@ceph.io> Subject: Re: [ceph-users] OSDs growing beyond full ratio On 8/28/22 17:30, Wyll Ingersoll wrote: > We have a pacific cluster that is overly filled and is having major trouble > recovering. We are desperate for help in improving recovery speed. We have > modified all of the various recovery throttling parameters. > > The full_ratio is 0.95 but we have several osds that continue to grow and are > approaching 100% utilization. They are reweighted to almost 0, but yet > continue to grow. > Why is this happening? I thought the cluster would stop writing to the osd > when it was at above the full ratio. > > > We have added additional capacity to the cluster but the new OSDs are being > used very very slowly. The primary pool in the cluster is the RGW data pool > which is a 12+4 EC pool using "host" placement rules across 18 hosts, 2 new > hosts with 20x10TB osds each were recently added but they are only very very > slowly being filled up. I don't see how to force recovery on that particular > pool. From what I understand, we cannot modify the EC parameters without > destroying the pool and we cannot offload that pool to any others because > there is no other place to store the amount of data. > > > We have been running "ceph osd reweight-by-utilization" periodically and it > works for a while (a few hours) but then recovery and backfill IO numbers > drop to negligible values. > > The balancer module will not run because the current misplaced % is about 97%. > > Would it be more effective to use the osmaptool and generate a bunch of upmap > commands to manually move data around or keep trying to get > reweight-by-utlilization to work? I would use the script: upmap-remapped.py [1] to get your cluster healthy again, and after that pgremapper [2] to drain PGs from the full OSDs. At a certain point (usage) you might want to let the Ceph balancer do it's thing. But from experience I can tell that Jonas Jelten ceph-balancer script is currently doing a way better job [3]. Search the list for the use / usage of the scripts (or use a search engine). With upmaps you have more control on where PGs should go. You might want to skip step [2] and directly try ceph-balancer [3]. Gr. Stefan [1]: https://gitlab.cern.ch/ceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py [2]: https://github.com/digitalocean/pgremapper/ [3]: https://github.com/TheJJ/ceph-balancer _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io