On 8/28/22 17:30, Wyll Ingersoll wrote:
We have a pacific cluster that is overly filled and is having major trouble
recovering. We are desperate for help in improving recovery speed. We have
modified all of the various recovery throttling parameters.
The full_ratio is 0.95 but we have several osds that continue to grow and are
approaching 100% utilization. They are reweighted to almost 0, but yet
continue to grow.
Why is this happening? I thought the cluster would stop writing to the osd
when it was at above the full ratio.
We have added additional capacity to the cluster but the new OSDs are being used very
very slowly. The primary pool in the cluster is the RGW data pool which is a 12+4 EC
pool using "host" placement rules across 18 hosts, 2 new hosts with 20x10TB
osds each were recently added but they are only very very slowly being filled up. I
don't see how to force recovery on that particular pool. From what I understand, we
cannot modify the EC parameters without destroying the pool and we cannot offload that
pool to any others because there is no other place to store the amount of data.
We have been running "ceph osd reweight-by-utilization" periodically and it
works for a while (a few hours) but then recovery and backfill IO numbers drop to
negligible values.
The balancer module will not run because the current misplaced % is about 97%.
Would it be more effective to use the osmaptool and generate a bunch of upmap
commands to manually move data around or keep trying to get
reweight-by-utlilization to work?
I would use the script: upmap-remapped.py [1] to get your cluster
healthy again, and after that pgremapper [2] to drain PGs from the full
OSDs. At a certain point (usage) you might want to let the Ceph balancer
do it's thing. But from experience I can tell that Jonas Jelten
ceph-balancer script is currently doing a way better job [3]. Search the
list for the use / usage of the scripts (or use a search engine). With
upmaps you have more control on where PGs should go. You might want to
skip step [2] and directly try ceph-balancer [3].
Gr. Stefan
[1]:
https://gitlab.cern.ch/ceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py
[2]: https://github.com/digitalocean/pgremapper/
[3]: https://github.com/TheJJ/ceph-balancer
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io