[ceph-users] Re: OSDs growing beyond full ratio

Wyll Ingersoll Mon, 29 Aug 2022 08:26:55 -0700

Thank You!

I will see about trying these out, probably using your suggestion of several 
iterations with #1 and then #3.



________________________________
From: Stefan Kooman <ste...@bit.nl>
Sent: Monday, August 29, 2022 1:38 AM
To: Wyll Ingersoll <wyllys.ingers...@keepertech.com>; ceph-users@ceph.io 
<ceph-users@ceph.io>
Subject: Re: [ceph-users] OSDs growing beyond full ratio

On 8/28/22 17:30, Wyll Ingersoll wrote:
> We have a pacific cluster that is overly filled and is having major trouble 
> recovering.  We are desperate for help in improving recovery speed.  We have 
> modified all of the various recovery throttling parameters.
>
> The full_ratio is 0.95 but we have several osds that continue to grow and are 
> approaching 100% utilization.  They are reweighted to almost 0, but yet 
> continue to grow.
> Why is this happening?  I thought the cluster would stop writing to the osd 
> when it was at above the full ratio.
>
>
> We have added additional capacity to the cluster but the new OSDs are being 
> used very very slowly.  The primary pool in the cluster is the RGW data pool 
> which is a 12+4 EC pool using "host" placement rules across 18 hosts, 2 new 
> hosts with 20x10TB osds each were recently added but they are only very very 
> slowly being filled up.  I don't see how to force recovery on that particular 
> pool.   From what I understand, we cannot modify the EC parameters without 
> destroying the pool and we cannot offload that pool to any others because 
> there is no other place to store the amount of data.
>
>
> We have been running "ceph osd reweight-by-utilization"  periodically and it 
> works for a while (a few hours) but then recovery and backfill IO numbers 
> drop to negligible values.
>
> The balancer module will not run because the current misplaced % is about 97%.
>
> Would it be more effective to use the osmaptool and generate a bunch of upmap 
> commands to manually move data around or keep trying to get 
> reweight-by-utlilization to work?

I would use the script: upmap-remapped.py [1] to get your cluster
healthy again, and after that pgremapper [2] to drain PGs from the full
OSDs. At a certain point (usage) you might want to let the Ceph balancer
do it's thing. But from experience I can tell that Jonas Jelten
ceph-balancer script is currently doing a way better job [3]. Search the
list for the use / usage of the scripts (or use a search engine). With
upmaps you have more control on where PGs should go. You might want to
skip step [2] and directly try ceph-balancer [3].

Gr. Stefan

[1]:
https://gitlab.cern.ch/ceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py
[2]: https://github.com/digitalocean/pgremapper/
[3]: https://github.com/TheJJ/ceph-balancer
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSDs growing beyond full ratio

Reply via email to