OSDs are bluestore on HDD with SSD for DB/WAL.  We already tuned the sleep_hdd 
to 0 and cranked up the max_backfills and recovery parameters to much higher 
values.


________________________________
From: Josh Baergen <jbaer...@digitalocean.com>
Sent: Tuesday, August 30, 2022 9:46 AM
To: Wyll Ingersoll <wyllys.ingers...@keepertech.com>
Cc: Dave Schulz <dsch...@ucalgary.ca>; ceph-users@ceph.io <ceph-users@ceph.io>
Subject: Re: [ceph-users] Re: OSDs growing beyond full ratio

Hey Wyll,

I haven't been following this thread very closely so my apologies if
this has already been covered: Are the OSDs on HDDs or SSDs (or
hybrid)? If HDDs, you may want to look at decreasing
osd_recovery_sleep_hdd and increasing osd_max_backfills. YMMV, but
I've seen osd_recovery_sleep_hdd=0.01 and osd_max_backfills=6 work OK
on Bluestore HDDs. This would help speed up the data movements.

If it's a hybrid setup, I'm sure you could apply similar tweaks. Sleep
is already 0 for SSDs but you may be able to increase max_backfills
for some gains.

Josh

On Tue, Aug 30, 2022 at 7:31 AM Wyll Ingersoll
<wyllys.ingers...@keepertech.com> wrote:
>
>
> Yes, this cluster has both - a large cephfs FS (60TB) that is replicated 
> (2-copy) and a really large RGW data pool that is EC (12+4).  We cannot 
> currently delete any data from either of them because commands to access them 
> are not responsive.  The cephfs will not mount and radosgw-admin just hangs.
>
> We have several OSDs that are >99% full and keep approaching 100, even after 
> reweighting them to 0. There is no client activity in this cluster at this 
> point (its dead), but lots of rebalance and repairing going on) so data is 
> moving around.
>
> We are currently trying to use upmap commands to relocate PGs in to attempt 
> to balance things better and get it moving again, but progress is glacially 
> slow.
>
> ________________________________
> From: Dave Schulz <dsch...@ucalgary.ca>
> Sent: Monday, August 29, 2022 10:42 PM
> To: Wyll Ingersoll <wyllys.ingers...@keepertech.com>; ceph-users@ceph.io 
> <ceph-users@ceph.io>
> Subject: Re: [ceph-users] Re: OSDs growing beyond full ratio
>
> Hi Wyll,
>
> Any chance you're using CephFS and have some really large files in the
> CephFS filesystem?  Erasure coding? I recently encountered a similar
> problem and as soon as the end-user deleted the really large files our
> problem became much more managable.
>
> I had issues reweighting OSDs too and in the end I changed the crush
> weights and had to chase them around every couple of days reweighting
> the OSDs >70% to zero and then setting them back to 12 when they were
> mostly empty (12TB spinning rust buckets).  Note that I'm really not
> recommending this course of action it's just the only option that seemed
> to have any effect.
>
> -Dave
>
> On 2022-08-29 3:00 p.m., Wyll Ingersoll wrote:
> > [△EXTERNAL]
> >
> >
> >
> > Can anyone explain why OSDs (ceph pacific, bluestore osds) continue to grow 
> > well after they have exceeded the "full" level (95%) and is there any way 
> > to stop this?
> >
> > "The full_ratio is 0.95 but we have several osds that continue to grow and 
> > are approaching 100% utilization.  They are reweighted to almost 0, but yet 
> > continue to grow.
> > Why is this happening?  I thought the cluster would stop writing to the osd 
> > when it was at above the full ratio."
> >
> > thanks...
> >
> > ________________________________
> > From: Wyll Ingersoll <wyllys.ingers...@keepertech.com>
> > Sent: Monday, August 29, 2022 9:24 AM
> > To: Jarett <starkr...@gmail.com>; ceph-users@ceph.io <ceph-users@ceph.io>
> > Subject: [ceph-users] Re: OSDs growing beyond full ratio
> >
> >
> > I would think so, but it isn't happening nearly fast enough.
> >
> > It's literally been over 10 days with 40 new drives across 2 new servers 
> > and they barely have any PGs yet. A few, but not nearly enough to help with 
> > the imbalance.
> > ________________________________
> > From: Jarett <starkr...@gmail.com>
> > Sent: Sunday, August 28, 2022 8:19 PM
> > To: Wyll Ingersoll <wyllys.ingers...@keepertech.com>; ceph-users@ceph.io 
> > <ceph-users@ceph.io>
> > Subject: RE: [ceph-users] OSDs growing beyond full ratio
> >
> >
> > Isn’t rebalancing onto the empty OSDs default behavior?
> >
> >
> >
> > From: Wyll Ingersoll<mailto:wyllys.ingers...@keepertech.com>
> > Sent: Sunday, August 28, 2022 10:31 AM
> > To: ceph-users@ceph.io<mailto:ceph-users@ceph.io>
> > Subject: [ceph-users] OSDs growing beyond full ratio
> >
> >
> >
> > We have a pacific cluster that is overly filled and is having major trouble 
> > recovering.  We are desperate for help in improving recovery speed.  We 
> > have modified all of the various recovery throttling parameters.
> >
> >
> >
> > The full_ratio is 0.95 but we have several osds that continue to grow and 
> > are approaching 100% utilization.  They are reweighted to almost 0, but yet 
> > continue to grow.
> >
> > Why is this happening?  I thought the cluster would stop writing to the osd 
> > when it was at above the full ratio.
> >
> >
> >
> >
> >
> > We have added additional capacity to the cluster but the new OSDs are being 
> > used very very slowly.  The primary pool in the cluster is the RGW data 
> > pool which is a 12+4 EC pool using "host" placement rules across 18 hosts, 
> > 2 new hosts with 20x10TB osds each were recently added but they are only 
> > very very slowly being filled up.  I don't see how to force recovery on 
> > that particular pool.   From what I understand, we cannot modify the EC 
> > parameters without destroying the pool and we cannot offload that pool to 
> > any others because there is no other place to store the amount of data.
> >
> >
> >
> >
> >
> > We have been running "ceph osd reweight-by-utilization"  periodically and 
> > it works for a while (a few hours) but then recovery and backfill IO 
> > numbers drop to negligible values.
> >
> >
> >
> > The balancer module will not run because the current misplaced % is about 
> > 97%.
> >
> >
> >
> > Would it be more effective to use the osmaptool and generate a bunch of 
> > upmap commands to manually move data around or keep trying to get 
> > reweight-by-utlilization to work?
> >
> >
> >
> > Any suggestions (other than deleting data which we cannot do at this point, 
> > the pools are not accessible) or adding more storage (we already did and it 
> > is not being utilized very heavily yet for some reason).
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> >
> > ceph-users mailing list -- ceph-users@ceph.io
> >
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to