[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-12 Thread Anthony D'Atri
I halfway suspect that something akin to the speculation in https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/7MWAHAY7NCJK2DHEGO6MO4SWTLPTXQMD/ is going on. Below are reservations reported by a random OSD that serves (mostly) an EC RGW bucket pool. This is with the mclock

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-10 Thread Matthew Darwin
We have had pgs get stuck  in quincy (17.2.7).  After changing to wpq, no such problems were observed.  We're using a replicated (x3) pool. On 2024-05-02 10:02, Wesley Dillingham wrote: In our case it was with a EC pool as well. I believe the PG state was degraded+recovering / recovery_wait

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-02 Thread Wesley Dillingham
In our case it was with a EC pool as well. I believe the PG state was degraded+recovering / recovery_wait and iirc the PGs just simply sat in the recovering state without any progress (degraded PG object count did not decline). A repeer of the PG was attempted but no success there. A restart of

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-02 Thread Sridhar Seshasayee
> > Multiple people -- including me -- have also observed backfill/recovery > stop completely for no apparent reason. > > In some cases poking the lead OSD for a PG with `ceph osd down` restores, > in other cases it doesn't. > > Anecdotally this *may* only happen for EC pools on HDDs but that

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-02 Thread Anthony D'Atri
>> For our customers we are still disabling mclock and using wpq. Might be >> worth trying. >> >> > Could you please elaborate a bit on the issue(s) preventing the > use of mClock. Is this specific to only the slow backfill rate and/or other > issue? > > This feedback would help prioritize

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-02 Thread Mark Nelson
Hi Sridhar, (Very!) Slow backfill was one issue, but if I recall we hit a case where backfill wasn't completing at all until we reverted to WPQ. I was getting hammered with other stuff at the time so I don't quite remember the details, but Dan might. I think this was in Quincy after the

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-01 Thread Sridhar Seshasayee
Hi Mark, On Thu, May 2, 2024 at 3:18 AM Mark Nelson wrote: > For our customers we are still disabling mclock and using wpq. Might be > worth trying. > > Could you please elaborate a bit on the issue(s) preventing the use of mClock. Is this specific to only the slow backfill rate and/or other

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-01 Thread Sridhar Seshasayee
Hi Götz, Please see my response below. On Tue, Apr 30, 2024 at 7:39 PM Pierre Riteau wrote: > Hi Götz, > > You can change the value of osd_max_backfills (for all OSDs or specific > ones) using `ceph config`, but you need > enable osd_mclock_override_recovery_settings. See > >

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-01 Thread Mark Nelson
For our customers we are still disabling mclock and using wpq. Might be worth trying. Mark On 4/30/24 09:08, Pierre Riteau wrote: Hi Götz, You can change the value of osd_max_backfills (for all OSDs or specific ones) using `ceph config`, but you need enable

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-04-30 Thread Pierre Riteau
Hi Götz, You can change the value of osd_max_backfills (for all OSDs or specific ones) using `ceph config`, but you need enable osd_mclock_override_recovery_settings. See https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/#steps-to-modify-mclock-max-backfills-recovery-limits