Hi Jan, Using mclock scheduler you can do the following:
set osd_mclock_override_recovery_settings to true e.g. ~ # ceph config set osd osd_mclock_override_recovery_settings true and then increase max backfills like ~ # ceph config set osd osd_max_backfills 5 It will instantly increase backfills. I suggest starting with a low number of max backfills and monitor how your cluster behaves before increasing further. Important: When your backfill and recovery is done lower max backfill again. ~ # ceph config set osd osd_max_backfills 2 and set overwrite back to false ~ # ceph config set osd osd_mclock_override_recovery_settings false Cheers Stephan Am Mi., 8. Okt. 2025 um 07:08 Uhr schrieb Kirby Haze <[email protected] >: > I will assume you are running the default mclock scheduler and not wpq. I'm > not too familiar with tuning mclock settings but this is the docs to look > at > > https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/#recovery-backfill-options > > osd_max_backfills is set to 1 by default and this is the first thing I > would tune if you want faster backfilling. > > I would mainly look at this setting first before looking into the various > knobs mclock provides > > https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/#confval-osd_mclock_profile > > I use wpq and I have two main levers for backfilling > osd_max_backfills - higher is faster backfilling > osd_recovery_sleep + any other sleep setting - throttles recovery ops > > mclock doesn't use the sleep configs so I'm not too sure the various knobs > mclock has but the docs above have some good options to tweak. I would > probably try to experiment with the different mclock profiles to see if it > speeds up backfilling like the high recovery ops setting. > > https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/#high-recovery-ops > > > > On Tue, Oct 7, 2025 at 2:20 AM Jan Kasprzak <[email protected]> wrote: > > > Hello, Ceph users, > > > > on my new cluster which I filled with testing data two weeks ago > > there are many repmapped PGs in backfill_wait state, probably as result > > of autoscaling the number of PGs per pool. But the recovery speed > > is quite low, in order of small MB/s and < 10 obj/s according to ceph -s. > > > > The cluster is otherwise idle, no client traffic after initial import, > > so I wonder why the backfill does not progress faster. Also, it seems > like > > more pgs are getting remapped as existing ones get successfully > backfilled > > - the percentage of misplaced objects is steadily around 6 % for the last > > two weeks. > > > > The PGs waiting for backfill all belong to the biggest pool I have > > according to "ceph pg dump | grep backfill", no surprise here. > > The pool has 229 TB of data and currently 128 PGs. It is replicated > > with k=4 m=2. The second biggest pool has only 23 TB of data: > > > > rados df > > POOL_NAME USED OBJECTS CLONES COPIES > > MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS > WR > > USED COMPR UNDER COMPR > > pool_with_backfill 229 TiB 10086940 0 60521640 > > 0 0 0 0 0 B 72545009 54 TiB 0 B > > 0 B > > second_biggest_pool 23 TiB 1153174 0 6919044 > > 0 0 0 0 0 B 38506397 16 TiB 0 B > > 0 B > > [...] > > > > I tried to do "ceph osd pool force-backfill $pool", it helped to speed > > things up a bit, but it still runs at 50-200 MB/s and 4-20 obj/s. > > The initial data import ran at around 600 MB/s. > > > > Is it normal or can I speed the recovery up a bit somehow? > > > > Output of ceph -s: > > > > cluster: > > id: ... > > health: HEALTH_WARN > > 2 large omap objects > > > > services: > > mon: 3 daemons, quorum istor11,istor21,istor31 (age 13d) > > mgr: istor31(active, since 3w), standbys: istor21, istor11 > > osd: 36 osds: 36 up (since 2w), 36 in (since 3w); 14 remapped pgs > > > > data: > > pools: 45 pools, 1505 pgs > > objects: 13.39M objects, 198 TiB > > usage: 303 TiB used, 421 TiB / 724 TiB avail > > pgs: 5335074/80345832 objects misplaced (6.640%) > > 1449 active+clean > > 34 active+clean+scrubbing > > 11 active+remapped+backfill_wait+forced_backfill > > 8 active+clean+scrubbing+deep > > 2 active+remapped+forced_backfill > > 1 active+remapped+backfilling+forced_backfill > > > > io: > > recovery: 69 MiB/s, 4 objects/s > > > > The OSDs are HDD-based with metadata on NVMe, 4 OSDs per node, > > and all the nodes have load average somewhere between 0.3 and 0.6. > > > > Thanks! > > > > -Yenya > > > > -- > > | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - > private}> > > | > > | https://www.fi.muni.cz/~kas/ GPG: > 4096R/A45477D5 > > | > > We all agree on the necessity of compromise. We just can't agree on > > when it's necessary to compromise. --Larry Wall > > _______________________________________________ > > ceph-users mailing list -- [email protected] > > To unsubscribe send an email to [email protected] > > > _______________________________________________ > ceph-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] > _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
