I will assume you are running the default mclock scheduler and not wpq. I'm
not too familiar with tuning mclock settings but this is the docs to look
at
https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/#recovery-backfill-options

osd_max_backfills is set to 1 by default and this is the first thing I
would tune if you want faster backfilling.

I would mainly look at this setting first before looking into the various
knobs mclock provides
https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/#confval-osd_mclock_profile

I use wpq and I have two main levers for backfilling
osd_max_backfills - higher is faster backfilling
osd_recovery_sleep + any other sleep setting -  throttles recovery ops

mclock doesn't use the sleep configs so I'm not too sure the various knobs
mclock has but the docs above have some good options to tweak. I would
probably try to experiment with the different mclock profiles to see if it
speeds up backfilling like the high recovery ops setting.
https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/#high-recovery-ops



On Tue, Oct 7, 2025 at 2:20 AM Jan Kasprzak <[email protected]> wrote:

>         Hello, Ceph users,
>
> on my new cluster which I filled with testing data two weeks ago
> there are many repmapped PGs in backfill_wait state, probably as result
> of autoscaling the number of PGs per pool. But the recovery speed
> is quite low, in order of small MB/s and < 10 obj/s according to ceph -s.
>
> The cluster is otherwise idle, no client traffic after initial import,
> so I wonder why the backfill does not progress faster. Also, it seems like
> more pgs are getting remapped as existing ones get successfully backfilled
> - the percentage of misplaced objects is steadily around 6 % for the last
> two weeks.
>
> The PGs waiting for backfill all belong to the biggest pool I have
> according to "ceph pg dump | grep backfill", no surprise here.
> The pool has 229 TB of data and currently 128 PGs. It is replicated
> with k=4 m=2. The second biggest pool has only 23 TB of data:
>
> rados df
> POOL_NAME               USED   OBJECTS  CLONES    COPIES
> MISSING_ON_PRIMARY  UNFOUND  DEGRADED  RD_OPS       RD     WR_OPS       WR
> USED COMPR  UNDER COMPR
> pool_with_backfill   229 TiB  10086940       0  60521640
>  0        0         0       0      0 B   72545009   54 TiB         0 B
>     0 B
> second_biggest_pool   23 TiB   1153174       0   6919044
>  0        0         0       0      0 B   38506397   16 TiB         0 B
>     0 B
> [...]
>
> I tried to do "ceph osd pool force-backfill $pool", it helped to speed
> things up a bit, but it still runs at 50-200 MB/s and 4-20 obj/s.
> The initial data import ran at around 600 MB/s.
>
> Is it normal or can I speed the recovery up a bit somehow?
>
> Output of ceph -s:
>
>   cluster:
>     id:     ...
>     health: HEALTH_WARN
>             2 large omap objects
>
>   services:
>     mon: 3 daemons, quorum istor11,istor21,istor31 (age 13d)
>     mgr: istor31(active, since 3w), standbys: istor21, istor11
>     osd: 36 osds: 36 up (since 2w), 36 in (since 3w); 14 remapped pgs
>
>   data:
>     pools:   45 pools, 1505 pgs
>     objects: 13.39M objects, 198 TiB
>     usage:   303 TiB used, 421 TiB / 724 TiB avail
>     pgs:     5335074/80345832 objects misplaced (6.640%)
>              1449 active+clean
>              34   active+clean+scrubbing
>              11   active+remapped+backfill_wait+forced_backfill
>              8    active+clean+scrubbing+deep
>              2    active+remapped+forced_backfill
>              1    active+remapped+backfilling+forced_backfill
>
>   io:
>     recovery: 69 MiB/s, 4 objects/s
>
> The OSDs are HDD-based with metadata on NVMe, 4 OSDs per node,
> and all the nodes have load average somewhere between 0.3 and 0.6.
>
> Thanks!
>
> -Yenya
>
> --
> | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}>
> |
> | https://www.fi.muni.cz/~kas/                        GPG: 4096R/A45477D5
> |
>     We all agree on the necessity of compromise. We just can't agree on
>     when it's necessary to compromise.                     --Larry Wall
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to