Hello, Ceph users,

on my new cluster which I filled with testing data two weeks ago
there are many repmapped PGs in backfill_wait state, probably as result
of autoscaling the number of PGs per pool. But the recovery speed
is quite low, in order of small MB/s and < 10 obj/s according to ceph -s. 

The cluster is otherwise idle, no client traffic after initial import,
so I wonder why the backfill does not progress faster. Also, it seems like
more pgs are getting remapped as existing ones get successfully backfilled
- the percentage of misplaced objects is steadily around 6 % for the last
two weeks.

The PGs waiting for backfill all belong to the biggest pool I have
according to "ceph pg dump | grep backfill", no surprise here.
The pool has 229 TB of data and currently 128 PGs. It is replicated
with k=4 m=2. The second biggest pool has only 23 TB of data:

rados df
POOL_NAME               USED   OBJECTS  CLONES    COPIES  MISSING_ON_PRIMARY  
UNFOUND  DEGRADED  RD_OPS       RD     WR_OPS       WR  USED COMPR  UNDER COMPR
pool_with_backfill   229 TiB  10086940       0  60521640                   0    
    0         0       0      0 B   72545009   54 TiB         0 B          0 B
second_biggest_pool   23 TiB   1153174       0   6919044                   0    
    0         0       0      0 B   38506397   16 TiB         0 B          0 B
[...]

I tried to do "ceph osd pool force-backfill $pool", it helped to speed
things up a bit, but it still runs at 50-200 MB/s and 4-20 obj/s.
The initial data import ran at around 600 MB/s.

Is it normal or can I speed the recovery up a bit somehow? 

Output of ceph -s:

  cluster:
    id:     ...
    health: HEALTH_WARN
            2 large omap objects
 
  services:
    mon: 3 daemons, quorum istor11,istor21,istor31 (age 13d)
    mgr: istor31(active, since 3w), standbys: istor21, istor11
    osd: 36 osds: 36 up (since 2w), 36 in (since 3w); 14 remapped pgs
 
  data:
    pools:   45 pools, 1505 pgs
    objects: 13.39M objects, 198 TiB
    usage:   303 TiB used, 421 TiB / 724 TiB avail
    pgs:     5335074/80345832 objects misplaced (6.640%)
             1449 active+clean
             34   active+clean+scrubbing
             11   active+remapped+backfill_wait+forced_backfill
             8    active+clean+scrubbing+deep
             2    active+remapped+forced_backfill
             1    active+remapped+backfilling+forced_backfill
 
  io:
    recovery: 69 MiB/s, 4 objects/s

The OSDs are HDD-based with metadata on NVMe, 4 OSDs per node,
and all the nodes have load average somewhere between 0.3 and 0.6.

Thanks!

-Yenya

-- 
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| https://www.fi.muni.cz/~kas/                        GPG: 4096R/A45477D5 |
    We all agree on the necessity of compromise. We just can't agree on
    when it's necessary to compromise.                     --Larry Wall
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to