Hello, Ceph users,
on my new cluster which I filled with testing data two weeks ago
there are many repmapped PGs in backfill_wait state, probably as result
of autoscaling the number of PGs per pool. But the recovery speed
is quite low, in order of small MB/s and < 10 obj/s according to ceph -s.
The cluster is otherwise idle, no client traffic after initial import,
so I wonder why the backfill does not progress faster. Also, it seems like
more pgs are getting remapped as existing ones get successfully backfilled
- the percentage of misplaced objects is steadily around 6 % for the last
two weeks.
The PGs waiting for backfill all belong to the biggest pool I have
according to "ceph pg dump | grep backfill", no surprise here.
The pool has 229 TB of data and currently 128 PGs. It is replicated
with k=4 m=2. The second biggest pool has only 23 TB of data:
rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY
UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
pool_with_backfill 229 TiB 10086940 0 60521640 0
0 0 0 0 B 72545009 54 TiB 0 B 0 B
second_biggest_pool 23 TiB 1153174 0 6919044 0
0 0 0 0 B 38506397 16 TiB 0 B 0 B
[...]
I tried to do "ceph osd pool force-backfill $pool", it helped to speed
things up a bit, but it still runs at 50-200 MB/s and 4-20 obj/s.
The initial data import ran at around 600 MB/s.
Is it normal or can I speed the recovery up a bit somehow?
Output of ceph -s:
cluster:
id: ...
health: HEALTH_WARN
2 large omap objects
services:
mon: 3 daemons, quorum istor11,istor21,istor31 (age 13d)
mgr: istor31(active, since 3w), standbys: istor21, istor11
osd: 36 osds: 36 up (since 2w), 36 in (since 3w); 14 remapped pgs
data:
pools: 45 pools, 1505 pgs
objects: 13.39M objects, 198 TiB
usage: 303 TiB used, 421 TiB / 724 TiB avail
pgs: 5335074/80345832 objects misplaced (6.640%)
1449 active+clean
34 active+clean+scrubbing
11 active+remapped+backfill_wait+forced_backfill
8 active+clean+scrubbing+deep
2 active+remapped+forced_backfill
1 active+remapped+backfilling+forced_backfill
io:
recovery: 69 MiB/s, 4 objects/s
The OSDs are HDD-based with metadata on NVMe, 4 OSDs per node,
and all the nodes have load average somewhere between 0.3 and 0.6.
Thanks!
-Yenya
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| https://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
We all agree on the necessity of compromise. We just can't agree on
when it's necessary to compromise. --Larry Wall
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]