Hi all,
we started a forward scrub on our 5.x PB ceph file system and observe a massive
ballooning of MDS caches. Our status is:
# ceph status
cluster:
id: xxx
health: HEALTH_WARN
1 MDSs report oversized cache
(muted: MDS_CLIENT_LATE_RELEASE(12d) MDS_CLIENT_RECALL(12d)
PG_NOT_DEEP_SCRUBBED(5d) PG_NOT_SCRUBBED(5d) POOL_NEAR_FULL(4w))
services:
mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 4w)
mgr: ceph-03(active, since 3w), standbys: ceph-01, ceph-25, ceph-26, ceph-02
mds: 8/8 daemons up, 4 standby
osd: 1317 osds: 1316 up (since 5d), 1316 in (since 5w)
task status:
scrub status:
mds.ceph-08: active paths [/]
mds.ceph-11: active paths [/]
mds.ceph-12: active paths [/]
mds.ceph-14: active paths [/]
mds.ceph-15: active paths [/]
mds.ceph-17: active paths [/]
mds.ceph-24: active paths [/]
data:
volumes: 1/1 healthy
pools: 14 pools, 29161 pgs
objects: 4.38G objects, 5.5 PiB
usage: 7.2 PiB used, 5.9 PiB / 13 PiB avail
pgs: 29129 active+clean
28 active+clean+scrubbing+deep
2 active+clean+snaptrim
2 active+clean+scrubbing
io:
client: 484 MiB/s rd, 64 MiB/s wr, 4.64k op/s rd, 1.50k op/s wr
# ceph fs status
con-fs2 - 1554 clients
=======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active ceph-12 Reqs: 3 /s 13.8M 13.8M 478k 29.2k
1 active ceph-15 Reqs: 31 /s 10.5M 10.5M 140k 441k
2 active ceph-14 Reqs: 0 /s 11.5M 11.5M 504k 32.8k
3 active ceph-17 Reqs: 5 /s 12.4M 12.4M 487k 30.9k
4 active ceph-08 Reqs: 0 /s 15.3M 15.3M 247k 47.2k
5 active ceph-11 Reqs: 7 /s 4414k 4413k 262k 72.5k
6 active ceph-16 Reqs: 409 /s 1079k 1057k 7766 17.3k
7 active ceph-24 Reqs: 41 /s 4074k 4074k 448k 109k
POOL TYPE USED AVAIL
con-fs2-meta1 metadata 4078G 6657G
con-fs2-meta2 data 0 6657G
con-fs2-data data 1225T 2272T
con-fs2-data-ec-ssd data 794G 20.8T
con-fs2-data2 data 5745T 2066T
STANDBY MDS
ceph-09
ceph-10
ceph-23
ceph-13
MDS version: ceph version 16.2.15 (618f440892089921c3e944a991122ddc44e60516)
pacific (stable)
Ranks 0-4 are massively oversized. Ranks 5 and 7 show usual values. Rank 6 is
low, because we restarted it already due to this warning. It seems as if MDSes
don't trim cache during scrub, the sizes only increase.
Is this ballooning normal or a bug? Is there a workaround apart from restarting
MDSes all the time?
Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]