[ceph-users] MDSs report oversized cache during forward scrub

Frank Schilder Fri, 10 Jan 2025 07:44:33 -0800

Hi all,

we started a forward scrub on our 5.x PB ceph file system and observe a massive 
ballooning of MDS caches. Our status is:


# ceph status
  cluster:
    id:     xxx
    health: HEALTH_WARN
            1 MDSs report oversized cache
            (muted: MDS_CLIENT_LATE_RELEASE(12d) MDS_CLIENT_RECALL(12d) 
PG_NOT_DEEP_SCRUBBED(5d) PG_NOT_SCRUBBED(5d) POOL_NEAR_FULL(4w))
 
  services:
    mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 4w)
    mgr: ceph-03(active, since 3w), standbys: ceph-01, ceph-25, ceph-26, ceph-02
    mds: 8/8 daemons up, 4 standby
    osd: 1317 osds: 1316 up (since 5d), 1316 in (since 5w)
 
  task status:
    scrub status:
        mds.ceph-08: active paths [/]
        mds.ceph-11: active paths [/]
        mds.ceph-12: active paths [/]
        mds.ceph-14: active paths [/]
        mds.ceph-15: active paths [/]
        mds.ceph-17: active paths [/]
        mds.ceph-24: active paths [/]
 
  data:
    volumes: 1/1 healthy
    pools:   14 pools, 29161 pgs
    objects: 4.38G objects, 5.5 PiB
    usage:   7.2 PiB used, 5.9 PiB / 13 PiB avail
    pgs:     29129 active+clean
             28    active+clean+scrubbing+deep
             2     active+clean+snaptrim
             2     active+clean+scrubbing
 
  io:
    client:   484 MiB/s rd, 64 MiB/s wr, 4.64k op/s rd, 1.50k op/s wr

# ceph fs status
con-fs2 - 1554 clients
=======
RANK  STATE     MDS       ACTIVITY     DNS    INOS   DIRS   CAPS  
 0    active  ceph-12  Reqs:    3 /s  13.8M  13.8M   478k  29.2k  
 1    active  ceph-15  Reqs:   31 /s  10.5M  10.5M   140k   441k  
 2    active  ceph-14  Reqs:    0 /s  11.5M  11.5M   504k  32.8k  
 3    active  ceph-17  Reqs:    5 /s  12.4M  12.4M   487k  30.9k  
 4    active  ceph-08  Reqs:    0 /s  15.3M  15.3M   247k  47.2k  
 5    active  ceph-11  Reqs:    7 /s  4414k  4413k   262k  72.5k  
 6    active  ceph-16  Reqs:  409 /s  1079k  1057k  7766   17.3k  
 7    active  ceph-24  Reqs:   41 /s  4074k  4074k   448k   109k  
        POOL           TYPE     USED  AVAIL  
   con-fs2-meta1     metadata  4078G  6657G  
   con-fs2-meta2       data       0   6657G  
    con-fs2-data       data    1225T  2272T  
con-fs2-data-ec-ssd    data     794G  20.8T  
   con-fs2-data2       data    5745T  2066T  
STANDBY MDS  
  ceph-09    
  ceph-10    
  ceph-23    
  ceph-13    
MDS version: ceph version 16.2.15 (618f440892089921c3e944a991122ddc44e60516) 
pacific (stable)

Ranks 0-4 are massively oversized. Ranks 5 and 7 show usual values. Rank 6 is 
low, because we restarted it already due to this warning. It seems as if MDSes 
don't trim cache during scrub, the sizes only increase.

Is this ballooning normal or a bug? Is there a workaround apart from restarting 
MDSes all the time?

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] MDSs report oversized cache during forward scrub

Reply via email to