We just upgraded a cephfs cluster from 12.2.12 to 14.2.11. Our next step is
to upgrade to 14.2.16 to troubleshoot this issue, but I thought I'd reach
out here first if anyone had any ideas. The clients are still running an
older version of ceph-fuse 12.2.4 and it's very difficult to remount all of
them. Would probably take a team of us a couple days to restart all of
them. I've looked around online and release notes and all of the known
memory leaks I've been able to find have been fixed prior to version
14.2.11 so this would be an unknown memory leak.

All of the memory is in use in [1] buffer_anon. If left unchecked it will
use up over 700GB of memory within 24 hours. On an identical cluster with
an equivalent workload still running 12.2.12 [2] buffer_anon information is
much healthier.

Without any other options or ideas our plan is to upgrade the cluster to
14.2.16 first and then upgrade the clients. Has anyone else come across
high buffer_anon usage?


[1]
"buffer_anon": {
    "items": 33756758,
    "bytes": 135025912897
 },

[2]
"buffer_anon": {
    "items": 636,
    "bytes": 273118
},
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to