Hello Robert,

On Mon, Mar 9, 2020 at 7:55 PM Robert Ruge <robert.r...@deakin.edu.au> wrote:
> For a 1.1PB raw cephfs system currently storing 191TB of data and 390 million 
> objects (mostly small Python, ML training files etc.) how many MDS servers 
> should I be running?
>
> System is Nautilus 14.2.8.
>
>
>
> I ask because up to know I have run one MDS with one standby-replay and 
> occasionally it blows up with large memory consumption, 60Gb+ even though I 
> have mds_cache_memory_limit = 32G and that was 16G until recently. It of 
> course tries to restart on another MDS node fails again and after several 
> attempts usually comes back up. Today I increased to two active MDS’s but the 
> question is what is the optimal number for a pretty active system? The single 
> MDS seemed to regularly run around 1400 req/s and I often get up to six 
> clients failing to respond to cache pressure.

Ideally, the only reason you should add more active MDS (increase
max_mds) is because you want to increase request throughput.

60GB RSS is not completely unexpected. A 32GB cache size would use
approximately 48GB (150%) RSS in a steady state situation. You may
ahve hit some kind of bug as others have reported which is causing the
cache size / anonymous memory to continually increase. You will need
to post more information about the client type/version, cache usage,
perf dumps, and workload to help diagnose.


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to