[ceph-users] Re: OSD memory leak?

Frank Schilder Thu, 20 Aug 2020 04:35:13 -0700

Hi Dan and Mark,

could you please let me know if you can read the files with the version info I 
provided in my previous e-mail? I'm in the process of collecting data with more 
FS activity and would like to send it in a format that is useful for 
investigation.


Right now I'm observing a daily growth of swap of ca. 100-200MB on servers with 
16 OSDs each, 1SSD and 15HDDs. The OS+daemons operate fine, the OS manages to 
keep enough RAM available. Also the mempool dump still shows onode and data 
cached at a seemingly reasonable level. Users report a more stable performance 
of the FS after I increased the cach min sizes on all OSDs.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <fr...@dtu.dk>
Sent: 17 August 2020 09:37
To: Dan van der Ster
Cc: ceph-users
Subject: [ceph-users] Re: OSD memory leak?

Hi Dan,

I use the container 
docker.io/ceph/daemon:v3.2.10-stable-3.2-mimic-centos-7-x86_64. As far as I can 
see, it uses the packages from http://download.ceph.com/rpm-mimic/el7, its a 
Centos 7 build. The version is:

# ceph -v
ceph version 13.2.8 (5579a94fafbc1f9cc913a0f5d362953a5d9c3ae0) mimic (stable)

On Centos, the profiler packages are called different, without the "google-" 
prefix. The version I have installed is

# pprof --version
pprof (part of gperftools 2.0)

Copyright 1998-2007 Google Inc.

This is BSD licensed software; see the source for copying conditions
and license information.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

It is possible to install pprof inside this container and analyse the 
*.heap-files I provided.

If this doesn't work for you and you want me to generate the text output for 
heap-files, I can do that. Please let me know if I should do all files and with 
what option (eg. against a base etc.).

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Dan van der Ster <d...@vanderster.com>
Sent: 14 August 2020 10:38:57
To: Frank Schilder
Cc: Mark Nelson; ceph-users
Subject: Re: [ceph-users] Re: OSD memory leak?

Hi Frank,

I'm having trouble getting the exact version of ceph you used to
create this heap profile.
Could you run the google-pprof --text steps at [1] and share the output?

Thanks, Dan

[1] https://docs.ceph.com/docs/master/rados/troubleshooting/memory-profiling/


On Tue, Aug 11, 2020 at 2:37 PM Frank Schilder <fr...@dtu.dk> wrote:
>
> Hi Mark,
>
> here is a first collection of heap profiling data (valid 30 days):
>
> https://files.dtu.dk/u/53HHic_xx5P1cceJ/heap_profiling-2020-08-03.tgz?l
>
> This was collected with the following config settings:
>
>   osd                      dev      osd_memory_cache_min              
> 805306368
>   osd                      basic    osd_memory_target                 
> 2147483648
>
> Setting the cache_min value seems to help keeping cache space available. 
> Unfortunately, the above collection is for 12 days only. I needed to restart 
> the OSD and will need to restart it soon again. I hope I can then run a 
> longer sample. The profiling does cause slow ops though.
>
> Maybe you can see something already? It seems to have collected some leaked 
> memory. Unfortunately, it was a period of extremely low load. Basically, with 
> the day of recording the utilization dropped to almost zero.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Frank Schilder <fr...@dtu.dk>
> Sent: 21 July 2020 12:57:32
> To: Mark Nelson; Dan van der Ster
> Cc: ceph-users
> Subject: [ceph-users] Re: OSD memory leak?
>
> Quick question: Is there a way to change the frequency of heap dumps? On this 
> page http://goog-perftools.sourceforge.net/doc/heap_profiler.html a function 
> HeapProfilerSetAllocationInterval() is mentioned, but no other way of 
> configuring this. Is there a config parameter or a ceph daemon call to adjust 
> this?
>
> If not, can I change the dump path?
>
> Its likely to overrun my log partition quickly if I cannot adjust either of 
> the two.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Frank Schilder <fr...@dtu.dk>
> Sent: 20 July 2020 15:19:05
> To: Mark Nelson; Dan van der Ster
> Cc: ceph-users
> Subject: [ceph-users] Re: OSD memory leak?
>
> Dear Mark,
>
> thank you very much for the very helpful answers. I will raise 
> osd_memory_cache_min, leave everything else alone and watch what happens. I 
> will report back here.
>
> Thanks also for raising this as an issue.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Mark Nelson <mnel...@redhat.com>
> Sent: 20 July 2020 15:08:11
> To: Frank Schilder; Dan van der Ster
> Cc: ceph-users
> Subject: Re: [ceph-users] Re: OSD memory leak?
>
> On 7/20/20 3:23 AM, Frank Schilder wrote:
> > Dear Mark and Dan,
> >
> > I'm in the process of restarting all OSDs and could use some quick advice 
> > on bluestore cache settings. My plan is to set higher minimum values and 
> > deal with accumulated excess usage via regular restarts. Looking at the 
> > documentation 
> > (https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/),
> >  I find the following relevant options (with defaults):
> >
> > # Automatic Cache Sizing
> > osd_memory_target {4294967296} # 4GB
> > osd_memory_base {805306368} # 768MB
> > osd_memory_cache_min {134217728} # 128MB
> >
> > # Manual Cache Sizing
> > bluestore_cache_meta_ratio {.4} # 40% ?
> > bluestore_cache_kv_ratio {.4} # 40% ?
> > bluestore_cache_kv_max {512 * 1024*1024} # 512MB
> >
> > Q1) If I increase osd_memory_cache_min, should I also increase 
> > osd_memory_base by the same or some other amount?
>
>
> osd_memory_base is a hint at how much memory the OSD could consume
> outside the cache once it's reached steady state.  It basically sets a
> hard cap on how much memory the cache will use to avoid over-committing
> memory and thrashing when we exceed the memory limit. It's not necessary
> to get it right, it just helps smooth things out by making the automatic
> memory tuning less aggressive.  IE if you have a 2 GB memory target and
> a 512MB base, you'll never assign more than 1.5GB to the cache on the
> assumption that the rest of the OSD will eventually need 512MB to
> operate even if it's not using that much right now.  I think you can
> probably just leave it alone.  What you and Dan appear to be seeing is
> that this number isn't static in your case but increases over time any
> way.  Eventually I'm hoping that we can automatically account for more
> and more of that memory by reading the data from the mempools.
>
> > Q2) The cache ratio options are shown under the section "Manual Cache 
> > Sizing". Do they also apply when cache auto tuning is enabled? If so, is it 
> > worth changing these defaults for higher values of osd_memory_cache_min?
>
>
> They actually do have an effect on the automatic cache sizing and
> probably shouldn't only be under the manual section.  When you have the
> automatic cache sizing enabled, those options will affect the "fair
> share" values of the different caches at each cache priority level.  IE
> at priority level 0, if both caches want more memory than is available,
> those ratios will determine how much each cache gets.  If there is more
> memory available than requested, each cache gets as much as they want
> and we move on to the next priority level and do the same thing again.
> So in this case the ratios end up being sort of more like fallback
> settings for when you don't have enough memory to fulfill all cache
> requests at a given priority level, but otherwise are not utilized until
> we hit that limit.  The goal with this scheme is to make sure that "high
> priority" items in each cache get first dibs at the memory even if it
> might skew the ratios.  This might be things like rocksdb bloom filters
> and indexes, or potentially very recent hot items in one cache vs very
> old items in another cache.  The ratios become more like guidelines than
> hard limits.
>
>
> When you change to manual mode, you set an overall bluestore cache size
> and each cache gets a flat percentage of it based on the ratios.  With
> 0.4/0.4 you will always have 40% for onode, 40% for omap, and 20% for
> data even if one of those caches does not use all of it's memory.
>
>
> >
> > Many thanks for your help with this. I can't find answers to these 
> > questions in the docs.
> >
> > There might be two reasons for high osd_map memory usage. One is, that our 
> > OSDs seem to hold a large number of OSD maps:
>
>
> I brought this up in our core team standup last week.  Not sure if
> anyone has had time to look at it yet though.
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSD memory leak?

Reply via email to