Re: [ceph-users] Memory leak in Ceph OSD?

Kjetil Joergensen Tue, 06 Mar 2018 16:08:38 -0800

Hi,

addendum: We're running 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b).


The workload is a mix of 3xreplicated & ec-coded (rbd, cephfs, rgw).

-KJ

On Tue, Mar 6, 2018 at 3:53 PM, Kjetil Joergensen <kje...@medallia.com>
wrote:

> Hi,
>
> so.. +1
>
> We don't run compression as far as I know, so that wouldn't be it. We do
> actually run a mix of bluestore & filestore - due to the rest of the
> cluster predating a stable bluestore by some amount.
>
> The interesting part is - the behavior seems to be specific to our
> bluestore nodes.
>
> Below - yellow line, node with 10 x ~4TB SSDs, green line 8 x 800GB SSDs.
> Blue line - dump_mempools total bytes for all the OSDs running on the
> yellow line. The big dips - forced restarts after having suffered through
> after effects of letting linux deal with it by OOM->SIGKILL previously.
>
>
> 
> A gross extrapolation - "right now" the "memory used" seems to be close
> enough to "sum of RSS of ceph-osd processes" running on the machines.
>
> -KJ
>
> On Thu, Mar 1, 2018 at 7:18 PM, Alex Gorbachev <a...@iss-integration.com>
> wrote:
>
>> On Thu, Mar 1, 2018 at 5:37 PM, Subhachandra Chandra
>> <schan...@grailbio.com> wrote:
>> > Even with bluestore we saw memory usage plateau at 3-4GB with 8TB drives
>> > filled to around 90%. One thing that does increase memory usage is the
>> > number of clients simultaneously sending write requests to a particular
>> > primary OSD if the write sizes are large.
>>
>> We have not seen a memory increase in Ubuntu 16.04, but I also
>> observed repeatedly the following phenomenon:
>>
>> When doing a VMotion in ESXi of a large 3TB file (this generates a log
>> of IO requests of small size) to a Ceph pool with compression set to
>> force, after some time the Ceph cluster shows a large number of
>> blocked requests and eventually timeouts become very large (to the
>> point where ESXi aborts the IO due to timeouts).  After abort, the
>> blocked/slow requests messages disappear.  There are no OSD errors.  I
>> have OSD logs if anyone is interested.
>>
>> This does not occur when compression is unset.
>>
>> --
>> Alex Gorbachev
>> Storcium
>>
>> >
>> > Subhachandra
>> >
>> > On Thu, Mar 1, 2018 at 6:18 AM, David Turner <drakonst...@gmail.com>
>> wrote:
>> >>
>> >> With default memory settings, the general rule is 1GB ram/1TB OSD.  If
>> you
>> >> have a 4TB OSD, you should plan to have at least 4GB ram.  This was the
>> >> recommendation for filestore OSDs, but it was a bit much memory for the
>> >> OSDs.  From what I've seen, this rule is a little more appropriate with
>> >> bluestore now and should still be observed.
>> >>
>> >> Please note that memory usage in a HEALTH_OK cluster is not the same
>> >> amount of memory that the daemons will use during recovery.  I have
>> seen
>> >> deployments with 4x memory usage during recovery.
>> >>
>> >> On Thu, Mar 1, 2018 at 8:11 AM Stefan Kooman <ste...@bit.nl> wrote:
>> >>>
>> >>> Quoting Caspar Smit (caspars...@supernas.eu):
>> >>> > Stefan,
>> >>> >
>> >>> > How many OSD's and how much RAM are in each server?
>> >>>
>> >>> Currently 7 OSDs, 128 GB RAM. Max wil be 10 OSDs in these servers. 12
>> >>> cores (at least one core per OSD).
>> >>>
>> >>> > bluestore_cache_size=6G will not mean each OSD is using max 6GB RAM
>> >>> > right?
>> >>>
>> >>> Apparently. Sure they will use more RAM than just cache to function
>> >>> correctly. I figured 3 GB per OSD would be enough ...
>> >>>
>> >>> > Our bluestore hdd OSD's with bluestore_cache_size at 1G use ~4GB of
>> >>> > total
>> >>> > RAM. The cache is a part of the memory usage by bluestore OSD's.
>> >>>
>> >>> A factor 4 is quite high, isn't it? Where is all this RAM used for
>> >>> besides cache? RocksDB?
>> >>>
>> >>> So how should I size the amount of RAM in a OSD server for 10
>> bluestore
>> >>> SSDs in a
>> >>> replicated setup?
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Stefan
>> >>>
>> >>> --
>> >>> | BIT BV  http://www.bit.nl/        Kamer van Koophandel 09090351
>> >>> | GPG: 0xD14839C6                   +31 318 648 688 / i...@bit.nl
>> >>> _______________________________________________
>> >>> ceph-users mailing list
>> >>> ceph-users@lists.ceph.com
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> >>
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> Kjetil Joergensen <kje...@medallia.com>
> SRE, Medallia Inc
>



-- 
Kjetil Joergensen <kje...@medallia.com>
SRE, Medallia Inc

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Memory leak in Ceph OSD?

Reply via email to