The leveldb is smallish: around 70mb.

I ran debug mon = 10 for a while,  but couldn't find any interesting
information. I would run out of space quite quickly though as the log
partition only has 10g.
On 24 Jul 2015 21:13, "Mark Nelson" <mnel...@redhat.com> wrote:

> On 07/24/2015 02:31 PM, Luis Periquito wrote:
>
>> Now it's official,  I have a weird one!
>>
>> Restarted one of the ceph-mons with jemalloc and it didn't make any
>> difference. It's still using a lot of cpu and still not freeing up
>> memory...
>>
>> The issue is that the cluster almost stops responding to requests, and
>> if I restart the primary mon (that had almost no memory usage nor cpu)
>> the cluster goes back to its merry way responding to requests.
>>
>> Does anyone have any idea what may be going on? The worst bit is that I
>> have several clusters just like this (well they are smaller), and as we
>> do everything with puppet, they should be very similar... and all the
>> other clusters are just working fine, without any issues whatsoever...
>>
>
> We've seen cases where leveldb can't compact fast enough and memory
> balloons, but it's usually associated with extreme CPU usage as well. It
> would be showing up in perf though if that were the case...
>
>
>> On 24 Jul 2015 10:11, "Jan Schermer" <j...@schermer.cz
>> <mailto:j...@schermer.cz>> wrote:
>>
>>     You don’t (shouldn’t) need to rebuild the binary to use jemalloc. It
>>     should be possible to do something like
>>
>>     LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 ceph-osd …
>>
>>     The last time we tried it segfaulted after a few minutes, so YMMV
>>     and be careful.
>>
>>     Jan
>>
>>      On 23 Jul 2015, at 18:18, Luis Periquito <periqu...@gmail.com
>>>     <mailto:periqu...@gmail.com>> wrote:
>>>
>>>     Hi Greg,
>>>
>>>     I've been looking at the tcmalloc issues, but did seem to affect
>>>     osd's, and I do notice it in heavy read workloads (even after the
>>>     patch and
>>>     increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728). This
>>>     is affecting the mon process though.
>>>
>>>     looking at perf top I'm getting most of the CPU usage in mutex
>>>     lock/unlock
>>>       5.02% libpthread-2.19.so <http://libpthread-2.19.so/>    [.]
>>>     pthread_mutex_unlock
>>>       3.82%  libsoftokn3.so        [.] 0x000000000001e7cb
>>>       3.46% libpthread-2.19.so <http://libpthread-2.19.so/>    [.]
>>>     pthread_mutex_lock
>>>
>>>     I could try to use jemalloc, are you aware of any built binaries?
>>>     Can I mix a cluster with different malloc binaries?
>>>
>>>
>>>     On Thu, Jul 23, 2015 at 10:50 AM, Gregory Farnum <g...@gregs42.com
>>>     <mailto:g...@gregs42.com>> wrote:
>>>
>>>         On Thu, Jul 23, 2015 at 8:39 AM, Luis Periquito
>>>         <periqu...@gmail.com <mailto:periqu...@gmail.com>> wrote:
>>>         > The ceph-mon is already taking a lot of memory, and I ran a
>>>         heap stats
>>>         > ------------------------------------------------
>>>         > MALLOC:       32391696 (   30.9 MiB) Bytes in use by
>>> application
>>>         > MALLOC: +  27597135872 (26318.7 MiB) Bytes in page heap
>>> freelist
>>>         > MALLOC: +     16598552 (   15.8 MiB) Bytes in central cache
>>>         freelist
>>>         > MALLOC: +     14693536 (   14.0 MiB) Bytes in transfer cache
>>>         freelist
>>>         > MALLOC: +     17441592 (   16.6 MiB) Bytes in thread cache
>>>         freelists
>>>         > MALLOC: +    116387992 (  111.0 MiB) Bytes in malloc metadata
>>>         > MALLOC:   ------------
>>>         > MALLOC: =  27794649240 (26507.0 MiB) Actual memory used
>>>         (physical + swap)
>>>         > MALLOC: +     26116096 (   24.9 MiB) Bytes released to OS
>>>         (aka unmapped)
>>>         > MALLOC:   ------------
>>>         > MALLOC: =  27820765336 (26531.9 MiB) Virtual address space used
>>>         > MALLOC:
>>>         > MALLOC:           5683              Spans in use
>>>         > MALLOC:             21              Thread heaps in use
>>>         > MALLOC:           8192              Tcmalloc page size
>>>         > ------------------------------------------------
>>>         >
>>>         > after that I ran the heap release and it went back to normal.
>>>         > ------------------------------------------------
>>>         > MALLOC:       22919616 (   21.9 MiB) Bytes in use by
>>> application
>>>         > MALLOC: +      4792320 (    4.6 MiB) Bytes in page heap
>>> freelist
>>>         > MALLOC: +     18743448 (   17.9 MiB) Bytes in central cache
>>>         freelist
>>>         > MALLOC: +     20645776 (   19.7 MiB) Bytes in transfer cache
>>>         freelist
>>>         > MALLOC: +     18456088 (   17.6 MiB) Bytes in thread cache
>>>         freelists
>>>         > MALLOC: +    116387992 (  111.0 MiB) Bytes in malloc metadata
>>>         > MALLOC:   ------------
>>>         > MALLOC: =    201945240 (  192.6 MiB) Actual memory used
>>>         (physical + swap)
>>>         > MALLOC: + 27618820096 <tel:%2B%20%2027618820096> (26339.4
>>>         MiB) Bytes released to OS (aka unmapped)
>>>         > MALLOC:   ------------
>>>         > MALLOC: =  27820765336 (26531.9 MiB) Virtual address space used
>>>         > MALLOC:
>>>         > MALLOC:           5639              Spans in use
>>>         > MALLOC:             29              Thread heaps in use
>>>         > MALLOC:           8192              Tcmalloc page size
>>>         > ------------------------------------------------
>>>         >
>>>         > So it just seems the monitor is not returning unused memory
>>> into the OS or
>>>         > reusing already allocated memory it deems as free...
>>>
>>>         Yep. This is a bug (best we can tell) in some versions of
>>> tcmalloc
>>>         combined with certain distribution stacks, although I don't think
>>>         we've seen it reported on Trusty (nor on a tcmalloc
>>>         distribution that
>>>         new) before. Alternatively some folks are seeing tcmalloc use
>>>         up lots
>>>         of CPU in other scenarios involving memory return and it may
>>>         manifest
>>>         like this, but I'm not sure. You could look through the
>>>         mailing list
>>>         for information on it.
>>>         -Greg
>>>
>>>
>>>     _______________________________________________
>>>     ceph-users mailing list
>>>     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>  _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to