On Wed, Oct 3, 2012 at 4:56 PM, Gregory Farnum <g...@inktank.com> wrote:
> On Wed, Oct 3, 2012 at 4:23 PM, Tren Blackburn <t...@eotnetworks.com> wrote:
>> On Wed, Oct 3, 2012 at 4:15 PM, Gregory Farnum <g...@inktank.com> wrote:
>>> On Wed, Oct 3, 2012 at 3:22 PM, Tren Blackburn <t...@eotnetworks.com> wrote:
>>>> Hi List;
>>>> I was advised to use the "mds cache size" option to limit the memory
>>>> that the mds process will take. I have it set to "32768". However it
>>>> the ceph-mds process is now at 50GB and still growing.
>>>> fern ceph # ps wwaux | grep ceph-mds
>>>> root       895  4.3 26.6 53269304 52725820 ?   Ssl  Sep28 312:29
>>>> /usr/bin/ceph-mds -i fern --pid-file /var/run/ceph/mds.fern.pid -c
>>>> /etc/ceph/ceph.conf
>>>> Have I specified the limit incorrectly? How far will it go?
>>> Oof. That looks correct; it sounds like we have a leak or some other
>>> kind of bug. I believe you're on Gentoo; did you build with tcmalloc?
>>> If so, can you run "ceph -w" in one window and then "ceph mds tell 0
>>> heap stats" and send back the output?
>>> If you didn't build with tcmalloc, can you do so and try again? We
>>> have noticed fragmentation issues with the default memory allocator,
>>> which is why we switched (though I can't imagine it'd balloon that far
>>> — but tcmalloc will give us some better options to diagnose it). Sorry
>>> I didn't mention this before!
>> Hey Greg! Good recall, I am on Gentoo, and I did build with tcmalloc.
> Search is a wonderful thing. ;)
>> Here is the information you requested:
>> 2012-10-03 16:20:43.979673 mds.0 [INF] mds.ferntcmalloc heap
>> stats:------------------------------------------------
>> 2012-10-03 16:20:43.979676 mds.0 [INF] MALLOC:    53796808560 (51304.6
>> MiB) Bytes in use by application
>> 2012-10-03 16:20:43.979679 mds.0 [INF] MALLOC: +       753664 (    0.7
>> MiB) Bytes in page heap freelist
>> 2012-10-03 16:20:43.979681 mds.0 [INF] MALLOC: +     93299048 (   89.0
>> MiB) Bytes in central cache freelist
>> 2012-10-03 16:20:43.979683 mds.0 [INF] MALLOC: +      6110720 (    5.8
>> MiB) Bytes in transfer cache freelist
>> 2012-10-03 16:20:43.979685 mds.0 [INF] MALLOC: +     84547880 (   80.6
>> MiB) Bytes in thread cache freelists
>> 2012-10-03 16:20:43.979686 mds.0 [INF] MALLOC: +     84606976 (   80.7
>> MiB) Bytes in malloc metadata
>> 2012-10-03 16:20:43.979688 mds.0 [INF] MALLOC:   ------------
>> 2012-10-03 16:20:43.979690 mds.0 [INF] MALLOC: =  54066126848 (51561.5
>> MiB) Actual memory used (physical + swap)
>> 2012-10-03 16:20:43.979691 mds.0 [INF] MALLOC: +            0 (    0.0
>> MiB) Bytes released to OS (aka unmapped)
>> 2012-10-03 16:20:43.979693 mds.0 [INF] MALLOC:   ------------
>> 2012-10-03 16:20:43.979694 mds.0 [INF] MALLOC: =  54066126848 (51561.5
>> MiB) Virtual address space used
>> 2012-10-03 16:20:43.979700 mds.0 [INF] MALLOC:
>> 2012-10-03 16:20:43.979702 mds.0 [INF] MALLOC:         609757
>>     Spans in use
>> 2012-10-03 16:20:43.979703 mds.0 [INF] MALLOC:            395
>>     Thread heaps in use
>> 2012-10-03 16:20:43.979705 mds.0 [INF] MALLOC:           8192
>>     Tcmalloc page size
>> 2012-10-03 16:20:43.979710 mds.0 [INF]
> So tcmalloc thinks the MDS is actually using >50GB of RAM. ie, we have a leak.
> Sage suggests we check out the perfcounters (specifically, how many
> log segments are open). "ceph --admin-daemon </path/to/socket>
> perfcounters_dump" I believe the default path is
> /var/run/ceph/ceph-mds.a.asok.

Got it...

--- Start ---
fern ceph # ceph --admin-daemon /var/run/ceph/ceph-mds.fern.asok
--- End ---

> If this doesn't provide us a clue, I'm afraid we're going to have to
> start keeping track of heap usage with tcmalloc or run the daemon
> through massif...

Hmm, well let me know if there's anything else I can provide. And
thanks again for your help.

To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to