Basically process memory was growing very slowly over time to
eventually cause machine swapping. It was leveling out (not a leak),
but at a level higher than we expected, even with hashtable and
maxbytes accounted for. So I was poking around at memory usage, and
decided that fragmentation was to blame.


Looking again right now at a machine configured with -m 6000 (so
~6gb), I see "stats maps" showing a 512mb hashtable and 7.5gb heap.

"stats malloc" (which isn't 64-bit aware) gives:
STAT mmapped_space 564604928   # this has the 512mb hashtable
STAT arena_size -1058820096
STAT total_alloc -2040194320
STAT total_free 981374224
where arena_size = total_alloc+total_free.

Knowing that the total size of the heap is 7.5gb, I can derive that
real_arena_size = -1058820096 + 2**32 * 3 = 7531114496. Doing
total_free/real_arena_size gives 13%, which is my estimate for
free-but-unallocated ram. (Free due to fragmentation or
not-yet-allocation is hard to tell, but that number is still very
high.)

Alternately, one could ask why we have a 7.5gb heap for a 6gb
memcache...why so much ram? I calculated 100mb-200mb for 7600
connections plus some various free lists, but I was running into the
problem that total_free indicates there are still 981mb of unallocated
ram in the heap. So I think at the time I concluded this was due to
fragmentation.


We solved our problem by reducing the amount of ram we gave to
memcache so we didn't swap, but in theory getting an extra 10-13% of
RAM out of our memcaches sounds like a great idea. And so given my
fragmentation conclusion, I was looking for ways to reduce that.


Thoughts? Is there perhaps another explanation for the data above?

Thanks,
Mike

On Wed, Jul 15, 2009 at 19:40, Matt Ingenthron<ingen...@cep.net> wrote:
>
> Hi Mike,
>
> Mike Lambert wrote:
>>
>> Trond, any thoughts?
>>
>
> Trond is actually on vacation, but I did steal a few cycles of his time and
> asked about this.
>>
>> I'd like to double-check that there isn't a reason we can't support
>> preallocation without getpagesizes() before attempting to manually
>> patch memcache and play with our production system here.
>>
>
> There's no reason you can't do that.  There may be a slightly cleaner
> integration approach Trond and I talked through.  I'll try to code that up
> here in the next few days... but for now you may try your approach to see if
> it helps alleviate the issue you were seeing.
>
> Incidentially, how did the memory fragmentation manifest itself on your
> system?  I mean, could you see any effect on apps running on the system?
>
>
>> Thanks,
>> Mike
>>
>> On Jul 13, 8:38 pm, Mike Lambert <mlamb...@gmail.com> wrote:
>>
>>>
>>> On Jul 10, 1:37 pm, Matt Ingenthron <ingen...@cep.net> wrote:
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> Mike Lambert wrote:
>>>>
>>>>>
>>>>> Currently the -L flag is only enabled if
>>>>> HAVE_GETPAGESIZES&&HAVE_MEMCNTL. I'm curious what the motivation is
>>>>> for something like that? In our experience, for some memcache pools we
>>>>> end up fragmenting memory due to the repeated allocation of 1MB slabs
>>>>> around all the other hashtables and free lists going on. We know we
>>>>> want to allocate all memory upfront, but can't seem to do that on a
>>>>> Linux system.
>>>>>
>>>>
>>>> The primary motivation was more about not beating up the TLB cache on
>>>> the CPU when running with large heaps.  There are users with large heaps
>>>> already, so this should help if the underlying OS supports large pages.
>>>>  TLB cache sizes are getting bigger in CPUs, but virtualization is more
>>>> common and memory heaps are growing faster.
>>>>      I'd like to have some empirical data on how big a difference the -L
>>>> flag
>>>> makes, but that assumes a workload profile.  I should be able to hack
>>>> one up and do this with memcachetest, but I've just not done it yet.  :)
>>>>
>>>>>
>>>>> To put it more concretely, here is a proposed change to make -L do a
>>>>> contiguous preallocation even on machines without getpagesizes tuning.
>>>>> My memcached server doesn't seem to crash, but I'm not sure if that's
>>>>> a proper litmus test. What are the pros/cons of doing something like
>>>>> this?
>>>>>
>>>>
>>>> This feels more related to the -k flag, and that it should be using
>>>> madvise() in there somewhere too.  It wouldn't be a bad idea to separate
>>>> these necessarily.   I don't know that the day after 1.4.0 is the day to
>>>> redefine -L though, but it's not necessarily bad. We should wait for
>>>> Trond's repsonse to see what he thinks about this since he implemented
>>>> it.  :)
>>>>
>>>
>>> Haha, yeah, the release of 1.4.0 reminded me I wanted to send this
>>> email. Sorry for the bad timing.
>>>
>>> -k keeps the memory from getting paged out to disk (which is a very
>>> goodt hing in our case.)
>>> -L appears to me (who isn't aware of what getpagesizes does) to be
>>> related to preallocation with big allocations, which I thought was
>>> what I wanted.
>>>
>>> If you want, I'd be just as happy with a -A flag that turns on
>>> preallocation, but without any of getpagesizes() tuning. It'd force
>>> one big slabs allocation and that's it.
>>>
>>>
>>>>
>>>> Also, I did some testing with this (-L) some time back (admittedly on
>>>> OpenSolaris) and the actual behavior will vary based on the memory
>>>> allocation library you're using and what it does with the OS
>>>> underneath.  I didn't try Linux variations, but that may be worthwhile
>>>> for you.  IIRC, default malloc would wait for page-fault to do the
>>>> actual memory allocation, so there'd still be risk of fragmentation.
>>>>
>>>
>>> We do use Linux, but haven't tested in production with my modified -L
>>> patch. What I *have* noticed is that when we allocate a 512MB
>>> hashtable, that shows up in linux as mmap-ed contiguous block of
>>> memory. Fromhttp://m.linuxjournal.com/article/6390, we "For very
>>> large requests, malloc() uses the mmap() system call to find
>>> addressable memory space. This process helps reduce the negative
>>> effects of memory fragmentation when large blocks of memory are freed
>>> but locked by smaller, more recently allocated blocks lying between
>>> them and the end of the allocated space."
>>>
>>> I was hoping to get the same large mmap for all our slabs, out of the
>>> way in a different address space in a way that didn't interfere with
>>> the actual memory allocator itself, so that the linux allocator could
>>> then focus on balancing just the small allocations without any page
>>> waste.
>>>
>>> Thanks,
>>> Mike
>>>
>
>

Reply via email to