Trond, any thoughts? I'd like to double-check that there isn't a reason we can't support preallocation without getpagesizes() before attempting to manually patch memcache and play with our production system here.
Thanks, Mike On Jul 13, 8:38 pm, Mike Lambert <mlamb...@gmail.com> wrote: > On Jul 10, 1:37 pm, Matt Ingenthron <ingen...@cep.net> wrote: > > > > > > > Mike Lambert wrote: > > > Currently the -L flag is only enabled if > > > HAVE_GETPAGESIZES&&HAVE_MEMCNTL. I'm curious what the motivation is > > > for something like that? In our experience, for some memcache pools we > > > end up fragmenting memory due to the repeated allocation of 1MB slabs > > > around all the other hashtables and free lists going on. We know we > > > want to allocate all memory upfront, but can't seem to do that on a > > > Linux system. > > > The primary motivation was more about not beating up the TLB cache on > > the CPU when running with large heaps. There are users with large heaps > > already, so this should help if the underlying OS supports large pages. > > TLB cache sizes are getting bigger in CPUs, but virtualization is more > > common and memory heaps are growing faster. > > > I'd like to have some empirical data on how big a difference the -L flag > > makes, but that assumes a workload profile. I should be able to hack > > one up and do this with memcachetest, but I've just not done it yet. :) > > > > To put it more concretely, here is a proposed change to make -L do a > > > contiguous preallocation even on machines without getpagesizes tuning. > > > My memcached server doesn't seem to crash, but I'm not sure if that's > > > a proper litmus test. What are the pros/cons of doing something like > > > this? > > > This feels more related to the -k flag, and that it should be using > > madvise() in there somewhere too. It wouldn't be a bad idea to separate > > these necessarily. I don't know that the day after 1.4.0 is the day to > > redefine -L though, but it's not necessarily bad. We should wait for > > Trond's repsonse to see what he thinks about this since he implemented > > it. :) > > Haha, yeah, the release of 1.4.0 reminded me I wanted to send this > email. Sorry for the bad timing. > > -k keeps the memory from getting paged out to disk (which is a very > goodt hing in our case.) > -L appears to me (who isn't aware of what getpagesizes does) to be > related to preallocation with big allocations, which I thought was > what I wanted. > > If you want, I'd be just as happy with a -A flag that turns on > preallocation, but without any of getpagesizes() tuning. It'd force > one big slabs allocation and that's it. > > > Also, I did some testing with this (-L) some time back (admittedly on > > OpenSolaris) and the actual behavior will vary based on the memory > > allocation library you're using and what it does with the OS > > underneath. I didn't try Linux variations, but that may be worthwhile > > for you. IIRC, default malloc would wait for page-fault to do the > > actual memory allocation, so there'd still be risk of fragmentation. > > We do use Linux, but haven't tested in production with my modified -L > patch. What I *have* noticed is that when we allocate a 512MB > hashtable, that shows up in linux as mmap-ed contiguous block of > memory. Fromhttp://m.linuxjournal.com/article/6390, we "For very > large requests, malloc() uses the mmap() system call to find > addressable memory space. This process helps reduce the negative > effects of memory fragmentation when large blocks of memory are freed > but locked by smaller, more recently allocated blocks lying between > them and the end of the allocated space." > > I was hoping to get the same large mmap for all our slabs, out of the > way in a different address space in a way that didn't interfere with > the actual memory allocator itself, so that the linux allocator could > then focus on balancing just the small allocations without any page > waste. > > Thanks, > Mike