Page alignment for cache efficiency was actually one of the things I was hoping to gain out of (experimentally, of course) graphing the libumem[1] slab allocator to memcache. I started this just before the hackathon we hosted at Sun, but found it taking a bit longer than I'd thought and it wasn't high priority for anyone.
Of course, if memcached gets into areas where locality or SMP scaling becomes more critical, I think it could be a bigger issue-- but I otherwise agree with Steve that with the way most deployments are done today, you probably wouldn't see a big difference from this alone. - Matt 1. libumem is the userspace implementation of Solaris's kernel slab allocator. This is also where memcached's slab allocator design came from (Jeff Bonwick's usenix paper I think). Wez Furlong and some other folks at omniti.com liked it so much, they ported it to other OSs: https://labs.omniti.com/trac/portableumem Also worth reading: http://blogs.sun.com/bonwick/entry/now_it_can_be_told ----- Original Message ----- From: Brian P Brooks <[EMAIL PROTECTED]> Date: Thursday, December 27, 2007 10:37 pm Subject: Re: Theoretical set assoc cache org performance boosts? To: memcached <[email protected]> > Is it realistically possible for a small server project like memcached > to ever pose the major bottleneck as the server's processing time > rather than connections/networking/protocol? Is this even possible > for a lightweight server? Could the binary protocol offer such a > performance boost where server processing could be a competitor for > major profiling bottlenecks? > > Of course this is all in curiosity disregarding network speeds, API > speeds, etc... > > Brian Brooks > http://csel.cs.colorado.edu/~brooksbp/ > Cell: (303)319-8663 > > > ---- Original message ---- > >Date: Thu, 27 Dec 2007 22:18:47 -0800 > >From: Steven Grimm <[EMAIL PROTECTED]> > >Subject: Re: Theoretical set assoc cache org performance boosts? > >To: Brian P Brooks <[EMAIL PROTECTED]> > >Cc: memcached <[email protected]> > > > >I doubt that would make the network round-trips any faster, and > >network delay is, to be very conservative about it, four or five > >orders of magnitude greater than the total request processing time > > >inside the server. You could reduce the server's processing time to > > >zero and it would have no measurable effect on response times or > >throughput from the client's point of view. > > > >Of course, it's open source and you're welcome to experiment; nobody > > >would say no to a significant performance improvement. But I really > > >doubt there's much to be gained there. > > > >-Steve > > > > > >On Dec 27, 2007, at 9:59 PM, Brian P Brooks wrote: > > > >> To my understanding, at the server level, Memcached is implemented > > >> by a fully associative cache -- most likely using a LRU stack for > > >> overwriting comparisons. Would it be theoretically beneficial if > > >> Memcached were to use a 2 or 4 way set associative cache? Of > course > >> there would be some changes i.e. would have to statically alloc > RAM > >> so it could partition it's blocks. > >> > >> But, this would definitely help for apps that cache for speed > rather > >> than cache hit reliability. > >> > >> The only way I could see implementing any sort of direct mapped / > > >> set associative cache organization other than specifying the > blocks/ > >> partitions you write to in your application (ie spec'ing out the > > >> direct mapped design in your application). Although, I could see > > >> how this could give you more control of the cache, and could > >> probably result in faster caching performance (both reads and > >> writes), but lower hit rates. > >> > >> Any thoughts? > >> > >> Brian Brooks > >> http://csel.cs.colorado.edu/~brooksbp/ > >> Cell: (303)319-8663 > > >
