Henrik Schröder wrote:
Hi Dustin,

Out of pure curiosity, have you benchmarked the difference between your single-connection client and a comparable multi-connection one with a connection pool?

And have I understood it correctly that the reason your version is as fast as or faster than a multi-connection one because memcached in itself is single-threaded and will process requests in a serialized fashion anyway?

Not answering on Dustin's behalf, but...

Well, the memcached server itself has just turned multi-threaded...

However, the memcached single-threaded server is indeed very fast, and mostly because there is no context-switching or waiting within the server loop. It will simply receive a request, process it and send back a response.

This works very well as long as the server CPU is very fast. However, if the CPU cannot cope with the load, both throughput and latency will suffer. So in order to service e.g. a 10GbE network, multiple CPU threads and parallel processing may be required to keep up with the speed of the network. But the challenge is not only with the memcached user threads. The networking stack must also be sufficiently parallelized, and/or being able to batch requests within the kernel layer.


/Henrik

On Wed, May 7, 2008 at 11:02 PM, Dustin Sallings <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:


    On May 7, 2008, at 12:00, Roy Lyseng wrote:

    Anybody having benchmark data to back up this?

    It depends quite a bit on how you benchmark it.

    My client has *very* little contention, so the number of client
    threads doesn't bother it so much.  If you have a lot of threads
    accessing stuff randomly, it should work better.  If you have 100
    threads asking for the same key simultaneously, my client will make
    two requests, decode the value twice, and dispatch it to all the
    requestors.  I don't expect that to be a common case, but multi-gets
    are faster than multiple individual gets, so the general case for
    multi-get escalation is made.  Deduplicating the keys was just easy
    while I was already doing it.

    Similarly, all but a few operations are completely asynchronous to
    the client, so sets return roughly immediately in all cases, so you
    never have to wait for one unless you actually want to know whether
    it was successful.  Same for gets.  You can request some data at the
    beginning of the method, do some other stuff, and then use it in the
    middle once it may have arrived.


    On the other hand, I'm not currently doing a good job of utilizing
    available CPUs to decode results from multiple requests.  There are
    a few easy workarounds for this, but I'm hoping to provide something
    directly for it.

    Also, I only open one connection to the server, so TCP congestion
    avoidance algorithm isn't cheated (which seems to be desirable for
    some people).  This can be worked around in some of the same ways as
    the above (the easiest way currently is to have a couple of active
    clients), but it's not a desirable way to do things.  I'm hoping to
    be able to clear more of this up in my ``three'' branch.


    Of course, patches and bug reports are welcome for anything that
    doesn't perform as well as it should.

-- Dustin Sallings


Reply via email to