Henrik Schröder wrote:
Hi Dustin,
Out of pure curiosity, have you benchmarked the difference between your
single-connection client and a comparable multi-connection one with a
connection pool?
And have I understood it correctly that the reason your version is as
fast as or faster than a multi-connection one because memcached in
itself is single-threaded and will process requests in a serialized
fashion anyway?
Not answering on Dustin's behalf, but...
Well, the memcached server itself has just turned multi-threaded...
However, the memcached single-threaded server is indeed very fast, and
mostly because there is no context-switching or waiting within the
server loop. It will simply receive a request, process it and send back
a response.
This works very well as long as the server CPU is very fast. However, if
the CPU cannot cope with the load, both throughput and latency will
suffer. So in order to service e.g. a 10GbE network, multiple CPU
threads and parallel processing may be required to keep up with the
speed of the network. But the challenge is not only with the memcached
user threads. The networking stack must also be sufficiently
parallelized, and/or being able to batch requests within the kernel layer.
/Henrik
On Wed, May 7, 2008 at 11:02 PM, Dustin Sallings <[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>> wrote:
On May 7, 2008, at 12:00, Roy Lyseng wrote:
Anybody having benchmark data to back up this?
It depends quite a bit on how you benchmark it.
My client has *very* little contention, so the number of client
threads doesn't bother it so much. If you have a lot of threads
accessing stuff randomly, it should work better. If you have 100
threads asking for the same key simultaneously, my client will make
two requests, decode the value twice, and dispatch it to all the
requestors. I don't expect that to be a common case, but multi-gets
are faster than multiple individual gets, so the general case for
multi-get escalation is made. Deduplicating the keys was just easy
while I was already doing it.
Similarly, all but a few operations are completely asynchronous to
the client, so sets return roughly immediately in all cases, so you
never have to wait for one unless you actually want to know whether
it was successful. Same for gets. You can request some data at the
beginning of the method, do some other stuff, and then use it in the
middle once it may have arrived.
On the other hand, I'm not currently doing a good job of utilizing
available CPUs to decode results from multiple requests. There are
a few easy workarounds for this, but I'm hoping to provide something
directly for it.
Also, I only open one connection to the server, so TCP congestion
avoidance algorithm isn't cheated (which seems to be desirable for
some people). This can be worked around in some of the same ways as
the above (the easiest way currently is to have a couple of active
clients), but it's not a desirable way to do things. I'm hoping to
be able to clear more of this up in my ``three'' branch.
Of course, patches and bug reports are welcome for anything that
doesn't perform as well as it should.
--
Dustin Sallings