On Thu, Jun 5, 2008 at 8:01 PM, Jeff Rodenburg <[EMAIL PROTECTED]>
wrote:

> Henrik - can you elaborate on what you've found with this?  I'm not looking
> to resolve the issues, just trying to get a better picture of where the
> bodies are buried, and to convince an all-windows shop that it's OK to run a
> few linux instances to support certain application services.
>

About half a year ago we implemented memcached for a customer of ours, and
we decided to try running the memcached servers on the physical web servers.
Those were pretty old machines, a mixed bunch, and maybe 4 years old on
average. After a little while, the memcached process on each just ate all
the available CPU and promptly starved the IIS webserver processes. causing
the entire web application to go down. However, even with no traffic to the
website and no pressure on the memcached servers, they still consumed all
CPU. We switched it around and made a separate memcached server cluster, but
those machines were still brought down by memcached consuming all CPU.
Finally, we installed some Linux on the same machines, and they just stay at
0% CPU while serving the entire web application perfectly. So on the same
hardware and the same load, the Linux version totally outperforms the
Windows version on such a scale that it cannot be platform differences,
there's gotta be a bug or five in the Windows version which causes it to
consume a lot more resources than it should. I would guess that the culprit
is the libevent port or a combination of memcached + libevent that just
doesn't play along well on Windows.

On our current project, we run memcached on two servers that are also web
servers, and on both machines the memcached process consumes exactly 25%
CPU. The weird thing is that those two servers have different hardware. One
is a two-processor dual core Xeon at 2,5GHz, and the other is a
two-processor dual core Xeon at 1,6GHz. The first one runs Windows Server
2008, the other Windows Server 2003. But the memcached process on each takes
up exactly 25% CPU all the time. I can also see on the stats that the second
server gets more memcached traffic than the first one, so the second server
is slower than the first and gets more traffic, but the CPU use is 25% on
both servers.

I first assumed that the memcached process just ate an entire core on both
machines to produce such a perfect load number, but when I look at the CPU
graphs, it doesn't. It's evenly spread, yet somehow capped at 25%. So the
CPU consumption clearly isn't proportional to the load on the server or the
speed of the CPU, which means it probably is a timing/polling(?) error of
some sort, that somehow isn't confined to a single thread.

But the worst thing is that I don't know how to reproduce this state. If I
restart the memcached processes, they go down to 0% CPU, stay there for a
while, but later pop up to 25% when the web application has been running for
a while. We did extensive testing when we developed our own memcached
client, but we never ever encountered it then even though that put a lot
more pressure on it than what the live web application does.

And that's about what I know about the problems.


/Henrik Schröder

Reply via email to