On Thu, Jun 5, 2008 at 8:01 PM, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
> Henrik - can you elaborate on what you've found with this? I'm not looking > to resolve the issues, just trying to get a better picture of where the > bodies are buried, and to convince an all-windows shop that it's OK to run a > few linux instances to support certain application services. > About half a year ago we implemented memcached for a customer of ours, and we decided to try running the memcached servers on the physical web servers. Those were pretty old machines, a mixed bunch, and maybe 4 years old on average. After a little while, the memcached process on each just ate all the available CPU and promptly starved the IIS webserver processes. causing the entire web application to go down. However, even with no traffic to the website and no pressure on the memcached servers, they still consumed all CPU. We switched it around and made a separate memcached server cluster, but those machines were still brought down by memcached consuming all CPU. Finally, we installed some Linux on the same machines, and they just stay at 0% CPU while serving the entire web application perfectly. So on the same hardware and the same load, the Linux version totally outperforms the Windows version on such a scale that it cannot be platform differences, there's gotta be a bug or five in the Windows version which causes it to consume a lot more resources than it should. I would guess that the culprit is the libevent port or a combination of memcached + libevent that just doesn't play along well on Windows. On our current project, we run memcached on two servers that are also web servers, and on both machines the memcached process consumes exactly 25% CPU. The weird thing is that those two servers have different hardware. One is a two-processor dual core Xeon at 2,5GHz, and the other is a two-processor dual core Xeon at 1,6GHz. The first one runs Windows Server 2008, the other Windows Server 2003. But the memcached process on each takes up exactly 25% CPU all the time. I can also see on the stats that the second server gets more memcached traffic than the first one, so the second server is slower than the first and gets more traffic, but the CPU use is 25% on both servers. I first assumed that the memcached process just ate an entire core on both machines to produce such a perfect load number, but when I look at the CPU graphs, it doesn't. It's evenly spread, yet somehow capped at 25%. So the CPU consumption clearly isn't proportional to the load on the server or the speed of the CPU, which means it probably is a timing/polling(?) error of some sort, that somehow isn't confined to a single thread. But the worst thing is that I don't know how to reproduce this state. If I restart the memcached processes, they go down to 0% CPU, stay there for a while, but later pop up to 25% when the web application has been running for a while. We did extensive testing when we developed our own memcached client, but we never ever encountered it then even though that put a lot more pressure on it than what the live web application does. And that's about what I know about the problems. /Henrik Schröder
