We were having some weird sporadic errors on our product, and after scratching our heads a lot and digging down into it, it turned out that we were getting the "SERVER_ERROR out of memory" error when storing items on our memcached cluster, but here's the weird part:
We only got the error occasionally. Most writes went ok, but some of them just failed We estimated the error rate to about 1 in 30. All memcached servers had grown to the memory limit we set (512MB). I ran stats slabs, and there were plenty of slabs of all sizes. The number of evictions ticked up slowly, but definitely not as fast as it should, given the rate at which we stored items. The items that failed were all very small, with an expiry of 5 seconds. And we were running version 1.2.5 for Windows. And we weren't running with the -M option. We upgraded to version 1.4.4 now, and restarted them, and it'll take a week or two for the cache servers to get full again, and we're hoping the error won't come back. But what happened? How could we get that error, when the servers just should have evicted lots of objects instead? How come only a fraction of the writes failed that way? What does the error actually mean, since the servers obviously weren't out of memory? And how can we prevent it from happening again? /Henrik Schröder