Huzzah! Actually that's a bad thing. That means when it tries to evict an object there are 50+ "locked" objects ahead of it, which may mean a refcount leak (but I haven't proved this yet).
Do you have the full text of the out of memory error? For 1.2.5 I changed all of the errors to have more context, so we can tell exactly where in the code it came from. Do you have the 'stats items' outofmemory counters? Are the errors isolated to specific slab classes? Does the 'evictions' state for those classes increase ever? Do you get an error setting an item into those classes 100% of the time? (from your text it'd appear that answer is no?). You're definitely not running with -M (LRU disabled) mode? -Dormando Miguel DeAvila wrote: > We have a 12-node memcache (v 1.2.5) cluster with ~72GB of memory (6GB > per server, ~1300 request/sec per server). > > We've started getting "SERVER_ERROR out of memory" errors during both object > stores and counter increments. The errors are isolated to 3 of the 12 > servers, > and to the same slab class (class 1) on each server. > > It seems like an out-of-memory error occurs when there are no free chunks in > the class, > no additional slabs can be allocated, and if no items can be evicted from the > LRU > (due to non-zero refcounts). > > The cluster stores items with a wide range of sizes. It is certainly possible > that the > item sizes that were prevalent while the cache was filling are different then > the > item sizes on an ongoing basis (leading to an imperfect slab-to-class > allocation). > > We're using the default "powers-of-N" value (1.25). > > The number of errors, relative to the number of successes, is quite small, > but previously > there were no out-of-memory errors at all. > > Are these types of errors typical for a busy, mid-sized cluster with a wide > item-size > distribution? (Or is this a harbinger of things to come ...) > > thanks, > > Miguel
