Hello all,

In adding support for the sFlow monitoring standard to the latest memcached  
(see http://code.google.com/p/memcached/issues/detail?id=202) I used a scheme 
for lock-free counter accumulation that might make sense for the memcached 
stats counters too.

Background:  the memcached critical path includes a few calls to 
pthread_mutex_lock()  that might not strictly be necessary.   Some of these are 
per-thread mutex locks that will probably not see much contention,  but even a 
user-space-only mutex lock still involves an atomic operation and I think this 
can add up to rather a large number of clock cycles,  right?   Enough that it 
might be limiting the max throughput of a memcached cluster(?)

First,  I apologize if this has all been thrashed out before!

Second,  a question:  Is is safe to say that all platforms using memcached 
today are at least 32-bit natively?   In other words,  can we assume that an 
aligned 32-bit STORE or FETCH can be considered atomic?  (even though a ++ 
increment is emphatically NOT atomic).

If so,  how about a new scheme for maintaining stats counters:

1) define aligned 32-bit thread-local variables for *all* the counters (even 
the global "server" ones),  and bump only these ones in the critical path - 
with no mutex locking.
2) every 100mS or so,  or when a worker-thread terminates, accumulate these 
counters into their 64-bit equivalents so that they can be served in response 
to the GET-STATS command.  Again, no locking on the per-thread counters:  just 
a lock on the 64-bit counter-structures.

I suggest 100mS because it is enough to get this out of the critical path,  
avoids undetected 32-bit rollovers,  and also avoids any significant 
re-sampling error, e.g. if someone is polling GET-STATS every 10 seconds they 
will only see a 1% error due to re-sampling.

Thoughts?

Neil



Reply via email to