Hi all,

I am a graduate student at UC San Diego and we recently presented a paper 
on latency bottlenecks in cloud. Our study used Memcached as an example of 
latency sensitive application. I want to share some of our results with you 
all.

In our study we found that Memcached application logic is super fast 2-3 
microseconds. This latency number includes time to parse a request, look up 
key in hash table and compose reply. We found other bottlenecks in the 
system i.e. kernel stack limits Memcached throughput and increases latency. 
We observed that kernel contribution to overall latency in cloud is between 
80% to 90%.

Bypassing the kernel networking stack  (i.e. deliver packets directly from 
NIC to Memcached application) improves throughput by 5x but still results 
in latency variance. We found that pthread lock contention on global hash 
table in Memcached results in latency variance and reason being that locks 
not only causes  serialization but the contested pthread locks need to be 
resolved in kernel, thus adding to variance.

Caveat: We have done this study long back and I haven't tested with latest 
Memcached 1.4.15 with Dormando lock fixes. I believe this fix would reduce 
or eliminate the issue.

We found that partitioning the global hash table into multiple buckets (we 
use vbuckets  feature in Memcached 1.6) and using NIC features to steer 
requests directly to thread handling the bucket could result in reduced 
latency.

More details are in the paper. Link: 
http://cseweb.ucsd.edu/~rkapoor/papers/rkapoor_chronos.pdf


We would welcome any comments/questions/criticisms.

Thanks,
Rishi Kapoor
UCSD

Reply via email to