Hi all, I am a graduate student at UC San Diego and we recently presented a paper on latency bottlenecks in cloud. Our study used Memcached as an example of latency sensitive application. I want to share some of our results with you all.
In our study we found that Memcached application logic is super fast 2-3 microseconds. This latency number includes time to parse a request, look up key in hash table and compose reply. We found other bottlenecks in the system i.e. kernel stack limits Memcached throughput and increases latency. We observed that kernel contribution to overall latency in cloud is between 80% to 90%. Bypassing the kernel networking stack (i.e. deliver packets directly from NIC to Memcached application) improves throughput by 5x but still results in latency variance. We found that pthread lock contention on global hash table in Memcached results in latency variance and reason being that locks not only causes serialization but the contested pthread locks need to be resolved in kernel, thus adding to variance. Caveat: We have done this study long back and I haven't tested with latest Memcached 1.4.15 with Dormando lock fixes. I believe this fix would reduce or eliminate the issue. We found that partitioning the global hash table into multiple buckets (we use vbuckets feature in Memcached 1.6) and using NIC features to steer requests directly to thread handling the bucket could result in reduced latency. More details are in the paper. Link: http://cseweb.ucsd.edu/~rkapoor/papers/rkapoor_chronos.pdf We would welcome any comments/questions/criticisms. Thanks, Rishi Kapoor UCSD