hum, could you reproduce the tests with last 3.6.3 kernel? i'm not sure but i read some changes in latency of kernel maybe this could help too since you tested with 2.6.28
2012/10/22 Rishi <rkapoor.ri...@gmail.com>: > Hi all, > > I am a graduate student at UC San Diego and we recently presented a paper on > latency bottlenecks in cloud. Our study used Memcached as an example of > latency sensitive application. I want to share some of our results with you > all. > > In our study we found that Memcached application logic is super fast 2-3 > microseconds. This latency number includes time to parse a request, look up > key in hash table and compose reply. We found other bottlenecks in the > system i.e. kernel stack limits Memcached throughput and increases latency. > We observed that kernel contribution to overall latency in cloud is between > 80% to 90%. > > Bypassing the kernel networking stack (i.e. deliver packets directly from > NIC to Memcached application) improves throughput by 5x but still results in > latency variance. We found that pthread lock contention on global hash table > in Memcached results in latency variance and reason being that locks not > only causes serialization but the contested pthread locks need to be > resolved in kernel, thus adding to variance. > > Caveat: We have done this study long back and I haven't tested with latest > Memcached 1.4.15 with Dormando lock fixes. I believe this fix would reduce > or eliminate the issue. > > We found that partitioning the global hash table into multiple buckets (we > use vbuckets feature in Memcached 1.6) and using NIC features to steer > requests directly to thread handling the bucket could result in reduced > latency. > > More details are in the paper. Link: > http://cseweb.ucsd.edu/~rkapoor/papers/rkapoor_chronos.pdf > > > We would welcome any comments/questions/criticisms. > > Thanks, > Rishi Kapoor > UCSD > -- Roberto Spadim