I means, if there are some work to improve memcache, I will try to code it.
memcache has been widely used,it's perfect software. I just want to learn from it , and contribute it if I can. 2013/1/4 liubo <lb.falc...@gmail.com> > Now,reading the code. > > But not just read,I want to try make some code to improve it. > > > 2013/1/4 dormando <dorma...@rydia.net> > >> Yeah. Removing contended locks gives more speedup. >> >> But noting the performance numbers from 1.4.15, going even faster than >> that is almost useless. It's very hard to get your network to perform up >> to those levels. >> >> Though there's still room for improvement. >> >> Are you just reading the code academically, or do you have a problem >> you're trying to solve? >> >> On Fri, 4 Jan 2013, liubo wrote: >> >> > remove global mutex will get more speed up,right? >> > >> > >> > 2013/1/4 liubo <lb.falc...@gmail.com> >> > For example,slabs_lock?? some global mutex. >> > >> > >> > 2013/1/4 dormando <dorma...@rydia.net> >> > > Hello. >> > > I found all stat is protected by thread's mutex. >> > > All event is running in the signal thread context. >> > > >> > > Why need the protect,for sum?? or for command STAT?? >> > > >> > > thanks >> > >> > It's for when the summation happens, you can get consistent reads. >> > >> > NOTES, SINCE I HEAR THIS A LOT: >> > >> > *uncontested* mutexes aren't free, but are very nearly free. *contested* >> > mutexes slow things down a lot. >> > >> > Since those thread locks are only ever called in the brief times in >> which >> > you actually run stats commands, they have a very very small amount of >> > overhead. >> > >> > When I was doing the lock scaling patches for 1.4.10-1.4.15 I did test >> > this out: >> > >> > >> https://github.com/dormando/memcached/commit/56ad41e1a19a7fc99da51bdca4fdcb524a300984 >> > >> > (a little further work would be required to make that change permanent). >> > On 64bit systems you can do 64bit-aligned 8 byte memory reads >> atomically, >> > so as long as the stats structure is all 64bit items, is 64bit aligned, >> > and the external reader is ... just a reader, you can get pretty >> accurate >> > readings. on 32bit you need the lock. >> > >> > So I thought I'd try removing the locks on my 64bit system and test it. >> > There was *ALMOST NO* change in performance. I can't stress this enough. >> > Everyone focuses on these locks but if you bust out a God Damned Ruler >> > they don't even use crap for cycles. The other work I did ended up >> having >> > a much higher effect when tested, and I merged those branches instead. I >> > think it was between 1-5% change in speed. By comparison making the lock >> > shorter in the item_alloc code was a 15-30% bump. >> > >> > It'll be nice to remove the uncontested locks and save some CPU, but it >> > was a much lower priority than other work. >> > >> > have fun, >> > -Dormando >> > >> > >> > >> > >> > -- >> > -- liubo >> > >> > >> > >> > >> > -- >> > -- liubo >> > >> > >> > > > > -- > -- liubo > -- -- liubo