Hello Dormando, thanks for detailed and constructive reply.

On Tuesday, 17 February 2015 01:39:02 UTC+7, Dormando wrote:

> Yo, 
>
> On Mon, 16 Feb 2015, Roman Leventov wrote: 
>
> > Hello Memcached users and developers, 
> > I've analyzed Memcached implementation and suggested some improvements 
> > here: 
> http://key-value-stories.blogspot.com/2015/02/memcached-internals-design.html 
> > 
> > If I'm wrong in some details or conclusions, please correct me instead 
> of blaming, my only concern is making article as relevant as possible. 
>
> The basic overview of internals seems largely correct, but as soon as you 
> start recommending things it goes weird. I'll answer points below, but 
> I'll counter some unsolicited advice with some of my own: you should 
> really verify ideas with simple tests rather than simply write them out 
> without backing them up.

In this case preparation of a single post would take month instead of a 
week. I would throw away some hypoteses, which haven't been proven in my 
syntetic tests, however, this could be because I build wrong workload 
model, of applied slightly wrong approach. Just pronauncing many ideas 
(even controversial and imprecise) seems to be better: it generates 
discussions, from  which some really valuable ideas or conclusions could 
emerge.

"There is a surprising text area " flags length\r\n" (the value length is 
> meant) between the stored key and value. Apparently, it is added for 
> debugging," 
>
> It's not for debugging. This is part of the ASCII protocol:
>
Ok, updated the post. I have no one to ask about Memcached (and any other 
database), so use another approach:
make assumption and wait until someone prove/refute it. 
 

> "Even through the whole memory dedicated to for entry allocation could be 
> malloc()-ed in the start of memcached process (it is done so if memcached 
> is configured to use large pages), entries of the size class are 
> "established" by chunks of maximum entry size (i. e. 1 MB). This chunk 
> itself also called "slab" that is confusing a bit." 
>
> It originally did that, but so many people couldn't make large allocations 
> at start time it was changed to as-needed. 
>
Hm. https://github.com/memcached/memcached/blob/master/memcached.c#L5245
 

> "Mutex locks are slow, because they cause two context switches. Context 
> switch cost estimates vary in a wide range, depending on if this is just a 
> in-process thread switch, or an OS-level process switch (requiring much 
> more kernel operations and TLB cache flush), CPU model, workload, if the 
> server is dedicated for Memcached or have other intensively-working 
> processes, and so on." 
>
> This is all baseless speculation. It was spinlocking on a few mutexes 
> (without a time limit, I guess). I was never able to get it to be an 
> improvement in performance, and as the number of worker threads grows the 
> performance would worsen. Removing all of them produced significantly 
> higher performance and better latency averages. The locks are held for 
> generally consistent amounts of times (no syscalls/etc) so spinlocking 
> waiting for a lock just burns cpu for no reason. 
>
> Spinlocks don't seem to be much help outside of the kernel. I don't really 
> understand why people are so obsessed with them. If two worker threads 
> block on the same mutex, the OS can tell which one to wake up sooner. In 
> the spinlock case you could spend an entire timeslice (or two) on the 
> wrong thread. Unless there's a POSIX primitive for spinlocking which 
> introduces dependencies... 
>
> Even within the linux kernel they're pretty obnoxious. I speak from having 
> to peel them out constantly at $dayjob. 
>
> Also, don't forget futexes. For the per-thread statistics lock futexes 
> make it an agreeable situation. Assuming it's terrible is how we got 
> twemcache's release announcement to claim they've "significantly improved 
> performance" despite simple benchmarking proving otherwise. 
>
> In the future I would like to move more stats into the per-thread locks, 
> and then remove the per-thread locks on 64bit x86_64 systems (8 byte 
> aligned writes/reads being atomic). Since the performance difference is 
> so small the priority is low.
>

Well, I was not precise (now rephrased the post a bit, but understand that 
something is still not properly explained).
There is simple arithmetic: with mutexes, you pay 2 context switches. With 
spin locks, you pay average time each thread spends under the lock, 
multiplied by the average threads concurrently waiting to acquire the lock, 
plus some context switches still need to occur for OS background tasks (if 
the server is dedicated to cache instance). I'm not even sure statistics 
updates should be under locks (couldn't it be just a number of atomic 
counters?), but it they should, operation is so thin, that even with pretty 
high contention spin locks are far more efficient than mutexes.

Problems cause one another: for contended locks wrapping midly expensive 
operations mutexes are better, but locks shouldn't be so contended.
 

> "Hash table bucket locks should be read-write granular." 
>
> rwlocks are slower than mutexes, and I've not gotten the bucket locks to 
> be a major source of pain. In recent benchmarks I've been able to fetch 20 
> million keys/sec from 16+ cores.
>
Bucket locks are well striped, so you can afford yourself to have just an 
64-bit word acting as hand-made unfair read-write spin-lock. 

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to