Re: Check for orphaned items in lru crawler thread

dormando Mon, 05 Oct 2015 01:30:08 -0700

It took a day of running torture tests which took 30-90 minutes to fail,
but along with a bunch of house chores I believe I've found the problem:


https://github.com/dormando/memcached/tree/slab_rebal_next - has a new
commit, specifically this:
https://github.com/dormando/memcached/commit/1c32e5eeff5bd2a8cc9b652a2ed808157e4929bb

It's somewhat relieving that when I brained this super hard back in
january I may have actually gotten the complex set of interactions
correct, I simply failed to keep typing when converting the comments to
code.

So this has been broken since 1.4.24, but hardly anyone uses the page
mover apparently. It's survived a 5 hour torture test (that I wrote in
2011!) once fixed (previously dying after 30-90 minutes). So please give
this one a try and let me know how it goes.

If it goes well I can merge up some other fixes from PR list and cut a
release, unless someone has feedback for something to change.

thanks!

On Thu, 1 Oct 2015, dormando wrote:

> I've seen items.c:1183 reported elsewhere in 1.4.24... so probably the bug
> was introduced when I rewrote the page mover for that.
>
> I didn't mean to send me a core file: I mean if you dump the core you can
> load it in gdb and get the backtrace (bt + thread apply all bt)
>
> Don't have a handler for convenient attaching :(
>
> didn't get a chance to poke at this today... I'll need another day to try
> it out.
>
> On Thu, 1 Oct 2015, Scott Mansfield wrote:
>
> > Sorry for the data dumps here, but I want to give you everything I have. I 
> > found 3 more addresses that showed up in the dmesg logs:
> >
> > $ for addr in 40e013 40eff4 40f7c4; do addr2line -e memcached $addr; done
> >
> > .../build/memcached-1.4.24-slab-rebal-next/slabs.c:265 (discriminator 1)
> >
> > .../build/memcached-1.4.24-slab-rebal-next/items.c:312 (discriminator 1)
> >
> > .../build/memcached-1.4.24-slab-rebal-next/items.c:1183
> >
> >
> > I still haven't tried to attach a debugger, since the frequency of the 
> > error would make it hard to catch it. Is there a handler that I could add 
> > in to dump the stack trace when it segfaults? I'd get a core dump, but they 
> > would be HUGE and contain confidential information.
> >
> >
> > Below are the full dmesg logs. Out of 205 servers, 35 had dmesg logs after 
> > a memcached crash, and only one crashed twice, both times on the original 
> > segfault. Below is the full unified set of dmesg logs, from which you can 
> > get a sense of frequency.
> >
> >
> > [47992.109269] memcached[2798]: segfault at 0 ip 000000000040e007 sp 
> > 00007f4d20d25eb0 error 4 in memcached[400000+1d000]
> >
> > [48960.851278] memcached[2805]: segfault at 0 ip 000000000040e007 sp 
> > 00007f3c30d15eb0 error 4 in memcached[400000+1d000]
> >
> > [46421.604609] memcached[2784]: segfault at 0 ip 000000000040e007 sp 
> > 00007fdb94612eb0 error 4 in memcached[400000+1d000]
> >
> > [48429.671534] traps: memcached[2768] general protection ip:40e013 
> > sp:7f1c32676be0 error:0 in memcached[400000+1d000]
> >
> > [71838.979269] memcached[2792]: segfault at 0 ip 000000000040e007 sp 
> > 00007f0162feeeb0 error 4 in memcached[400000+1d000]
> >
> > [66763.091475] memcached[2804]: segfault at 0 ip 000000000040e007 sp 
> > 00007f8240170eb0 error 4 in memcached[400000+1d000]
> >
> > [102544.376092] traps: memcached[2792] general protection ip:40eff4 
> > sp:7fa58095be18 error:0 in memcached[400000+1d000]
> >
> > [49932.757825] memcached[2777]: segfault at 0 ip 000000000040e007 sp 
> > 00007f1ff2131eb0 error 4 in memcached[400000+1d000]
> >
> > [50400.415878] memcached[2794]: segfault at 0 ip 000000000040e007 sp 
> > 00007f11a26daeb0 error 4 in memcached[400000+1d000]
> >
> > [48986.340345] memcached[2786]: segfault at 0 ip 000000000040e007 sp 
> > 00007f9235279eb0 error 4 in memcached[400000+1d000]
> >
> > [44742.175894] memcached[2796]: segfault at 0 ip 000000000040e007 sp 
> > 00007eff3a0cceb0 error 4 in memcached[400000+1d000]
> >
> > [49030.431879] memcached[2776]: segfault at 0 ip 000000000040e007 sp 
> > 00007fdef27cfbe0 error 4 in memcached[400000+1d000]
> >
> > [50211.611439] traps: memcached[2782] general protection ip:40e013 
> > sp:7f9ee1723be0 error:0 in memcached[400000+1d000]
> >
> > [62534.892817] memcached[2783]: segfault at 0 ip 000000000040e007 sp 
> > 00007f37f2d4beb0 error 4 in memcached[400000+1d000]
> >
> > [78697.201195] memcached[2801]: segfault at 0 ip 000000000040e007 sp 
> > 00007f696ef1feb0 error 4 in memcached[400000+1d000]
> >
> > [48922.246712] memcached[2804]: segfault at 0 ip 000000000040e007 sp 
> > 00007f1ebb338eb0 error 4 in memcached[400000+1d000]
> >
> > [52170.371014] memcached[2809]: segfault at 0 ip 000000000040e007 sp 
> > 00007f5e62fcbeb0 error 4 in memcached[400000+1d000]
> >
> > [69531.775868] memcached[2785]: segfault at 0 ip 000000000040e007 sp 
> > 00007ff50ac2eeb0 error 4 in memcached[400000+1d000]
> >
> > [48926.661559] memcached[2799]: segfault at 0 ip 000000000040e007 sp 
> > 00007f71e0ac6be0 error 4 in memcached[400000+1d000]
> >
> > [49491.126885] memcached[2745]: segfault at 0 ip 000000000040e007 sp 
> > 00007f5737c4beb0 error 4 in memcached[400000+1d000]
> >
> > [104247.724294] traps: memcached[2793] general protection ip:40f7c4 
> > sp:7f3af8c27eb0 error:0 in memcached[400000+1d000]
> >
> > [78098.528606] traps: memcached[2757] general protection ip:412b9d 
> > sp:7fc0700dbdd0 error:0 in memcached[400000+1d000]
> >
> > [71958.385432] memcached[2809]: segfault at 0 ip 000000000040e007 sp 
> > 00007f8b68cd0eb0 error 4 in memcached[400000+1d000]
> >
> > [48934.182852] memcached[2787]: segfault at 0 ip 000000000040e007 sp 
> > 00007f0aef774eb0 error 4 in memcached[400000+1d000]
> >
> > [104220.754195] traps: memcached[2802] general protection ip:40f7c4 
> > sp:7ffa85a2deb0 error:0 in memcached[400000+1d000]
> >
> > [45807.670246] memcached[2755]: segfault at 0 ip 000000000040e007 sp 
> > 00007fd74a1d0eb0 error 4 in memcached[400000+1d000]
> >
> > [73640.102621] memcached[2802]: segfault at 0 ip 000000000040e007 sp 
> > 00007f7bb30bfeb0 error 4 in memcached[400000+1d000]
> >
> > [67690.640196] memcached[2787]: segfault at 0 ip 000000000040e007 sp 
> > 00007f299580feb0 error 4 in memcached[400000+1d000]
> >
> > [57729.895442] memcached[2786]: segfault at 0 ip 000000000040e007 sp 
> > 00007f204073deb0 error 4 in memcached[400000+1d000]
> >
> > [48009.284226] memcached[2801]: segfault at 0 ip 000000000040e007 sp 
> > 00007f7b30876eb0 error 4 in memcached[400000+1d000]
> >
> > [48198.211826] memcached[2811]: segfault at 0 ip 000000000040e007 sp 
> > 00007fd496d79eb0 error 4 in memcached[400000+1d000]
> >
> > [84057.439927] traps: memcached[2804] general protection ip:40f7c4 
> > sp:7fbe75fffeb0 error:0 in memcached[400000+1d000]
> >
> > [50215.489124] memcached[2784]: segfault at 0 ip 000000000040e007 sp 
> > 00007f3234b73eb0 error 4 in memcached[400000+1d000]
> >
> > [46545.316351] memcached[2789]: segfault at 0 ip 000000000040e007 sp 
> > 00007f362ceedeb0 error 4 in memcached[400000+1d000]
> >
> > [102076.523474] memcached[29833]: segfault at 0 ip 000000000040e007 sp 
> > 00007f3c89b9ebe0 error 4 in memcached[400000+1d000]
> >
> > [55537.568254] memcached[2780]: segfault at 0 ip 000000000040e007 sp 
> > 00007fc1f6005eb0 error 4 in memcached[400000+1d000]
> >
> >
> >
> >
> > On Thursday, October 1, 2015 at 5:40:35 PM UTC-7, Dormando wrote:
> >       got it. that might be a decent hint actually... I had addded a bugfix 
> > to
> >       the branch to not miscount the mem_requested counter, but it's not 
> > working
> >       or I missed a spot.
> >
> >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
> >
> >       > The number now, after maybe 90 minutes of writes, is 1,446. I think 
> > after disabling a lot of the data TTL'd out. I have to disable it for now, 
> > again (for unrelated reasons, again). The page that I screenshotted gives 
> > real time data, so the numbers were from right then. Last night, it should 
> > have shown better numbers in terms of "total_pages",
> >       but I didn't
> >       > get a screenshot. That number is directly from the stats slabs 
> > output.
> >       >
> >       >
> >       >
> >       > On Thursday, October 1, 2015 at 4:21:42 PM UTC-7, Dormando wrote:
> >       >       ok... slab class 12 claims to have 2 in "total_pages", yet 
> > 14g in
> >       >       mem_requested. is this stat wrong?
> >       >
> >       >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
> >       >
> >       >       > The ones that crashed (new code cluster) were set to only 
> > be written to from the client applications. The data is an index key and a 
> > series of data keys that are all written one after another. Each key might 
> > be hashed to a different server, though, so not all of them are written to 
> > the same server. I can give you a snapshot of one of
> >       the
> >       >       clusters that
> >       >       > didn't crash (attached file). I can give more detail 
> > offline if you need it.
> >       >       >
> >       >       >
> >       >       > On Thursday, October 1, 2015 at 2:32:53 PM UTC-7, Dormando 
> > wrote:
> >       >       >       Any chance you could describe (perhaps privately?) in 
> > very broad strokes
> >       >       >       what the write load looks like? (they're getting only 
> > writes, too?).
> >       >       >       otherwise I'll have to devise arbitrary torture 
> > tests. I'm sure the bug's
> >       >       >       in there but it's not obvious yet
> >       >       >
> >       >       >       On Thu, 1 Oct 2015, dormando wrote:
> >       >       >
> >       >       >       > perfect, thanks! I have $dayjob as well but will 
> > look into this as soon as
> >       >       >       > I can. my torture test machines are in a box but 
> > I'll try to borrow one
> >       >       >       >
> >       >       >       > On Thu, 1 Oct 2015, Scott Mansfield wrote:
> >       >       >       >
> >       >       >       > > Yes. Exact args:
> >       >       >       > > -p 11211 -u <omitted> -l 0.0.0.0 -c 100000 -o 
> > slab_reassign -o lru_maintainer,lru_crawler,hash_algorithm=murmur3 -I 4m -m 
> > 56253
> >       >       >       > >
> >       >       >       > > On Thursday, October 1, 2015 at 12:41:06 PM 
> > UTC-7, Dormando wrote:
> >       >       >       > >       Were lru_maintainer/lru_crawler/etc enabled 
> > though? even if slab mover is
> >       >       >       > >       off, those two were the big changes in .24
> >       >       >       > >
> >       >       >       > >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
> >       >       >       > >
> >       >       >       > >       > The same cluster has > 400 servers 
> > happily running 1.4.24. It's been our standard deployment for a while now, 
> > and we haven't seen any crashes. The servers in the same cluster running 
> > 1.4.24 (with the same write load the new build was taking) have been up for 
> > 29 days. The start options do not contain the slab_automove
> >       option
> >       >       because
> >       >       >       it wasn't
> >       >       >       > >       effective for
> >       >       >       > >       > us before. The memory given is possibly 
> > slightly different per server, as we calculate on startup how much we give. 
> > It's in the same ballpark, though (~56 gigs).
> >       >       >       > >       >
> >       >       >       > >       > On Thursday, October 1, 2015 at 12:11:35 
> > PM UTC-7, Dormando wrote:
> >       >       >       > >       >       Just before I sit in and try to 
> > narrow this down: have you run any host on
> >       >       >       > >       >       1.4.24 mainline with those same 
> > start options? just in case the crash is
> >       >       >       > >       >       older
> >       >       >       > >       >
> >       >       >       > >       >       On Thu, 1 Oct 2015, Scott Mansfield 
> > wrote:
> >       >       >       > >       >
> >       >       >       > >       >       > Another message for you:
> >       >       >       > >       >       > [78098.528606] traps: 
> > memcached[2757] general protection ip:412b9d sp:7fc0700dbdd0 error:0 in 
> > memcached[400000+1d000]
> >       >       >       > >       >       >
> >       >       >       > >       >       >
> >       >       >       > >       >       > addr2line shows:
> >       >       >       > >       >       >
> >       >       >       > >       >       > $ addr2line -e memcached 412b9d
> >       >       >       > >       >       >
> >       >       >       > >       >       > 
> > /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/assoc.c:119
> >       >       >       > >       >       >
> >       >       >       > >       >       >
> >       >       >       > >       >       >
> >       >       >       > >       >       > On Thursday, October 1, 2015 at 
> > 1:41:44 AM UTC-7, Dormando wrote:
> >       >       >       > >       >       >       Ok, thanks!
> >       >       >       > >       >       >
> >       >       >       > >       >       >       I'll noodle this a bit... 
> > unfortunately a backtrace might be more helpful.
> >       >       >       > >       >       >       will ask you to attempt to 
> > get one if I don't figure anything out in time.
> >       >       >       > >       >       >
> >       >       >       > >       >       >       (allow it to core dump or 
> > attach a GDB session and set an ignore handler
> >       >       >       > >       >       >       for sigpipe/int/etc and run 
> > "continue")
> >       >       >       > >       >       >
> >       >       >       > >       >       >       what were your full startup 
> > args, though?
> >       >       >       > >       >       >
> >       >       >       > >       >       >       On Thu, 1 Oct 2015, Scott 
> > Mansfield wrote:
> >       >       >       > >       >       >
> >       >       >       > >       >       >       > The commit was the latest 
> > in slab_rebal_next at the time:
> >       >       >       > >       >       >       > 
> > https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a
> >       >       >       > >       >       >       >
> >       >       >       > >       >       >       > addr2line gave me this 
> > output:
> >       >       >       > >       >       >       >
> >       >       >       > >       >       >       > $ addr2line -e memcached 
> > 0x40e007
> >       >       >       > >       >       >       >
> >       >       >       > >       >       >       > 
> > /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/slabs.c:264
> >       >       >       > >       >       >       >
> >       >       >       > >       >       >       >
> >       >       >       > >       >       >       > As well, this was running 
> > with production writes, but not reads. Even if we had reads on with the few 
> > servers crashing, we're ok architecturally. That's why I can get it out 
> > there without worrying too much. For now, I'm going to turn it off. I had a 
> > metrics issue anyway that needs to get fixed.
> >       Tomorrow I'm
> >       >       planning
> >       >       >       to test
> >       >       >       > >       again with
> >       >       >       > >       >       more
> >       >       >       > >       >       >       metrics, but I
> >       >       >       > >       >       >       > can get any new code in 
> > pretty quick.
> >       >       >       > >       >       >       >
> >       >       >       > >       >       >       >
> >       >       >       > >       >       >       > On Thursday, October 1, 
> > 2015 at 1:01:36 AM UTC-7, Dormando wrote:
> >       >       >       > >       >       >       >       How many servers 
> > were you running it on? I hope it wasn't more than a
> >       >       >       > >       >       >       >       handful. I'd 
> > recommend starting with one :P
> >       >       >       > >       >       >       >
> >       >       >       > >       >       >       >       can you do an 
> > addr2line? what were your startup args, and what was the
> >       >       >       > >       >       >       >       commit sha1 for the 
> > branch you pulled?
> >       >       >       > >       >       >       >
> >       >       >       > >       >       >       >       sorry about that :/
> >       >       >       > >       >       >       >
> >       >       >       > >       >       >       >       On Thu, 1 Oct 2015, 
> > Scott Mansfield wrote:
> >       >       >       > >       >       >       >
> >       >       >       > >       >       >       >       > A few different 
> > servers (5 / 205) experienced a segfault all within an hour or so. 
> > Unfortunately at this point I'm a bit out of my depth. I have the dmesg 
> > output, which is identical for all 5 boxes:
> >       >       >       > >       >       >       >       >
> >       >       >       > >       >       >       >       > [46545.316351] 
> > memcached[2789]: segfault at 0 ip 000000000040e007 sp 00007f362ceedeb0 
> > error 4 in memcached[400000+1d000]
> >       >       >       > >       >       >       >       >
> >       >       >       > >       >       >       >       >
> >       >       >       > >       >       >       >       > I can possibly 
> > supply the binary file if needed, though we didn't do anything besides the 
> > standard setup and compile.
> >       >       >       > >       >       >       >       >
> >       >       >       > >       >       >       >       >
> >       >       >       > >       >       >       >       >
> >       >       >       > >       >       >       >       > On Tuesday, 
> > September 29, 2015 at 10:27:59 PM UTC-7, Dormando wrote:
> >       >       >       > >       >       >       >       >       If you look 
> > at the new branch there's a commit explaining the new stats.
> >       >       >       > >       >       >       >       >
> >       >       >       > >       >       >       >       >       You can 
> > watch slab_reassing_evictions vs slab_reassign_saves. you can also
> >       >       >       > >       >       >       >       >       test 
> > automove=1 vs automove=2 (please also turn on the lru_maintainer and
> >       >       >       > >       >       >       >       >       
> > lru_crawler).
> >       >       >       > >       >       >       >       >
> >       >       >       > >       >       >       >       >       The initial 
> > branch you were running didn't add any new stats. It just
> >       >       >       > >       >       >       >       >       restored an 
> > old feature.
> >       >       >       > >       >       >       >       >
> >       >       >       > >       >       >       >       >       On Tue, 29 
> > Sep 2015, Scott Mansfield wrote:
> >       >       >       > >       >       >       >       >
> >       >       >       > >       >       >       >       >       > An 
> > unrelated prod problem meant I had to stop after about an hour. I'm turning 
> > it on again tomorrow morning.
> >       >       >       > >       >       >       >       >       > Are there 
> > any new metrics I should be looking at? Anything new in the stats output? 
> > I'm about to take a look at the diffs as well.
> >       >       >       > >       >       >       >       >       >
> >       >       >       > >       >       >       >       >       > On 
> > Tuesday, September 29, 2015 at 12:37:45 PM UTC-7, Dormando wrote:
> >       >       >       > >       >       >       >       >       >       
> > excellent. if automove=2 is too aggressive you'll see that come in in a
> >       >       >       > >       >       >       >       >       >       hit 
> > ratio reduction.
> >       >       >       > >       >       >       >       >       >
> >       >       >       > >       >       >       >       >       >       the 
> > new branch works with automove=2 as well, but it will attempt to
> >       >       >       > >       >       >       >       >       >       
> > rescue valid items in the old slab if possible. I'll still be working on
> >       >       >       > >       >       >       >       >       >       it 
> > for another few hours today though. I'll mail again when I'm done.
> >       >       >       > >       >       >       >       >       >
> >       >       >       > >       >       >       >       >       >       On 
> > Tue, 29 Sep 2015, Scott Mansfield wrote:
> >       >       >       > >       >       >       >       >       >
> >       >       >       > >       >       >       >       >       >       > I 
> > have the first commit (slab_automove=2) running in prod right now. Later 
> > today will be a full load production test of the latest code. I'll just let 
> > it run for a few days unless I spot any problems. We have good metrics for 
> > latency et. al. from the client side, though network
> >       normally
> >       >       dwarfs
> >       >       >       memcached
> >       >       >       > >       time.
> >       >       >       > >       >       >       >       >       >       >
> >       >       >       > >       >       >       >       >       >       > 
> > On Tuesday, September 29, 2015 at 3:10:03 AM UTC-7, Dormando wrote:
> >       >       >       > >       >       >       >       >       >       >   
> >     That's unfortunate.
> >       >       >       > >       >       >       >       >       >       >
> >       >       >       > >       >       >       >       >       >       >   
> >     I've done some more work on the branch:
> >       >       >       > >       >       >       >       >       >       >   
> >     https://github.com/memcached/memcached/pull/112
> >       >       >       > >       >       >       >       >       >       >
> >       >       >       > >       >       >       >       >       >       >   
> >     It's not completely likely you would see enough of an improvement from 
> > the
> >       >       >       > >       >       >       >       >       >       >   
> >     new default mode. However if your item sizes change gradually, items are
> >       >       >       > >       >       >       >       >       >       >   
> >     reclaimed during expiration, or get overwritten (and thus freed in the 
> > old
> >       >       >       > >       >       >       >       >       >       >   
> >     class), it should work just fine. I have another patch coming which 
> > should
> >       >       >       > >       >       >       >       >       >       >   
> >     help though.
> >       >       >       > >       >       >       >       >       >       >
> >       >       >       > >       >       >       >       >       >       >   
> >     Open to feedback from any interested party.
> >       >       >       > >       >       >       >       >       >       >
> >       >       >       > >       >       >       >       >       >       >   
> >     On Fri, 25 Sep 2015, Scott Mansfield wrote:
> >       >       >       > >       >       >       >       >       >       >
> >       >       >       > >       >       >       >       >       >       >   
> >     > I have it running internally, and it runs fine under normal load. 
> > It's difficult to put it into the line of fire for a production workload 
> > because of social reasons... As well it's a degenerate case that we 
> > normally don't run in to (and actively try to avoid). I'm going
> >       to run
> >       >       some
> >       >       >       heavier load
> >       >       >       > >       tests on
> >       >       >       > >       >       it
> >       >       >       > >       >       >       today. 
> >       >       >       > >       >       >       >       >       >       >   
> >     >
> >       >       >       > >       >       >       >       >       >       >   
> >     > On Wednesday, September 9, 2015 at 10:23:32 AM UTC-7, Scott Mansfield 
> > wrote:
> >       >       >       > >       >       >       >       >       >       >   
> >     >       I'm working on getting a test going internally. I'll let you 
> > know how it goes. 
> >       >       >       > >       >       >       >       >       >       >   
> >     >
> >       >       >       > >       >       >       >       >       >       >   
> >     >
> >       >       >       > >       >       >       >       >       >       >   
> >     > Scott Mansfield
> >       >       >       > >       >       >       >       >       >       >   
> >     > On Mon, Sep 7, 2015 at 2:33 PM, dormando wrote:
> >       >       >       > >       >       >       >       >       >       >   
> >     >       Yo,
> >       >       >       > >       >       >       >       >       >       >   
> >     >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       https://github.com/dormando/memcached/commits/slab_rebal_next - 
> > would you
> >       >       >       > >       >       >       >       >       >       >   
> >     >       mind playing around with the branch here? You can see the start 
> > options in
> >       >       >       > >       >       >       >       >       >       >   
> >     >       the test.
> >       >       >       > >       >       >       >       >       >       >   
> >     >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       This is a dead simple modification (a restoration of a feature 
> > that was
> >       >       >       > >       >       >       >       >       >       >   
> >     >       arleady there...). The test very aggressively writes and is 
> > able to shunt
> >       >       >       > >       >       >       >       >       >       >   
> >     >       memory around appropriately.
> >       >       >       > >       >       >       >       >       >       >   
> >     >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       The work I'm exploring right now will allow savings of items 
> > being
> >       >       >       > >       >       >       >       >       >       >   
> >     >       rebalanced from, and increasing the aggression of page moving 
> > without
> >       >       >       > >       >       >       >       >       >       >   
> >     >       being so brain damaged about it.
> >       >       >       > >       >       >       >       >       >       >   
> >     >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       But while I'm poking around with that, I'd be interested in 
> > knowing if
> >       >       >       > >       >       >       >       >       >       >   
> >     >       this simple branch is an improvement, and if so how much.
> >       >       >       > >       >       >       >       >       >       >   
> >     >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       I'll push more code to the branch, but the changes should be 
> > gated behind
> >       >       >       > >       >       >       >       >       >       >   
> >     >       a feature flag.
> >       >       >       > >       >       >       >       >       >       >   
> >     >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       On Tue, 18 Aug 2015, 'Scott Mansfield' via memcached wrote:
> >       >       >       > >       >       >       >       >       >       >   
> >     >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       > No worries man, you're doing us a favor. Let me know if 
> > there's anything you need from us, and I promise I'll be quicker this time 
> > :)
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       > On Aug 18, 2015 12:01 AM, "dormando" <dorm...@rydia.net> 
> > wrote:
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       Hey,
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       I'm still really interested in working on this. I'll be 
> > taking a careful
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       look soon I hope.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       On Mon, 3 Aug 2015, Scott Mansfield wrote:
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       > I've tweaked the program slightly, so I'm adding a 
> > new version. It prints more stats as it goes and runs a bit faster.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       > On Monday, August 3, 2015 at 1:20:37 AM UTC-7, Scott 
> > Mansfield wrote:
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >       Total brain fart on my part. Apparently I had 
> > memcached 1.4.13 on my path (who knows how...) Using the actual one that 
> > I've built works. Sorry for the confusion... can't believe I didn't realize 
> > that before. I'm testing against the compiled one now
> >       to see
> >       >       how it
> >       >       >       behaves.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >       On Monday, August 3, 2015 at 1:15:06 AM UTC-7, 
> > Dormando wrote:
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             You sure that's 1.4.24? None of those 
> > fail for me :(
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             On Mon, 3 Aug 2015, Scott Mansfield wrote:
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             > The command line I've used that will 
> > start is:
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             > memcached -m 64 -o 
> > slab_reassign,slab_automove
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             > the ones that fail are:
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             > memcached -m 64 -o 
> > slab_reassign,slab_automove,lru_crawler,lru_maintainer
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             > memcached -o lru_crawler
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             > I'm sure I've missed something during 
> > compile, though I just used ./configure and make.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             > On Monday, August 3, 2015 at 12:22:33 
> > AM UTC-7, Scott Mansfield wrote:
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       I've attached a pretty simple 
> > program to connect, fill a slab with data, and then fill another slab 
> > slowly with data of a different size. I've been trying to get memcached to 
> > run with the lru_crawler and lru_maintainer flags, but I get
> >       '
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       Illegal suboption "(null)"' every 
> > time I try to start with either in any configuration.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       I haven't seen it start to move 
> > slabs automatically with a freshly installed 1.2.24.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       On Tuesday, July 21, 2015 at 
> > 4:55:17 PM UTC-7, Scott Mansfield wrote:
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >             I realize I've not given 
> > you the tests to reproduce the behavior. I should be able to soon. Sorry 
> > about the delay here.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             > In the mean time, I wanted to bring up 
> > a possible secondary use of the same logic to move items on slab 
> > rebalancing. I think the system might benefit from using the same logic to 
> > crawl the pages in a slab and compact the data in the
> >       background. In
> >       >       the case
> >       >       >       where we
> >       >       >       > >       have
> >       >       >       > >       >       memory that
> >       >       >       > >       >       >       is
> >       >       >       > >       >       >       >       assigned to
> >       >       >       > >       >       >       >       >       the slab
> >       >       >       > >       >       >       >       >       >       but 
> > not
> >       >       >       > >       >       >       >       >       >       >   
> >     >       being used
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       because
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             of replaced
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             > or TTL'd out data, returning the memory 
> > to a pool of free memory will allow a slab to grow with that memory first 
> > instead of waiting for an event where memory is needed at that instant.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             > It's a change in approach, from 
> > reactive to proactive. What do you think?
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             > On Monday, July 13, 2015 at 5:54:11 PM 
> > UTC-7, Dormando wrote:
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       > First, more detail for you:
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       > We are running 1.4.24 in 
> > production and haven't noticed any bugs as of yet. The new LRUs seem to be 
> > working well, though we nearly always run memcached scaled to hold all data 
> > without evictions. Those with evictions are behaving
> >       well. Those
> >       >       without
> >       >       >       evictions
> >       >       >       > >       haven't
> >       >       >       > >       >       seen
> >       >       >       > >       >       >       crashing or
> >       >       >       > >       >       >       >       any
> >       >       >       > >       >       >       >       >       other
> >       >       >       > >       >       >       >       >       >       
> > noticeable
> >       >       >       > >       >       >       >       >       >       >   
> >     bad
> >       >       >       > >       >       >       >       >       >       >   
> >     >       behavior.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       Neat.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       > OK, I think I see an area where 
> > I was speculating on functionality. If you have a key in slab 21 and then 
> > the same key is written again at a larger size in slab 23 I assumed that 
> > the space in 21 was not freed on the second write.
> >       With that
> >       >       >       assumption, the LRU
> >       >       >       > >       crawler
> >       >       >       > >       >       would
> >       >       >       > >       >       >       not free
> >       >       >       > >       >       >       >       up that
> >       >       >       > >       >       >       >       >       space.
> >       >       >       > >       >       >       >       >       >       
> > Also just
> >       >       >       > >       >       >       >       >       >       >   
> >     >       by observation
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       in
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             the
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       macro, the space is not freed
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       > fast enough to be effective, in 
> > our use case, to accept the writes that are happening. Think in the 
> > hundreds of millions of "overwrites" in a 6 - 10 hour period across a 
> > cluster.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       Internally, "items" (a key/value 
> > pair) are generally immutable. The only
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       time when it's not is for 
> > INCR/DECR, and it still becomes immutable if two
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       INCR/DECR's collide.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       What this means, is that the new 
> > item is staged in a piece of free memory
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       while the "upload" stage of the 
> > SET happens. When memcached has all of the
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       data in memory to replace the 
> > item, it does an internal swap under a lock.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       The old item is removed from the 
> > hash table and LRU, and the new item gets
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       put in its place (at the head of 
> > the LRU).
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       Since items are refcounted, this 
> > means that if other users are downloading
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       an item which just got replaced, 
> > their memory doesn't get corrupted by the
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       item changing out from underneath 
> > them. They can continue to read the old
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       item until they're done. When the 
> > refcount reaches zero the old memory is
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       reclaimed.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       Most of the time, the item 
> > replacement happens then the old memory is
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       immediately removed.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       However, this does mean that you 
> > need *one* piece of free memory to
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       replace the old one. Then the old 
> > memory gets freed after that set.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       So if you take a memcached 
> > instance with 0 free chunks, and do a rolling
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       replacement of all items (within 
> > the same slab class as before), the first
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       one would cause an eviction from 
> > the tail of the LRU to get a free chunk.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       Every SET after that would use 
> > the chunk freed from the replacement of the
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       previous memory.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       > After that last sentence I 
> > realized I also may not have explained well enough the access pattern. The 
> > keys are all overwritten every day, but it takes some time to write them 
> > all (obviously). We see a huge increase in the bytes
> >       metric as if
> >       >       the new
> >       >       >       data for
> >       >       >       > >       the old
> >       >       >       > >       >       keys was
> >       >       >       > >       >       >       being
> >       >       >       > >       >       >       >       written
> >       >       >       > >       >       >       >       >       for the
> >       >       >       > >       >       >       >       >       >       
> > first
> >       >       >       > >       >       >       >       >       >       >   
> >     time.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       Since the
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       "old"
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             slab for
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       the same key doesn't
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       > proactively release memory, it 
> > starts to fill up the cache and then start evicting data in the new slab. 
> > Once that happens, we see evictions in the old slab because of the 
> > algorithm you mentioned (random picking / freeing of memory).
> >       >       Typically we
> >       >       >       don't see
> >       >       >       > >       any use
> >       >       >       > >       >       for
> >       >       >       > >       >       >       "upgrading" an
> >       >       >       > >       >       >       >       item as
> >       >       >       > >       >       >       >       >       the new
> >       >       >       > >       >       >       >       >       >       data
> >       >       >       > >       >       >       >       >       >       >   
> >     >       would be entirely
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             new and
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       should wholesale replace the
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       > old data for that key. More 
> > specifically, the operation is always set, with different data each day.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       Right. Most of your problems will 
> > come from two areas. One being that
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       writing data aggressively into 
> > the new slab class (unless you set the
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       rebalancer to always-replace 
> > mode), the mover will make memory available
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       more slowly than you can insert. 
> > So you'll cause extra evictions in the
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       new slab class.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       The secondary problem is from the 
> > random evictions in the previous slab
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       class as stuff is chucked on the 
> > floor to make memory moveable.
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       > As for testing, we'll be able 
> > to put it under real production workload. I don't know what kind of data 
> > you mean you need for testing. The data stored in the caches are highly 
> > confidential. I can give you all kinds of metrics, since
> >       we
> >       >       collect most
> >       >       >       of the ones
> >       >       >       > >       that
> >       >       >       > >       >       are in the
> >       >       >       > >       >       >       stats
> >       >       >       > >       >       >       >       and some
> >       >       >       > >       >       >       >       >       from the
> >       >       >       > >       >       >       >       >       >       
> > stats
> >       >       >       > >       >       >       >       >       >       >   
> >     >       slabs output. If
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             you have
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       some specific ones that
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       > need collecting, I'll double 
> > check and make sure we can get those. Alternatively, it might be most 
> > beneficial to see the metrics in person :)
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       I just need stats snapshots here 
> > and there, and actually putting the thing
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       under load. When I did the LRU 
> > work I had to beg for several months
> >       >       >       > >       >       >       >       >       >       >   
> >     >       >       >             >       before anyone tested it with a 
> > production load. This slows things down and
> >       >       >       > >       >       >       >       >       >     ...
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google Groups 
> > "memcached" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to memcached+unsubscr...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
> >
> >

Re: Check for orphaned items in lru crawler thread

Reply via email to