It took a day of running torture tests which took 30-90 minutes to fail, but along with a bunch of house chores I believe I've found the problem:
https://github.com/dormando/memcached/tree/slab_rebal_next - has a new commit, specifically this: https://github.com/dormando/memcached/commit/1c32e5eeff5bd2a8cc9b652a2ed808157e4929bb It's somewhat relieving that when I brained this super hard back in january I may have actually gotten the complex set of interactions correct, I simply failed to keep typing when converting the comments to code. So this has been broken since 1.4.24, but hardly anyone uses the page mover apparently. It's survived a 5 hour torture test (that I wrote in 2011!) once fixed (previously dying after 30-90 minutes). So please give this one a try and let me know how it goes. If it goes well I can merge up some other fixes from PR list and cut a release, unless someone has feedback for something to change. thanks! On Thu, 1 Oct 2015, dormando wrote: > I've seen items.c:1183 reported elsewhere in 1.4.24... so probably the bug > was introduced when I rewrote the page mover for that. > > I didn't mean to send me a core file: I mean if you dump the core you can > load it in gdb and get the backtrace (bt + thread apply all bt) > > Don't have a handler for convenient attaching :( > > didn't get a chance to poke at this today... I'll need another day to try > it out. > > On Thu, 1 Oct 2015, Scott Mansfield wrote: > > > Sorry for the data dumps here, but I want to give you everything I have. I > > found 3 more addresses that showed up in the dmesg logs: > > > > $ for addr in 40e013 40eff4 40f7c4; do addr2line -e memcached $addr; done > > > > .../build/memcached-1.4.24-slab-rebal-next/slabs.c:265 (discriminator 1) > > > > .../build/memcached-1.4.24-slab-rebal-next/items.c:312 (discriminator 1) > > > > .../build/memcached-1.4.24-slab-rebal-next/items.c:1183 > > > > > > I still haven't tried to attach a debugger, since the frequency of the > > error would make it hard to catch it. Is there a handler that I could add > > in to dump the stack trace when it segfaults? I'd get a core dump, but they > > would be HUGE and contain confidential information. > > > > > > Below are the full dmesg logs. Out of 205 servers, 35 had dmesg logs after > > a memcached crash, and only one crashed twice, both times on the original > > segfault. Below is the full unified set of dmesg logs, from which you can > > get a sense of frequency. > > > > > > [47992.109269] memcached[2798]: segfault at 0 ip 000000000040e007 sp > > 00007f4d20d25eb0 error 4 in memcached[400000+1d000] > > > > [48960.851278] memcached[2805]: segfault at 0 ip 000000000040e007 sp > > 00007f3c30d15eb0 error 4 in memcached[400000+1d000] > > > > [46421.604609] memcached[2784]: segfault at 0 ip 000000000040e007 sp > > 00007fdb94612eb0 error 4 in memcached[400000+1d000] > > > > [48429.671534] traps: memcached[2768] general protection ip:40e013 > > sp:7f1c32676be0 error:0 in memcached[400000+1d000] > > > > [71838.979269] memcached[2792]: segfault at 0 ip 000000000040e007 sp > > 00007f0162feeeb0 error 4 in memcached[400000+1d000] > > > > [66763.091475] memcached[2804]: segfault at 0 ip 000000000040e007 sp > > 00007f8240170eb0 error 4 in memcached[400000+1d000] > > > > [102544.376092] traps: memcached[2792] general protection ip:40eff4 > > sp:7fa58095be18 error:0 in memcached[400000+1d000] > > > > [49932.757825] memcached[2777]: segfault at 0 ip 000000000040e007 sp > > 00007f1ff2131eb0 error 4 in memcached[400000+1d000] > > > > [50400.415878] memcached[2794]: segfault at 0 ip 000000000040e007 sp > > 00007f11a26daeb0 error 4 in memcached[400000+1d000] > > > > [48986.340345] memcached[2786]: segfault at 0 ip 000000000040e007 sp > > 00007f9235279eb0 error 4 in memcached[400000+1d000] > > > > [44742.175894] memcached[2796]: segfault at 0 ip 000000000040e007 sp > > 00007eff3a0cceb0 error 4 in memcached[400000+1d000] > > > > [49030.431879] memcached[2776]: segfault at 0 ip 000000000040e007 sp > > 00007fdef27cfbe0 error 4 in memcached[400000+1d000] > > > > [50211.611439] traps: memcached[2782] general protection ip:40e013 > > sp:7f9ee1723be0 error:0 in memcached[400000+1d000] > > > > [62534.892817] memcached[2783]: segfault at 0 ip 000000000040e007 sp > > 00007f37f2d4beb0 error 4 in memcached[400000+1d000] > > > > [78697.201195] memcached[2801]: segfault at 0 ip 000000000040e007 sp > > 00007f696ef1feb0 error 4 in memcached[400000+1d000] > > > > [48922.246712] memcached[2804]: segfault at 0 ip 000000000040e007 sp > > 00007f1ebb338eb0 error 4 in memcached[400000+1d000] > > > > [52170.371014] memcached[2809]: segfault at 0 ip 000000000040e007 sp > > 00007f5e62fcbeb0 error 4 in memcached[400000+1d000] > > > > [69531.775868] memcached[2785]: segfault at 0 ip 000000000040e007 sp > > 00007ff50ac2eeb0 error 4 in memcached[400000+1d000] > > > > [48926.661559] memcached[2799]: segfault at 0 ip 000000000040e007 sp > > 00007f71e0ac6be0 error 4 in memcached[400000+1d000] > > > > [49491.126885] memcached[2745]: segfault at 0 ip 000000000040e007 sp > > 00007f5737c4beb0 error 4 in memcached[400000+1d000] > > > > [104247.724294] traps: memcached[2793] general protection ip:40f7c4 > > sp:7f3af8c27eb0 error:0 in memcached[400000+1d000] > > > > [78098.528606] traps: memcached[2757] general protection ip:412b9d > > sp:7fc0700dbdd0 error:0 in memcached[400000+1d000] > > > > [71958.385432] memcached[2809]: segfault at 0 ip 000000000040e007 sp > > 00007f8b68cd0eb0 error 4 in memcached[400000+1d000] > > > > [48934.182852] memcached[2787]: segfault at 0 ip 000000000040e007 sp > > 00007f0aef774eb0 error 4 in memcached[400000+1d000] > > > > [104220.754195] traps: memcached[2802] general protection ip:40f7c4 > > sp:7ffa85a2deb0 error:0 in memcached[400000+1d000] > > > > [45807.670246] memcached[2755]: segfault at 0 ip 000000000040e007 sp > > 00007fd74a1d0eb0 error 4 in memcached[400000+1d000] > > > > [73640.102621] memcached[2802]: segfault at 0 ip 000000000040e007 sp > > 00007f7bb30bfeb0 error 4 in memcached[400000+1d000] > > > > [67690.640196] memcached[2787]: segfault at 0 ip 000000000040e007 sp > > 00007f299580feb0 error 4 in memcached[400000+1d000] > > > > [57729.895442] memcached[2786]: segfault at 0 ip 000000000040e007 sp > > 00007f204073deb0 error 4 in memcached[400000+1d000] > > > > [48009.284226] memcached[2801]: segfault at 0 ip 000000000040e007 sp > > 00007f7b30876eb0 error 4 in memcached[400000+1d000] > > > > [48198.211826] memcached[2811]: segfault at 0 ip 000000000040e007 sp > > 00007fd496d79eb0 error 4 in memcached[400000+1d000] > > > > [84057.439927] traps: memcached[2804] general protection ip:40f7c4 > > sp:7fbe75fffeb0 error:0 in memcached[400000+1d000] > > > > [50215.489124] memcached[2784]: segfault at 0 ip 000000000040e007 sp > > 00007f3234b73eb0 error 4 in memcached[400000+1d000] > > > > [46545.316351] memcached[2789]: segfault at 0 ip 000000000040e007 sp > > 00007f362ceedeb0 error 4 in memcached[400000+1d000] > > > > [102076.523474] memcached[29833]: segfault at 0 ip 000000000040e007 sp > > 00007f3c89b9ebe0 error 4 in memcached[400000+1d000] > > > > [55537.568254] memcached[2780]: segfault at 0 ip 000000000040e007 sp > > 00007fc1f6005eb0 error 4 in memcached[400000+1d000] > > > > > > > > > > On Thursday, October 1, 2015 at 5:40:35 PM UTC-7, Dormando wrote: > > got it. that might be a decent hint actually... I had addded a bugfix > > to > > the branch to not miscount the mem_requested counter, but it's not > > working > > or I missed a spot. > > > > On Thu, 1 Oct 2015, Scott Mansfield wrote: > > > > > The number now, after maybe 90 minutes of writes, is 1,446. I think > > after disabling a lot of the data TTL'd out. I have to disable it for now, > > again (for unrelated reasons, again). The page that I screenshotted gives > > real time data, so the numbers were from right then. Last night, it should > > have shown better numbers in terms of "total_pages", > > but I didn't > > > get a screenshot. That number is directly from the stats slabs > > output. > > > > > > > > > > > > On Thursday, October 1, 2015 at 4:21:42 PM UTC-7, Dormando wrote: > > > ok... slab class 12 claims to have 2 in "total_pages", yet > > 14g in > > > mem_requested. is this stat wrong? > > > > > > On Thu, 1 Oct 2015, Scott Mansfield wrote: > > > > > > > The ones that crashed (new code cluster) were set to only > > be written to from the client applications. The data is an index key and a > > series of data keys that are all written one after another. Each key might > > be hashed to a different server, though, so not all of them are written to > > the same server. I can give you a snapshot of one of > > the > > > clusters that > > > > didn't crash (attached file). I can give more detail > > offline if you need it. > > > > > > > > > > > > On Thursday, October 1, 2015 at 2:32:53 PM UTC-7, Dormando > > wrote: > > > > Any chance you could describe (perhaps privately?) in > > very broad strokes > > > > what the write load looks like? (they're getting only > > writes, too?). > > > > otherwise I'll have to devise arbitrary torture > > tests. I'm sure the bug's > > > > in there but it's not obvious yet > > > > > > > > On Thu, 1 Oct 2015, dormando wrote: > > > > > > > > > perfect, thanks! I have $dayjob as well but will > > look into this as soon as > > > > > I can. my torture test machines are in a box but > > I'll try to borrow one > > > > > > > > > > On Thu, 1 Oct 2015, Scott Mansfield wrote: > > > > > > > > > > > Yes. Exact args: > > > > > > -p 11211 -u <omitted> -l 0.0.0.0 -c 100000 -o > > slab_reassign -o lru_maintainer,lru_crawler,hash_algorithm=murmur3 -I 4m -m > > 56253 > > > > > > > > > > > > On Thursday, October 1, 2015 at 12:41:06 PM > > UTC-7, Dormando wrote: > > > > > > Were lru_maintainer/lru_crawler/etc enabled > > though? even if slab mover is > > > > > > off, those two were the big changes in .24 > > > > > > > > > > > > On Thu, 1 Oct 2015, Scott Mansfield wrote: > > > > > > > > > > > > > The same cluster has > 400 servers > > happily running 1.4.24. It's been our standard deployment for a while now, > > and we haven't seen any crashes. The servers in the same cluster running > > 1.4.24 (with the same write load the new build was taking) have been up for > > 29 days. The start options do not contain the slab_automove > > option > > > because > > > > it wasn't > > > > > > effective for > > > > > > > us before. The memory given is possibly > > slightly different per server, as we calculate on startup how much we give. > > It's in the same ballpark, though (~56 gigs). > > > > > > > > > > > > > > On Thursday, October 1, 2015 at 12:11:35 > > PM UTC-7, Dormando wrote: > > > > > > > Just before I sit in and try to > > narrow this down: have you run any host on > > > > > > > 1.4.24 mainline with those same > > start options? just in case the crash is > > > > > > > older > > > > > > > > > > > > > > On Thu, 1 Oct 2015, Scott Mansfield > > wrote: > > > > > > > > > > > > > > > Another message for you: > > > > > > > > [78098.528606] traps: > > memcached[2757] general protection ip:412b9d sp:7fc0700dbdd0 error:0 in > > memcached[400000+1d000] > > > > > > > > > > > > > > > > > > > > > > > > addr2line shows: > > > > > > > > > > > > > > > > $ addr2line -e memcached 412b9d > > > > > > > > > > > > > > > > > > /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/assoc.c:119 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thursday, October 1, 2015 at > > 1:41:44 AM UTC-7, Dormando wrote: > > > > > > > > Ok, thanks! > > > > > > > > > > > > > > > > I'll noodle this a bit... > > unfortunately a backtrace might be more helpful. > > > > > > > > will ask you to attempt to > > get one if I don't figure anything out in time. > > > > > > > > > > > > > > > > (allow it to core dump or > > attach a GDB session and set an ignore handler > > > > > > > > for sigpipe/int/etc and run > > "continue") > > > > > > > > > > > > > > > > what were your full startup > > args, though? > > > > > > > > > > > > > > > > On Thu, 1 Oct 2015, Scott > > Mansfield wrote: > > > > > > > > > > > > > > > > > The commit was the latest > > in slab_rebal_next at the time: > > > > > > > > > > > https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a > > > > > > > > > > > > > > > > > > addr2line gave me this > > output: > > > > > > > > > > > > > > > > > > $ addr2line -e memcached > > 0x40e007 > > > > > > > > > > > > > > > > > > > > /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/slabs.c:264 > > > > > > > > > > > > > > > > > > > > > > > > > > > As well, this was running > > with production writes, but not reads. Even if we had reads on with the few > > servers crashing, we're ok architecturally. That's why I can get it out > > there without worrying too much. For now, I'm going to turn it off. I had a > > metrics issue anyway that needs to get fixed. > > Tomorrow I'm > > > planning > > > > to test > > > > > > again with > > > > > > > more > > > > > > > > metrics, but I > > > > > > > > > can get any new code in > > pretty quick. > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thursday, October 1, > > 2015 at 1:01:36 AM UTC-7, Dormando wrote: > > > > > > > > > How many servers > > were you running it on? I hope it wasn't more than a > > > > > > > > > handful. I'd > > recommend starting with one :P > > > > > > > > > > > > > > > > > > can you do an > > addr2line? what were your startup args, and what was the > > > > > > > > > commit sha1 for the > > branch you pulled? > > > > > > > > > > > > > > > > > > sorry about that :/ > > > > > > > > > > > > > > > > > > On Thu, 1 Oct 2015, > > Scott Mansfield wrote: > > > > > > > > > > > > > > > > > > > A few different > > servers (5 / 205) experienced a segfault all within an hour or so. > > Unfortunately at this point I'm a bit out of my depth. I have the dmesg > > output, which is identical for all 5 boxes: > > > > > > > > > > > > > > > > > > > > [46545.316351] > > memcached[2789]: segfault at 0 ip 000000000040e007 sp 00007f362ceedeb0 > > error 4 in memcached[400000+1d000] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I can possibly > > supply the binary file if needed, though we didn't do anything besides the > > standard setup and compile. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tuesday, > > September 29, 2015 at 10:27:59 PM UTC-7, Dormando wrote: > > > > > > > > > > If you look > > at the new branch there's a commit explaining the new stats. > > > > > > > > > > > > > > > > > > > > You can > > watch slab_reassing_evictions vs slab_reassign_saves. you can also > > > > > > > > > > test > > automove=1 vs automove=2 (please also turn on the lru_maintainer and > > > > > > > > > > > > lru_crawler). > > > > > > > > > > > > > > > > > > > > The initial > > branch you were running didn't add any new stats. It just > > > > > > > > > > restored an > > old feature. > > > > > > > > > > > > > > > > > > > > On Tue, 29 > > Sep 2015, Scott Mansfield wrote: > > > > > > > > > > > > > > > > > > > > > An > > unrelated prod problem meant I had to stop after about an hour. I'm turning > > it on again tomorrow morning. > > > > > > > > > > > Are there > > any new metrics I should be looking at? Anything new in the stats output? > > I'm about to take a look at the diffs as well. > > > > > > > > > > > > > > > > > > > > > > On > > Tuesday, September 29, 2015 at 12:37:45 PM UTC-7, Dormando wrote: > > > > > > > > > > > > > excellent. if automove=2 is too aggressive you'll see that come in in a > > > > > > > > > > > hit > > ratio reduction. > > > > > > > > > > > > > > > > > > > > > > the > > new branch works with automove=2 as well, but it will attempt to > > > > > > > > > > > > > rescue valid items in the old slab if possible. I'll still be working on > > > > > > > > > > > it > > for another few hours today though. I'll mail again when I'm done. > > > > > > > > > > > > > > > > > > > > > > On > > Tue, 29 Sep 2015, Scott Mansfield wrote: > > > > > > > > > > > > > > > > > > > > > > > I > > have the first commit (slab_automove=2) running in prod right now. Later > > today will be a full load production test of the latest code. I'll just let > > it run for a few days unless I spot any problems. We have good metrics for > > latency et. al. from the client side, though network > > normally > > > dwarfs > > > > memcached > > > > > > time. > > > > > > > > > > > > > > > > > > > > > > > > > > On Tuesday, September 29, 2015 at 3:10:03 AM UTC-7, Dormando wrote: > > > > > > > > > > > > > > That's unfortunate. > > > > > > > > > > > > > > > > > > > > > > > > > > I've done some more work on the branch: > > > > > > > > > > > > > > https://github.com/memcached/memcached/pull/112 > > > > > > > > > > > > > > > > > > > > > > > > > > It's not completely likely you would see enough of an improvement from > > the > > > > > > > > > > > > > > new default mode. However if your item sizes change gradually, items are > > > > > > > > > > > > > > reclaimed during expiration, or get overwritten (and thus freed in the > > old > > > > > > > > > > > > > > class), it should work just fine. I have another patch coming which > > should > > > > > > > > > > > > > > help though. > > > > > > > > > > > > > > > > > > > > > > > > > > Open to feedback from any interested party. > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, 25 Sep 2015, Scott Mansfield wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > I have it running internally, and it runs fine under normal load. > > It's difficult to put it into the line of fire for a production workload > > because of social reasons... As well it's a degenerate case that we > > normally don't run in to (and actively try to avoid). I'm going > > to run > > > some > > > > heavier load > > > > > > tests on > > > > > > > it > > > > > > > > today. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wednesday, September 9, 2015 at 10:23:32 AM UTC-7, Scott Mansfield > > wrote: > > > > > > > > > > > > > > > I'm working on getting a test going internally. I'll let you > > know how it goes. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Scott Mansfield > > > > > > > > > > > > > > > On Mon, Sep 7, 2015 at 2:33 PM, dormando wrote: > > > > > > > > > > > > > > > Yo, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/dormando/memcached/commits/slab_rebal_next - > > would you > > > > > > > > > > > > > > > mind playing around with the branch here? You can see the start > > options in > > > > > > > > > > > > > > > the test. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This is a dead simple modification (a restoration of a feature > > that was > > > > > > > > > > > > > > > arleady there...). The test very aggressively writes and is > > able to shunt > > > > > > > > > > > > > > > memory around appropriately. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The work I'm exploring right now will allow savings of items > > being > > > > > > > > > > > > > > > rebalanced from, and increasing the aggression of page moving > > without > > > > > > > > > > > > > > > being so brain damaged about it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > But while I'm poking around with that, I'd be interested in > > knowing if > > > > > > > > > > > > > > > this simple branch is an improvement, and if so how much. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll push more code to the branch, but the changes should be > > gated behind > > > > > > > > > > > > > > > a feature flag. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 18 Aug 2015, 'Scott Mansfield' via memcached wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > No worries man, you're doing us a favor. Let me know if > > there's anything you need from us, and I promise I'll be quicker this time > > :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Aug 18, 2015 12:01 AM, "dormando" <dorm...@rydia.net> > > wrote: > > > > > > > > > > > > > > > > Hey, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm still really interested in working on this. I'll be > > taking a careful > > > > > > > > > > > > > > > > look soon I hope. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, 3 Aug 2015, Scott Mansfield wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've tweaked the program slightly, so I'm adding a > > new version. It prints more stats as it goes and runs a bit faster. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Monday, August 3, 2015 at 1:20:37 AM UTC-7, Scott > > Mansfield wrote: > > > > > > > > > > > > > > > > > Total brain fart on my part. Apparently I had > > memcached 1.4.13 on my path (who knows how...) Using the actual one that > > I've built works. Sorry for the confusion... can't believe I didn't realize > > that before. I'm testing against the compiled one now > > to see > > > how it > > > > behaves. > > > > > > > > > > > > > > > > > On Monday, August 3, 2015 at 1:15:06 AM UTC-7, > > Dormando wrote: > > > > > > > > > > > > > > > > > You sure that's 1.4.24? None of those > > fail for me :( > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, 3 Aug 2015, Scott Mansfield wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The command line I've used that will > > start is: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > memcached -m 64 -o > > slab_reassign,slab_automove > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > the ones that fail are: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > memcached -m 64 -o > > slab_reassign,slab_automove,lru_crawler,lru_maintainer > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > memcached -o lru_crawler > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm sure I've missed something during > > compile, though I just used ./configure and make. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Monday, August 3, 2015 at 12:22:33 > > AM UTC-7, Scott Mansfield wrote: > > > > > > > > > > > > > > > > > > I've attached a pretty simple > > program to connect, fill a slab with data, and then fill another slab > > slowly with data of a different size. I've been trying to get memcached to > > run with the lru_crawler and lru_maintainer flags, but I get > > ' > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Illegal suboption "(null)"' every > > time I try to start with either in any configuration. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I haven't seen it start to move > > slabs automatically with a freshly installed 1.2.24. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tuesday, July 21, 2015 at > > 4:55:17 PM UTC-7, Scott Mansfield wrote: > > > > > > > > > > > > > > > > > > I realize I've not given > > you the tests to reproduce the behavior. I should be able to soon. Sorry > > about the delay here. > > > > > > > > > > > > > > > > > > In the mean time, I wanted to bring up > > a possible secondary use of the same logic to move items on slab > > rebalancing. I think the system might benefit from using the same logic to > > crawl the pages in a slab and compact the data in the > > background. In > > > the case > > > > where we > > > > > > have > > > > > > > memory that > > > > > > > > is > > > > > > > > > assigned to > > > > > > > > > > the slab > > > > > > > > > > > but > > not > > > > > > > > > > > > > > > being used > > > > > > > > > > > > > > > > because > > > > > > > > > > > > > > > > > of replaced > > > > > > > > > > > > > > > > > > or TTL'd out data, returning the memory > > to a pool of free memory will allow a slab to grow with that memory first > > instead of waiting for an event where memory is needed at that instant. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It's a change in approach, from > > reactive to proactive. What do you think? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Monday, July 13, 2015 at 5:54:11 PM > > UTC-7, Dormando wrote: > > > > > > > > > > > > > > > > > > > First, more detail for you: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We are running 1.4.24 in > > production and haven't noticed any bugs as of yet. The new LRUs seem to be > > working well, though we nearly always run memcached scaled to hold all data > > without evictions. Those with evictions are behaving > > well. Those > > > without > > > > evictions > > > > > > haven't > > > > > > > seen > > > > > > > > crashing or > > > > > > > > > any > > > > > > > > > > other > > > > > > > > > > > > > noticeable > > > > > > > > > > > > > > bad > > > > > > > > > > > > > > > behavior. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Neat. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OK, I think I see an area where > > I was speculating on functionality. If you have a key in slab 21 and then > > the same key is written again at a larger size in slab 23 I assumed that > > the space in 21 was not freed on the second write. > > With that > > > > assumption, the LRU > > > > > > crawler > > > > > > > would > > > > > > > > not free > > > > > > > > > up that > > > > > > > > > > space. > > > > > > > > > > > > > Also just > > > > > > > > > > > > > > > by observation > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > macro, the space is not freed > > > > > > > > > > > > > > > > > > > fast enough to be effective, in > > our use case, to accept the writes that are happening. Think in the > > hundreds of millions of "overwrites" in a 6 - 10 hour period across a > > cluster. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Internally, "items" (a key/value > > pair) are generally immutable. The only > > > > > > > > > > > > > > > > > > time when it's not is for > > INCR/DECR, and it still becomes immutable if two > > > > > > > > > > > > > > > > > > INCR/DECR's collide. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What this means, is that the new > > item is staged in a piece of free memory > > > > > > > > > > > > > > > > > > while the "upload" stage of the > > SET happens. When memcached has all of the > > > > > > > > > > > > > > > > > > data in memory to replace the > > item, it does an internal swap under a lock. > > > > > > > > > > > > > > > > > > The old item is removed from the > > hash table and LRU, and the new item gets > > > > > > > > > > > > > > > > > > put in its place (at the head of > > the LRU). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Since items are refcounted, this > > means that if other users are downloading > > > > > > > > > > > > > > > > > > an item which just got replaced, > > their memory doesn't get corrupted by the > > > > > > > > > > > > > > > > > > item changing out from underneath > > them. They can continue to read the old > > > > > > > > > > > > > > > > > > item until they're done. When the > > refcount reaches zero the old memory is > > > > > > > > > > > > > > > > > > reclaimed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Most of the time, the item > > replacement happens then the old memory is > > > > > > > > > > > > > > > > > > immediately removed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, this does mean that you > > need *one* piece of free memory to > > > > > > > > > > > > > > > > > > replace the old one. Then the old > > memory gets freed after that set. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So if you take a memcached > > instance with 0 free chunks, and do a rolling > > > > > > > > > > > > > > > > > > replacement of all items (within > > the same slab class as before), the first > > > > > > > > > > > > > > > > > > one would cause an eviction from > > the tail of the LRU to get a free chunk. > > > > > > > > > > > > > > > > > > Every SET after that would use > > the chunk freed from the replacement of the > > > > > > > > > > > > > > > > > > previous memory. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > After that last sentence I > > realized I also may not have explained well enough the access pattern. The > > keys are all overwritten every day, but it takes some time to write them > > all (obviously). We see a huge increase in the bytes > > metric as if > > > the new > > > > data for > > > > > > the old > > > > > > > keys was > > > > > > > > being > > > > > > > > > written > > > > > > > > > > for the > > > > > > > > > > > > > first > > > > > > > > > > > > > > time. > > > > > > > > > > > > > > > Since the > > > > > > > > > > > > > > > > "old" > > > > > > > > > > > > > > > > > slab for > > > > > > > > > > > > > > > > > > the same key doesn't > > > > > > > > > > > > > > > > > > > proactively release memory, it > > starts to fill up the cache and then start evicting data in the new slab. > > Once that happens, we see evictions in the old slab because of the > > algorithm you mentioned (random picking / freeing of memory). > > > Typically we > > > > don't see > > > > > > any use > > > > > > > for > > > > > > > > "upgrading" an > > > > > > > > > item as > > > > > > > > > > the new > > > > > > > > > > > data > > > > > > > > > > > > > > > would be entirely > > > > > > > > > > > > > > > > > new and > > > > > > > > > > > > > > > > > > should wholesale replace the > > > > > > > > > > > > > > > > > > > old data for that key. More > > specifically, the operation is always set, with different data each day. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Right. Most of your problems will > > come from two areas. One being that > > > > > > > > > > > > > > > > > > writing data aggressively into > > the new slab class (unless you set the > > > > > > > > > > > > > > > > > > rebalancer to always-replace > > mode), the mover will make memory available > > > > > > > > > > > > > > > > > > more slowly than you can insert. > > So you'll cause extra evictions in the > > > > > > > > > > > > > > > > > > new slab class. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The secondary problem is from the > > random evictions in the previous slab > > > > > > > > > > > > > > > > > > class as stuff is chucked on the > > floor to make memory moveable. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As for testing, we'll be able > > to put it under real production workload. I don't know what kind of data > > you mean you need for testing. The data stored in the caches are highly > > confidential. I can give you all kinds of metrics, since > > we > > > collect most > > > > of the ones > > > > > > that > > > > > > > are in the > > > > > > > > stats > > > > > > > > > and some > > > > > > > > > > from the > > > > > > > > > > > > > stats > > > > > > > > > > > > > > > slabs output. If > > > > > > > > > > > > > > > > > you have > > > > > > > > > > > > > > > > > > some specific ones that > > > > > > > > > > > > > > > > > > > need collecting, I'll double > > check and make sure we can get those. Alternatively, it might be most > > beneficial to see the metrics in person :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I just need stats snapshots here > > and there, and actually putting the thing > > > > > > > > > > > > > > > > > > under load. When I did the LRU > > work I had to beg for several months > > > > > > > > > > > > > > > > > > before anyone tested it with a > > production load. This slows things down and > > > > > > > > > > > ... > > > > -- > > > > --- > > You received this message because you are subscribed to the Google Groups > > "memcached" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to memcached+unsubscr...@googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. > > > >