Re: Check for orphaned items in lru crawler thread

dormando Thu, 01 Oct 2015 16:22:17 -0700

ok... slab class 12 claims to have 2 in "total_pages", yet 14g in
mem_requested. is this stat wrong?


On Thu, 1 Oct 2015, Scott Mansfield wrote:

> The ones that crashed (new code cluster) were set to only be written to from 
> the client applications. The data is an index key and a series of data keys 
> that are all written one after another. Each key might be hashed to a 
> different server, though, so not all of them are written to the same server. 
> I can give you a snapshot of one of the clusters that
> didn't crash (attached file). I can give more detail offline if you need it.
>
>
> On Thursday, October 1, 2015 at 2:32:53 PM UTC-7, Dormando wrote:
>       Any chance you could describe (perhaps privately?) in very broad strokes
>       what the write load looks like? (they're getting only writes, too?).
>       otherwise I'll have to devise arbitrary torture tests. I'm sure the 
> bug's
>       in there but it's not obvious yet
>
>       On Thu, 1 Oct 2015, dormando wrote:
>
>       > perfect, thanks! I have $dayjob as well but will look into this as 
> soon as
>       > I can. my torture test machines are in a box but I'll try to borrow 
> one
>       >
>       > On Thu, 1 Oct 2015, Scott Mansfield wrote:
>       >
>       > > Yes. Exact args:
>       > > -p 11211 -u <omitted> -l 0.0.0.0 -c 100000 -o slab_reassign -o 
> lru_maintainer,lru_crawler,hash_algorithm=murmur3 -I 4m -m 56253
>       > >
>       > > On Thursday, October 1, 2015 at 12:41:06 PM UTC-7, Dormando wrote:
>       > >       Were lru_maintainer/lru_crawler/etc enabled though? even if 
> slab mover is
>       > >       off, those two were the big changes in .24
>       > >
>       > >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>       > >
>       > >       > The same cluster has > 400 servers happily running 1.4.24. 
> It's been our standard deployment for a while now, and we haven't seen any 
> crashes. The servers in the same cluster running 1.4.24 (with the same write 
> load the new build was taking) have been up for 29 days. The start options do 
> not contain the slab_automove option because
>       it wasn't
>       > >       effective for
>       > >       > us before. The memory given is possibly slightly different 
> per server, as we calculate on startup how much we give. It's in the same 
> ballpark, though (~56 gigs).
>       > >       >
>       > >       > On Thursday, October 1, 2015 at 12:11:35 PM UTC-7, Dormando 
> wrote:
>       > >       >       Just before I sit in and try to narrow this down: 
> have you run any host on
>       > >       >       1.4.24 mainline with those same start options? just 
> in case the crash is
>       > >       >       older
>       > >       >
>       > >       >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>       > >       >
>       > >       >       > Another message for you:
>       > >       >       > [78098.528606] traps: memcached[2757] general 
> protection ip:412b9d sp:7fc0700dbdd0 error:0 in memcached[400000+1d000]
>       > >       >       >
>       > >       >       >
>       > >       >       > addr2line shows:
>       > >       >       >
>       > >       >       > $ addr2line -e memcached 412b9d
>       > >       >       >
>       > >       >       > 
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/assoc.c:119
>       > >       >       >
>       > >       >       >
>       > >       >       >
>       > >       >       > On Thursday, October 1, 2015 at 1:41:44 AM UTC-7, 
> Dormando wrote:
>       > >       >       >       Ok, thanks!
>       > >       >       >
>       > >       >       >       I'll noodle this a bit... unfortunately a 
> backtrace might be more helpful.
>       > >       >       >       will ask you to attempt to get one if I don't 
> figure anything out in time.
>       > >       >       >
>       > >       >       >       (allow it to core dump or attach a GDB 
> session and set an ignore handler
>       > >       >       >       for sigpipe/int/etc and run "continue")
>       > >       >       >
>       > >       >       >       what were your full startup args, though?
>       > >       >       >
>       > >       >       >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>       > >       >       >
>       > >       >       >       > The commit was the latest in 
> slab_rebal_next at the time:
>       > >       >       >       > 
> https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a
>       > >       >       >       >
>       > >       >       >       > addr2line gave me this output:
>       > >       >       >       >
>       > >       >       >       > $ addr2line -e memcached 0x40e007
>       > >       >       >       >
>       > >       >       >       > 
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/slabs.c:264
>       > >       >       >       >
>       > >       >       >       >
>       > >       >       >       > As well, this was running with production 
> writes, but not reads. Even if we had reads on with the few servers crashing, 
> we're ok architecturally. That's why I can get it out there without worrying 
> too much. For now, I'm going to turn it off. I had a metrics issue anyway 
> that needs to get fixed. Tomorrow I'm planning
>       to test
>       > >       again with
>       > >       >       more
>       > >       >       >       metrics, but I
>       > >       >       >       > can get any new code in pretty quick.
>       > >       >       >       >
>       > >       >       >       >
>       > >       >       >       > On Thursday, October 1, 2015 at 1:01:36 AM 
> UTC-7, Dormando wrote:
>       > >       >       >       >       How many servers were you running it 
> on? I hope it wasn't more than a
>       > >       >       >       >       handful. I'd recommend starting with 
> one :P
>       > >       >       >       >
>       > >       >       >       >       can you do an addr2line? what were 
> your startup args, and what was the
>       > >       >       >       >       commit sha1 for the branch you pulled?
>       > >       >       >       >
>       > >       >       >       >       sorry about that :/
>       > >       >       >       >
>       > >       >       >       >       On Thu, 1 Oct 2015, Scott Mansfield 
> wrote:
>       > >       >       >       >
>       > >       >       >       >       > A few different servers (5 / 205) 
> experienced a segfault all within an hour or so. Unfortunately at this point 
> I'm a bit out of my depth. I have the dmesg output, which is identical for 
> all 5 boxes:
>       > >       >       >       >       >
>       > >       >       >       >       > [46545.316351] memcached[2789]: 
> segfault at 0 ip 000000000040e007 sp 00007f362ceedeb0 error 4 in 
> memcached[400000+1d000]
>       > >       >       >       >       >
>       > >       >       >       >       >
>       > >       >       >       >       > I can possibly supply the binary 
> file if needed, though we didn't do anything besides the standard setup and 
> compile.
>       > >       >       >       >       >
>       > >       >       >       >       >
>       > >       >       >       >       >
>       > >       >       >       >       > On Tuesday, September 29, 2015 at 
> 10:27:59 PM UTC-7, Dormando wrote:
>       > >       >       >       >       >       If you look at the new branch 
> there's a commit explaining the new stats.
>       > >       >       >       >       >
>       > >       >       >       >       >       You can watch 
> slab_reassing_evictions vs slab_reassign_saves. you can also
>       > >       >       >       >       >       test automove=1 vs automove=2 
> (please also turn on the lru_maintainer and
>       > >       >       >       >       >       lru_crawler).
>       > >       >       >       >       >
>       > >       >       >       >       >       The initial branch you were 
> running didn't add any new stats. It just
>       > >       >       >       >       >       restored an old feature.
>       > >       >       >       >       >
>       > >       >       >       >       >       On Tue, 29 Sep 2015, Scott 
> Mansfield wrote:
>       > >       >       >       >       >
>       > >       >       >       >       >       > An unrelated prod problem 
> meant I had to stop after about an hour. I'm turning it on again tomorrow 
> morning.
>       > >       >       >       >       >       > Are there any new metrics I 
> should be looking at? Anything new in the stats output? I'm about to take a 
> look at the diffs as well.
>       > >       >       >       >       >       >
>       > >       >       >       >       >       > On Tuesday, September 29, 
> 2015 at 12:37:45 PM UTC-7, Dormando wrote:
>       > >       >       >       >       >       >       excellent. if 
> automove=2 is too aggressive you'll see that come in in a
>       > >       >       >       >       >       >       hit ratio reduction.
>       > >       >       >       >       >       >
>       > >       >       >       >       >       >       the new branch works 
> with automove=2 as well, but it will attempt to
>       > >       >       >       >       >       >       rescue valid items in 
> the old slab if possible. I'll still be working on
>       > >       >       >       >       >       >       it for another few 
> hours today though. I'll mail again when I'm done.
>       > >       >       >       >       >       >
>       > >       >       >       >       >       >       On Tue, 29 Sep 2015, 
> Scott Mansfield wrote:
>       > >       >       >       >       >       >
>       > >       >       >       >       >       >       > I have the first 
> commit (slab_automove=2) running in prod right now. Later today will be a 
> full load production test of the latest code. I'll just let it run for a few 
> days unless I spot any problems. We have good metrics for latency et. al. 
> from the client side, though network normally dwarfs
>       memcached
>       > >       time.
>       > >       >       >       >       >       >       >
>       > >       >       >       >       >       >       > On Tuesday, 
> September 29, 2015 at 3:10:03 AM UTC-7, Dormando wrote:
>       > >       >       >       >       >       >       >       That's 
> unfortunate.
>       > >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       I've done 
> some more work on the branch:
>       > >       >       >       >       >       >       >       
> https://github.com/memcached/memcached/pull/112
>       > >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       It's not 
> completely likely you would see enough of an improvement from the
>       > >       >       >       >       >       >       >       new default 
> mode. However if your item sizes change gradually, items are
>       > >       >       >       >       >       >       >       reclaimed 
> during expiration, or get overwritten (and thus freed in the old
>       > >       >       >       >       >       >       >       class), it 
> should work just fine. I have another patch coming which should
>       > >       >       >       >       >       >       >       help though.
>       > >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       Open to 
> feedback from any interested party.
>       > >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       On Fri, 25 
> Sep 2015, Scott Mansfield wrote:
>       > >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       > I have it 
> running internally, and it runs fine under normal load. It's difficult to put 
> it into the line of fire for a production workload because of social 
> reasons... As well it's a degenerate case that we normally don't run in to 
> (and actively try to avoid). I'm going to run some
>       heavier load
>       > >       tests on
>       > >       >       it
>       > >       >       >       today. 
>       > >       >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       > On 
> Wednesday, September 9, 2015 at 10:23:32 AM UTC-7, Scott Mansfield wrote:
>       > >       >       >       >       >       >       >       >       I'm 
> working on getting a test going internally. I'll let you know how it goes. 
>       > >       >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       > Scott 
> Mansfield
>       > >       >       >       >       >       >       >       > On Mon, Sep 
> 7, 2015 at 2:33 PM, dormando wrote:
>       > >       >       >       >       >       >       >       >       Yo,
>       > >       >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       >       
> https://github.com/dormando/memcached/commits/slab_rebal_next - would you
>       > >       >       >       >       >       >       >       >       mind 
> playing around with the branch here? You can see the start options in
>       > >       >       >       >       >       >       >       >       the 
> test.
>       > >       >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       >       This 
> is a dead simple modification (a restoration of a feature that was
>       > >       >       >       >       >       >       >       >       
> arleady there...). The test very aggressively writes and is able to shunt
>       > >       >       >       >       >       >       >       >       
> memory around appropriately.
>       > >       >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       >       The 
> work I'm exploring right now will allow savings of items being
>       > >       >       >       >       >       >       >       >       
> rebalanced from, and increasing the aggression of page moving without
>       > >       >       >       >       >       >       >       >       being 
> so brain damaged about it.
>       > >       >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       >       But 
> while I'm poking around with that, I'd be interested in knowing if
>       > >       >       >       >       >       >       >       >       this 
> simple branch is an improvement, and if so how much.
>       > >       >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       >       I'll 
> push more code to the branch, but the changes should be gated behind
>       > >       >       >       >       >       >       >       >       a 
> feature flag.
>       > >       >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       >       On 
> Tue, 18 Aug 2015, 'Scott Mansfield' via memcached wrote:
>       > >       >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       >       > No 
> worries man, you're doing us a favor. Let me know if there's anything you 
> need from us, and I promise I'll be quicker this time :)
>       > >       >       >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       >       > On 
> Aug 18, 2015 12:01 AM, "dormando" <dorm...@rydia.net> wrote:
>       > >       >       >       >       >       >       >       >       >     
>   Hey,
>       > >       >       >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       >       >     
>   I'm still really interested in working on this. I'll be taking a careful
>       > >       >       >       >       >       >       >       >       >     
>   look soon I hope.
>       > >       >       >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       >       >     
>   On Mon, 3 Aug 2015, Scott Mansfield wrote:
>       > >       >       >       >       >       >       >       >       >
>       > >       >       >       >       >       >       >       >       >     
>   > I've tweaked the program slightly, so I'm adding a new version. It prints 
> more stats as it goes and runs a bit faster.
>       > >       >       >       >       >       >       >       >       >     
>   >
>       > >       >       >       >       >       >       >       >       >     
>   > On Monday, August 3, 2015 at 1:20:37 AM UTC-7, Scott Mansfield wrote:
>       > >       >       >       >       >       >       >       >       >     
>   >       Total brain fart on my part. Apparently I had memcached 1.4.13 on 
> my path (who knows how...) Using the actual one that I've built works. Sorry 
> for the confusion... can't believe I didn't realize that before. I'm testing 
> against the compiled one now to see how it
>       behaves.
>       > >       >       >       >       >       >       >       >       >     
>   >       On Monday, August 3, 2015 at 1:15:06 AM UTC-7, Dormando wrote:
>       > >       >       >       >       >       >       >       >       >     
>   >             You sure that's 1.4.24? None of those fail for me :(
>       > >       >       >       >       >       >       >       >       >     
>   >
>       > >       >       >       >       >       >       >       >       >     
>   >             On Mon, 3 Aug 2015, Scott Mansfield wrote:
>       > >       >       >       >       >       >       >       >       >     
>   >
>       > >       >       >       >       >       >       >       >       >     
>   >             > The command line I've used that will start is:
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             > memcached -m 64 -o slab_reassign,slab_automove
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             > the ones that fail are:
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             > memcached -m 64 -o 
> slab_reassign,slab_automove,lru_crawler,lru_maintainer
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             > memcached -o lru_crawler
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             > I'm sure I've missed something during compile, though I 
> just used ./configure and make.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             > On Monday, August 3, 2015 at 12:22:33 AM UTC-7, Scott 
> Mansfield wrote:
>       > >       >       >       >       >       >       >       >       >     
>   >             >       I've attached a pretty simple program to connect, 
> fill a slab with data, and then fill another slab slowly with data of a 
> different size. I've been trying to get memcached to run with the lru_crawler 
> and lru_maintainer flags, but I get '
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       Illegal suboption "(null)"' every time I try to start 
> with either in any configuration.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       I haven't seen it start to move slabs automatically 
> with a freshly installed 1.2.24.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       On Tuesday, July 21, 2015 at 4:55:17 PM UTC-7, Scott 
> Mansfield wrote:
>       > >       >       >       >       >       >       >       >       >     
>   >             >             I realize I've not given you the tests to 
> reproduce the behavior. I should be able to soon. Sorry about the delay here.
>       > >       >       >       >       >       >       >       >       >     
>   >             > In the mean time, I wanted to bring up a possible secondary 
> use of the same logic to move items on slab rebalancing. I think the system 
> might benefit from using the same logic to crawl the pages in a slab and 
> compact the data in the background. In the case
>       where we
>       > >       have
>       > >       >       memory that
>       > >       >       >       is
>       > >       >       >       >       assigned to
>       > >       >       >       >       >       the slab
>       > >       >       >       >       >       >       but not
>       > >       >       >       >       >       >       >       >       being 
> used
>       > >       >       >       >       >       >       >       >       >     
>   because
>       > >       >       >       >       >       >       >       >       >     
>   >             of replaced
>       > >       >       >       >       >       >       >       >       >     
>   >             > or TTL'd out data, returning the memory to a pool of free 
> memory will allow a slab to grow with that memory first instead of waiting 
> for an event where memory is needed at that instant.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             > It's a change in approach, from reactive to proactive. What 
> do you think?
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             > On Monday, July 13, 2015 at 5:54:11 PM UTC-7, Dormando 
> wrote:
>       > >       >       >       >       >       >       >       >       >     
>   >             >       > First, more detail for you:
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       > We are running 1.4.24 in production and haven't 
> noticed any bugs as of yet. The new LRUs seem to be working well, though we 
> nearly always run memcached scaled to hold all data without evictions. Those 
> with evictions are behaving well. Those without
>       evictions
>       > >       haven't
>       > >       >       seen
>       > >       >       >       crashing or
>       > >       >       >       >       any
>       > >       >       >       >       >       other
>       > >       >       >       >       >       >       noticeable
>       > >       >       >       >       >       >       >       bad
>       > >       >       >       >       >       >       >       >       
> behavior.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       Neat.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       > OK, I think I see an area where I was speculating 
> on functionality. If you have a key in slab 21 and then the same key is 
> written again at a larger size in slab 23 I assumed that the space in 21 was 
> not freed on the second write. With that
>       assumption, the LRU
>       > >       crawler
>       > >       >       would
>       > >       >       >       not free
>       > >       >       >       >       up that
>       > >       >       >       >       >       space.
>       > >       >       >       >       >       >       Also just
>       > >       >       >       >       >       >       >       >       by 
> observation
>       > >       >       >       >       >       >       >       >       >     
>   in
>       > >       >       >       >       >       >       >       >       >     
>   >             the
>       > >       >       >       >       >       >       >       >       >     
>   >             >       macro, the space is not freed
>       > >       >       >       >       >       >       >       >       >     
>   >             >       > fast enough to be effective, in our use case, to 
> accept the writes that are happening. Think in the hundreds of millions of 
> "overwrites" in a 6 - 10 hour period across a cluster.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       Internally, "items" (a key/value pair) are generally 
> immutable. The only
>       > >       >       >       >       >       >       >       >       >     
>   >             >       time when it's not is for INCR/DECR, and it still 
> becomes immutable if two
>       > >       >       >       >       >       >       >       >       >     
>   >             >       INCR/DECR's collide.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       What this means, is that the new item is staged in a 
> piece of free memory
>       > >       >       >       >       >       >       >       >       >     
>   >             >       while the "upload" stage of the SET happens. When 
> memcached has all of the
>       > >       >       >       >       >       >       >       >       >     
>   >             >       data in memory to replace the item, it does an 
> internal swap under a lock.
>       > >       >       >       >       >       >       >       >       >     
>   >             >       The old item is removed from the hash table and LRU, 
> and the new item gets
>       > >       >       >       >       >       >       >       >       >     
>   >             >       put in its place (at the head of the LRU).
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       Since items are refcounted, this means that if other 
> users are downloading
>       > >       >       >       >       >       >       >       >       >     
>   >             >       an item which just got replaced, their memory doesn't 
> get corrupted by the
>       > >       >       >       >       >       >       >       >       >     
>   >             >       item changing out from underneath them. They can 
> continue to read the old
>       > >       >       >       >       >       >       >       >       >     
>   >             >       item until they're done. When the refcount reaches 
> zero the old memory is
>       > >       >       >       >       >       >       >       >       >     
>   >             >       reclaimed.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       Most of the time, the item replacement happens then 
> the old memory is
>       > >       >       >       >       >       >       >       >       >     
>   >             >       immediately removed.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       However, this does mean that you need *one* piece of 
> free memory to
>       > >       >       >       >       >       >       >       >       >     
>   >             >       replace the old one. Then the old memory gets freed 
> after that set.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       So if you take a memcached instance with 0 free 
> chunks, and do a rolling
>       > >       >       >       >       >       >       >       >       >     
>   >             >       replacement of all items (within the same slab class 
> as before), the first
>       > >       >       >       >       >       >       >       >       >     
>   >             >       one would cause an eviction from the tail of the LRU 
> to get a free chunk.
>       > >       >       >       >       >       >       >       >       >     
>   >             >       Every SET after that would use the chunk freed from 
> the replacement of the
>       > >       >       >       >       >       >       >       >       >     
>   >             >       previous memory.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       > After that last sentence I realized I also may not 
> have explained well enough the access pattern. The keys are all overwritten 
> every day, but it takes some time to write them all (obviously). We see a 
> huge increase in the bytes metric as if the new
>       data for
>       > >       the old
>       > >       >       keys was
>       > >       >       >       being
>       > >       >       >       >       written
>       > >       >       >       >       >       for the
>       > >       >       >       >       >       >       first
>       > >       >       >       >       >       >       >       time.
>       > >       >       >       >       >       >       >       >       Since 
> the
>       > >       >       >       >       >       >       >       >       >     
>   "old"
>       > >       >       >       >       >       >       >       >       >     
>   >             slab for
>       > >       >       >       >       >       >       >       >       >     
>   >             >       the same key doesn't
>       > >       >       >       >       >       >       >       >       >     
>   >             >       > proactively release memory, it starts to fill up 
> the cache and then start evicting data in the new slab. Once that happens, we 
> see evictions in the old slab because of the algorithm you mentioned (random 
> picking / freeing of memory). Typically we
>       don't see
>       > >       any use
>       > >       >       for
>       > >       >       >       "upgrading" an
>       > >       >       >       >       item as
>       > >       >       >       >       >       the new
>       > >       >       >       >       >       >       data
>       > >       >       >       >       >       >       >       >       would 
> be entirely
>       > >       >       >       >       >       >       >       >       >     
>   >             new and
>       > >       >       >       >       >       >       >       >       >     
>   >             >       should wholesale replace the
>       > >       >       >       >       >       >       >       >       >     
>   >             >       > old data for that key. More specifically, the 
> operation is always set, with different data each day.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       Right. Most of your problems will come from two 
> areas. One being that
>       > >       >       >       >       >       >       >       >       >     
>   >             >       writing data aggressively into the new slab class 
> (unless you set the
>       > >       >       >       >       >       >       >       >       >     
>   >             >       rebalancer to always-replace mode), the mover will 
> make memory available
>       > >       >       >       >       >       >       >       >       >     
>   >             >       more slowly than you can insert. So you'll cause 
> extra evictions in the
>       > >       >       >       >       >       >       >       >       >     
>   >             >       new slab class.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       The secondary problem is from the random evictions in 
> the previous slab
>       > >       >       >       >       >       >       >       >       >     
>   >             >       class as stuff is chucked on the floor to make memory 
> moveable.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       > As for testing, we'll be able to put it under real 
> production workload. I don't know what kind of data you mean you need for 
> testing. The data stored in the caches are highly confidential. I can give 
> you all kinds of metrics, since we collect most
>       of the ones
>       > >       that
>       > >       >       are in the
>       > >       >       >       stats
>       > >       >       >       >       and some
>       > >       >       >       >       >       from the
>       > >       >       >       >       >       >       stats
>       > >       >       >       >       >       >       >       >       slabs 
> output. If
>       > >       >       >       >       >       >       >       >       >     
>   >             you have
>       > >       >       >       >       >       >       >       >       >     
>   >             >       some specific ones that
>       > >       >       >       >       >       >       >       >       >     
>   >             >       > need collecting, I'll double check and make sure we 
> can get those. Alternatively, it might be most beneficial to see the metrics 
> in person :)
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       I just need stats snapshots here and there, and 
> actually putting the thing
>       > >       >       >       >       >       >       >       >       >     
>   >             >       under load. When I did the LRU work I had to beg for 
> several months
>       > >       >       >       >       >       >       >       >       >     
>   >             >       before anyone tested it with a production load. This 
> slows things down and
>       > >       >       >       >       >       >       >       >       >     
>   >             >       demotivates me from working on the project.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       Unfortunately my dayjob keeps me pretty busy so 
> ~internet~ would probably
>       > >       >       >       >       >       >       >       >       >     
>   >             >       be best.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       > I can create a driver program to reproduce the 
> behavior on a smaller scale. It would write e.g. 10k keys of 10k size, then 
> rewrite the same keys with different size data. I'll work on that and post it 
> to this thread when I can reproduce the behavior
>       locally.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       Ok. There're slab rebalance unit tests in the t/ 
> directory which do things
>       > >       >       >       >       >       >       >       >       >     
>   >             >       like this, and I've used mc-crusher to slam the 
> rebalancer. It's pretty
>       > >       >       >       >       >       >       >       >       >     
>   >             >       easy to run one config to load up 10k objects, then 
> flip to the other
>       > >       >       >       >       >       >       >       >       >     
>   >             >       using the same key namespace.
>       > >       >       >       >       >       >       >       >       >     
>   >             >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       > Thanks,
>       > >       >       >       >       >       >       >       >       >     
>   >             >       > Scott
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       > On Saturday, July 11, 2015 at 12:05:54 PM UTC-7, 
> Dormando wrote:
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >       Hey,
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >       On Fri, 10 Jul 2015, Scott Mansfield wrote:
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >       > We've seen issues recently where we run a 
> cluster that typically has the majority of items overwritten in the same slab 
> every day and a sudden change in data size evicts a ton of data, affecting 
> downstream systems. To be clear that is our
>       problem, but
>       > >       I think
>       > >       >       there's
>       > >       >       >       a tweak
>       > >       >       >       >       in
>       > >       >       >       >       >       memcached
>       > >       >       >       >       >       >       that might
>       > >       >       >       >       >       >       >       >       be 
> useful and
>       > >       >       >       >       >       >       >       >       >     
>   >             another
>       > >       >       >       >       >       >       >       >       >     
>   >             >       possible feature that
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >       would be even
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >       > better.
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >       > The data that is written to this cache is 
> overwritten every day, though the TTL is 7 days. One slab takes up the 
> majority of the space in the cache. The application wrote e.g. 10KB (slab 21) 
> every day for each key consistently. One day, a
>       change
>       > >       occurred
>       > >       >       where it
>       > >       >       >       started
>       > >       >       >       >       writing
>       > >       >       >       >       >       15KB (slab
>       > >       >       >       >       >       >       23),
>       > >       >       >       >       >       >       >       >       
> causing a migration
>       > >       >       >       >       >       >       >       >       >     
>   >             of data
>       > >       >       >       >       >       >       >       >       >     
>   >             >       from one slab to
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >       another. We had -o
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >       > slab_reassign,slab_automove=1 set on the 
> server, causing large numbers of evictions on the initial slab. Let's say the 
> cache could hold the data at 15KB per key, but the old data was not 
> technically TTL'd out in it's old slab. This means
>       that memory
>       > >       was not
>       > >       >       being
>       > >       >       >       freed by
>       > >       >       >       >       the lru
>       > >       >       >       >       >       crawler
>       > >       >       >       >       >       >       thread (I
>       > >       >       >       >       >       >       >       >       
> think) because
>       > >       >       >       >       >       >       >       >       >     
>   its
>       > >       >       >       >       >       >       >       >       >     
>   >             expiry
>       > >       >       >       >       >       >       >       >       >     
>   >             >       had not come
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >       around. 
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >       >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >       > lines 1199 and 1200 in items.c:
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >       > if ((search->exptime != 0 && 
> search->exptime < current_time) || is_flushed(search)) {
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >       >
>       > >       >       >       >       >       >       >       >       >     
>   >             >       >       > If there was a check to see if this data 
> was "orphaned," i.e. that the key, if accessed, would map to a different slab 
> than the current one, then these orphans could be reclaimed as free memory. I 
> am working on a patch to do this, though I
>       have
>       > >       reservations
>       > >       >       about
>       > >       >       >       performing
>       > >       >       >       >       a hash
>       > >       >       >       >       >       on the
>       > >       >       >       >       >       >       key on the
>       > >       >       >       >       >       >       >       >       lru 
> crawler
>       > >       >       >       >       >       >       >     ...
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

Re: Check for orphaned items in lru crawler thread

Reply via email to