Re: Check for orphaned items in lru crawler thread

Scott Mansfield Thu, 01 Oct 2015 12:04:31 -0700

Another message for you:

[78098.528606] traps: memcached[2757] general protection ip:412b9d 
sp:7fc0700dbdd0 error:0 in memcached[400000+1d000]



addr2line shows:

$ addr2line -e memcached 412b9d

/mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/assoc.c:119


On Thursday, October 1, 2015 at 1:41:44 AM UTC-7, Dormando wrote:
>
> Ok, thanks! 
>
> I'll noodle this a bit... unfortunately a backtrace might be more helpful. 
> will ask you to attempt to get one if I don't figure anything out in time. 
>
> (allow it to core dump or attach a GDB session and set an ignore handler 
> for sigpipe/int/etc and run "continue") 
>
> what were your full startup args, though? 
>
> On Thu, 1 Oct 2015, Scott Mansfield wrote: 
>
> > The commit was the latest in slab_rebal_next at the time: 
> > 
> https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a
>  
> > 
> > addr2line gave me this output: 
> > 
> > $ addr2line -e memcached 0x40e007 
> > 
> > 
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/slabs.c:264
>  
>
> > 
> > 
> > As well, this was running with production writes, but not reads. Even if 
> we had reads on with the few servers crashing, we're ok architecturally. 
> That's why I can get it out there without worrying too much. For now, I'm 
> going to turn it off. I had a metrics issue anyway that needs to get fixed. 
> Tomorrow I'm planning to test again with more metrics, but I 
> > can get any new code in pretty quick. 
> > 
> > 
> > On Thursday, October 1, 2015 at 1:01:36 AM UTC-7, Dormando wrote: 
> >       How many servers were you running it on? I hope it wasn't more 
> than a 
> >       handful. I'd recommend starting with one :P 
> > 
> >       can you do an addr2line? what were your startup args, and what was 
> the 
> >       commit sha1 for the branch you pulled? 
> > 
> >       sorry about that :/ 
> > 
> >       On Thu, 1 Oct 2015, Scott Mansfield wrote: 
> > 
> >       > A few different servers (5 / 205) experienced a segfault all 
> within an hour or so. Unfortunately at this point I'm a bit out of my 
> depth. I have the dmesg output, which is identical for all 5 boxes: 
> >       > 
> >       > [46545.316351] memcached[2789]: segfault at 0 ip 
> 000000000040e007 sp 00007f362ceedeb0 error 4 in memcached[400000+1d000] 
> >       > 
> >       > 
> >       > I can possibly supply the binary file if needed, though we 
> didn't do anything besides the standard setup and compile. 
> >       > 
> >       > 
> >       > 
> >       > On Tuesday, September 29, 2015 at 10:27:59 PM UTC-7, Dormando 
> wrote: 
> >       >       If you look at the new branch there's a commit explaining 
> the new stats. 
> >       > 
> >       >       You can watch slab_reassing_evictions vs 
> slab_reassign_saves. you can also 
> >       >       test automove=1 vs automove=2 (please also turn on the 
> lru_maintainer and 
> >       >       lru_crawler). 
> >       > 
> >       >       The initial branch you were running didn't add any new 
> stats. It just 
> >       >       restored an old feature. 
> >       > 
> >       >       On Tue, 29 Sep 2015, Scott Mansfield wrote: 
> >       > 
> >       >       > An unrelated prod problem meant I had to stop after 
> about an hour. I'm turning it on again tomorrow morning. 
> >       >       > Are there any new metrics I should be looking at? 
> Anything new in the stats output? I'm about to take a look at the diffs as 
> well. 
> >       >       > 
> >       >       > On Tuesday, September 29, 2015 at 12:37:45 PM UTC-7, 
> Dormando wrote: 
> >       >       >       excellent. if automove=2 is too aggressive you'll 
> see that come in in a 
> >       >       >       hit ratio reduction. 
> >       >       > 
> >       >       >       the new branch works with automove=2 as well, but 
> it will attempt to 
> >       >       >       rescue valid items in the old slab if possible. 
> I'll still be working on 
> >       >       >       it for another few hours today though. I'll mail 
> again when I'm done. 
> >       >       > 
> >       >       >       On Tue, 29 Sep 2015, Scott Mansfield wrote: 
> >       >       > 
> >       >       >       > I have the first commit (slab_automove=2) 
> running in prod right now. Later today will be a full load production test 
> of the latest code. I'll just let it run for a few days unless I spot any 
> problems. We have good metrics for latency et. al. from the client side, 
> though network normally dwarfs memcached time. 
> >       >       >       > 
> >       >       >       > On Tuesday, September 29, 2015 at 3:10:03 AM 
> UTC-7, Dormando wrote: 
> >       >       >       >       That's unfortunate. 
> >       >       >       > 
> >       >       >       >       I've done some more work on the branch: 
> >       >       >       >       
> https://github.com/memcached/memcached/pull/112 
> >       >       >       > 
> >       >       >       >       It's not completely likely you would see 
> enough of an improvement from the 
> >       >       >       >       new default mode. However if your item 
> sizes change gradually, items are 
> >       >       >       >       reclaimed during expiration, or get 
> overwritten (and thus freed in the old 
> >       >       >       >       class), it should work just fine. I have 
> another patch coming which should 
> >       >       >       >       help though. 
> >       >       >       > 
> >       >       >       >       Open to feedback from any interested 
> party. 
> >       >       >       > 
> >       >       >       >       On Fri, 25 Sep 2015, Scott Mansfield 
> wrote: 
> >       >       >       > 
> >       >       >       >       > I have it running internally, and it 
> runs fine under normal load. It's difficult to put it into the line of fire 
> for a production workload because of social reasons... As well it's a 
> degenerate case that we normally don't run in to (and actively try to 
> avoid). I'm going to run some heavier load tests on it today.  
> >       >       >       >       > 
> >       >       >       >       > On Wednesday, September 9, 2015 at 
> 10:23:32 AM UTC-7, Scott Mansfield wrote: 
> >       >       >       >       >       I'm working on getting a test 
> going internally. I'll let you know how it goes.  
> >       >       >       >       > 
> >       >       >       >       > 
> >       >       >       >       > Scott Mansfield 
> >       >       >       >       > On Mon, Sep 7, 2015 at 2:33 PM, 
> dormando wrote: 
> >       >       >       >       >       Yo, 
> >       >       >       >       > 
> >       >       >       >       >       
> https://github.com/dormando/memcached/commits/slab_rebal_next - would you 
> >       >       >       >       >       mind playing around with the 
> branch here? You can see the start options in 
> >       >       >       >       >       the test. 
> >       >       >       >       > 
> >       >       >       >       >       This is a dead simple modification 
> (a restoration of a feature that was 
> >       >       >       >       >       arleady there...). The test very 
> aggressively writes and is able to shunt 
> >       >       >       >       >       memory around appropriately. 
> >       >       >       >       > 
> >       >       >       >       >       The work I'm exploring right now 
> will allow savings of items being 
> >       >       >       >       >       rebalanced from, and increasing 
> the aggression of page moving without 
> >       >       >       >       >       being so brain damaged about it. 
> >       >       >       >       > 
> >       >       >       >       >       But while I'm poking around with 
> that, I'd be interested in knowing if 
> >       >       >       >       >       this simple branch is an 
> improvement, and if so how much. 
> >       >       >       >       > 
> >       >       >       >       >       I'll push more code to the branch, 
> but the changes should be gated behind 
> >       >       >       >       >       a feature flag. 
> >       >       >       >       > 
> >       >       >       >       >       On Tue, 18 Aug 2015, 'Scott 
> Mansfield' via memcached wrote: 
> >       >       >       >       > 
> >       >       >       >       >       > 
> >       >       >       >       >       > No worries man, you're doing us 
> a favor. Let me know if there's anything you need from us, and I promise 
> I'll be quicker this time :) 
> >       >       >       >       >       > 
> >       >       >       >       >       > On Aug 18, 2015 12:01 AM, 
> "dormando" <dorm...@rydia.net> wrote: 
> >       >       >       >       >       >       Hey, 
> >       >       >       >       >       > 
> >       >       >       >       >       >       I'm still really 
> interested in working on this. I'll be taking a careful 
> >       >       >       >       >       >       look soon I hope. 
> >       >       >       >       >       > 
> >       >       >       >       >       >       On Mon, 3 Aug 2015, Scott 
> Mansfield wrote: 
> >       >       >       >       >       > 
> >       >       >       >       >       >       > I've tweaked the program 
> slightly, so I'm adding a new version. It prints more stats as it goes and 
> runs a bit faster. 
> >       >       >       >       >       >       > 
> >       >       >       >       >       >       > On Monday, August 3, 
> 2015 at 1:20:37 AM UTC-7, Scott Mansfield wrote: 
> >       >       >       >       >       >       >       Total brain fart 
> on my part. Apparently I had memcached 1.4.13 on my path (who knows how...) 
> Using the actual one that I've built works. Sorry for the confusion... 
> can't believe I didn't realize that before. I'm testing against the 
> compiled one now to see how it behaves. 
> >       >       >       >       >       >       >       On Monday, August 
> 3, 2015 at 1:15:06 AM UTC-7, Dormando wrote: 
> >       >       >       >       >       >       >             You sure 
> that's 1.4.24? None of those fail for me :( 
> >       >       >       >       >       >       > 
> >       >       >       >       >       >       >             On Mon, 3 
> Aug 2015, Scott Mansfield wrote: 
> >       >       >       >       >       >       > 
> >       >       >       >       >       >       >             > The 
> command line I've used that will start is: 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             > memcached 
> -m 64 -o slab_reassign,slab_automove 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             > the ones 
> that fail are: 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             > memcached 
> -m 64 -o slab_reassign,slab_automove,lru_crawler,lru_maintainer 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             > memcached 
> -o lru_crawler 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             > I'm sure 
> I've missed something during compile, though I just used ./configure and 
> make. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             > On Monday, 
> August 3, 2015 at 12:22:33 AM UTC-7, Scott Mansfield wrote: 
> >       >       >       >       >       >       >             >       I've 
> attached a pretty simple program to connect, fill a slab with data, and 
> then fill another slab slowly with data of a different size. I've been 
> trying to get memcached to run with the lru_crawler and lru_maintainer 
> flags, but I get ' 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       
> Illegal suboption "(null)"' every time I try to start with either in any 
> configuration. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       I 
> haven't seen it start to move slabs automatically with a freshly installed 
> 1.2.24. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       On 
> Tuesday, July 21, 2015 at 4:55:17 PM UTC-7, Scott Mansfield wrote: 
> >       >       >       >       >       >       >             >           
>   I realize I've not given you the tests to reproduce the behavior. I 
> should be able to soon. Sorry about the delay here. 
> >       >       >       >       >       >       >             > In the 
> mean time, I wanted to bring up a possible secondary use of the same logic 
> to move items on slab rebalancing. I think the system might benefit from 
> using the same logic to crawl the pages in a slab and compact the data in 
> the background. In the case where we have memory that is 
> >       assigned to 
> >       >       the slab 
> >       >       >       but not 
> >       >       >       >       >       being used 
> >       >       >       >       >       >       because 
> >       >       >       >       >       >       >             of replaced 
> >       >       >       >       >       >       >             > or TTL'd 
> out data, returning the memory to a pool of free memory will allow a slab 
> to grow with that memory first instead of waiting for an event where memory 
> is needed at that instant. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             > It's a 
> change in approach, from reactive to proactive. What do you think? 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             > On Monday, 
> July 13, 2015 at 5:54:11 PM UTC-7, Dormando wrote: 
> >       >       >       >       >       >       >             >       > 
> First, more detail for you: 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       > We 
> are running 1.4.24 in production and haven't noticed any bugs as of yet. 
> The new LRUs seem to be working well, though we nearly always run memcached 
> scaled to hold all data without evictions. Those with evictions are 
> behaving well. Those without evictions haven't seen crashing or 
> >       any 
> >       >       other 
> >       >       >       noticeable 
> >       >       >       >       bad 
> >       >       >       >       >       behavior. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       
> Neat. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       > 
> OK, I think I see an area where I was speculating on functionality. If you 
> have a key in slab 21 and then the same key is written again at a larger 
> size in slab 23 I assumed that the space in 21 was not freed on the second 
> write. With that assumption, the LRU crawler would not free 
> >       up that 
> >       >       space. 
> >       >       >       Also just 
> >       >       >       >       >       by observation 
> >       >       >       >       >       >       in 
> >       >       >       >       >       >       >             the 
> >       >       >       >       >       >       >             >       
> macro, the space is not freed 
> >       >       >       >       >       >       >             >       > 
> fast enough to be effective, in our use case, to accept the writes that are 
> happening. Think in the hundreds of millions of "overwrites" in a 6 - 10 
> hour period across a cluster. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       
> Internally, "items" (a key/value pair) are generally immutable. The only 
> >       >       >       >       >       >       >             >       time 
> when it's not is for INCR/DECR, and it still becomes immutable if two 
> >       >       >       >       >       >       >             >       
> INCR/DECR's collide. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       What 
> this means, is that the new item is staged in a piece of free memory 
> >       >       >       >       >       >       >             >       
> while the "upload" stage of the SET happens. When memcached has all of the 
> >       >       >       >       >       >       >             >       data 
> in memory to replace the item, it does an internal swap under a lock. 
> >       >       >       >       >       >       >             >       The 
> old item is removed from the hash table and LRU, and the new item gets 
> >       >       >       >       >       >       >             >       put 
> in its place (at the head of the LRU). 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       
> Since items are refcounted, this means that if other users are downloading 
> >       >       >       >       >       >       >             >       an 
> item which just got replaced, their memory doesn't get corrupted by the 
> >       >       >       >       >       >       >             >       item 
> changing out from underneath them. They can continue to read the old 
> >       >       >       >       >       >       >             >       item 
> until they're done. When the refcount reaches zero the old memory is 
> >       >       >       >       >       >       >             >       
> reclaimed. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       Most 
> of the time, the item replacement happens then the old memory is 
> >       >       >       >       >       >       >             >       
> immediately removed. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       
> However, this does mean that you need *one* piece of free memory to 
> >       >       >       >       >       >       >             >       
> replace the old one. Then the old memory gets freed after that set. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       So 
> if you take a memcached instance with 0 free chunks, and do a rolling 
> >       >       >       >       >       >       >             >       
> replacement of all items (within the same slab class as before), the first 
> >       >       >       >       >       >       >             >       one 
> would cause an eviction from the tail of the LRU to get a free chunk. 
> >       >       >       >       >       >       >             >       
> Every SET after that would use the chunk freed from the replacement of the 
> >       >       >       >       >       >       >             >       
> previous memory. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       > 
> After that last sentence I realized I also may not have explained well 
> enough the access pattern. The keys are all overwritten every day, but it 
> takes some time to write them all (obviously). We see a huge increase in 
> the bytes metric as if the new data for the old keys was being 
> >       written 
> >       >       for the 
> >       >       >       first 
> >       >       >       >       time. 
> >       >       >       >       >       Since the 
> >       >       >       >       >       >       "old" 
> >       >       >       >       >       >       >             slab for 
> >       >       >       >       >       >       >             >       the 
> same key doesn't 
> >       >       >       >       >       >       >             >       > 
> proactively release memory, it starts to fill up the cache and then start 
> evicting data in the new slab. Once that happens, we see evictions in the 
> old slab because of the algorithm you mentioned (random picking / freeing 
> of memory). Typically we don't see any use for "upgrading" an 
> >       item as 
> >       >       the new 
> >       >       >       data 
> >       >       >       >       >       would be entirely 
> >       >       >       >       >       >       >             new and 
> >       >       >       >       >       >       >             >       
> should wholesale replace the 
> >       >       >       >       >       >       >             >       > 
> old data for that key. More specifically, the operation is always set, with 
> different data each day. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       
> Right. Most of your problems will come from two areas. One being that 
> >       >       >       >       >       >       >             >       
> writing data aggressively into the new slab class (unless you set the 
> >       >       >       >       >       >       >             >       
> rebalancer to always-replace mode), the mover will make memory available 
> >       >       >       >       >       >       >             >       more 
> slowly than you can insert. So you'll cause extra evictions in the 
> >       >       >       >       >       >       >             >       new 
> slab class. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       The 
> secondary problem is from the random evictions in the previous slab 
> >       >       >       >       >       >       >             >       
> class as stuff is chucked on the floor to make memory moveable. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       > As 
> for testing, we'll be able to put it under real production workload. I 
> don't know what kind of data you mean you need for testing. The data stored 
> in the caches are highly confidential. I can give you all kinds of metrics, 
> since we collect most of the ones that are in the stats 
> >       and some 
> >       >       from the 
> >       >       >       stats 
> >       >       >       >       >       slabs output. If 
> >       >       >       >       >       >       >             you have 
> >       >       >       >       >       >       >             >       some 
> specific ones that 
> >       >       >       >       >       >       >             >       > 
> need collecting, I'll double check and make sure we can get those. 
> Alternatively, it might be most beneficial to see the metrics in person :) 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       I 
> just need stats snapshots here and there, and actually putting the thing 
> >       >       >       >       >       >       >             >       
> under load. When I did the LRU work I had to beg for several months 
> >       >       >       >       >       >       >             >       
> before anyone tested it with a production load. This slows things down and 
> >       >       >       >       >       >       >             >       
> demotivates me from working on the project. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       
> Unfortunately my dayjob keeps me pretty busy so ~internet~ would probably 
> >       >       >       >       >       >       >             >       be 
> best. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       > I 
> can create a driver program to reproduce the behavior on a smaller scale. 
> It would write e.g. 10k keys of 10k size, then rewrite the same keys with 
> different size data. I'll work on that and post it to this thread when I 
> can reproduce the behavior locally. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       Ok. 
> There're slab rebalance unit tests in the t/ directory which do things 
> >       >       >       >       >       >       >             >       like 
> this, and I've used mc-crusher to slam the rebalancer. It's pretty 
> >       >       >       >       >       >       >             >       easy 
> to run one config to load up 10k objects, then flip to the other 
> >       >       >       >       >       >       >             >       
> using the same key namespace. 
> >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >             >       > 
> Thanks, 
> >       >       >       >       >       >       >             >       > 
> Scott 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       > On 
> Saturday, July 11, 2015 at 12:05:54 PM UTC-7, Dormando wrote: 
> >       >       >       >       >       >       >             >       >   
>     Hey, 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     On Fri, 10 Jul 2015, Scott Mansfield wrote: 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     > We've seen issues recently where we run a cluster that typically has 
> the majority of items overwritten in the same slab every day and a sudden 
> change in data size evicts a ton of data, affecting downstream systems. To 
> be clear that is our problem, but I think there's a tweak 
> >       in 
> >       >       memcached 
> >       >       >       that might 
> >       >       >       >       >       be useful and 
> >       >       >       >       >       >       >             another 
> >       >       >       >       >       >       >             >       
> possible feature that 
> >       >       >       >       >       >       >             >       >   
>     would be even 
> >       >       >       >       >       >       >             >       >   
>     > better. 
> >       >       >       >       >       >       >             >       >   
>     > The data that is written to this cache is overwritten every day, 
> though the TTL is 7 days. One slab takes up the majority of the space in 
> the cache. The application wrote e.g. 10KB (slab 21) every day for each key 
> consistently. One day, a change occurred where it started 
> >       writing 
> >       >       15KB (slab 
> >       >       >       23), 
> >       >       >       >       >       causing a migration 
> >       >       >       >       >       >       >             of data 
> >       >       >       >       >       >       >             >       from 
> one slab to 
> >       >       >       >       >       >       >             >       >   
>     another. We had -o 
> >       >       >       >       >       >       >             >       >   
>     > slab_reassign,slab_automove=1 set on the server, causing large 
> numbers of evictions on the initial slab. Let's say the cache could hold 
> the data at 15KB per key, but the old data was not technically TTL'd out in 
> it's old slab. This means that memory was not being freed by 
> >       the lru 
> >       >       crawler 
> >       >       >       thread (I 
> >       >       >       >       >       think) because 
> >       >       >       >       >       >       its 
> >       >       >       >       >       >       >             expiry 
> >       >       >       >       >       >       >             >       had 
> not come 
> >       >       >       >       >       >       >             >       >   
>     around.  
> >       >       >       >       >       >       >             >       >   
>     > 
> >       >       >       >       >       >       >             >       >   
>     > lines 1199 and 1200 in items.c: 
> >       >       >       >       >       >       >             >       >   
>     > if ((search->exptime != 0 && search->exptime < current_time) || 
> is_flushed(search)) { 
> >       >       >       >       >       >       >             >       >   
>     > 
> >       >       >       >       >       >       >             >       >   
>     > If there was a check to see if this data was "orphaned," i.e. that 
> the key, if accessed, would map to a different slab than the current one, 
> then these orphans could be reclaimed as free memory. I am working on a 
> patch to do this, though I have reservations about performing 
> >       a hash 
> >       >       on the 
> >       >       >       key on the 
> >       >       >       >       >       lru crawler 
> >       >       >       >       >       >       >             thread (if 
> >       >       >       >       >       >       >             >       the 
> hash is not 
> >       >       >       >       >       >       >             >       >   
>     already available). 
> >       >       >       >       >       >       >             >       >   
>     > I have very little experience in the memcached codebase so I don't 
> know the most efficient way to do this. Any help would be appreciated. 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     There seems to be a misconception about how the slab classes work. A 
> key, 
> >       >       >       >       >       >       >             >       >   
>     if already existing in a slab, will always map to the slab class it 
> >       >       >       >       >       >       >             >       >   
>     currently fits into. The slab classes always exist, but the amount of 
> >       >       >       >       >       >       >             >       >   
>     memory reserved for each of them will shift with the slab_reassign. ie: 
> 10 
> >       >       >       >       >       >       >             >       >   
>     pages in slab class 21, then memory pressure on 23 causes it to move 
> over. 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     So if you examine a key that still exists in slab class 21, it has no 
> >       >       >       >       >       >       >             >       >   
>     reason to move up or down the slab classes. 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     > Alternatively, and possibly more beneficial is compaction of data in 
> a slab using the same set of criteria as lru crawling. Understandably, 
> compaction is a very difficult problem to solve since moving the data would 
> be a pain in the ass. I saw a couple of discussions about 
> >       this in 
> >       >       the 
> >       >       >       mailing list, 
> >       >       >       >       >       though I didn't 
> >       >       >       >       >       >       >             see any 
> >       >       >       >       >       >       >             >       firm 
> thoughts about 
> >       >       >       >       >       >       >             >       >   
>     it. I think it 
> >       >       >       >       >       >       >             >       >   
>     > can probably be done in O(1) like the lru crawler by limiting the 
> number of items it touches each time. Writing and reading are doable in 
> O(1) so moving should be as well. Has anyone given more thought on 
> compaction? 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     I'd be interested in hacking this up for you folks if you can provide 
> me 
> >       >       >       >       >       >       >             >       >   
>     testing and some data to work with. With all of the LRU work I did in 
> >       >       >       >       >       >       >             >       >   
>     1.4.24, the next things I wanted to do is a big improvement on the slab 
> >       >       >       >       >       >       >             >       >   
>     reassignment code. 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     Currently it picks essentially a random slab page, empties it, and 
> moves 
> >       >       >       >       >       >       >             >       >   
>     the slab page into the class under pressure. 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     One thing we can do is first examine for free memory in the existing 
> slab, 
> >       >       >       >       >       >       >             >       >   
>     IE: 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     - Take a page from slab 21 
> >       >       >       >       >       >       >             >       >   
>     - Scan the page for valid items which need to be moved 
> >       >       >       >       >       >       >             >       >   
>     - Pull free memory from slab 21, migrate the item (moderately 
> complicated) 
> >       >       >       >       >       >       >             >       >   
>     - When the page is empty, move it (or give up if you run out of free 
> >       >       >       >       >       >       >             >       >   
>     chunks). 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     The next step is to pull from the LRU on slab 21: 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     - Take page from slab 21 
> >       >       >       >       >       >       >             >       >   
>     - Scan page for valid items 
> >       >       >       >       >       >       >             >       >   
>     - Pull free memory from slab 21, migrate the item 
> >       >       >       >       >       >       >             >       >   
>       - If no memory free, evict tail of slab 21. use that chunk. 
> >       >       >       >       >       >       >             >       >   
>     - When the page is empty, move it. 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     Then, when you hit this condition your least-recently-used data gets 
> >       >       >       >       >       >       >             >       >   
>     culled as new data migrates your page class. This should match a 
> natural 
> >       >       >       >       >       >       >             >       >   
>     occurrance if you would already be evicting valid (but old) items to 
> make 
> >       >       >       >       >       >       >             >       >   
>     room for new items. 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     A bonus to using the free memory trick, is that I can use the amount of 
> >       >       >       >       >       >       >             >       >   
>     free space in a slab class as a heuristic to more quickly move slab 
> pages 
> >       >       >       >       >       >       >             >       >   
>     around. 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     If it's still necessary from there, we can explore "upgrading" items to 
> a 
> >       >       >       >       >       >       >             >       >   
>     new slab class, but that is much much more complicated since the item 
> has 
> >       >       >       >       >       >       >             >       >   
>     to shift LRU's. Do you put it at the head, the tail, the middle, etc? 
> It 
> >       >       >       >       >       >       >             >       >   
>     might be impossible to make a good generic decision there. 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     What version are you currently on? If 1.4.24, have you seen any 
> >       >       >       >       >       >       >             >       >   
>     instability? I'm currently torn between fighting a few bugs and start 
> on 
> >       >       >       >       >       >       >             >       >   
>     improving the slab rebalancer. 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     -Dormando 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       > On 
> Saturday, July 11, 2015 at 12:05:54 PM UTC-7, Dormando wrote: 
> >       >       >       >       >       >       >             >       >   
>     Hey, 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     On Fri, 10 Jul 2015, Scott Mansfield wrote: 
> >       >       >       >       >       >       >             >       > 
> >       >       >       >       >       >       >             >       >   
>     > We've seen issues recently where we run a cluster that typically has 
> the majority of items overwritten in the same slab every day and a sudden 
> change in data size evicts a ton of data, affecting downstream systems. To 
> be clear that is our problem, but I think there's a tweak 
> >       in 
> >       >       memcached 
> >       >       >       that might 
> >       >       >       >       >       be useful and 
> >       >       >       >       >       >       >             another 
> >       >       >       >       >       >       >             >       
> possible feature that 
> >       >       >       >       >       >       >             >       >   
>     would be even 
> >       >       >       >       >       >       >             >       >   
>     > better. 
> >       >       >       >       >       >       >             >       >   
>     > The data that is written to this cache is overwritten every day, 
> though the TTL is 7 days. One slab takes up the majority of the space in 
> the cache. The application wrote e.g. 10KB (slab 21) every day for each key 
> consistently. One day, a change occurred where it started 
> >       writing 
> >       >       15KB (slab 
> >       >       >       23), 
> >       >       >       >       >       causing a migration 
> >       >       >       >       >       >       >             of data 
> >       >       >       >       >       >       >             >       from 
> one slab to 
> >       >       >       >       >       >       >             >       >   
>     another. We had -o 
> >       >       >       >       >       >       >             >       >   
>     > slab_reassign,slab_automove=1 set on the server, causing large 
> numbers of evictions on the initial slab. Let's say the cache could hold 
> the data at 15KB per key, but the old data was not technically TTL'd out in 
> it's old slab. This means that memory was not being freed by 
> >       the lru 
> >       >       crawler 
> >       >       >       thread (I 
> >       >       >       >       >       think) because 
> >       >       >       >       >       >       its 
> >       >       >       >       >       >       >             expiry 
> >       >       >       >       >       >       >             >       had 
> not come 
> >       >       >       >       >       >       >             >       >   
>     around.  
> >       >       >       >       >       >       >             >       >   
>     > 
> >       >       >       >       >       >       >             >       >   
>     > lines 1199 and 1200 in items.c: 
> >       >       >       >       >       >       >             >       >   
>     > if ((search->exptime != 0 && search->exptime < current_time) || 
> is_flushed(search)) { 
> >       >       >       >       >       >       >             >       >   
>     > 
> >       >       >       >       >       >       >             >       >   
>     > If there was a check to see if this data was "orphaned," i.e. that 
> the k...

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Check for orphaned items in lru crawler thread

Reply via email to