Re: Check for orphaned items in lru crawler thread

dormando Thu, 01 Oct 2015 12:49:06 -0700

Were lru_maintainer/lru_crawler/etc enabled though? even if slab mover is
off, those two were the big changes in .24


On Thu, 1 Oct 2015, Scott Mansfield wrote:

> The same cluster has > 400 servers happily running 1.4.24. It's been our 
> standard deployment for a while now, and we haven't seen any crashes. The 
> servers in the same cluster running 1.4.24 (with the same write load the new 
> build was taking) have been up for 29 days. The start options do not contain 
> the slab_automove option because it wasn't effective for
> us before. The memory given is possibly slightly different per server, as we 
> calculate on startup how much we give. It's in the same ballpark, though (~56 
> gigs).
>
> On Thursday, October 1, 2015 at 12:11:35 PM UTC-7, Dormando wrote:
>       Just before I sit in and try to narrow this down: have you run any host 
> on
>       1.4.24 mainline with those same start options? just in case the crash is
>       older
>
>       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>
>       > Another message for you:
>       > [78098.528606] traps: memcached[2757] general protection ip:412b9d 
> sp:7fc0700dbdd0 error:0 in memcached[400000+1d000]
>       >
>       >
>       > addr2line shows:
>       >
>       > $ addr2line -e memcached 412b9d
>       >
>       > 
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/assoc.c:119
>       >
>       >
>       >
>       > On Thursday, October 1, 2015 at 1:41:44 AM UTC-7, Dormando wrote:
>       >       Ok, thanks!
>       >
>       >       I'll noodle this a bit... unfortunately a backtrace might be 
> more helpful.
>       >       will ask you to attempt to get one if I don't figure anything 
> out in time.
>       >
>       >       (allow it to core dump or attach a GDB session and set an 
> ignore handler
>       >       for sigpipe/int/etc and run "continue")
>       >
>       >       what were your full startup args, though?
>       >
>       >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>       >
>       >       > The commit was the latest in slab_rebal_next at the time:
>       >       > 
> https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a
>       >       >
>       >       > addr2line gave me this output:
>       >       >
>       >       > $ addr2line -e memcached 0x40e007
>       >       >
>       >       > 
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/slabs.c:264
>       >       >
>       >       >
>       >       > As well, this was running with production writes, but not 
> reads. Even if we had reads on with the few servers crashing, we're ok 
> architecturally. That's why I can get it out there without worrying too much. 
> For now, I'm going to turn it off. I had a metrics issue anyway that needs to 
> get fixed. Tomorrow I'm planning to test again with
>       more
>       >       metrics, but I
>       >       > can get any new code in pretty quick.
>       >       >
>       >       >
>       >       > On Thursday, October 1, 2015 at 1:01:36 AM UTC-7, Dormando 
> wrote:
>       >       >       How many servers were you running it on? I hope it 
> wasn't more than a
>       >       >       handful. I'd recommend starting with one :P
>       >       >
>       >       >       can you do an addr2line? what were your startup args, 
> and what was the
>       >       >       commit sha1 for the branch you pulled?
>       >       >
>       >       >       sorry about that :/
>       >       >
>       >       >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>       >       >
>       >       >       > A few different servers (5 / 205) experienced a 
> segfault all within an hour or so. Unfortunately at this point I'm a bit out 
> of my depth. I have the dmesg output, which is identical for all 5 boxes:
>       >       >       >
>       >       >       > [46545.316351] memcached[2789]: segfault at 0 ip 
> 000000000040e007 sp 00007f362ceedeb0 error 4 in memcached[400000+1d000]
>       >       >       >
>       >       >       >
>       >       >       > I can possibly supply the binary file if needed, 
> though we didn't do anything besides the standard setup and compile.
>       >       >       >
>       >       >       >
>       >       >       >
>       >       >       > On Tuesday, September 29, 2015 at 10:27:59 PM UTC-7, 
> Dormando wrote:
>       >       >       >       If you look at the new branch there's a commit 
> explaining the new stats.
>       >       >       >
>       >       >       >       You can watch slab_reassing_evictions vs 
> slab_reassign_saves. you can also
>       >       >       >       test automove=1 vs automove=2 (please also turn 
> on the lru_maintainer and
>       >       >       >       lru_crawler).
>       >       >       >
>       >       >       >       The initial branch you were running didn't add 
> any new stats. It just
>       >       >       >       restored an old feature.
>       >       >       >
>       >       >       >       On Tue, 29 Sep 2015, Scott Mansfield wrote:
>       >       >       >
>       >       >       >       > An unrelated prod problem meant I had to stop 
> after about an hour. I'm turning it on again tomorrow morning.
>       >       >       >       > Are there any new metrics I should be looking 
> at? Anything new in the stats output? I'm about to take a look at the diffs 
> as well.
>       >       >       >       >
>       >       >       >       > On Tuesday, September 29, 2015 at 12:37:45 PM 
> UTC-7, Dormando wrote:
>       >       >       >       >       excellent. if automove=2 is too 
> aggressive you'll see that come in in a
>       >       >       >       >       hit ratio reduction.
>       >       >       >       >
>       >       >       >       >       the new branch works with automove=2 as 
> well, but it will attempt to
>       >       >       >       >       rescue valid items in the old slab if 
> possible. I'll still be working on
>       >       >       >       >       it for another few hours today though. 
> I'll mail again when I'm done.
>       >       >       >       >
>       >       >       >       >       On Tue, 29 Sep 2015, Scott Mansfield 
> wrote:
>       >       >       >       >
>       >       >       >       >       > I have the first commit 
> (slab_automove=2) running in prod right now. Later today will be a full load 
> production test of the latest code. I'll just let it run for a few days 
> unless I spot any problems. We have good metrics for latency et. al. from the 
> client side, though network normally dwarfs memcached time.
>       >       >       >       >       >
>       >       >       >       >       > On Tuesday, September 29, 2015 at 
> 3:10:03 AM UTC-7, Dormando wrote:
>       >       >       >       >       >       That's unfortunate.
>       >       >       >       >       >
>       >       >       >       >       >       I've done some more work on the 
> branch:
>       >       >       >       >       >       
> https://github.com/memcached/memcached/pull/112
>       >       >       >       >       >
>       >       >       >       >       >       It's not completely likely you 
> would see enough of an improvement from the
>       >       >       >       >       >       new default mode. However if 
> your item sizes change gradually, items are
>       >       >       >       >       >       reclaimed during expiration, or 
> get overwritten (and thus freed in the old
>       >       >       >       >       >       class), it should work just 
> fine. I have another patch coming which should
>       >       >       >       >       >       help though.
>       >       >       >       >       >
>       >       >       >       >       >       Open to feedback from any 
> interested party.
>       >       >       >       >       >
>       >       >       >       >       >       On Fri, 25 Sep 2015, Scott 
> Mansfield wrote:
>       >       >       >       >       >
>       >       >       >       >       >       > I have it running internally, 
> and it runs fine under normal load. It's difficult to put it into the line of 
> fire for a production workload because of social reasons... As well it's a 
> degenerate case that we normally don't run in to (and actively try to avoid). 
> I'm going to run some heavier load tests on
>       it
>       >       today. 
>       >       >       >       >       >       >
>       >       >       >       >       >       > On Wednesday, September 9, 
> 2015 at 10:23:32 AM UTC-7, Scott Mansfield wrote:
>       >       >       >       >       >       >       I'm working on getting 
> a test going internally. I'll let you know how it goes. 
>       >       >       >       >       >       >
>       >       >       >       >       >       >
>       >       >       >       >       >       > Scott Mansfield
>       >       >       >       >       >       > On Mon, Sep 7, 2015 at 2:33 
> PM, dormando wrote:
>       >       >       >       >       >       >       Yo,
>       >       >       >       >       >       >
>       >       >       >       >       >       >       
> https://github.com/dormando/memcached/commits/slab_rebal_next - would you
>       >       >       >       >       >       >       mind playing around 
> with the branch here? You can see the start options in
>       >       >       >       >       >       >       the test.
>       >       >       >       >       >       >
>       >       >       >       >       >       >       This is a dead simple 
> modification (a restoration of a feature that was
>       >       >       >       >       >       >       arleady there...). The 
> test very aggressively writes and is able to shunt
>       >       >       >       >       >       >       memory around 
> appropriately.
>       >       >       >       >       >       >
>       >       >       >       >       >       >       The work I'm exploring 
> right now will allow savings of items being
>       >       >       >       >       >       >       rebalanced from, and 
> increasing the aggression of page moving without
>       >       >       >       >       >       >       being so brain damaged 
> about it.
>       >       >       >       >       >       >
>       >       >       >       >       >       >       But while I'm poking 
> around with that, I'd be interested in knowing if
>       >       >       >       >       >       >       this simple branch is 
> an improvement, and if so how much.
>       >       >       >       >       >       >
>       >       >       >       >       >       >       I'll push more code to 
> the branch, but the changes should be gated behind
>       >       >       >       >       >       >       a feature flag.
>       >       >       >       >       >       >
>       >       >       >       >       >       >       On Tue, 18 Aug 2015, 
> 'Scott Mansfield' via memcached wrote:
>       >       >       >       >       >       >
>       >       >       >       >       >       >       >
>       >       >       >       >       >       >       > No worries man, 
> you're doing us a favor. Let me know if there's anything you need from us, 
> and I promise I'll be quicker this time :)
>       >       >       >       >       >       >       >
>       >       >       >       >       >       >       > On Aug 18, 2015 12:01 
> AM, "dormando" <dorm...@rydia.net> wrote:
>       >       >       >       >       >       >       >       Hey,
>       >       >       >       >       >       >       >
>       >       >       >       >       >       >       >       I'm still 
> really interested in working on this. I'll be taking a careful
>       >       >       >       >       >       >       >       look soon I 
> hope.
>       >       >       >       >       >       >       >
>       >       >       >       >       >       >       >       On Mon, 3 Aug 
> 2015, Scott Mansfield wrote:
>       >       >       >       >       >       >       >
>       >       >       >       >       >       >       >       > I've tweaked 
> the program slightly, so I'm adding a new version. It prints more stats as it 
> goes and runs a bit faster.
>       >       >       >       >       >       >       >       >
>       >       >       >       >       >       >       >       > On Monday, 
> August 3, 2015 at 1:20:37 AM UTC-7, Scott Mansfield wrote:
>       >       >       >       >       >       >       >       >       Total 
> brain fart on my part. Apparently I had memcached 1.4.13 on my path (who 
> knows how...) Using the actual one that I've built works. Sorry for the 
> confusion... can't believe I didn't realize that before. I'm testing against 
> the compiled one now to see how it behaves.
>       >       >       >       >       >       >       >       >       On 
> Monday, August 3, 2015 at 1:15:06 AM UTC-7, Dormando wrote:
>       >       >       >       >       >       >       >       >             
> You sure that's 1.4.24? None of those fail for me :(
>       >       >       >       >       >       >       >       >
>       >       >       >       >       >       >       >       >             
> On Mon, 3 Aug 2015, Scott Mansfield wrote:
>       >       >       >       >       >       >       >       >
>       >       >       >       >       >       >       >       >             > 
> The command line I've used that will start is:
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
> memcached -m 64 -o slab_reassign,slab_automove
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
> the ones that fail are:
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
> memcached -m 64 -o slab_reassign,slab_automove,lru_crawler,lru_maintainer
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
> memcached -o lru_crawler
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
> I'm sure I've missed something during compile, though I just used ./configure 
> and make.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
> On Monday, August 3, 2015 at 12:22:33 AM UTC-7, Scott Mansfield wrote:
>       >       >       >       >       >       >       >       >             > 
>       I've attached a pretty simple program to connect, fill a slab with 
> data, and then fill another slab slowly with data of a different size. I've 
> been trying to get memcached to run with the lru_crawler and lru_maintainer 
> flags, but I get '
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       Illegal suboption "(null)"' every time I try to start with either in 
> any configuration.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       I haven't seen it start to move slabs automatically with a freshly 
> installed 1.2.24.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       On Tuesday, July 21, 2015 at 4:55:17 PM UTC-7, Scott Mansfield wrote:
>       >       >       >       >       >       >       >       >             > 
>             I realize I've not given you the tests to reproduce the behavior. 
> I should be able to soon. Sorry about the delay here.
>       >       >       >       >       >       >       >       >             > 
> In the mean time, I wanted to bring up a possible secondary use of the same 
> logic to move items on slab rebalancing. I think the system might benefit 
> from using the same logic to crawl the pages in a slab and compact the data 
> in the background. In the case where we have
>       memory that
>       >       is
>       >       >       assigned to
>       >       >       >       the slab
>       >       >       >       >       but not
>       >       >       >       >       >       >       being used
>       >       >       >       >       >       >       >       because
>       >       >       >       >       >       >       >       >             
> of replaced
>       >       >       >       >       >       >       >       >             > 
> or TTL'd out data, returning the memory to a pool of free memory will allow a 
> slab to grow with that memory first instead of waiting for an event where 
> memory is needed at that instant.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
> It's a change in approach, from reactive to proactive. What do you think?
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
> On Monday, July 13, 2015 at 5:54:11 PM UTC-7, Dormando wrote:
>       >       >       >       >       >       >       >       >             > 
>       > First, more detail for you:
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       > We are running 1.4.24 in production and haven't noticed any bugs as 
> of yet. The new LRUs seem to be working well, though we nearly always run 
> memcached scaled to hold all data without evictions. Those with evictions are 
> behaving well. Those without evictions haven't
>       seen
>       >       crashing or
>       >       >       any
>       >       >       >       other
>       >       >       >       >       noticeable
>       >       >       >       >       >       bad
>       >       >       >       >       >       >       behavior.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       Neat.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       > OK, I think I see an area where I was speculating on functionality. 
> If you have a key in slab 21 and then the same key is written again at a 
> larger size in slab 23 I assumed that the space in 21 was not freed on the 
> second write. With that assumption, the LRU crawler
>       would
>       >       not free
>       >       >       up that
>       >       >       >       space.
>       >       >       >       >       Also just
>       >       >       >       >       >       >       by observation
>       >       >       >       >       >       >       >       in
>       >       >       >       >       >       >       >       >             
> the
>       >       >       >       >       >       >       >       >             > 
>       macro, the space is not freed
>       >       >       >       >       >       >       >       >             > 
>       > fast enough to be effective, in our use case, to accept the writes 
> that are happening. Think in the hundreds of millions of "overwrites" in a 6 
> - 10 hour period across a cluster.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       Internally, "items" (a key/value pair) are generally immutable. The only
>       >       >       >       >       >       >       >       >             > 
>       time when it's not is for INCR/DECR, and it still becomes immutable if 
> two
>       >       >       >       >       >       >       >       >             > 
>       INCR/DECR's collide.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       What this means, is that the new item is staged in a piece of free 
> memory
>       >       >       >       >       >       >       >       >             > 
>       while the "upload" stage of the SET happens. When memcached has all of 
> the
>       >       >       >       >       >       >       >       >             > 
>       data in memory to replace the item, it does an internal swap under a 
> lock.
>       >       >       >       >       >       >       >       >             > 
>       The old item is removed from the hash table and LRU, and the new item 
> gets
>       >       >       >       >       >       >       >       >             > 
>       put in its place (at the head of the LRU).
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       Since items are refcounted, this means that if other users are 
> downloading
>       >       >       >       >       >       >       >       >             > 
>       an item which just got replaced, their memory doesn't get corrupted by 
> the
>       >       >       >       >       >       >       >       >             > 
>       item changing out from underneath them. They can continue to read the 
> old
>       >       >       >       >       >       >       >       >             > 
>       item until they're done. When the refcount reaches zero the old memory 
> is
>       >       >       >       >       >       >       >       >             > 
>       reclaimed.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       Most of the time, the item replacement happens then the old memory is
>       >       >       >       >       >       >       >       >             > 
>       immediately removed.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       However, this does mean that you need *one* piece of free memory to
>       >       >       >       >       >       >       >       >             > 
>       replace the old one. Then the old memory gets freed after that set.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       So if you take a memcached instance with 0 free chunks, and do a rolling
>       >       >       >       >       >       >       >       >             > 
>       replacement of all items (within the same slab class as before), the 
> first
>       >       >       >       >       >       >       >       >             > 
>       one would cause an eviction from the tail of the LRU to get a free 
> chunk.
>       >       >       >       >       >       >       >       >             > 
>       Every SET after that would use the chunk freed from the replacement of 
> the
>       >       >       >       >       >       >       >       >             > 
>       previous memory.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       > After that last sentence I realized I also may not have explained 
> well enough the access pattern. The keys are all overwritten every day, but 
> it takes some time to write them all (obviously). We see a huge increase in 
> the bytes metric as if the new data for the old
>       keys was
>       >       being
>       >       >       written
>       >       >       >       for the
>       >       >       >       >       first
>       >       >       >       >       >       time.
>       >       >       >       >       >       >       Since the
>       >       >       >       >       >       >       >       "old"
>       >       >       >       >       >       >       >       >             
> slab for
>       >       >       >       >       >       >       >       >             > 
>       the same key doesn't
>       >       >       >       >       >       >       >       >             > 
>       > proactively release memory, it starts to fill up the cache and then 
> start evicting data in the new slab. Once that happens, we see evictions in 
> the old slab because of the algorithm you mentioned (random picking / freeing 
> of memory). Typically we don't see any use
>       for
>       >       "upgrading" an
>       >       >       item as
>       >       >       >       the new
>       >       >       >       >       data
>       >       >       >       >       >       >       would be entirely
>       >       >       >       >       >       >       >       >             
> new and
>       >       >       >       >       >       >       >       >             > 
>       should wholesale replace the
>       >       >       >       >       >       >       >       >             > 
>       > old data for that key. More specifically, the operation is always 
> set, with different data each day.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       Right. Most of your problems will come from two areas. One being that
>       >       >       >       >       >       >       >       >             > 
>       writing data aggressively into the new slab class (unless you set the
>       >       >       >       >       >       >       >       >             > 
>       rebalancer to always-replace mode), the mover will make memory available
>       >       >       >       >       >       >       >       >             > 
>       more slowly than you can insert. So you'll cause extra evictions in the
>       >       >       >       >       >       >       >       >             > 
>       new slab class.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       The secondary problem is from the random evictions in the previous slab
>       >       >       >       >       >       >       >       >             > 
>       class as stuff is chucked on the floor to make memory moveable.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       > As for testing, we'll be able to put it under real production 
> workload. I don't know what kind of data you mean you need for testing. The 
> data stored in the caches are highly confidential. I can give you all kinds 
> of metrics, since we collect most of the ones that
>       are in the
>       >       stats
>       >       >       and some
>       >       >       >       from the
>       >       >       >       >       stats
>       >       >       >       >       >       >       slabs output. If
>       >       >       >       >       >       >       >       >             
> you have
>       >       >       >       >       >       >       >       >             > 
>       some specific ones that
>       >       >       >       >       >       >       >       >             > 
>       > need collecting, I'll double check and make sure we can get those. 
> Alternatively, it might be most beneficial to see the metrics in person :)
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       I just need stats snapshots here and there, and actually putting the 
> thing
>       >       >       >       >       >       >       >       >             > 
>       under load. When I did the LRU work I had to beg for several months
>       >       >       >       >       >       >       >       >             > 
>       before anyone tested it with a production load. This slows things down 
> and
>       >       >       >       >       >       >       >       >             > 
>       demotivates me from working on the project.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       Unfortunately my dayjob keeps me pretty busy so ~internet~ would 
> probably
>       >       >       >       >       >       >       >       >             > 
>       be best.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       > I can create a driver program to reproduce the behavior on a smaller 
> scale. It would write e.g. 10k keys of 10k size, then rewrite the same keys 
> with different size data. I'll work on that and post it to this thread when I 
> can reproduce the behavior locally.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       Ok. There're slab rebalance unit tests in the t/ directory which do 
> things
>       >       >       >       >       >       >       >       >             > 
>       like this, and I've used mc-crusher to slam the rebalancer. It's pretty
>       >       >       >       >       >       >       >       >             > 
>       easy to run one config to load up 10k objects, then flip to the other
>       >       >       >       >       >       >       >       >             > 
>       using the same key namespace.
>       >       >       >       >       >       >       >       >             >
>       >       >       >       >       >       >       >       >             > 
>       > Thanks,
>       >       >       >       >       >       >       >       >             > 
>       > Scott
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       > On Saturday, July 11, 2015 at 12:05:54 PM UTC-7, Dormando wrote:
>       >       >       >       >       >       >       >       >             > 
>       >       Hey,
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       >       On Fri, 10 Jul 2015, Scott Mansfield wrote:
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       >       > We've seen issues recently where we run a cluster that 
> typically has the majority of items overwritten in the same slab every day 
> and a sudden change in data size evicts a ton of data, affecting downstream 
> systems. To be clear that is our problem, but I think
>       there's
>       >       a tweak
>       >       >       in
>       >       >       >       memcached
>       >       >       >       >       that might
>       >       >       >       >       >       >       be useful and
>       >       >       >       >       >       >       >       >             
> another
>       >       >       >       >       >       >       >       >             > 
>       possible feature that
>       >       >       >       >       >       >       >       >             > 
>       >       would be even
>       >       >       >       >       >       >       >       >             > 
>       >       > better.
>       >       >       >       >       >       >       >       >             > 
>       >       > The data that is written to this cache is overwritten every 
> day, though the TTL is 7 days. One slab takes up the majority of the space in 
> the cache. The application wrote e.g. 10KB (slab 21) every day for each key 
> consistently. One day, a change occurred
>       where it
>       >       started
>       >       >       writing
>       >       >       >       15KB (slab
>       >       >       >       >       23),
>       >       >       >       >       >       >       causing a migration
>       >       >       >       >       >       >       >       >             
> of data
>       >       >       >       >       >       >       >       >             > 
>       from one slab to
>       >       >       >       >       >       >       >       >             > 
>       >       another. We had -o
>       >       >       >       >       >       >       >       >             > 
>       >       > slab_reassign,slab_automove=1 set on the server, causing 
> large numbers of evictions on the initial slab. Let's say the cache could 
> hold the data at 15KB per key, but the old data was not technically TTL'd out 
> in it's old slab. This means that memory was not
>       being
>       >       freed by
>       >       >       the lru
>       >       >       >       crawler
>       >       >       >       >       thread (I
>       >       >       >       >       >       >       think) because
>       >       >       >       >       >       >       >       its
>       >       >       >       >       >       >       >       >             
> expiry
>       >       >       >       >       >       >       >       >             > 
>       had not come
>       >       >       >       >       >       >       >       >             > 
>       >       around. 
>       >       >       >       >       >       >       >       >             > 
>       >       >
>       >       >       >       >       >       >       >       >             > 
>       >       > lines 1199 and 1200 in items.c:
>       >       >       >       >       >       >       >       >             > 
>       >       > if ((search->exptime != 0 && search->exptime < current_time) 
> || is_flushed(search)) {
>       >       >       >       >       >       >       >       >             > 
>       >       >
>       >       >       >       >       >       >       >       >             > 
>       >       > If there was a check to see if this data was "orphaned," i.e. 
> that the key, if accessed, would map to a different slab than the current 
> one, then these orphans could be reclaimed as free memory. I am working on a 
> patch to do this, though I have reservations
>       about
>       >       performing
>       >       >       a hash
>       >       >       >       on the
>       >       >       >       >       key on the
>       >       >       >       >       >       >       lru crawler
>       >       >       >       >       >       >       >       >             
> thread (if
>       >       >       >       >       >       >       >       >             > 
>       the hash is not
>       >       >       >       >       >       >       >       >             > 
>       >       already available).
>       >       >       >       >       >       >       >       >             > 
>       >       > I have very little experience in the memcached codebase so I 
> don't know the most efficient way to do this. Any help would be appreciated.
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       >       There seems to be a misconception about how the slab classes 
> work. A key,
>       >       >       >       >       >       >       >       >             > 
>       >       if already existing in a slab, will always map to the slab 
> class it
>       >       >       >       >       >       >       >       >             > 
>       >       currently fits into. The slab classes always exist, but the 
> amount of
>       >       >       >       >       >       >       >       >             > 
>       >       memory reserved for each of them will shift with the 
> slab_reassign. ie: 10
>       >       >       >       >       >       >       >       >             > 
>       >       pages in slab class 21, then memory pressure on 23 causes it to 
> move over.
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       >       So if you examine a key that still exists in slab class 21, it 
> has no
>       >       >       >       >       >       >       >       >             > 
>       >       reason to move up or down the slab classes.
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       >       > Alternatively, and possibly more beneficial is compaction of 
> data in a slab using the same set of criteria as lru crawling. 
> Understandably, compaction is a very difficult problem to solve since moving 
> the data would be a pain in the ass. I saw a couple of
>       discussions
>       >       about
>       >       >       this in
>       >       >       >       the
>       >       >       >       >       mailing list,
>       >       >       >       >       >       >       though I didn't
>       >       >       >       >       >       >       >       >             
> see any
>       >       >       >       >       >       >       >       >             > 
>       firm thoughts about
>       >       >       >       >       >       >       >       >             > 
>       >       it. I think it
>       >       >       >       >       >       >       >       >             > 
>       >       > can probably be done in O(1) like the lru crawler by limiting 
> the number of items it touches each time. Writing and reading are doable in 
> O(1) so moving should be as well. Has anyone given more thought on compaction?
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       >       I'd be interested in hacking this up for you folks if you can 
> provide me
>       >       >       >       >       >       >       >       >             > 
>       >       testing and some data to work with. With all of the LRU work I 
> did in
>       >       >       >       >       >       >       >       >             > 
>       >       1.4.24, the next things I wanted to do is a big improvement on 
> the slab
>       >       >       >       >       >       >       >       >             > 
>       >       reassignment code.
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       >       Currently it picks essentially a random slab page, empties it, 
> and moves
>       >       >       >       >       >       >       >       >             > 
>       >       the slab page into the class under pressure.
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       >       One thing we can do is first examine for free memory in the 
> existing slab,
>       >       >       >       >       >       >       >       >             > 
>       >       IE:
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       >       - Take a page from slab 21
>       >       >       >       >       >       >       >       >             > 
>       >       - Scan the page for valid items which need to be moved
>       >       >       >       >       >       >       >       >             > 
>       >       - Pull free memory from slab 21, migrate the item (moderately 
> complicated)
>       >       >       >       >       >       >       >       >             > 
>       >       - When the page is empty, move it (or give up if you run out of 
> free
>       >       >       >       >       >       >       >       >             > 
>       >       chunks).
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       >       The next step is to pull from the LRU on slab 21:
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       >       - Take page from slab 21
>       >       >       >       >       >       >       >       >             > 
>       >       - Scan page for valid items
>       >       >       >       >       >       >       >       >             > 
>       >       - Pull free memory from slab 21, migrate the item
>       >       >       >       >       >       >       >       >             > 
>       >         - If no memory free, evict tail of slab 21. use that chunk.
>       >       >       >       >       >       >       >       >             > 
>       >       - When the page is empty, move it.
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       >       Then, when you hit this condition your least-recently-used data 
> gets
>       >       >       >       >       >       >       >       >             > 
>       >       culled as new data migrates your page class. This should match 
> a natural
>       >       >       >       >       >       >       >       >             > 
>       >       occurrance if you would already be evicting valid (but old) 
> items to make
>       >       >       >       >       >       >       >       >             > 
>       >       room for new items.
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       >       A bonus to using the free memory trick, is that I can use the 
> amount of
>       >       >       >       >       >       >       >       >             > 
>       >       free space in a slab class as a heuristic to more quickly move 
> slab pages
>       >       >       >       >       >       >       >       >             > 
>       >       around.
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       >       If it's still necessary from there, we can explore "upgrading" 
> items to a
>       >       >       >       >       >       >       >       >             > 
>       >       new slab class, but that is much much more complicated since 
> the item has
>       >       >       >       >       >       >       >       >             > 
>       >       to shift LRU's. Do you put it at the head, the tail, the 
> middle, etc? It
>       >       >       >       >       >       >       >       >             > 
>       >       might be impossible to make a good generic decision there.
>       >       >       >       >       >       >       >       >             > 
>       >
>       >       >       >       >       >       >       >       >             > 
>       >       What version are you currently on? If 1.4.24, have you seen any
>       >       >       >       >       >       >       >       >             > 
>       >       instability? I'm currently torn between fighting a few bugs and 
> start on
>       >       >       >       >       >       >       >       >             > 
> ...
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

Re: Check for orphaned items in lru crawler thread

Reply via email to