Re: Check for orphaned items in lru crawler thread

dormando Thu, 01 Oct 2015 01:42:08 -0700

Ok, thanks!

I'll noodle this a bit... unfortunately a backtrace might be more helpful.
will ask you to attempt to get one if I don't figure anything out in time.


(allow it to core dump or attach a GDB session and set an ignore handler
for sigpipe/int/etc and run "continue")

what were your full startup args, though?

On Thu, 1 Oct 2015, Scott Mansfield wrote:

> The commit was the latest in slab_rebal_next at the time:
> https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a
>
> addr2line gave me this output:
>
> $ addr2line -e memcached 0x40e007
>
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/slabs.c:264
>
>
> As well, this was running with production writes, but not reads. Even if we 
> had reads on with the few servers crashing, we're ok architecturally. That's 
> why I can get it out there without worrying too much. For now, I'm going to 
> turn it off. I had a metrics issue anyway that needs to get fixed. Tomorrow 
> I'm planning to test again with more metrics, but I
> can get any new code in pretty quick.
>
>
> On Thursday, October 1, 2015 at 1:01:36 AM UTC-7, Dormando wrote:
>       How many servers were you running it on? I hope it wasn't more than a
>       handful. I'd recommend starting with one :P
>
>       can you do an addr2line? what were your startup args, and what was the
>       commit sha1 for the branch you pulled?
>
>       sorry about that :/
>
>       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>
>       > A few different servers (5 / 205) experienced a segfault all within 
> an hour or so. Unfortunately at this point I'm a bit out of my depth. I have 
> the dmesg output, which is identical for all 5 boxes:
>       >
>       > [46545.316351] memcached[2789]: segfault at 0 ip 000000000040e007 sp 
> 00007f362ceedeb0 error 4 in memcached[400000+1d000]
>       >
>       >
>       > I can possibly supply the binary file if needed, though we didn't do 
> anything besides the standard setup and compile.
>       >
>       >
>       >
>       > On Tuesday, September 29, 2015 at 10:27:59 PM UTC-7, Dormando wrote:
>       >       If you look at the new branch there's a commit explaining the 
> new stats.
>       >
>       >       You can watch slab_reassing_evictions vs slab_reassign_saves. 
> you can also
>       >       test automove=1 vs automove=2 (please also turn on the 
> lru_maintainer and
>       >       lru_crawler).
>       >
>       >       The initial branch you were running didn't add any new stats. 
> It just
>       >       restored an old feature.
>       >
>       >       On Tue, 29 Sep 2015, Scott Mansfield wrote:
>       >
>       >       > An unrelated prod problem meant I had to stop after about an 
> hour. I'm turning it on again tomorrow morning.
>       >       > Are there any new metrics I should be looking at? Anything 
> new in the stats output? I'm about to take a look at the diffs as well.
>       >       >
>       >       > On Tuesday, September 29, 2015 at 12:37:45 PM UTC-7, Dormando 
> wrote:
>       >       >       excellent. if automove=2 is too aggressive you'll see 
> that come in in a
>       >       >       hit ratio reduction.
>       >       >
>       >       >       the new branch works with automove=2 as well, but it 
> will attempt to
>       >       >       rescue valid items in the old slab if possible. I'll 
> still be working on
>       >       >       it for another few hours today though. I'll mail again 
> when I'm done.
>       >       >
>       >       >       On Tue, 29 Sep 2015, Scott Mansfield wrote:
>       >       >
>       >       >       > I have the first commit (slab_automove=2) running in 
> prod right now. Later today will be a full load production test of the latest 
> code. I'll just let it run for a few days unless I spot any problems. We have 
> good metrics for latency et. al. from the client side, though network 
> normally dwarfs memcached time.
>       >       >       >
>       >       >       > On Tuesday, September 29, 2015 at 3:10:03 AM UTC-7, 
> Dormando wrote:
>       >       >       >       That's unfortunate.
>       >       >       >
>       >       >       >       I've done some more work on the branch:
>       >       >       >       https://github.com/memcached/memcached/pull/112
>       >       >       >
>       >       >       >       It's not completely likely you would see enough 
> of an improvement from the
>       >       >       >       new default mode. However if your item sizes 
> change gradually, items are
>       >       >       >       reclaimed during expiration, or get overwritten 
> (and thus freed in the old
>       >       >       >       class), it should work just fine. I have 
> another patch coming which should
>       >       >       >       help though.
>       >       >       >
>       >       >       >       Open to feedback from any interested party.
>       >       >       >
>       >       >       >       On Fri, 25 Sep 2015, Scott Mansfield wrote:
>       >       >       >
>       >       >       >       > I have it running internally, and it runs 
> fine under normal load. It's difficult to put it into the line of fire for a 
> production workload because of social reasons... As well it's a degenerate 
> case that we normally don't run in to (and actively try to avoid). I'm going 
> to run some heavier load tests on it today. 
>       >       >       >       >
>       >       >       >       > On Wednesday, September 9, 2015 at 10:23:32 
> AM UTC-7, Scott Mansfield wrote:
>       >       >       >       >       I'm working on getting a test going 
> internally. I'll let you know how it goes. 
>       >       >       >       >
>       >       >       >       >
>       >       >       >       > Scott Mansfield
>       >       >       >       > On Mon, Sep 7, 2015 at 2:33 PM, dormando 
> wrote:
>       >       >       >       >       Yo,
>       >       >       >       >
>       >       >       >       >       
> https://github.com/dormando/memcached/commits/slab_rebal_next - would you
>       >       >       >       >       mind playing around with the branch 
> here? You can see the start options in
>       >       >       >       >       the test.
>       >       >       >       >
>       >       >       >       >       This is a dead simple modification (a 
> restoration of a feature that was
>       >       >       >       >       arleady there...). The test very 
> aggressively writes and is able to shunt
>       >       >       >       >       memory around appropriately.
>       >       >       >       >
>       >       >       >       >       The work I'm exploring right now will 
> allow savings of items being
>       >       >       >       >       rebalanced from, and increasing the 
> aggression of page moving without
>       >       >       >       >       being so brain damaged about it.
>       >       >       >       >
>       >       >       >       >       But while I'm poking around with that, 
> I'd be interested in knowing if
>       >       >       >       >       this simple branch is an improvement, 
> and if so how much.
>       >       >       >       >
>       >       >       >       >       I'll push more code to the branch, but 
> the changes should be gated behind
>       >       >       >       >       a feature flag.
>       >       >       >       >
>       >       >       >       >       On Tue, 18 Aug 2015, 'Scott Mansfield' 
> via memcached wrote:
>       >       >       >       >
>       >       >       >       >       >
>       >       >       >       >       > No worries man, you're doing us a 
> favor. Let me know if there's anything you need from us, and I promise I'll 
> be quicker this time :)
>       >       >       >       >       >
>       >       >       >       >       > On Aug 18, 2015 12:01 AM, "dormando" 
> <dorm...@rydia.net> wrote:
>       >       >       >       >       >       Hey,
>       >       >       >       >       >
>       >       >       >       >       >       I'm still really interested in 
> working on this. I'll be taking a careful
>       >       >       >       >       >       look soon I hope.
>       >       >       >       >       >
>       >       >       >       >       >       On Mon, 3 Aug 2015, Scott 
> Mansfield wrote:
>       >       >       >       >       >
>       >       >       >       >       >       > I've tweaked the program 
> slightly, so I'm adding a new version. It prints more stats as it goes and 
> runs a bit faster.
>       >       >       >       >       >       >
>       >       >       >       >       >       > On Monday, August 3, 2015 at 
> 1:20:37 AM UTC-7, Scott Mansfield wrote:
>       >       >       >       >       >       >       Total brain fart on my 
> part. Apparently I had memcached 1.4.13 on my path (who knows how...) Using 
> the actual one that I've built works. Sorry for the confusion... can't 
> believe I didn't realize that before. I'm testing against the compiled one 
> now to see how it behaves.
>       >       >       >       >       >       >       On Monday, August 3, 
> 2015 at 1:15:06 AM UTC-7, Dormando wrote:
>       >       >       >       >       >       >             You sure that's 
> 1.4.24? None of those fail for me :(
>       >       >       >       >       >       >
>       >       >       >       >       >       >             On Mon, 3 Aug 
> 2015, Scott Mansfield wrote:
>       >       >       >       >       >       >
>       >       >       >       >       >       >             > The command 
> line I've used that will start is:
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             > memcached -m 64 
> -o slab_reassign,slab_automove
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             > the ones that 
> fail are:
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             > memcached -m 64 
> -o slab_reassign,slab_automove,lru_crawler,lru_maintainer
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             > memcached -o 
> lru_crawler
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             > I'm sure I've 
> missed something during compile, though I just used ./configure and make.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             > On Monday, 
> August 3, 2015 at 12:22:33 AM UTC-7, Scott Mansfield wrote:
>       >       >       >       >       >       >             >       I've 
> attached a pretty simple program to connect, fill a slab with data, and then 
> fill another slab slowly with data of a different size. I've been trying to 
> get memcached to run with the lru_crawler and lru_maintainer flags, but I get 
> '
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       Illegal 
> suboption "(null)"' every time I try to start with either in any 
> configuration.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       I haven't 
> seen it start to move slabs automatically with a freshly installed 1.2.24.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       On 
> Tuesday, July 21, 2015 at 4:55:17 PM UTC-7, Scott Mansfield wrote:
>       >       >       >       >       >       >             >             I 
> realize I've not given you the tests to reproduce the behavior. I should be 
> able to soon. Sorry about the delay here.
>       >       >       >       >       >       >             > In the mean 
> time, I wanted to bring up a possible secondary use of the same logic to move 
> items on slab rebalancing. I think the system might benefit from using the 
> same logic to crawl the pages in a slab and compact the data in the 
> background. In the case where we have memory that is
>       assigned to
>       >       the slab
>       >       >       but not
>       >       >       >       >       being used
>       >       >       >       >       >       because
>       >       >       >       >       >       >             of replaced
>       >       >       >       >       >       >             > or TTL'd out 
> data, returning the memory to a pool of free memory will allow a slab to grow 
> with that memory first instead of waiting for an event where memory is needed 
> at that instant.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             > It's a change 
> in approach, from reactive to proactive. What do you think?
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             > On Monday, July 
> 13, 2015 at 5:54:11 PM UTC-7, Dormando wrote:
>       >       >       >       >       >       >             >       > First, 
> more detail for you:
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       > We are 
> running 1.4.24 in production and haven't noticed any bugs as of yet. The new 
> LRUs seem to be working well, though we nearly always run memcached scaled to 
> hold all data without evictions. Those with evictions are behaving well. 
> Those without evictions haven't seen crashing or
>       any
>       >       other
>       >       >       noticeable
>       >       >       >       bad
>       >       >       >       >       behavior.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       Neat.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       > OK, I 
> think I see an area where I was speculating on functionality. If you have a 
> key in slab 21 and then the same key is written again at a larger size in 
> slab 23 I assumed that the space in 21 was not freed on the second write. 
> With that assumption, the LRU crawler would not free
>       up that
>       >       space.
>       >       >       Also just
>       >       >       >       >       by observation
>       >       >       >       >       >       in
>       >       >       >       >       >       >             the
>       >       >       >       >       >       >             >       macro, 
> the space is not freed
>       >       >       >       >       >       >             >       > fast 
> enough to be effective, in our use case, to accept the writes that are 
> happening. Think in the hundreds of millions of "overwrites" in a 6 - 10 hour 
> period across a cluster.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       
> Internally, "items" (a key/value pair) are generally immutable. The only
>       >       >       >       >       >       >             >       time when 
> it's not is for INCR/DECR, and it still becomes immutable if two
>       >       >       >       >       >       >             >       
> INCR/DECR's collide.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       What this 
> means, is that the new item is staged in a piece of free memory
>       >       >       >       >       >       >             >       while the 
> "upload" stage of the SET happens. When memcached has all of the
>       >       >       >       >       >       >             >       data in 
> memory to replace the item, it does an internal swap under a lock.
>       >       >       >       >       >       >             >       The old 
> item is removed from the hash table and LRU, and the new item gets
>       >       >       >       >       >       >             >       put in 
> its place (at the head of the LRU).
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       Since 
> items are refcounted, this means that if other users are downloading
>       >       >       >       >       >       >             >       an item 
> which just got replaced, their memory doesn't get corrupted by the
>       >       >       >       >       >       >             >       item 
> changing out from underneath them. They can continue to read the old
>       >       >       >       >       >       >             >       item 
> until they're done. When the refcount reaches zero the old memory is
>       >       >       >       >       >       >             >       reclaimed.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       Most of 
> the time, the item replacement happens then the old memory is
>       >       >       >       >       >       >             >       
> immediately removed.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       However, 
> this does mean that you need *one* piece of free memory to
>       >       >       >       >       >       >             >       replace 
> the old one. Then the old memory gets freed after that set.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       So if you 
> take a memcached instance with 0 free chunks, and do a rolling
>       >       >       >       >       >       >             >       
> replacement of all items (within the same slab class as before), the first
>       >       >       >       >       >       >             >       one would 
> cause an eviction from the tail of the LRU to get a free chunk.
>       >       >       >       >       >       >             >       Every SET 
> after that would use the chunk freed from the replacement of the
>       >       >       >       >       >       >             >       previous 
> memory.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       > After 
> that last sentence I realized I also may not have explained well enough the 
> access pattern. The keys are all overwritten every day, but it takes some 
> time to write them all (obviously). We see a huge increase in the bytes 
> metric as if the new data for the old keys was being
>       written
>       >       for the
>       >       >       first
>       >       >       >       time.
>       >       >       >       >       Since the
>       >       >       >       >       >       "old"
>       >       >       >       >       >       >             slab for
>       >       >       >       >       >       >             >       the same 
> key doesn't
>       >       >       >       >       >       >             >       > 
> proactively release memory, it starts to fill up the cache and then start 
> evicting data in the new slab. Once that happens, we see evictions in the old 
> slab because of the algorithm you mentioned (random picking / freeing of 
> memory). Typically we don't see any use for "upgrading" an
>       item as
>       >       the new
>       >       >       data
>       >       >       >       >       would be entirely
>       >       >       >       >       >       >             new and
>       >       >       >       >       >       >             >       should 
> wholesale replace the
>       >       >       >       >       >       >             >       > old 
> data for that key. More specifically, the operation is always set, with 
> different data each day.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       Right. 
> Most of your problems will come from two areas. One being that
>       >       >       >       >       >       >             >       writing 
> data aggressively into the new slab class (unless you set the
>       >       >       >       >       >       >             >       
> rebalancer to always-replace mode), the mover will make memory available
>       >       >       >       >       >       >             >       more 
> slowly than you can insert. So you'll cause extra evictions in the
>       >       >       >       >       >       >             >       new slab 
> class.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       The 
> secondary problem is from the random evictions in the previous slab
>       >       >       >       >       >       >             >       class as 
> stuff is chucked on the floor to make memory moveable.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       > As for 
> testing, we'll be able to put it under real production workload. I don't know 
> what kind of data you mean you need for testing. The data stored in the 
> caches are highly confidential. I can give you all kinds of metrics, since we 
> collect most of the ones that are in the stats
>       and some
>       >       from the
>       >       >       stats
>       >       >       >       >       slabs output. If
>       >       >       >       >       >       >             you have
>       >       >       >       >       >       >             >       some 
> specific ones that
>       >       >       >       >       >       >             >       > need 
> collecting, I'll double check and make sure we can get those. Alternatively, 
> it might be most beneficial to see the metrics in person :)
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       I just 
> need stats snapshots here and there, and actually putting the thing
>       >       >       >       >       >       >             >       under 
> load. When I did the LRU work I had to beg for several months
>       >       >       >       >       >       >             >       before 
> anyone tested it with a production load. This slows things down and
>       >       >       >       >       >       >             >       
> demotivates me from working on the project.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       
> Unfortunately my dayjob keeps me pretty busy so ~internet~ would probably
>       >       >       >       >       >       >             >       be best.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       > I can 
> create a driver program to reproduce the behavior on a smaller scale. It 
> would write e.g. 10k keys of 10k size, then rewrite the same keys with 
> different size data. I'll work on that and post it to this thread when I can 
> reproduce the behavior locally.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       Ok. 
> There're slab rebalance unit tests in the t/ directory which do things
>       >       >       >       >       >       >             >       like 
> this, and I've used mc-crusher to slam the rebalancer. It's pretty
>       >       >       >       >       >       >             >       easy to 
> run one config to load up 10k objects, then flip to the other
>       >       >       >       >       >       >             >       using the 
> same key namespace.
>       >       >       >       >       >       >             >
>       >       >       >       >       >       >             >       > Thanks,
>       >       >       >       >       >       >             >       > Scott
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       > On 
> Saturday, July 11, 2015 at 12:05:54 PM UTC-7, Dormando wrote:
>       >       >       >       >       >       >             >       >       
> Hey,
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> On Fri, 10 Jul 2015, Scott Mansfield wrote:
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       > 
> We've seen issues recently where we run a cluster that typically has the 
> majority of items overwritten in the same slab every day and a sudden change 
> in data size evicts a ton of data, affecting downstream systems. To be clear 
> that is our problem, but I think there's a tweak
>       in
>       >       memcached
>       >       >       that might
>       >       >       >       >       be useful and
>       >       >       >       >       >       >             another
>       >       >       >       >       >       >             >       possible 
> feature that
>       >       >       >       >       >       >             >       >       
> would be even
>       >       >       >       >       >       >             >       >       > 
> better.
>       >       >       >       >       >       >             >       >       > 
> The data that is written to this cache is overwritten every day, though the 
> TTL is 7 days. One slab takes up the majority of the space in the cache. The 
> application wrote e.g. 10KB (slab 21) every day for each key consistently. 
> One day, a change occurred where it started
>       writing
>       >       15KB (slab
>       >       >       23),
>       >       >       >       >       causing a migration
>       >       >       >       >       >       >             of data
>       >       >       >       >       >       >             >       from one 
> slab to
>       >       >       >       >       >       >             >       >       
> another. We had -o
>       >       >       >       >       >       >             >       >       > 
> slab_reassign,slab_automove=1 set on the server, causing large numbers of 
> evictions on the initial slab. Let's say the cache could hold the data at 
> 15KB per key, but the old data was not technically TTL'd out in it's old 
> slab. This means that memory was not being freed by
>       the lru
>       >       crawler
>       >       >       thread (I
>       >       >       >       >       think) because
>       >       >       >       >       >       its
>       >       >       >       >       >       >             expiry
>       >       >       >       >       >       >             >       had not 
> come
>       >       >       >       >       >       >             >       >       
> around. 
>       >       >       >       >       >       >             >       >       >
>       >       >       >       >       >       >             >       >       > 
> lines 1199 and 1200 in items.c:
>       >       >       >       >       >       >             >       >       > 
> if ((search->exptime != 0 && search->exptime < current_time) || 
> is_flushed(search)) {
>       >       >       >       >       >       >             >       >       >
>       >       >       >       >       >       >             >       >       > 
> If there was a check to see if this data was "orphaned," i.e. that the key, 
> if accessed, would map to a different slab than the current one, then these 
> orphans could be reclaimed as free memory. I am working on a patch to do 
> this, though I have reservations about performing
>       a hash
>       >       on the
>       >       >       key on the
>       >       >       >       >       lru crawler
>       >       >       >       >       >       >             thread (if
>       >       >       >       >       >       >             >       the hash 
> is not
>       >       >       >       >       >       >             >       >       
> already available).
>       >       >       >       >       >       >             >       >       > 
> I have very little experience in the memcached codebase so I don't know the 
> most efficient way to do this. Any help would be appreciated.
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> There seems to be a misconception about how the slab classes work. A key,
>       >       >       >       >       >       >             >       >       
> if already existing in a slab, will always map to the slab class it
>       >       >       >       >       >       >             >       >       
> currently fits into. The slab classes always exist, but the amount of
>       >       >       >       >       >       >             >       >       
> memory reserved for each of them will shift with the slab_reassign. ie: 10
>       >       >       >       >       >       >             >       >       
> pages in slab class 21, then memory pressure on 23 causes it to move over.
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> So if you examine a key that still exists in slab class 21, it has no
>       >       >       >       >       >       >             >       >       
> reason to move up or down the slab classes.
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       > 
> Alternatively, and possibly more beneficial is compaction of data in a slab 
> using the same set of criteria as lru crawling. Understandably, compaction is 
> a very difficult problem to solve since moving the data would be a pain in 
> the ass. I saw a couple of discussions about
>       this in
>       >       the
>       >       >       mailing list,
>       >       >       >       >       though I didn't
>       >       >       >       >       >       >             see any
>       >       >       >       >       >       >             >       firm 
> thoughts about
>       >       >       >       >       >       >             >       >       
> it. I think it
>       >       >       >       >       >       >             >       >       > 
> can probably be done in O(1) like the lru crawler by limiting the number of 
> items it touches each time. Writing and reading are doable in O(1) so moving 
> should be as well. Has anyone given more thought on compaction?
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> I'd be interested in hacking this up for you folks if you can provide me
>       >       >       >       >       >       >             >       >       
> testing and some data to work with. With all of the LRU work I did in
>       >       >       >       >       >       >             >       >       
> 1.4.24, the next things I wanted to do is a big improvement on the slab
>       >       >       >       >       >       >             >       >       
> reassignment code.
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> Currently it picks essentially a random slab page, empties it, and moves
>       >       >       >       >       >       >             >       >       
> the slab page into the class under pressure.
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> One thing we can do is first examine for free memory in the existing slab,
>       >       >       >       >       >       >             >       >       
> IE:
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       - 
> Take a page from slab 21
>       >       >       >       >       >       >             >       >       - 
> Scan the page for valid items which need to be moved
>       >       >       >       >       >       >             >       >       - 
> Pull free memory from slab 21, migrate the item (moderately complicated)
>       >       >       >       >       >       >             >       >       - 
> When the page is empty, move it (or give up if you run out of free
>       >       >       >       >       >       >             >       >       
> chunks).
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> The next step is to pull from the LRU on slab 21:
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       - 
> Take page from slab 21
>       >       >       >       >       >       >             >       >       - 
> Scan page for valid items
>       >       >       >       >       >       >             >       >       - 
> Pull free memory from slab 21, migrate the item
>       >       >       >       >       >       >             >       >         
> - If no memory free, evict tail of slab 21. use that chunk.
>       >       >       >       >       >       >             >       >       - 
> When the page is empty, move it.
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> Then, when you hit this condition your least-recently-used data gets
>       >       >       >       >       >       >             >       >       
> culled as new data migrates your page class. This should match a natural
>       >       >       >       >       >       >             >       >       
> occurrance if you would already be evicting valid (but old) items to make
>       >       >       >       >       >       >             >       >       
> room for new items.
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       A 
> bonus to using the free memory trick, is that I can use the amount of
>       >       >       >       >       >       >             >       >       
> free space in a slab class as a heuristic to more quickly move slab pages
>       >       >       >       >       >       >             >       >       
> around.
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> If it's still necessary from there, we can explore "upgrading" items to a
>       >       >       >       >       >       >             >       >       
> new slab class, but that is much much more complicated since the item has
>       >       >       >       >       >       >             >       >       
> to shift LRU's. Do you put it at the head, the tail, the middle, etc? It
>       >       >       >       >       >       >             >       >       
> might be impossible to make a good generic decision there.
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> What version are you currently on? If 1.4.24, have you seen any
>       >       >       >       >       >       >             >       >       
> instability? I'm currently torn between fighting a few bugs and start on
>       >       >       >       >       >       >             >       >       
> improving the slab rebalancer.
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> -Dormando
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       > On 
> Saturday, July 11, 2015 at 12:05:54 PM UTC-7, Dormando wrote:
>       >       >       >       >       >       >             >       >       
> Hey,
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> On Fri, 10 Jul 2015, Scott Mansfield wrote:
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       > 
> We've seen issues recently where we run a cluster that typically has the 
> majority of items overwritten in the same slab every day and a sudden change 
> in data size evicts a ton of data, affecting downstream systems. To be clear 
> that is our problem, but I think there's a tweak
>       in
>       >       memcached
>       >       >       that might
>       >       >       >       >       be useful and
>       >       >       >       >       >       >             another
>       >       >       >       >       >       >             >       possible 
> feature that
>       >       >       >       >       >       >             >       >       
> would be even
>       >       >       >       >       >       >             >       >       > 
> better.
>       >       >       >       >       >       >             >       >       > 
> The data that is written to this cache is overwritten every day, though the 
> TTL is 7 days. One slab takes up the majority of the space in the cache. The 
> application wrote e.g. 10KB (slab 21) every day for each key consistently. 
> One day, a change occurred where it started
>       writing
>       >       15KB (slab
>       >       >       23),
>       >       >       >       >       causing a migration
>       >       >       >       >       >       >             of data
>       >       >       >       >       >       >             >       from one 
> slab to
>       >       >       >       >       >       >             >       >       
> another. We had -o
>       >       >       >       >       >       >             >       >       > 
> slab_reassign,slab_automove=1 set on the server, causing large numbers of 
> evictions on the initial slab. Let's say the cache could hold the data at 
> 15KB per key, but the old data was not technically TTL'd out in it's old 
> slab. This means that memory was not being freed by
>       the lru
>       >       crawler
>       >       >       thread (I
>       >       >       >       >       think) because
>       >       >       >       >       >       its
>       >       >       >       >       >       >             expiry
>       >       >       >       >       >       >             >       had not 
> come
>       >       >       >       >       >       >             >       >       
> around. 
>       >       >       >       >       >       >             >       >       >
>       >       >       >       >       >       >             >       >       > 
> lines 1199 and 1200 in items.c:
>       >       >       >       >       >       >             >       >       > 
> if ((search->exptime != 0 && search->exptime < current_time) || 
> is_flushed(search)) {
>       >       >       >       >       >       >             >       >       >
>       >       >       >       >       >       >             >       >       > 
> If there was a check to see if this data was "orphaned," i.e. that the key, 
> if accessed, would map to a different slab than the current one, then these 
> orphans could be reclaimed as free memory. I am working on a patch to do 
> this, though I have reservations about performing
>       a hash
>       >       on the
>       >       >       key on the
>       >       >       >       >       lru crawler
>       >       >       >       >       >       >             thread (if
>       >       >       >       >       >       >             >       the hash 
> is not
>       >       >       >       >       >       >             >       >       
> already available).
>       >       >       >       >       >       >             >       >       > 
> I have very little experience in the memcached codebase so I don't know the 
> most efficient way to do this. Any help would be appreciated.
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> There seems to be a misconception about how the slab classes work. A key,
>       >       >       >       >       >       >             >       >       
> if already existing in a slab, will always map to the slab class it
>       >       >       >       >       >       >             >       >       
> currently fits into. The slab classes always exist, but the amount of
>       >       >       >       >       >       >             >       >       
> memory reserved for each of them will shift with the slab_reassign. ie: 10
>       >       >       >       >       >       >             >       >       
> pages in slab class 21, then memory pressure on 23 causes it to move over.
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> So if you examine a key that still exists in slab class 21, it has no
>       >       >       >       >       >       >             >       >       
> reason to move up or down the slab classes.
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       > 
> Alternatively, and possibly more beneficial is compaction of data in a slab 
> using the same set of criteria as lru crawling. Understandably, compaction is 
> a very difficult problem to solve since moving the data would be a pain in 
> the ass. I saw a couple of discussions about
>       this in
>       >       the
>       >       >       mailing list,
>       >       >       >       >       though I didn't
>       >       >       >       >       >       >             see any
>       >       >       >       >       >       >             >       firm 
> thoughts about
>       >       >       >       >       >       >             >       >       
> it. I think it
>       >       >       >       >       >       >             >       >       > 
> can probably be done in O(1) like the lru crawler by limiting the number of 
> items it touches each time. Writing and reading are doable in O(1) so moving 
> should be as well. Has anyone given more thought on compaction?
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> I'd be interested in hacking this up for you folks if you can provide me
>       >       >       >       >       >       >             >       >       
> testing and some data to work with. With all of the LRU work I did in
>       >       >       >       >       >       >             >       >       
> 1.4.24, the next things I wanted to do is a big improvement on the slab
>       >       >       >       >       >       >             >       >       
> reassignment code.
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> Currently it picks essentially a random slab page, empties it, and moves
>       >       >       >       >       >       >             >       >       
> the slab page into the class under pressure.
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       
> One thing we can do is first examine for free memory in the existing slab,
>       >       >       >       >       >       >             >       >       
> IE:
>       >       >       >       >       >       >             >       >
>       >       >       >       >       >       >             >       >       - 
> Take a page from slab 21
>       >       >       >       >       >       >             >       >       - 
> Scan the page for valid items which need to be moved
>       >       >       >       >     ...
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

Re: Check for orphaned items in lru crawler thread

Reply via email to