Re: Check for orphaned items in lru crawler thread

'Scott Mansfield' via memcached Wed, 09 Sep 2015 10:24:19 -0700

I'm working on getting a test going internally. I'll let you know how it
goes.



*Scott Mansfield*

Product Eng > Consumer Science Eng > Sr. Software Eng
{
  M: 352-514-9452
  E: smansfi...@netflix.com
  K: {M: mobile, E: email, K: key}
}

On Mon, Sep 7, 2015 at 2:33 PM, dormando <dorma...@rydia.net> wrote:

> Yo,
>
> https://github.com/dormando/memcached/commits/slab_rebal_next - would you
> mind playing around with the branch here? You can see the start options in
> the test.
>
> This is a dead simple modification (a restoration of a feature that was
> arleady there...). The test very aggressively writes and is able to shunt
> memory around appropriately.
>
> The work I'm exploring right now will allow savings of items being
> rebalanced from, and increasing the aggression of page moving without
> being so brain damaged about it.
>
> But while I'm poking around with that, I'd be interested in knowing if
> this simple branch is an improvement, and if so how much.
>
> I'll push more code to the branch, but the changes should be gated behind
> a feature flag.
>
> On Tue, 18 Aug 2015, 'Scott Mansfield' via memcached wrote:
>
> >
> > No worries man, you're doing us a favor. Let me know if there's anything
> you need from us, and I promise I'll be quicker this time :)
> >
> > On Aug 18, 2015 12:01 AM, "dormando" <dorma...@rydia.net> wrote:
> >       Hey,
> >
> >       I'm still really interested in working on this. I'll be taking a
> careful
> >       look soon I hope.
> >
> >       On Mon, 3 Aug 2015, Scott Mansfield wrote:
> >
> >       > I've tweaked the program slightly, so I'm adding a new version.
> It prints more stats as it goes and runs a bit faster.
> >       >
> >       > On Monday, August 3, 2015 at 1:20:37 AM UTC-7, Scott Mansfield
> wrote:
> >       >       Total brain fart on my part. Apparently I had memcached
> 1.4.13 on my path (who knows how...) Using the actual one that I've built
> works. Sorry for the confusion... can't believe I didn't realize that
> before. I'm testing against the compiled one now to see how it behaves.
> >       >       On Monday, August 3, 2015 at 1:15:06 AM UTC-7, Dormando
> wrote:
> >       >             You sure that's 1.4.24? None of those fail for me :(
> >       >
> >       >             On Mon, 3 Aug 2015, Scott Mansfield wrote:
> >       >
> >       >             > The command line I've used that will start is:
> >       >             >
> >       >             > memcached -m 64 -o slab_reassign,slab_automove
> >       >             >
> >       >             >
> >       >             > the ones that fail are:
> >       >             >
> >       >             >
> >       >             > memcached -m 64 -o
> slab_reassign,slab_automove,lru_crawler,lru_maintainer
> >       >             >
> >       >             > memcached -o lru_crawler
> >       >             >
> >       >             >
> >       >             > I'm sure I've missed something during compile,
> though I just used ./configure and make.
> >       >             >
> >       >             >
> >       >             > On Monday, August 3, 2015 at 12:22:33 AM UTC-7,
> Scott Mansfield wrote:
> >       >             >       I've attached a pretty simple program to
> connect, fill a slab with data, and then fill another slab slowly with data
> of a different size. I've been trying to get memcached to run with the
> lru_crawler and lru_maintainer flags, but I get '
> >       >             >
> >       >             >       Illegal suboption "(null)"' every time I try
> to start with either in any configuration.
> >       >             >
> >       >             >
> >       >             >       I haven't seen it start to move slabs
> automatically with a freshly installed 1.2.24.
> >       >             >
> >       >             >
> >       >             >       On Tuesday, July 21, 2015 at 4:55:17 PM
> UTC-7, Scott Mansfield wrote:
> >       >             >             I realize I've not given you the tests
> to reproduce the behavior. I should be able to soon. Sorry about the delay
> here.
> >       >             > In the mean time, I wanted to bring up a possible
> secondary use of the same logic to move items on slab rebalancing. I think
> the system might benefit from using the same logic to crawl the pages in a
> slab and compact the data in the background. In the case where we have
> memory that is assigned to the slab but not being used
> >       because
> >       >             of replaced
> >       >             > or TTL'd out data, returning the memory to a pool
> of free memory will allow a slab to grow with that memory first instead of
> waiting for an event where memory is needed at that instant.
> >       >             >
> >       >             > It's a change in approach, from reactive to
> proactive. What do you think?
> >       >             >
> >       >             > On Monday, July 13, 2015 at 5:54:11 PM UTC-7,
> Dormando wrote:
> >       >             >       > First, more detail for you:
> >       >             >       >
> >       >             >       > We are running 1.4.24 in production and
> haven't noticed any bugs as of yet. The new LRUs seem to be working well,
> though we nearly always run memcached scaled to hold all data without
> evictions. Those with evictions are behaving well. Those without evictions
> haven't seen crashing or any other noticeable bad behavior.
> >       >             >
> >       >             >       Neat.
> >       >             >
> >       >             >       >
> >       >             >       > OK, I think I see an area where I was
> speculating on functionality. If you have a key in slab 21 and then the
> same key is written again at a larger size in slab 23 I assumed that the
> space in 21 was not freed on the second write. With that assumption, the
> LRU crawler would not free up that space. Also just by observation
> >       in
> >       >             the
> >       >             >       macro, the space is not freed
> >       >             >       > fast enough to be effective, in our use
> case, to accept the writes that are happening. Think in the hundreds of
> millions of "overwrites" in a 6 - 10 hour period across a cluster.
> >       >             >
> >       >             >       Internally, "items" (a key/value pair) are
> generally immutable. The only
> >       >             >       time when it's not is for INCR/DECR, and it
> still becomes immutable if two
> >       >             >       INCR/DECR's collide.
> >       >             >
> >       >             >       What this means, is that the new item is
> staged in a piece of free memory
> >       >             >       while the "upload" stage of the SET happens.
> When memcached has all of the
> >       >             >       data in memory to replace the item, it does
> an internal swap under a lock.
> >       >             >       The old item is removed from the hash table
> and LRU, and the new item gets
> >       >             >       put in its place (at the head of the LRU).
> >       >             >
> >       >             >       Since items are refcounted, this means that
> if other users are downloading
> >       >             >       an item which just got replaced, their
> memory doesn't get corrupted by the
> >       >             >       item changing out from underneath them. They
> can continue to read the old
> >       >             >       item until they're done. When the refcount
> reaches zero the old memory is
> >       >             >       reclaimed.
> >       >             >
> >       >             >       Most of the time, the item replacement
> happens then the old memory is
> >       >             >       immediately removed.
> >       >             >
> >       >             >       However, this does mean that you need *one*
> piece of free memory to
> >       >             >       replace the old one. Then the old memory
> gets freed after that set.
> >       >             >
> >       >             >       So if you take a memcached instance with 0
> free chunks, and do a rolling
> >       >             >       replacement of all items (within the same
> slab class as before), the first
> >       >             >       one would cause an eviction from the tail of
> the LRU to get a free chunk.
> >       >             >       Every SET after that would use the chunk
> freed from the replacement of the
> >       >             >       previous memory.
> >       >             >
> >       >             >       > After that last sentence I realized I also
> may not have explained well enough the access pattern. The keys are all
> overwritten every day, but it takes some time to write them all
> (obviously). We see a huge increase in the bytes metric as if the new data
> for the old keys was being written for the first time. Since the
> >       "old"
> >       >             slab for
> >       >             >       the same key doesn't
> >       >             >       > proactively release memory, it starts to
> fill up the cache and then start evicting data in the new slab. Once that
> happens, we see evictions in the old slab because of the algorithm you
> mentioned (random picking / freeing of memory). Typically we don't see any
> use for "upgrading" an item as the new data would be entirely
> >       >             new and
> >       >             >       should wholesale replace the
> >       >             >       > old data for that key. More specifically,
> the operation is always set, with different data each day.
> >       >             >
> >       >             >       Right. Most of your problems will come from
> two areas. One being that
> >       >             >       writing data aggressively into the new slab
> class (unless you set the
> >       >             >       rebalancer to always-replace mode), the
> mover will make memory available
> >       >             >       more slowly than you can insert. So you'll
> cause extra evictions in the
> >       >             >       new slab class.
> >       >             >
> >       >             >       The secondary problem is from the random
> evictions in the previous slab
> >       >             >       class as stuff is chucked on the floor to
> make memory moveable.
> >       >             >
> >       >             >       > As for testing, we'll be able to put it
> under real production workload. I don't know what kind of data you mean you
> need for testing. The data stored in the caches are highly confidential. I
> can give you all kinds of metrics, since we collect most of the ones that
> are in the stats and some from the stats slabs output. If
> >       >             you have
> >       >             >       some specific ones that
> >       >             >       > need collecting, I'll double check and
> make sure we can get those. Alternatively, it might be most beneficial to
> see the metrics in person :)
> >       >             >
> >       >             >       I just need stats snapshots here and there,
> and actually putting the thing
> >       >             >       under load. When I did the LRU work I had to
> beg for several months
> >       >             >       before anyone tested it with a production
> load. This slows things down and
> >       >             >       demotivates me from working on the project.
> >       >             >
> >       >             >       Unfortunately my dayjob keeps me pretty busy
> so ~internet~ would probably
> >       >             >       be best.
> >       >             >
> >       >             >       > I can create a driver program to reproduce
> the behavior on a smaller scale. It would write e.g. 10k keys of 10k size,
> then rewrite the same keys with different size data. I'll work on that and
> post it to this thread when I can reproduce the behavior locally.
> >       >             >
> >       >             >       Ok. There're slab rebalance unit tests in
> the t/ directory which do things
> >       >             >       like this, and I've used mc-crusher to slam
> the rebalancer. It's pretty
> >       >             >       easy to run one config to load up 10k
> objects, then flip to the other
> >       >             >       using the same key namespace.
> >       >             >
> >       >             >       > Thanks,
> >       >             >       > Scott
> >       >             >       >
> >       >             >       > On Saturday, July 11, 2015 at 12:05:54 PM
> UTC-7, Dormando wrote:
> >       >             >       >       Hey,
> >       >             >       >
> >       >             >       >       On Fri, 10 Jul 2015, Scott Mansfield
> wrote:
> >       >             >       >
> >       >             >       >       > We've seen issues recently where
> we run a cluster that typically has the majority of items overwritten in
> the same slab every day and a sudden change in data size evicts a ton of
> data, affecting downstream systems. To be clear that is our problem, but I
> think there's a tweak in memcached that might be useful and
> >       >             another
> >       >             >       possible feature that
> >       >             >       >       would be even
> >       >             >       >       > better.
> >       >             >       >       > The data that is written to this
> cache is overwritten every day, though the TTL is 7 days. One slab takes up
> the majority of the space in the cache. The application wrote e.g. 10KB
> (slab 21) every day for each key consistently. One day, a change occurred
> where it started writing 15KB (slab 23), causing a migration
> >       >             of data
> >       >             >       from one slab to
> >       >             >       >       another. We had -o
> >       >             >       >       > slab_reassign,slab_automove=1 set
> on the server, causing large numbers of evictions on the initial slab.
> Let's say the cache could hold the data at 15KB per key, but the old data
> was not technically TTL'd out in it's old slab. This means that memory was
> not being freed by the lru crawler thread (I think) because
> >       its
> >       >             expiry
> >       >             >       had not come
> >       >             >       >       around.
> >       >             >       >       >
> >       >             >       >       > lines 1199 and 1200 in items.c:
> >       >             >       >       > if ((search->exptime != 0 &&
> search->exptime < current_time) || is_flushed(search)) {
> >       >             >       >       >
> >       >             >       >       > If there was a check to see if
> this data was "orphaned," i.e. that the key, if accessed, would map to a
> different slab than the current one, then these orphans could be reclaimed
> as free memory. I am working on a patch to do this, though I have
> reservations about performing a hash on the key on the lru crawler
> >       >             thread (if
> >       >             >       the hash is not
> >       >             >       >       already available).
> >       >             >       >       > I have very little experience in
> the memcached codebase so I don't know the most efficient way to do this.
> Any help would be appreciated.
> >       >             >       >
> >       >             >       >       There seems to be a misconception
> about how the slab classes work. A key,
> >       >             >       >       if already existing in a slab, will
> always map to the slab class it
> >       >             >       >       currently fits into. The slab
> classes always exist, but the amount of
> >       >             >       >       memory reserved for each of them
> will shift with the slab_reassign. ie: 10
> >       >             >       >       pages in slab class 21, then memory
> pressure on 23 causes it to move over.
> >       >             >       >
> >       >             >       >       So if you examine a key that still
> exists in slab class 21, it has no
> >       >             >       >       reason to move up or down the slab
> classes.
> >       >             >       >
> >       >             >       >       > Alternatively, and possibly more
> beneficial is compaction of data in a slab using the same set of criteria
> as lru crawling. Understandably, compaction is a very difficult problem to
> solve since moving the data would be a pain in the ass. I saw a couple of
> discussions about this in the mailing list, though I didn't
> >       >             see any
> >       >             >       firm thoughts about
> >       >             >       >       it. I think it
> >       >             >       >       > can probably be done in O(1) like
> the lru crawler by limiting the number of items it touches each time.
> Writing and reading are doable in O(1) so moving should be as well. Has
> anyone given more thought on compaction?
> >       >             >       >
> >       >             >       >       I'd be interested in hacking this up
> for you folks if you can provide me
> >       >             >       >       testing and some data to work with.
> With all of the LRU work I did in
> >       >             >       >       1.4.24, the next things I wanted to
> do is a big improvement on the slab
> >       >             >       >       reassignment code.
> >       >             >       >
> >       >             >       >       Currently it picks essentially a
> random slab page, empties it, and moves
> >       >             >       >       the slab page into the class under
> pressure.
> >       >             >       >
> >       >             >       >       One thing we can do is first examine
> for free memory in the existing slab,
> >       >             >       >       IE:
> >       >             >       >
> >       >             >       >       - Take a page from slab 21
> >       >             >       >       - Scan the page for valid items
> which need to be moved
> >       >             >       >       - Pull free memory from slab 21,
> migrate the item (moderately complicated)
> >       >             >       >       - When the page is empty, move it
> (or give up if you run out of free
> >       >             >       >       chunks).
> >       >             >       >
> >       >             >       >       The next step is to pull from the
> LRU on slab 21:
> >       >             >       >
> >       >             >       >       - Take page from slab 21
> >       >             >       >       - Scan page for valid items
> >       >             >       >       - Pull free memory from slab 21,
> migrate the item
> >       >             >       >         - If no memory free, evict tail of
> slab 21. use that chunk.
> >       >             >       >       - When the page is empty, move it.
> >       >             >       >
> >       >             >       >       Then, when you hit this condition
> your least-recently-used data gets
> >       >             >       >       culled as new data migrates your
> page class. This should match a natural
> >       >             >       >       occurrance if you would already be
> evicting valid (but old) items to make
> >       >             >       >       room for new items.
> >       >             >       >
> >       >             >       >       A bonus to using the free memory
> trick, is that I can use the amount of
> >       >             >       >       free space in a slab class as a
> heuristic to more quickly move slab pages
> >       >             >       >       around.
> >       >             >       >
> >       >             >       >       If it's still necessary from there,
> we can explore "upgrading" items to a
> >       >             >       >       new slab class, but that is much
> much more complicated since the item has
> >       >             >       >       to shift LRU's. Do you put it at the
> head, the tail, the middle, etc? It
> >       >             >       >       might be impossible to make a good
> generic decision there.
> >       >             >       >
> >       >             >       >       What version are you currently on?
> If 1.4.24, have you seen any
> >       >             >       >       instability? I'm currently torn
> between fighting a few bugs and start on
> >       >             >       >       improving the slab rebalancer.
> >       >             >       >
> >       >             >       >       -Dormando
> >       >             >       >
> >       >             >       >
> >       >             >       > On Saturday, July 11, 2015 at 12:05:54 PM
> UTC-7, Dormando wrote:
> >       >             >       >       Hey,
> >       >             >       >
> >       >             >       >       On Fri, 10 Jul 2015, Scott Mansfield
> wrote:
> >       >             >       >
> >       >             >       >       > We've seen issues recently where
> we run a cluster that typically has the majority of items overwritten in
> the same slab every day and a sudden change in data size evicts a ton of
> data, affecting downstream systems. To be clear that is our problem, but I
> think there's a tweak in memcached that might be useful and
> >       >             another
> >       >             >       possible feature that
> >       >             >       >       would be even
> >       >             >       >       > better.
> >       >             >       >       > The data that is written to this
> cache is overwritten every day, though the TTL is 7 days. One slab takes up
> the majority of the space in the cache. The application wrote e.g. 10KB
> (slab 21) every day for each key consistently. One day, a change occurred
> where it started writing 15KB (slab 23), causing a migration
> >       >             of data
> >       >             >       from one slab to
> >       >             >       >       another. We had -o
> >       >             >       >       > slab_reassign,slab_automove=1 set
> on the server, causing large numbers of evictions on the initial slab.
> Let's say the cache could hold the data at 15KB per key, but the old data
> was not technically TTL'd out in it's old slab. This means that memory was
> not being freed by the lru crawler thread (I think) because
> >       its
> >       >             expiry
> >       >             >       had not come
> >       >             >       >       around.
> >       >             >       >       >
> >       >             >       >       > lines 1199 and 1200 in items.c:
> >       >             >       >       > if ((search->exptime != 0 &&
> search->exptime < current_time) || is_flushed(search)) {
> >       >             >       >       >
> >       >             >       >       > If there was a check to see if
> this data was "orphaned," i.e. that the key, if accessed, would map to a
> different slab than the current one, then these orphans could be reclaimed
> as free memory. I am working on a patch to do this, though I have
> reservations about performing a hash on the key on the lru crawler
> >       >             thread (if
> >       >             >       the hash is not
> >       >             >       >       already available).
> >       >             >       >       > I have very little experience in
> the memcached codebase so I don't know the most efficient way to do this.
> Any help would be appreciated.
> >       >             >       >
> >       >             >       >       There seems to be a misconception
> about how the slab classes work. A key,
> >       >             >       >       if already existing in a slab, will
> always map to the slab class it
> >       >             >       >       currently fits into. The slab
> classes always exist, but the amount of
> >       >             >       >       memory reserved for each of them
> will shift with the slab_reassign. ie: 10
> >       >             >       >       pages in slab class 21, then memory
> pressure on 23 causes it to move over.
> >       >             >       >
> >       >             >       >       So if you examine a key that still
> exists in slab class 21, it has no
> >       >             >       >       reason to move up or down the slab
> classes.
> >       >             >       >
> >       >             >       >       > Alternatively, and possibly more
> beneficial is compaction of data in a slab using the same set of criteria
> as lru crawling. Understandably, compaction is a very difficult problem to
> solve since moving the data would be a pain in the ass. I saw a couple of
> discussions about this in the mailing list, though I didn't
> >       >             see any
> >       >             >       firm thoughts about
> >       >             >       >       it. I think it
> >       >             >       >       > can probably be done in O(1) like
> the lru crawler by limiting the number of items it touches each time.
> Writing and reading are doable in O(1) so moving should be as well. Has
> anyone given more thought on compaction?
> >       >             >       >
> >       >             >       >       I'd be interested in hacking this up
> for you folks if you can provide me
> >       >             >       >       testing and some data to work with.
> With all of the LRU work I did in
> >       >             >       >       1.4.24, the next things I wanted to
> do is a big improvement on the slab
> >       >             >       >       reassignment code.
> >       >             >       >
> >       >             >       >       Currently it picks essentially a
> random slab page, empties it, and moves
> >       >             >       >       the slab page into the class under
> pressure.
> >       >             >       >
> >       >             >       >       One thing we can do is first examine
> for free memory in the existing slab,
> >       >             >       >       IE:
> >       >             >       >
> >       >             >       >       - Take a page from slab 21
> >       >             >       >       - Scan the page for valid items
> which need to be moved
> >       >             >       >       - Pull free memory from slab 21,
> migrate the item (moderately complicated)
> >       >             >       >       - When the page is empty, move it
> (or give up if you run out of free
> >       >             >       >       chunks).
> >       >             >       >
> >       >             >       >       The next step is to pull from the
> LRU on slab 21:
> >       >             >       >
> >       >             >       >       - Take page from slab 21
> >       >             >       >       - Scan page for valid items
> >       >             >       >       - Pull free memory from slab 21,
> migrate the item
> >       >             >       >         - If no memory free, evict tail of
> slab 21. use that chunk.
> >       >             >       >       - When the page is empty, move it.
> >       >             >       >
> >       >             >       >       Then, when you hit this condition
> your least-recently-used data gets
> >       >             >       >       culled as new data migrates your
> page class. This should match a natural
> >       >             >       >       occurrance if you would already be
> evicting valid (but old) items to make
> >       >             >       >       room for new items.
> >       >             >       >
> >       >             >       >       A bonus to using the free memory
> trick, is that I can use the amount of
> >       >             >       >       free space in a slab class as a
> heuristic to more quickly move slab pages
> >       >             >       >       around.
> >       >             >       >
> >       >             >       >       If it's still necessary from there,
> we can explore "upgrading" items to a
> >       >             >       >       new slab class, but that is much
> much more complicated since the item has
> >       >             >       >       to shift LRU's. Do you put it at the
> head, the tail, the middle, etc? It
> >       >             >       >       might be impossible to make a good
> generic decision there.
> >       >             >       >
> >       >             >       >       What version are you currently on?
> If 1.4.24, have you seen any
> >       >             >       >       instability? I'm currently torn
> between fighting a few bugs and start on
> >       >             >       >       improving the slab rebalancer.
> >       >             >       >
> >       >             >       >       -Dormando
> >       >             >       >
> >       >             >       > --
> >       >             >       >
> >       >             >       > ---
> >       >             >       > You received this message because you are
> subscribed to the Google Groups "memcached" group.
> >       >             >       > To unsubscribe from this group and stop
> receiving emails from it, send an email to memcached+...@googlegroups.com.
> >       >             >       > For more options, visit
> https://groups.google.com/d/optout.
> >       >             >       >
> >       >             >       >
> >       >             >
> >       >             > --
> >       >             >
> >       >             > ---
> >       >             > You received this message because you are
> subscribed to the Google Groups "memcached" group.
> >       >             > To unsubscribe from this group and stop receiving
> emails from it, send an email to memcached+...@googlegroups.com.
> >       >             > For more options, visit
> https://groups.google.com/d/optout.
> >       >             >
> >       >             >
> >       >
> >       > --
> >       >
> >       > ---
> >       > You received this message because you are subscribed to the
> Google Groups "memcached" group.
> >       > To unsubscribe from this group and stop receiving emails from
> it, send an email to memcached+unsubscr...@googlegroups.com.
> >       > For more options, visit https://groups.google.com/d/optout.
> >       >
> >       >
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google
> Groups "memcached" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to memcached+unsubscr...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Check for orphaned items in lru crawler thread

Reply via email to