I have the first commit (slab_automove=2) running in prod right now. Later today will be a full load production test of the latest code. I'll just let it run for a few days unless I spot any problems. We have good metrics for latency et. al. from the client side, though network normally dwarfs memcached time.
On Tuesday, September 29, 2015 at 3:10:03 AM UTC-7, Dormando wrote: > > That's unfortunate. > > I've done some more work on the branch: > https://github.com/memcached/memcached/pull/112 > > It's not completely likely you would see enough of an improvement from the > new default mode. However if your item sizes change gradually, items are > reclaimed during expiration, or get overwritten (and thus freed in the old > class), it should work just fine. I have another patch coming which should > help though. > > Open to feedback from any interested party. > > On Fri, 25 Sep 2015, Scott Mansfield wrote: > > > I have it running internally, and it runs fine under normal load. It's > difficult to put it into the line of fire for a production workload because > of social reasons... As well it's a degenerate case that we normally don't > run in to (and actively try to avoid). I'm going to run some heavier load > tests on it today. > > > > On Wednesday, September 9, 2015 at 10:23:32 AM UTC-7, Scott Mansfield > wrote: > > I'm working on getting a test going internally. I'll let you know > how it goes. > > > > > > Scott Mansfield > > On Mon, Sep 7, 2015 at 2:33 PM, dormando wrote: > > Yo, > > > > https://github.com/dormando/memcached/commits/slab_rebal_next - > would you > > mind playing around with the branch here? You can see the start > options in > > the test. > > > > This is a dead simple modification (a restoration of a feature > that was > > arleady there...). The test very aggressively writes and is able > to shunt > > memory around appropriately. > > > > The work I'm exploring right now will allow savings of items being > > rebalanced from, and increasing the aggression of page moving > without > > being so brain damaged about it. > > > > But while I'm poking around with that, I'd be interested in > knowing if > > this simple branch is an improvement, and if so how much. > > > > I'll push more code to the branch, but the changes should be gated > behind > > a feature flag. > > > > On Tue, 18 Aug 2015, 'Scott Mansfield' via memcached wrote: > > > > > > > > No worries man, you're doing us a favor. Let me know if there's > anything you need from us, and I promise I'll be quicker this time :) > > > > > > On Aug 18, 2015 12:01 AM, "dormando" <dorm...@rydia.net > <javascript:>> wrote: > > > Hey, > > > > > > I'm still really interested in working on this. I'll be > taking a careful > > > look soon I hope. > > > > > > On Mon, 3 Aug 2015, Scott Mansfield wrote: > > > > > > > I've tweaked the program slightly, so I'm adding a new > version. It prints more stats as it goes and runs a bit faster. > > > > > > > > On Monday, August 3, 2015 at 1:20:37 AM UTC-7, Scott > Mansfield wrote: > > > > Total brain fart on my part. Apparently I had > memcached 1.4.13 on my path (who knows how...) Using the actual one that > I've built works. Sorry for the confusion... can't believe I didn't realize > that before. I'm testing against the compiled one now to see how it > behaves. > > > > On Monday, August 3, 2015 at 1:15:06 AM UTC-7, > Dormando wrote: > > > > You sure that's 1.4.24? None of those fail > for me :( > > > > > > > > On Mon, 3 Aug 2015, Scott Mansfield wrote: > > > > > > > > > The command line I've used that will start > is: > > > > > > > > > > memcached -m 64 -o > slab_reassign,slab_automove > > > > > > > > > > > > > > > the ones that fail are: > > > > > > > > > > > > > > > memcached -m 64 -o > slab_reassign,slab_automove,lru_crawler,lru_maintainer > > > > > > > > > > memcached -o lru_crawler > > > > > > > > > > > > > > > I'm sure I've missed something during > compile, though I just used ./configure and make. > > > > > > > > > > > > > > > On Monday, August 3, 2015 at 12:22:33 AM > UTC-7, Scott Mansfield wrote: > > > > > I've attached a pretty simple > program to connect, fill a slab with data, and then fill another slab > slowly with data of a different size. I've been trying to get memcached to > run with the lru_crawler and lru_maintainer flags, but I get ' > > > > > > > > > > Illegal suboption "(null)"' every > time I try to start with either in any configuration. > > > > > > > > > > > > > > > I haven't seen it start to move > slabs automatically with a freshly installed 1.2.24. > > > > > > > > > > > > > > > On Tuesday, July 21, 2015 at 4:55:17 > PM UTC-7, Scott Mansfield wrote: > > > > > I realize I've not given you > the tests to reproduce the behavior. I should be able to soon. Sorry about > the delay here. > > > > > In the mean time, I wanted to bring up a > possible secondary use of the same logic to move items on slab rebalancing. > I think the system might benefit from using the same logic to crawl the > pages in a slab and compact the data in the background. In the case where > we have memory that is assigned to the slab but not > > being used > > > because > > > > of replaced > > > > > or TTL'd out data, returning the memory to > a pool of free memory will allow a slab to grow with that memory first > instead of waiting for an event where memory is needed at that instant. > > > > > > > > > > It's a change in approach, from reactive > to proactive. What do you think? > > > > > > > > > > On Monday, July 13, 2015 at 5:54:11 PM > UTC-7, Dormando wrote: > > > > > > First, more detail for you: > > > > > > > > > > > > We are running 1.4.24 in > production and haven't noticed any bugs as of yet. The new LRUs seem to be > working well, though we nearly always run memcached scaled to hold all data > without evictions. Those with evictions are behaving well. Those without > evictions haven't seen crashing or any other noticeable bad > > behavior. > > > > > > > > > > Neat. > > > > > > > > > > > > > > > > > OK, I think I see an area where I > was speculating on functionality. If you have a key in slab 21 and then the > same key is written again at a larger size in slab 23 I assumed that the > space in 21 was not freed on the second write. With that assumption, the > LRU crawler would not free up that space. Also just > > by observation > > > in > > > > the > > > > > macro, the space is not freed > > > > > > fast enough to be effective, in > our use case, to accept the writes that are happening. Think in the > hundreds of millions of "overwrites" in a 6 - 10 hour period across a > cluster. > > > > > > > > > > Internally, "items" (a key/value > pair) are generally immutable. The only > > > > > time when it's not is for INCR/DECR, > and it still becomes immutable if two > > > > > INCR/DECR's collide. > > > > > > > > > > What this means, is that the new > item is staged in a piece of free memory > > > > > while the "upload" stage of the SET > happens. When memcached has all of the > > > > > data in memory to replace the item, > it does an internal swap under a lock. > > > > > The old item is removed from the > hash table and LRU, and the new item gets > > > > > put in its place (at the head of the > LRU). > > > > > > > > > > Since items are refcounted, this > means that if other users are downloading > > > > > an item which just got replaced, > their memory doesn't get corrupted by the > > > > > item changing out from underneath > them. They can continue to read the old > > > > > item until they're done. When the > refcount reaches zero the old memory is > > > > > reclaimed. > > > > > > > > > > Most of the time, the item > replacement happens then the old memory is > > > > > immediately removed. > > > > > > > > > > However, this does mean that you > need *one* piece of free memory to > > > > > replace the old one. Then the old > memory gets freed after that set. > > > > > > > > > > So if you take a memcached instance > with 0 free chunks, and do a rolling > > > > > replacement of all items (within the > same slab class as before), the first > > > > > one would cause an eviction from the > tail of the LRU to get a free chunk. > > > > > Every SET after that would use the > chunk freed from the replacement of the > > > > > previous memory. > > > > > > > > > > > After that last sentence I > realized I also may not have explained well enough the access pattern. The > keys are all overwritten every day, but it takes some time to write them > all (obviously). We see a huge increase in the bytes metric as if the new > data for the old keys was being written for the first time. > > Since the > > > "old" > > > > slab for > > > > > the same key doesn't > > > > > > proactively release memory, it > starts to fill up the cache and then start evicting data in the new slab. > Once that happens, we see evictions in the old slab because of the > algorithm you mentioned (random picking / freeing of memory). Typically we > don't see any use for "upgrading" an item as the new data > > would be entirely > > > > new and > > > > > should wholesale replace the > > > > > > old data for that key. More > specifically, the operation is always set, with different data each day. > > > > > > > > > > Right. Most of your problems will > come from two areas. One being that > > > > > writing data aggressively into the > new slab class (unless you set the > > > > > rebalancer to always-replace mode), > the mover will make memory available > > > > > more slowly than you can insert. So > you'll cause extra evictions in the > > > > > new slab class. > > > > > > > > > > The secondary problem is from the > random evictions in the previous slab > > > > > class as stuff is chucked on the > floor to make memory moveable. > > > > > > > > > > > As for testing, we'll be able to > put it under real production workload. I don't know what kind of data you > mean you need for testing. The data stored in the caches are highly > confidential. I can give you all kinds of metrics, since we collect most of > the ones that are in the stats and some from the stats > > slabs output. If > > > > you have > > > > > some specific ones that > > > > > > need collecting, I'll double check > and make sure we can get those. Alternatively, it might be most beneficial > to see the metrics in person :) > > > > > > > > > > I just need stats snapshots here and > there, and actually putting the thing > > > > > under load. When I did the LRU work > I had to beg for several months > > > > > before anyone tested it with a > production load. This slows things down and > > > > > demotivates me from working on the > project. > > > > > > > > > > Unfortunately my dayjob keeps me > pretty busy so ~internet~ would probably > > > > > be best. > > > > > > > > > > > I can create a driver program to > reproduce the behavior on a smaller scale. It would write e.g. 10k keys of > 10k size, then rewrite the same keys with different size data. I'll work on > that and post it to this thread when I can reproduce the behavior locally. > > > > > > > > > > Ok. There're slab rebalance unit > tests in the t/ directory which do things > > > > > like this, and I've used mc-crusher > to slam the rebalancer. It's pretty > > > > > easy to run one config to load up > 10k objects, then flip to the other > > > > > using the same key namespace. > > > > > > > > > > > Thanks, > > > > > > Scott > > > > > > > > > > > > On Saturday, July 11, 2015 at > 12:05:54 PM UTC-7, Dormando wrote: > > > > > > Hey, > > > > > > > > > > > > On Fri, 10 Jul 2015, Scott > Mansfield wrote: > > > > > > > > > > > > > We've seen issues recently > where we run a cluster that typically has the majority of items overwritten > in the same slab every day and a sudden change in data size evicts a ton of > data, affecting downstream systems. To be clear that is our problem, but I > think there's a tweak in memcached that might > > be useful and > > > > another > > > > > possible feature that > > > > > > would be even > > > > > > > better. > > > > > > > The data that is written > to this cache is overwritten every day, though the TTL is 7 days. One slab > takes up the majority of the space in the cache. The application wrote e.g. > 10KB (slab 21) every day for each key consistently. One day, a change > occurred where it started writing 15KB (slab 23), > > causing a migration > > > > of data > > > > > from one slab to > > > > > > another. We had -o > > > > > > > > slab_reassign,slab_automove=1 set on the server, causing large numbers of > evictions on the initial slab. Let's say the cache could hold the data at > 15KB per key, but the old data was not technically TTL'd out in it's old > slab. This means that memory was not being freed by the lru crawler thread > (I > > think) because > > > its > > > > expiry > > > > > had not come > > > > > > around. > > > > > > > > > > > > > > lines 1199 and 1200 in > items.c: > > > > > > > if ((search->exptime != 0 > && search->exptime < current_time) || is_flushed(search)) { > > > > > > > > > > > > > > If there was a check to > see if this data was "orphaned," i.e. that the key, if accessed, would map > to a different slab than the current one, then these orphans could be > reclaimed as free memory. I am working on a patch to do this, though I have > reservations about performing a hash on the key on the > > lru crawler > > > > thread (if > > > > > the hash is not > > > > > > already available). > > > > > > > I have very little > experience in the memcached codebase so I don't know the most efficient way > to do this. Any help would be appreciated. > > > > > > > > > > > > There seems to be a > misconception about how the slab classes work. A key, > > > > > > if already existing in a > slab, will always map to the slab class it > > > > > > currently fits into. The > slab classes always exist, but the amount of > > > > > > memory reserved for each of > them will shift with the slab_reassign. ie: 10 > > > > > > pages in slab class 21, then > memory pressure on 23 causes it to move over. > > > > > > > > > > > > So if you examine a key that > still exists in slab class 21, it has no > > > > > > reason to move up or down > the slab classes. > > > > > > > > > > > > > Alternatively, and > possibly more beneficial is compaction of data in a slab using the same set > of criteria as lru crawling. Understandably, compaction is a very difficult > problem to solve since moving the data would be a pain in the ass. I saw a > couple of discussions about this in the mailing list, > > though I didn't > > > > see any > > > > > firm thoughts about > > > > > > it. I think it > > > > > > > can probably be done in > O(1) like the lru crawler by limiting the number of items it touches each > time. Writing and reading are doable in O(1) so moving should be as well. > Has anyone given more thought on compaction? > > > > > > > > > > > > I'd be interested in hacking > this up for you folks if you can provide me > > > > > > testing and some data to > work with. With all of the LRU work I did in > > > > > > 1.4.24, the next things I > wanted to do is a big improvement on the slab > > > > > > reassignment code. > > > > > > > > > > > > Currently it picks > essentially a random slab page, empties it, and moves > > > > > > the slab page into the class > under pressure. > > > > > > > > > > > > One thing we can do is first > examine for free memory in the existing slab, > > > > > > IE: > > > > > > > > > > > > - Take a page from slab 21 > > > > > > - Scan the page for valid > items which need to be moved > > > > > > - Pull free memory from slab > 21, migrate the item (moderately complicated) > > > > > > - When the page is empty, > move it (or give up if you run out of free > > > > > > chunks). > > > > > > > > > > > > The next step is to pull > from the LRU on slab 21: > > > > > > > > > > > > - Take page from slab 21 > > > > > > - Scan page for valid items > > > > > > - Pull free memory from slab > 21, migrate the item > > > > > > - If no memory free, evict > tail of slab 21. use that chunk. > > > > > > - When the page is empty, > move it. > > > > > > > > > > > > Then, when you hit this > condition your least-recently-used data gets > > > > > > culled as new data migrates > your page class. This should match a natural > > > > > > occurrance if you would > already be evicting valid (but old) items to make > > > > > > room for new items. > > > > > > > > > > > > A bonus to using the free > memory trick, is that I can use the amount of > > > > > > free space in a slab class > as a heuristic to more quickly move slab pages > > > > > > around. > > > > > > > > > > > > If it's still necessary from > there, we can explore "upgrading" items to a > > > > > > new slab class, but that is > much much more complicated since the item has > > > > > > to shift LRU's. Do you put > it at the head, the tail, the middle, etc? It > > > > > > might be impossible to make > a good generic decision there. > > > > > > > > > > > > What version are you > currently on? If 1.4.24, have you seen any > > > > > > instability? I'm currently > torn between fighting a few bugs and start on > > > > > > improving the slab > rebalancer. > > > > > > > > > > > > -Dormando > > > > > > > > > > > > > > > > > > On Saturday, July 11, 2015 at > 12:05:54 PM UTC-7, Dormando wrote: > > > > > > Hey, > > > > > > > > > > > > On Fri, 10 Jul 2015, Scott > Mansfield wrote: > > > > > > > > > > > > > We've seen issues recently > where we run a cluster that typically has the majority of items overwritten > in the same slab every day and a sudden change in data size evicts a ton of > data, affecting downstream systems. To be clear that is our problem, but I > think there's a tweak in memcached that might > > be useful and > > > > another > > > > > possible feature that > > > > > > would be even > > > > > > > better. > > > > > > > The data that is written > to this cache is overwritten every day, though the TTL is 7 days. One slab > takes up the majority of the space in the cache. The application wrote e.g. > 10KB (slab 21) every day for each key consistently. One day, a change > occurred where it started writing 15KB (slab 23), > > causing a migration > > > > of data > > > > > from one slab to > > > > > > another. We had -o > > > > > > > > slab_reassign,slab_automove=1 set on the server, causing large numbers of > evictions on the initial slab. Let's say the cache could hold the data at > 15KB per key, but the old data was not technically TTL'd out in it's old > slab. This means that memory was not being freed by the lru crawler thread > (I > > think) because > > > its > > > > expiry > > > > > had not come > > > > > > around. > > > > > > > > > > > > > > lines 1199 and 1200 in > items.c: > > > > > > > if ((search->exptime != 0 > && search->exptime < current_time) || is_flushed(search)) { > > > > > > > > > > > > > > If there was a check to > see if this data was "orphaned," i.e. that the key, if accessed, would map > to a different slab than the current one, then these orphans could be > reclaimed as free memory. I am working on a patch to do this, though I have > reservations about performing a hash on the key on the > > lru crawler > > > > thread (if > > > > > the hash is not > > > > > > already available). > > > > > > > I have very little > experience in the memcached codebase so I don't know the most efficient way > to do this. Any help would be appreciated. > > > > > > > > > > > > There seems to be a > misconception about how the slab classes work. A key, > > > > > > if already existing in a > slab, will always map to the slab class it > > > > > > currently fits into. The > slab classes always exist, but the amount of > > > > > > memory reserved for each of > them will shift with the slab_reassign. ie: 10 > > > > > > pages in slab class 21, then > memory pressure on 23 causes it to move over. > > > > > > > > > > > > So if you examine a key that > still exists in slab class 21, it has no > > > > > > reason to move up or down > the slab classes. > > > > > > > > > > > > > Alternatively, and > possibly more beneficial is compaction of data in a slab using the same set > of criteria as lru crawling. Understandably, compaction is a very difficult > problem to solve since moving the data would be a pain in the ass. I saw a > couple of discussions about this in the mailing list, > > though I didn't > > > > see any > > > > > firm thoughts about > > > > > > it. I think it > > > > > > > can probably be done in > O(1) like the lru crawler by limiting the number of items it touches each > time. Writing and reading are doable in O(1) so moving should be as well. > Has anyone given more thought on compaction? > > > > > > > > > > > > I'd be interested in hacking > this up for you folks if you can provide me > > > > > > testing and some data to > work with. With all of the LRU work I did in > > > > > > 1.4.24, the next things I > wanted to do is a big improvement on the slab > > > > > > reassignment code. > > > > > > > > > > > > Currently it picks > essentially a random slab page, empties it, and moves > > > > > > the slab page into the class > under pressure. > > > > > > > > > > > > One thing we can do is first > examine for free memory in the existing slab, > > > > > > IE: > > > > > > > > > > > > - Take a page from slab 21 > > > > > > - Scan the page for valid > items which need to be moved > > > > > > - Pull free memory from slab > 21, migrate the item (moderately complicated) > > > > > > - When the page is empty, > move it (or give up if you run out of free > > > > > > chunks). > > > > > > > > > > > > The next step is to pull > from the LRU on slab 21: > > > > > > > > > > > > - Take page from slab 21 > > > > > > - Scan page for valid items > > > > > > - Pull free memory from slab > 21, migrate the item > > > > > > - If no memory free, evict > tail of slab 21. use that chunk. > > > > > > - When the page is empty, > move it. > > > > > > > > > > > > Then, when you hit this > condition your least-recently-used data gets > > > > > > culled as new data migrates > your page class. This should match a natural > > > > > > occurrance if you would > already be evicting valid (but old) items to make > > > > > > room for new items. > > > > > > > > > > > > A bonus to using the free > memory trick, is that I can use the amount of > > > > > > free space in a slab class > as a heuristic to more quickly move slab pages > > > > > > around. > > > > > > > > > > > > If it's still necessary from > there, we can explore "upgrading" items to a > > > > > > new slab class, but that is > much much more complicated since the item has > > > > > > to shift LRU's. Do you put > it at the head, the tail, the middle, etc? It > > > > > > might be impossible to make > a good generic decision there. > > > > > > > > > > > > What version are you > currently on? If 1.4.24, have you seen any > > > > > > instability? I'm currently > torn between fighting a few bugs and start on > > > > > > improving the slab > rebalancer. > > > > > > > > > > > > -Dormando > > > > > > > > > > > > -- > > > > > > > > > > > > --- > > > > > > You received this message because > you are subscribed to the Google Groups "memcached" group. > > > > > > To unsubscribe from this group and > stop receiving emails from it, send an email to > memcached+...@googlegroups.com. > > > > > > For more options, visit > https://groups.google.com/d/optout. > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > --- > > > > > You received this message because you are > subscribed to the Google Groups "memcached" group. > > > > > To unsubscribe from this group and stop > receiving emails from it, send an email to memcached+...@googlegroups.com. > > > > > > For more options, visit > https://groups.google.com/d/optout. > > > > > > > > > > > > > > > > > > -- > > > > > > > > --- > > > > You received this message because you are subscribed to > the Google Groups "memcached" group. > > > > To unsubscribe from this group and stop receiving emails > from it, send an email to memcached+...@googlegroups.com <javascript:>. > > > > For more options, visit > https://groups.google.com/d/optout. > > > > > > > > > > > > > > -- > > > > > > --- > > > You received this message because you are subscribed to the > Google Groups "memcached" group. > > > To unsubscribe from this group and stop receiving emails from > it, send an email to memcached+...@googlegroups.com <javascript:>. > > > For more options, visit https://groups.google.com/d/optout. > > > > > > > > > > > > -- > > > > --- > > You received this message because you are subscribed to the Google > Groups "memcached" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to memcached+...@googlegroups.com <javascript:>. > > For more options, visit https://groups.google.com/d/optout. > > > > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.