I've tweaked the program slightly, so I'm adding a new version. It prints more stats as it goes and runs a bit faster.
On Monday, August 3, 2015 at 1:20:37 AM UTC-7, Scott Mansfield wrote: > > Total brain fart on my part. Apparently I had memcached 1.4.13 on my path > (who knows how...) Using the actual one that I've built works. Sorry for > the confusion... can't believe I didn't realize that before. I'm testing > against the compiled one now to see how it behaves. > > On Monday, August 3, 2015 at 1:15:06 AM UTC-7, Dormando wrote: >> >> You sure that's 1.4.24? None of those fail for me :( >> >> On Mon, 3 Aug 2015, Scott Mansfield wrote: >> >> > The command line I've used that will start is: >> > >> > memcached -m 64 -o slab_reassign,slab_automove >> > >> > >> > the ones that fail are: >> > >> > >> > memcached -m 64 -o >> slab_reassign,slab_automove,lru_crawler,lru_maintainer >> > >> > memcached -o lru_crawler >> > >> > >> > I'm sure I've missed something during compile, though I just used >> ./configure and make. >> > >> > >> > On Monday, August 3, 2015 at 12:22:33 AM UTC-7, Scott Mansfield wrote: >> > I've attached a pretty simple program to connect, fill a slab >> with data, and then fill another slab slowly with data of a different size. >> I've been trying to get memcached to run with the lru_crawler and >> lru_maintainer flags, but I get ' >> > >> > Illegal suboption "(null)"' every time I try to start with either >> in any configuration. >> > >> > >> > I haven't seen it start to move slabs automatically with a >> freshly installed 1.2.24. >> > >> > >> > On Tuesday, July 21, 2015 at 4:55:17 PM UTC-7, Scott Mansfield >> wrote: >> > I realize I've not given you the tests to reproduce the >> behavior. I should be able to soon. Sorry about the delay here. >> > In the mean time, I wanted to bring up a possible secondary use of the >> same logic to move items on slab rebalancing. I think the system might >> benefit from using the same logic to crawl the pages in a slab and compact >> the data in the background. In the case where we have memory that is >> assigned to the slab but not being used because of replaced >> > or TTL'd out data, returning the memory to a pool of free memory will >> allow a slab to grow with that memory first instead of waiting for an event >> where memory is needed at that instant. >> > >> > It's a change in approach, from reactive to proactive. What do you >> think? >> > >> > On Monday, July 13, 2015 at 5:54:11 PM UTC-7, Dormando wrote: >> > > First, more detail for you: >> > > >> > > We are running 1.4.24 in production and haven't noticed any >> bugs as of yet. The new LRUs seem to be working well, though we nearly >> always run memcached scaled to hold all data without evictions. Those with >> evictions are behaving well. Those without evictions haven't seen crashing >> or any other noticeable bad behavior. >> > >> > Neat. >> > >> > > >> > > OK, I think I see an area where I was speculating on >> functionality. If you have a key in slab 21 and then the same key is >> written again at a larger size in slab 23 I assumed that the space in 21 >> was not freed on the second write. With that assumption, the LRU crawler >> would not free up that space. Also just by observation in the >> > macro, the space is not freed >> > > fast enough to be effective, in our use case, to accept the >> writes that are happening. Think in the hundreds of millions of >> "overwrites" in a 6 - 10 hour period across a cluster. >> > >> > Internally, "items" (a key/value pair) are generally immutable. >> The only >> > time when it's not is for INCR/DECR, and it still becomes >> immutable if two >> > INCR/DECR's collide. >> > >> > What this means, is that the new item is staged in a piece of >> free memory >> > while the "upload" stage of the SET happens. When memcached has >> all of the >> > data in memory to replace the item, it does an internal swap >> under a lock. >> > The old item is removed from the hash table and LRU, and the new >> item gets >> > put in its place (at the head of the LRU). >> > >> > Since items are refcounted, this means that if other users are >> downloading >> > an item which just got replaced, their memory doesn't get >> corrupted by the >> > item changing out from underneath them. They can continue to read >> the old >> > item until they're done. When the refcount reaches zero the old >> memory is >> > reclaimed. >> > >> > Most of the time, the item replacement happens then the old >> memory is >> > immediately removed. >> > >> > However, this does mean that you need *one* piece of free memory >> to >> > replace the old one. Then the old memory gets freed after that >> set. >> > >> > So if you take a memcached instance with 0 free chunks, and do a >> rolling >> > replacement of all items (within the same slab class as before), >> the first >> > one would cause an eviction from the tail of the LRU to get a >> free chunk. >> > Every SET after that would use the chunk freed from the >> replacement of the >> > previous memory. >> > >> > > After that last sentence I realized I also may not have >> explained well enough the access pattern. The keys are all overwritten >> every day, but it takes some time to write them all (obviously). We see a >> huge increase in the bytes metric as if the new data for the old keys was >> being written for the first time. Since the "old" slab for >> > the same key doesn't >> > > proactively release memory, it starts to fill up the cache and >> then start evicting data in the new slab. Once that happens, we see >> evictions in the old slab because of the algorithm you mentioned (random >> picking / freeing of memory). Typically we don't see any use for >> "upgrading" an item as the new data would be entirely new and >> > should wholesale replace the >> > > old data for that key. More specifically, the operation is >> always set, with different data each day. >> > >> > Right. Most of your problems will come from two areas. One being >> that >> > writing data aggressively into the new slab class (unless you set >> the >> > rebalancer to always-replace mode), the mover will make memory >> available >> > more slowly than you can insert. So you'll cause extra evictions >> in the >> > new slab class. >> > >> > The secondary problem is from the random evictions in the >> previous slab >> > class as stuff is chucked on the floor to make memory moveable. >> > >> > > As for testing, we'll be able to put it under real production >> workload. I don't know what kind of data you mean you need for testing. The >> data stored in the caches are highly confidential. I can give you all kinds >> of metrics, since we collect most of the ones that are in the stats and >> some from the stats slabs output. If you have >> > some specific ones that >> > > need collecting, I'll double check and make sure we can get >> those. Alternatively, it might be most beneficial to see the metrics in >> person :) >> > >> > I just need stats snapshots here and there, and actually putting >> the thing >> > under load. When I did the LRU work I had to beg for several >> months >> > before anyone tested it with a production load. This slows things >> down and >> > demotivates me from working on the project. >> > >> > Unfortunately my dayjob keeps me pretty busy so ~internet~ would >> probably >> > be best. >> > >> > > I can create a driver program to reproduce the behavior on a >> smaller scale. It would write e.g. 10k keys of 10k size, then rewrite the >> same keys with different size data. I'll work on that and post it to this >> thread when I can reproduce the behavior locally. >> > >> > Ok. There're slab rebalance unit tests in the t/ directory which >> do things >> > like this, and I've used mc-crusher to slam the rebalancer. It's >> pretty >> > easy to run one config to load up 10k objects, then flip to the >> other >> > using the same key namespace. >> > >> > > Thanks, >> > > Scott >> > > >> > > On Saturday, July 11, 2015 at 12:05:54 PM UTC-7, Dormando >> wrote: >> > > Hey, >> > > >> > > On Fri, 10 Jul 2015, Scott Mansfield wrote: >> > > >> > > > We've seen issues recently where we run a cluster that >> typically has the majority of items overwritten in the same slab every day >> and a sudden change in data size evicts a ton of data, affecting downstream >> systems. To be clear that is our problem, but I think there's a tweak in >> memcached that might be useful and another >> > possible feature that >> > > would be even >> > > > better. >> > > > The data that is written to this cache is overwritten >> every day, though the TTL is 7 days. One slab takes up the majority of the >> space in the cache. The application wrote e.g. 10KB (slab 21) every day for >> each key consistently. One day, a change occurred where it started writing >> 15KB (slab 23), causing a migration of data >> > from one slab to >> > > another. We had -o >> > > > slab_reassign,slab_automove=1 set on the server, >> causing large numbers of evictions on the initial slab. Let's say the cache >> could hold the data at 15KB per key, but the old data was not technically >> TTL'd out in it's old slab. This means that memory was not being freed by >> the lru crawler thread (I think) because its expiry >> > had not come >> > > around. >> > > > >> > > > lines 1199 and 1200 in items.c: >> > > > if ((search->exptime != 0 && search->exptime < >> current_time) || is_flushed(search)) { >> > > > >> > > > If there was a check to see if this data was >> "orphaned," i.e. that the key, if accessed, would map to a different slab >> than the current one, then these orphans could be reclaimed as free memory. >> I am working on a patch to do this, though I have reservations about >> performing a hash on the key on the lru crawler thread (if >> > the hash is not >> > > already available). >> > > > I have very little experience in the memcached codebase >> so I don't know the most efficient way to do this. Any help would be >> appreciated. >> > > >> > > There seems to be a misconception about how the slab >> classes work. A key, >> > > if already existing in a slab, will always map to the >> slab class it >> > > currently fits into. The slab classes always exist, but >> the amount of >> > > memory reserved for each of them will shift with the >> slab_reassign. ie: 10 >> > > pages in slab class 21, then memory pressure on 23 causes >> it to move over. >> > > >> > > So if you examine a key that still exists in slab class >> 21, it has no >> > > reason to move up or down the slab classes. >> > > >> > > > Alternatively, and possibly more beneficial is >> compaction of data in a slab using the same set of criteria as lru >> crawling. Understandably, compaction is a very difficult problem to solve >> since moving the data would be a pain in the ass. I saw a couple of >> discussions about this in the mailing list, though I didn't see any >> > firm thoughts about >> > > it. I think it >> > > > can probably be done in O(1) like the lru crawler by >> limiting the number of items it touches each time. Writing and reading are >> doable in O(1) so moving should be as well. Has anyone given more thought >> on compaction? >> > > >> > > I'd be interested in hacking this up for you folks if you >> can provide me >> > > testing and some data to work with. With all of the LRU >> work I did in >> > > 1.4.24, the next things I wanted to do is a big >> improvement on the slab >> > > reassignment code. >> > > >> > > Currently it picks essentially a random slab page, >> empties it, and moves >> > > the slab page into the class under pressure. >> > > >> > > One thing we can do is first examine for free memory in >> the existing slab, >> > > IE: >> > > >> > > - Take a page from slab 21 >> > > - Scan the page for valid items which need to be moved >> > > - Pull free memory from slab 21, migrate the item >> (moderately complicated) >> > > - When the page is empty, move it (or give up if you run >> out of free >> > > chunks). >> > > >> > > The next step is to pull from the LRU on slab 21: >> > > >> > > - Take page from slab 21 >> > > - Scan page for valid items >> > > - Pull free memory from slab 21, migrate the item >> > > - If no memory free, evict tail of slab 21. use that >> chunk. >> > > - When the page is empty, move it. >> > > >> > > Then, when you hit this condition your >> least-recently-used data gets >> > > culled as new data migrates your page class. This should >> match a natural >> > > occurrance if you would already be evicting valid (but >> old) items to make >> > > room for new items. >> > > >> > > A bonus to using the free memory trick, is that I can use >> the amount of >> > > free space in a slab class as a heuristic to more quickly >> move slab pages >> > > around. >> > > >> > > If it's still necessary from there, we can explore >> "upgrading" items to a >> > > new slab class, but that is much much more complicated >> since the item has >> > > to shift LRU's. Do you put it at the head, the tail, the >> middle, etc? It >> > > might be impossible to make a good generic decision >> there. >> > > >> > > What version are you currently on? If 1.4.24, have you >> seen any >> > > instability? I'm currently torn between fighting a few >> bugs and start on >> > > improving the slab rebalancer. >> > > >> > > -Dormando >> > > >> > > >> > > On Saturday, July 11, 2015 at 12:05:54 PM UTC-7, Dormando >> wrote: >> > > Hey, >> > > >> > > On Fri, 10 Jul 2015, Scott Mansfield wrote: >> > > >> > > > We've seen issues recently where we run a cluster that >> typically has the majority of items overwritten in the same slab every day >> and a sudden change in data size evicts a ton of data, affecting downstream >> systems. To be clear that is our problem, but I think there's a tweak in >> memcached that might be useful and another >> > possible feature that >> > > would be even >> > > > better. >> > > > The data that is written to this cache is overwritten >> every day, though the TTL is 7 days. One slab takes up the majority of the >> space in the cache. The application wrote e.g. 10KB (slab 21) every day for >> each key consistently. One day, a change occurred where it started writing >> 15KB (slab 23), causing a migration of data >> > from one slab to >> > > another. We had -o >> > > > slab_reassign,slab_automove=1 set on the server, >> causing large numbers of evictions on the initial slab. Let's say the cache >> could hold the data at 15KB per key, but the old data was not technically >> TTL'd out in it's old slab. This means that memory was not being freed by >> the lru crawler thread (I think) because its expiry >> > had not come >> > > around. >> > > > >> > > > lines 1199 and 1200 in items.c: >> > > > if ((search->exptime != 0 && search->exptime < >> current_time) || is_flushed(search)) { >> > > > >> > > > If there was a check to see if this data was >> "orphaned," i.e. that the key, if accessed, would map to a different slab >> than the current one, then these orphans could be reclaimed as free memory. >> I am working on a patch to do this, though I have reservations about >> performing a hash on the key on the lru crawler thread (if >> > the hash is not >> > > already available). >> > > > I have very little experience in the memcached codebase >> so I don't know the most efficient way to do this. Any help would be >> appreciated. >> > > >> > > There seems to be a misconception about how the slab >> classes work. A key, >> > > if already existing in a slab, will always map to the >> slab class it >> > > currently fits into. The slab classes always exist, but >> the amount of >> > > memory reserved for each of them will shift with the >> slab_reassign. ie: 10 >> > > pages in slab class 21, then memory pressure on 23 causes >> it to move over. >> > > >> > > So if you examine a key that still exists in slab class >> 21, it has no >> > > reason to move up or down the slab classes. >> > > >> > > > Alternatively, and possibly more beneficial is >> compaction of data in a slab using the same set of criteria as lru >> crawling. Understandably, compaction is a very difficult problem to solve >> since moving the data would be a pain in the ass. I saw a couple of >> discussions about this in the mailing list, though I didn't see any >> > firm thoughts about >> > > it. I think it >> > > > can probably be done in O(1) like the lru crawler by >> limiting the number of items it touches each time. Writing and reading are >> doable in O(1) so moving should be as well. Has anyone given more thought >> on compaction? >> > > >> > > I'd be interested in hacking this up for you folks if you >> can provide me >> > > testing and some data to work with. With all of the LRU >> work I did in >> > > 1.4.24, the next things I wanted to do is a big >> improvement on the slab >> > > reassignment code. >> > > >> > > Currently it picks essentially a random slab page, >> empties it, and moves >> > > the slab page into the class under pressure. >> > > >> > > One thing we can do is first examine for free memory in >> the existing slab, >> > > IE: >> > > >> > > - Take a page from slab 21 >> > > - Scan the page for valid items which need to be moved >> > > - Pull free memory from slab 21, migrate the item >> (moderately complicated) >> > > - When the page is empty, move it (or give up if you run >> out of free >> > > chunks). >> > > >> > > The next step is to pull from the LRU on slab 21: >> > > >> > > - Take page from slab 21 >> > > - Scan page for valid items >> > > - Pull free memory from slab 21, migrate the item >> > > - If no memory free, evict tail of slab 21. use that >> chunk. >> > > - When the page is empty, move it. >> > > >> > > Then, when you hit this condition your >> least-recently-used data gets >> > > culled as new data migrates your page class. This should >> match a natural >> > > occurrance if you would already be evicting valid (but >> old) items to make >> > > room for new items. >> > > >> > > A bonus to using the free memory trick, is that I can use >> the amount of >> > > free space in a slab class as a heuristic to more quickly >> move slab pages >> > > around. >> > > >> > > If it's still necessary from there, we can explore >> "upgrading" items to a >> > > new slab class, but that is much much more complicated >> since the item has >> > > to shift LRU's. Do you put it at the head, the tail, the >> middle, etc? It >> > > might be impossible to make a good generic decision >> there. >> > > >> > > What version are you currently on? If 1.4.24, have you >> seen any >> > > instability? I'm currently torn between fighting a few >> bugs and start on >> > > improving the slab rebalancer. >> > > >> > > -Dormando >> > > >> > > -- >> > > >> > > --- >> > > You received this message because you are subscribed to the >> Google Groups "memcached" group. >> > > To unsubscribe from this group and stop receiving emails from >> it, send an email to memcached+...@googlegroups.com. >> > > For more options, visit https://groups.google.com/d/optout. >> > > >> > > >> > >> > -- >> > >> > --- >> > You received this message because you are subscribed to the Google >> Groups "memcached" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an email to memcached+...@googlegroups.com. >> > For more options, visit https://groups.google.com/d/optout. >> > >> > > > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
slab_pressure.go
Description: Binary data