Another message for you: [78098.528606] traps: memcached[2757] general protection ip:412b9d sp:7fc0700dbdd0 error:0 in memcached[400000+1d000]
addr2line shows: $ addr2line -e memcached 412b9d /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/assoc.c:119 On Thursday, October 1, 2015 at 1:41:44 AM UTC-7, Dormando wrote: > > Ok, thanks! > > I'll noodle this a bit... unfortunately a backtrace might be more helpful. > will ask you to attempt to get one if I don't figure anything out in time. > > (allow it to core dump or attach a GDB session and set an ignore handler > for sigpipe/int/etc and run "continue") > > what were your full startup args, though? > > On Thu, 1 Oct 2015, Scott Mansfield wrote: > > > The commit was the latest in slab_rebal_next at the time: > > > https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a > > > > > addr2line gave me this output: > > > > $ addr2line -e memcached 0x40e007 > > > > > /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/slabs.c:264 > > > > > > > > As well, this was running with production writes, but not reads. Even if > we had reads on with the few servers crashing, we're ok architecturally. > That's why I can get it out there without worrying too much. For now, I'm > going to turn it off. I had a metrics issue anyway that needs to get fixed. > Tomorrow I'm planning to test again with more metrics, but I > > can get any new code in pretty quick. > > > > > > On Thursday, October 1, 2015 at 1:01:36 AM UTC-7, Dormando wrote: > > How many servers were you running it on? I hope it wasn't more > than a > > handful. I'd recommend starting with one :P > > > > can you do an addr2line? what were your startup args, and what was > the > > commit sha1 for the branch you pulled? > > > > sorry about that :/ > > > > On Thu, 1 Oct 2015, Scott Mansfield wrote: > > > > > A few different servers (5 / 205) experienced a segfault all > within an hour or so. Unfortunately at this point I'm a bit out of my > depth. I have the dmesg output, which is identical for all 5 boxes: > > > > > > [46545.316351] memcached[2789]: segfault at 0 ip > 000000000040e007 sp 00007f362ceedeb0 error 4 in memcached[400000+1d000] > > > > > > > > > I can possibly supply the binary file if needed, though we > didn't do anything besides the standard setup and compile. > > > > > > > > > > > > On Tuesday, September 29, 2015 at 10:27:59 PM UTC-7, Dormando > wrote: > > > If you look at the new branch there's a commit explaining > the new stats. > > > > > > You can watch slab_reassing_evictions vs > slab_reassign_saves. you can also > > > test automove=1 vs automove=2 (please also turn on the > lru_maintainer and > > > lru_crawler). > > > > > > The initial branch you were running didn't add any new > stats. It just > > > restored an old feature. > > > > > > On Tue, 29 Sep 2015, Scott Mansfield wrote: > > > > > > > An unrelated prod problem meant I had to stop after > about an hour. I'm turning it on again tomorrow morning. > > > > Are there any new metrics I should be looking at? > Anything new in the stats output? I'm about to take a look at the diffs as > well. > > > > > > > > On Tuesday, September 29, 2015 at 12:37:45 PM UTC-7, > Dormando wrote: > > > > excellent. if automove=2 is too aggressive you'll > see that come in in a > > > > hit ratio reduction. > > > > > > > > the new branch works with automove=2 as well, but > it will attempt to > > > > rescue valid items in the old slab if possible. > I'll still be working on > > > > it for another few hours today though. I'll mail > again when I'm done. > > > > > > > > On Tue, 29 Sep 2015, Scott Mansfield wrote: > > > > > > > > > I have the first commit (slab_automove=2) > running in prod right now. Later today will be a full load production test > of the latest code. I'll just let it run for a few days unless I spot any > problems. We have good metrics for latency et. al. from the client side, > though network normally dwarfs memcached time. > > > > > > > > > > On Tuesday, September 29, 2015 at 3:10:03 AM > UTC-7, Dormando wrote: > > > > > That's unfortunate. > > > > > > > > > > I've done some more work on the branch: > > > > > > https://github.com/memcached/memcached/pull/112 > > > > > > > > > > It's not completely likely you would see > enough of an improvement from the > > > > > new default mode. However if your item > sizes change gradually, items are > > > > > reclaimed during expiration, or get > overwritten (and thus freed in the old > > > > > class), it should work just fine. I have > another patch coming which should > > > > > help though. > > > > > > > > > > Open to feedback from any interested > party. > > > > > > > > > > On Fri, 25 Sep 2015, Scott Mansfield > wrote: > > > > > > > > > > > I have it running internally, and it > runs fine under normal load. It's difficult to put it into the line of fire > for a production workload because of social reasons... As well it's a > degenerate case that we normally don't run in to (and actively try to > avoid). I'm going to run some heavier load tests on it today. > > > > > > > > > > > > On Wednesday, September 9, 2015 at > 10:23:32 AM UTC-7, Scott Mansfield wrote: > > > > > > I'm working on getting a test > going internally. I'll let you know how it goes. > > > > > > > > > > > > > > > > > > Scott Mansfield > > > > > > On Mon, Sep 7, 2015 at 2:33 PM, > dormando wrote: > > > > > > Yo, > > > > > > > > > > > > > https://github.com/dormando/memcached/commits/slab_rebal_next - would you > > > > > > mind playing around with the > branch here? You can see the start options in > > > > > > the test. > > > > > > > > > > > > This is a dead simple modification > (a restoration of a feature that was > > > > > > arleady there...). The test very > aggressively writes and is able to shunt > > > > > > memory around appropriately. > > > > > > > > > > > > The work I'm exploring right now > will allow savings of items being > > > > > > rebalanced from, and increasing > the aggression of page moving without > > > > > > being so brain damaged about it. > > > > > > > > > > > > But while I'm poking around with > that, I'd be interested in knowing if > > > > > > this simple branch is an > improvement, and if so how much. > > > > > > > > > > > > I'll push more code to the branch, > but the changes should be gated behind > > > > > > a feature flag. > > > > > > > > > > > > On Tue, 18 Aug 2015, 'Scott > Mansfield' via memcached wrote: > > > > > > > > > > > > > > > > > > > > No worries man, you're doing us > a favor. Let me know if there's anything you need from us, and I promise > I'll be quicker this time :) > > > > > > > > > > > > > > On Aug 18, 2015 12:01 AM, > "dormando" <dorm...@rydia.net> wrote: > > > > > > > Hey, > > > > > > > > > > > > > > I'm still really > interested in working on this. I'll be taking a careful > > > > > > > look soon I hope. > > > > > > > > > > > > > > On Mon, 3 Aug 2015, Scott > Mansfield wrote: > > > > > > > > > > > > > > > I've tweaked the program > slightly, so I'm adding a new version. It prints more stats as it goes and > runs a bit faster. > > > > > > > > > > > > > > > > On Monday, August 3, > 2015 at 1:20:37 AM UTC-7, Scott Mansfield wrote: > > > > > > > > Total brain fart > on my part. Apparently I had memcached 1.4.13 on my path (who knows how...) > Using the actual one that I've built works. Sorry for the confusion... > can't believe I didn't realize that before. I'm testing against the > compiled one now to see how it behaves. > > > > > > > > On Monday, August > 3, 2015 at 1:15:06 AM UTC-7, Dormando wrote: > > > > > > > > You sure > that's 1.4.24? None of those fail for me :( > > > > > > > > > > > > > > > > On Mon, 3 > Aug 2015, Scott Mansfield wrote: > > > > > > > > > > > > > > > > > The > command line I've used that will start is: > > > > > > > > > > > > > > > > > > memcached > -m 64 -o slab_reassign,slab_automove > > > > > > > > > > > > > > > > > > > > > > > > > > > the ones > that fail are: > > > > > > > > > > > > > > > > > > > > > > > > > > > memcached > -m 64 -o slab_reassign,slab_automove,lru_crawler,lru_maintainer > > > > > > > > > > > > > > > > > > memcached > -o lru_crawler > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm sure > I've missed something during compile, though I just used ./configure and > make. > > > > > > > > > > > > > > > > > > > > > > > > > > > On Monday, > August 3, 2015 at 12:22:33 AM UTC-7, Scott Mansfield wrote: > > > > > > > > > I've > attached a pretty simple program to connect, fill a slab with data, and > then fill another slab slowly with data of a different size. I've been > trying to get memcached to run with the lru_crawler and lru_maintainer > flags, but I get ' > > > > > > > > > > > > > > > > > > > Illegal suboption "(null)"' every time I try to start with either in any > configuration. > > > > > > > > > > > > > > > > > > > > > > > > > > > I > haven't seen it start to move slabs automatically with a freshly installed > 1.2.24. > > > > > > > > > > > > > > > > > > > > > > > > > > > On > Tuesday, July 21, 2015 at 4:55:17 PM UTC-7, Scott Mansfield wrote: > > > > > > > > > > I realize I've not given you the tests to reproduce the behavior. I > should be able to soon. Sorry about the delay here. > > > > > > > > > In the > mean time, I wanted to bring up a possible secondary use of the same logic > to move items on slab rebalancing. I think the system might benefit from > using the same logic to crawl the pages in a slab and compact the data in > the background. In the case where we have memory that is > > assigned to > > > the slab > > > > but not > > > > > > being used > > > > > > > because > > > > > > > > of replaced > > > > > > > > > or TTL'd > out data, returning the memory to a pool of free memory will allow a slab > to grow with that memory first instead of waiting for an event where memory > is needed at that instant. > > > > > > > > > > > > > > > > > > It's a > change in approach, from reactive to proactive. What do you think? > > > > > > > > > > > > > > > > > > On Monday, > July 13, 2015 at 5:54:11 PM UTC-7, Dormando wrote: > > > > > > > > > > > First, more detail for you: > > > > > > > > > > > > > > > > > > > > We > are running 1.4.24 in production and haven't noticed any bugs as of yet. > The new LRUs seem to be working well, though we nearly always run memcached > scaled to hold all data without evictions. Those with evictions are > behaving well. Those without evictions haven't seen crashing or > > any > > > other > > > > noticeable > > > > > bad > > > > > > behavior. > > > > > > > > > > > > > > > > > > > Neat. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OK, I think I see an area where I was speculating on functionality. If you > have a key in slab 21 and then the same key is written again at a larger > size in slab 23 I assumed that the space in 21 was not freed on the second > write. With that assumption, the LRU crawler would not free > > up that > > > space. > > > > Also just > > > > > > by observation > > > > > > > in > > > > > > > > the > > > > > > > > > > macro, the space is not freed > > > > > > > > > > > fast enough to be effective, in our use case, to accept the writes that are > happening. Think in the hundreds of millions of "overwrites" in a 6 - 10 > hour period across a cluster. > > > > > > > > > > > > > > > > > > > Internally, "items" (a key/value pair) are generally immutable. The only > > > > > > > > > time > when it's not is for INCR/DECR, and it still becomes immutable if two > > > > > > > > > > INCR/DECR's collide. > > > > > > > > > > > > > > > > > > What > this means, is that the new item is staged in a piece of free memory > > > > > > > > > > while the "upload" stage of the SET happens. When memcached has all of the > > > > > > > > > data > in memory to replace the item, it does an internal swap under a lock. > > > > > > > > > The > old item is removed from the hash table and LRU, and the new item gets > > > > > > > > > put > in its place (at the head of the LRU). > > > > > > > > > > > > > > > > > > > Since items are refcounted, this means that if other users are downloading > > > > > > > > > an > item which just got replaced, their memory doesn't get corrupted by the > > > > > > > > > item > changing out from underneath them. They can continue to read the old > > > > > > > > > item > until they're done. When the refcount reaches zero the old memory is > > > > > > > > > > reclaimed. > > > > > > > > > > > > > > > > > > Most > of the time, the item replacement happens then the old memory is > > > > > > > > > > immediately removed. > > > > > > > > > > > > > > > > > > > However, this does mean that you need *one* piece of free memory to > > > > > > > > > > replace the old one. Then the old memory gets freed after that set. > > > > > > > > > > > > > > > > > > So > if you take a memcached instance with 0 free chunks, and do a rolling > > > > > > > > > > replacement of all items (within the same slab class as before), the first > > > > > > > > > one > would cause an eviction from the tail of the LRU to get a free chunk. > > > > > > > > > > Every SET after that would use the chunk freed from the replacement of the > > > > > > > > > > previous memory. > > > > > > > > > > > > > > > > > > > > After that last sentence I realized I also may not have explained well > enough the access pattern. The keys are all overwritten every day, but it > takes some time to write them all (obviously). We see a huge increase in > the bytes metric as if the new data for the old keys was being > > written > > > for the > > > > first > > > > > time. > > > > > > Since the > > > > > > > "old" > > > > > > > > slab for > > > > > > > > > the > same key doesn't > > > > > > > > > > > proactively release memory, it starts to fill up the cache and then start > evicting data in the new slab. Once that happens, we see evictions in the > old slab because of the algorithm you mentioned (random picking / freeing > of memory). Typically we don't see any use for "upgrading" an > > item as > > > the new > > > > data > > > > > > would be entirely > > > > > > > > new and > > > > > > > > > > should wholesale replace the > > > > > > > > > > > old data for that key. More specifically, the operation is always set, with > different data each day. > > > > > > > > > > > > > > > > > > > Right. Most of your problems will come from two areas. One being that > > > > > > > > > > writing data aggressively into the new slab class (unless you set the > > > > > > > > > > rebalancer to always-replace mode), the mover will make memory available > > > > > > > > > more > slowly than you can insert. So you'll cause extra evictions in the > > > > > > > > > new > slab class. > > > > > > > > > > > > > > > > > > The > secondary problem is from the random evictions in the previous slab > > > > > > > > > > class as stuff is chucked on the floor to make memory moveable. > > > > > > > > > > > > > > > > > > > As > for testing, we'll be able to put it under real production workload. I > don't know what kind of data you mean you need for testing. The data stored > in the caches are highly confidential. I can give you all kinds of metrics, > since we collect most of the ones that are in the stats > > and some > > > from the > > > > stats > > > > > > slabs output. If > > > > > > > > you have > > > > > > > > > some > specific ones that > > > > > > > > > > > need collecting, I'll double check and make sure we can get those. > Alternatively, it might be most beneficial to see the metrics in person :) > > > > > > > > > > > > > > > > > > I > just need stats snapshots here and there, and actually putting the thing > > > > > > > > > > under load. When I did the LRU work I had to beg for several months > > > > > > > > > > before anyone tested it with a production load. This slows things down and > > > > > > > > > > demotivates me from working on the project. > > > > > > > > > > > > > > > > > > > Unfortunately my dayjob keeps me pretty busy so ~internet~ would probably > > > > > > > > > be > best. > > > > > > > > > > > > > > > > > > > I > can create a driver program to reproduce the behavior on a smaller scale. > It would write e.g. 10k keys of 10k size, then rewrite the same keys with > different size data. I'll work on that and post it to this thread when I > can reproduce the behavior locally. > > > > > > > > > > > > > > > > > > Ok. > There're slab rebalance unit tests in the t/ directory which do things > > > > > > > > > like > this, and I've used mc-crusher to slam the rebalancer. It's pretty > > > > > > > > > easy > to run one config to load up 10k objects, then flip to the other > > > > > > > > > > using the same key namespace. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > Scott > > > > > > > > > > > > > > > > > > > > On > Saturday, July 11, 2015 at 12:05:54 PM UTC-7, Dormando wrote: > > > > > > > > > > > Hey, > > > > > > > > > > > > > > > > > > > > > On Fri, 10 Jul 2015, Scott Mansfield wrote: > > > > > > > > > > > > > > > > > > > > > > We've seen issues recently where we run a cluster that typically has > the majority of items overwritten in the same slab every day and a sudden > change in data size evicts a ton of data, affecting downstream systems. To > be clear that is our problem, but I think there's a tweak > > in > > > memcached > > > > that might > > > > > > be useful and > > > > > > > > another > > > > > > > > > > possible feature that > > > > > > > > > > > would be even > > > > > > > > > > > > better. > > > > > > > > > > > > The data that is written to this cache is overwritten every day, > though the TTL is 7 days. One slab takes up the majority of the space in > the cache. The application wrote e.g. 10KB (slab 21) every day for each key > consistently. One day, a change occurred where it started > > writing > > > 15KB (slab > > > > 23), > > > > > > causing a migration > > > > > > > > of data > > > > > > > > > from > one slab to > > > > > > > > > > > another. We had -o > > > > > > > > > > > > slab_reassign,slab_automove=1 set on the server, causing large > numbers of evictions on the initial slab. Let's say the cache could hold > the data at 15KB per key, but the old data was not technically TTL'd out in > it's old slab. This means that memory was not being freed by > > the lru > > > crawler > > > > thread (I > > > > > > think) because > > > > > > > its > > > > > > > > expiry > > > > > > > > > had > not come > > > > > > > > > > > around. > > > > > > > > > > > > > > > > > > > > > > > > lines 1199 and 1200 in items.c: > > > > > > > > > > > > if ((search->exptime != 0 && search->exptime < current_time) || > is_flushed(search)) { > > > > > > > > > > > > > > > > > > > > > > > > If there was a check to see if this data was "orphaned," i.e. that > the key, if accessed, would map to a different slab than the current one, > then these orphans could be reclaimed as free memory. I am working on a > patch to do this, though I have reservations about performing > > a hash > > > on the > > > > key on the > > > > > > lru crawler > > > > > > > > thread (if > > > > > > > > > the > hash is not > > > > > > > > > > > already available). > > > > > > > > > > > > I have very little experience in the memcached codebase so I don't > know the most efficient way to do this. Any help would be appreciated. > > > > > > > > > > > > > > > > > > > > > There seems to be a misconception about how the slab classes work. A > key, > > > > > > > > > > > if already existing in a slab, will always map to the slab class it > > > > > > > > > > > currently fits into. The slab classes always exist, but the amount of > > > > > > > > > > > memory reserved for each of them will shift with the slab_reassign. ie: > 10 > > > > > > > > > > > pages in slab class 21, then memory pressure on 23 causes it to move > over. > > > > > > > > > > > > > > > > > > > > > So if you examine a key that still exists in slab class 21, it has no > > > > > > > > > > > reason to move up or down the slab classes. > > > > > > > > > > > > > > > > > > > > > > Alternatively, and possibly more beneficial is compaction of data in > a slab using the same set of criteria as lru crawling. Understandably, > compaction is a very difficult problem to solve since moving the data would > be a pain in the ass. I saw a couple of discussions about > > this in > > > the > > > > mailing list, > > > > > > though I didn't > > > > > > > > see any > > > > > > > > > firm > thoughts about > > > > > > > > > > > it. I think it > > > > > > > > > > > > can probably be done in O(1) like the lru crawler by limiting the > number of items it touches each time. Writing and reading are doable in > O(1) so moving should be as well. Has anyone given more thought on > compaction? > > > > > > > > > > > > > > > > > > > > > I'd be interested in hacking this up for you folks if you can provide > me > > > > > > > > > > > testing and some data to work with. With all of the LRU work I did in > > > > > > > > > > > 1.4.24, the next things I wanted to do is a big improvement on the slab > > > > > > > > > > > reassignment code. > > > > > > > > > > > > > > > > > > > > > Currently it picks essentially a random slab page, empties it, and > moves > > > > > > > > > > > the slab page into the class under pressure. > > > > > > > > > > > > > > > > > > > > > One thing we can do is first examine for free memory in the existing > slab, > > > > > > > > > > > IE: > > > > > > > > > > > > > > > > > > > > > - Take a page from slab 21 > > > > > > > > > > > - Scan the page for valid items which need to be moved > > > > > > > > > > > - Pull free memory from slab 21, migrate the item (moderately > complicated) > > > > > > > > > > > - When the page is empty, move it (or give up if you run out of free > > > > > > > > > > > chunks). > > > > > > > > > > > > > > > > > > > > > The next step is to pull from the LRU on slab 21: > > > > > > > > > > > > > > > > > > > > > - Take page from slab 21 > > > > > > > > > > > - Scan page for valid items > > > > > > > > > > > - Pull free memory from slab 21, migrate the item > > > > > > > > > > > - If no memory free, evict tail of slab 21. use that chunk. > > > > > > > > > > > - When the page is empty, move it. > > > > > > > > > > > > > > > > > > > > > Then, when you hit this condition your least-recently-used data gets > > > > > > > > > > > culled as new data migrates your page class. This should match a > natural > > > > > > > > > > > occurrance if you would already be evicting valid (but old) items to > make > > > > > > > > > > > room for new items. > > > > > > > > > > > > > > > > > > > > > A bonus to using the free memory trick, is that I can use the amount of > > > > > > > > > > > free space in a slab class as a heuristic to more quickly move slab > pages > > > > > > > > > > > around. > > > > > > > > > > > > > > > > > > > > > If it's still necessary from there, we can explore "upgrading" items to > a > > > > > > > > > > > new slab class, but that is much much more complicated since the item > has > > > > > > > > > > > to shift LRU's. Do you put it at the head, the tail, the middle, etc? > It > > > > > > > > > > > might be impossible to make a good generic decision there. > > > > > > > > > > > > > > > > > > > > > What version are you currently on? If 1.4.24, have you seen any > > > > > > > > > > > instability? I'm currently torn between fighting a few bugs and start > on > > > > > > > > > > > improving the slab rebalancer. > > > > > > > > > > > > > > > > > > > > > -Dormando > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On > Saturday, July 11, 2015 at 12:05:54 PM UTC-7, Dormando wrote: > > > > > > > > > > > Hey, > > > > > > > > > > > > > > > > > > > > > On Fri, 10 Jul 2015, Scott Mansfield wrote: > > > > > > > > > > > > > > > > > > > > > > We've seen issues recently where we run a cluster that typically has > the majority of items overwritten in the same slab every day and a sudden > change in data size evicts a ton of data, affecting downstream systems. To > be clear that is our problem, but I think there's a tweak > > in > > > memcached > > > > that might > > > > > > be useful and > > > > > > > > another > > > > > > > > > > possible feature that > > > > > > > > > > > would be even > > > > > > > > > > > > better. > > > > > > > > > > > > The data that is written to this cache is overwritten every day, > though the TTL is 7 days. One slab takes up the majority of the space in > the cache. The application wrote e.g. 10KB (slab 21) every day for each key > consistently. One day, a change occurred where it started > > writing > > > 15KB (slab > > > > 23), > > > > > > causing a migration > > > > > > > > of data > > > > > > > > > from > one slab to > > > > > > > > > > > another. We had -o > > > > > > > > > > > > slab_reassign,slab_automove=1 set on the server, causing large > numbers of evictions on the initial slab. Let's say the cache could hold > the data at 15KB per key, but the old data was not technically TTL'd out in > it's old slab. This means that memory was not being freed by > > the lru > > > crawler > > > > thread (I > > > > > > think) because > > > > > > > its > > > > > > > > expiry > > > > > > > > > had > not come > > > > > > > > > > > around. > > > > > > > > > > > > > > > > > > > > > > > > lines 1199 and 1200 in items.c: > > > > > > > > > > > > if ((search->exptime != 0 && search->exptime < current_time) || > is_flushed(search)) { > > > > > > > > > > > > > > > > > > > > > > > > If there was a check to see if this data was "orphaned," i.e. that > the k... -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.