ok... slab class 12 claims to have 2 in "total_pages", yet 14g in mem_requested. is this stat wrong?
On Thu, 1 Oct 2015, Scott Mansfield wrote: > The ones that crashed (new code cluster) were set to only be written to from > the client applications. The data is an index key and a series of data keys > that are all written one after another. Each key might be hashed to a > different server, though, so not all of them are written to the same server. > I can give you a snapshot of one of the clusters that > didn't crash (attached file). I can give more detail offline if you need it. > > > On Thursday, October 1, 2015 at 2:32:53 PM UTC-7, Dormando wrote: > Any chance you could describe (perhaps privately?) in very broad strokes > what the write load looks like? (they're getting only writes, too?). > otherwise I'll have to devise arbitrary torture tests. I'm sure the > bug's > in there but it's not obvious yet > > On Thu, 1 Oct 2015, dormando wrote: > > > perfect, thanks! I have $dayjob as well but will look into this as > soon as > > I can. my torture test machines are in a box but I'll try to borrow > one > > > > On Thu, 1 Oct 2015, Scott Mansfield wrote: > > > > > Yes. Exact args: > > > -p 11211 -u <omitted> -l 0.0.0.0 -c 100000 -o slab_reassign -o > lru_maintainer,lru_crawler,hash_algorithm=murmur3 -I 4m -m 56253 > > > > > > On Thursday, October 1, 2015 at 12:41:06 PM UTC-7, Dormando wrote: > > > Were lru_maintainer/lru_crawler/etc enabled though? even if > slab mover is > > > off, those two were the big changes in .24 > > > > > > On Thu, 1 Oct 2015, Scott Mansfield wrote: > > > > > > > The same cluster has > 400 servers happily running 1.4.24. > It's been our standard deployment for a while now, and we haven't seen any > crashes. The servers in the same cluster running 1.4.24 (with the same write > load the new build was taking) have been up for 29 days. The start options do > not contain the slab_automove option because > it wasn't > > > effective for > > > > us before. The memory given is possibly slightly different > per server, as we calculate on startup how much we give. It's in the same > ballpark, though (~56 gigs). > > > > > > > > On Thursday, October 1, 2015 at 12:11:35 PM UTC-7, Dormando > wrote: > > > > Just before I sit in and try to narrow this down: > have you run any host on > > > > 1.4.24 mainline with those same start options? just > in case the crash is > > > > older > > > > > > > > On Thu, 1 Oct 2015, Scott Mansfield wrote: > > > > > > > > > Another message for you: > > > > > [78098.528606] traps: memcached[2757] general > protection ip:412b9d sp:7fc0700dbdd0 error:0 in memcached[400000+1d000] > > > > > > > > > > > > > > > addr2line shows: > > > > > > > > > > $ addr2line -e memcached 412b9d > > > > > > > > > > > /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/assoc.c:119 > > > > > > > > > > > > > > > > > > > > On Thursday, October 1, 2015 at 1:41:44 AM UTC-7, > Dormando wrote: > > > > > Ok, thanks! > > > > > > > > > > I'll noodle this a bit... unfortunately a > backtrace might be more helpful. > > > > > will ask you to attempt to get one if I don't > figure anything out in time. > > > > > > > > > > (allow it to core dump or attach a GDB > session and set an ignore handler > > > > > for sigpipe/int/etc and run "continue") > > > > > > > > > > what were your full startup args, though? > > > > > > > > > > On Thu, 1 Oct 2015, Scott Mansfield wrote: > > > > > > > > > > > The commit was the latest in > slab_rebal_next at the time: > > > > > > > https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a > > > > > > > > > > > > addr2line gave me this output: > > > > > > > > > > > > $ addr2line -e memcached 0x40e007 > > > > > > > > > > > > > /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/slabs.c:264 > > > > > > > > > > > > > > > > > > As well, this was running with production > writes, but not reads. Even if we had reads on with the few servers crashing, > we're ok architecturally. That's why I can get it out there without worrying > too much. For now, I'm going to turn it off. I had a metrics issue anyway > that needs to get fixed. Tomorrow I'm planning > to test > > > again with > > > > more > > > > > metrics, but I > > > > > > can get any new code in pretty quick. > > > > > > > > > > > > > > > > > > On Thursday, October 1, 2015 at 1:01:36 AM > UTC-7, Dormando wrote: > > > > > > How many servers were you running it > on? I hope it wasn't more than a > > > > > > handful. I'd recommend starting with > one :P > > > > > > > > > > > > can you do an addr2line? what were > your startup args, and what was the > > > > > > commit sha1 for the branch you pulled? > > > > > > > > > > > > sorry about that :/ > > > > > > > > > > > > On Thu, 1 Oct 2015, Scott Mansfield > wrote: > > > > > > > > > > > > > A few different servers (5 / 205) > experienced a segfault all within an hour or so. Unfortunately at this point > I'm a bit out of my depth. I have the dmesg output, which is identical for > all 5 boxes: > > > > > > > > > > > > > > [46545.316351] memcached[2789]: > segfault at 0 ip 000000000040e007 sp 00007f362ceedeb0 error 4 in > memcached[400000+1d000] > > > > > > > > > > > > > > > > > > > > > I can possibly supply the binary > file if needed, though we didn't do anything besides the standard setup and > compile. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tuesday, September 29, 2015 at > 10:27:59 PM UTC-7, Dormando wrote: > > > > > > > If you look at the new branch > there's a commit explaining the new stats. > > > > > > > > > > > > > > You can watch > slab_reassing_evictions vs slab_reassign_saves. you can also > > > > > > > test automove=1 vs automove=2 > (please also turn on the lru_maintainer and > > > > > > > lru_crawler). > > > > > > > > > > > > > > The initial branch you were > running didn't add any new stats. It just > > > > > > > restored an old feature. > > > > > > > > > > > > > > On Tue, 29 Sep 2015, Scott > Mansfield wrote: > > > > > > > > > > > > > > > An unrelated prod problem > meant I had to stop after about an hour. I'm turning it on again tomorrow > morning. > > > > > > > > Are there any new metrics I > should be looking at? Anything new in the stats output? I'm about to take a > look at the diffs as well. > > > > > > > > > > > > > > > > On Tuesday, September 29, > 2015 at 12:37:45 PM UTC-7, Dormando wrote: > > > > > > > > excellent. if > automove=2 is too aggressive you'll see that come in in a > > > > > > > > hit ratio reduction. > > > > > > > > > > > > > > > > the new branch works > with automove=2 as well, but it will attempt to > > > > > > > > rescue valid items in > the old slab if possible. I'll still be working on > > > > > > > > it for another few > hours today though. I'll mail again when I'm done. > > > > > > > > > > > > > > > > On Tue, 29 Sep 2015, > Scott Mansfield wrote: > > > > > > > > > > > > > > > > > I have the first > commit (slab_automove=2) running in prod right now. Later today will be a > full load production test of the latest code. I'll just let it run for a few > days unless I spot any problems. We have good metrics for latency et. al. > from the client side, though network normally dwarfs > memcached > > > time. > > > > > > > > > > > > > > > > > > On Tuesday, > September 29, 2015 at 3:10:03 AM UTC-7, Dormando wrote: > > > > > > > > > That's > unfortunate. > > > > > > > > > > > > > > > > > > I've done > some more work on the branch: > > > > > > > > > > https://github.com/memcached/memcached/pull/112 > > > > > > > > > > > > > > > > > > It's not > completely likely you would see enough of an improvement from the > > > > > > > > > new default > mode. However if your item sizes change gradually, items are > > > > > > > > > reclaimed > during expiration, or get overwritten (and thus freed in the old > > > > > > > > > class), it > should work just fine. I have another patch coming which should > > > > > > > > > help though. > > > > > > > > > > > > > > > > > > Open to > feedback from any interested party. > > > > > > > > > > > > > > > > > > On Fri, 25 > Sep 2015, Scott Mansfield wrote: > > > > > > > > > > > > > > > > > > > I have it > running internally, and it runs fine under normal load. It's difficult to put > it into the line of fire for a production workload because of social > reasons... As well it's a degenerate case that we normally don't run in to > (and actively try to avoid). I'm going to run some > heavier load > > > tests on > > > > it > > > > > today. > > > > > > > > > > > > > > > > > > > > On > Wednesday, September 9, 2015 at 10:23:32 AM UTC-7, Scott Mansfield wrote: > > > > > > > > > > I'm > working on getting a test going internally. I'll let you know how it goes. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Scott > Mansfield > > > > > > > > > > On Mon, Sep > 7, 2015 at 2:33 PM, dormando wrote: > > > > > > > > > > Yo, > > > > > > > > > > > > > > > > > > > > > https://github.com/dormando/memcached/commits/slab_rebal_next - would you > > > > > > > > > > mind > playing around with the branch here? You can see the start options in > > > > > > > > > > the > test. > > > > > > > > > > > > > > > > > > > > This > is a dead simple modification (a restoration of a feature that was > > > > > > > > > > > arleady there...). The test very aggressively writes and is able to shunt > > > > > > > > > > > memory around appropriately. > > > > > > > > > > > > > > > > > > > > The > work I'm exploring right now will allow savings of items being > > > > > > > > > > > rebalanced from, and increasing the aggression of page moving without > > > > > > > > > > being > so brain damaged about it. > > > > > > > > > > > > > > > > > > > > But > while I'm poking around with that, I'd be interested in knowing if > > > > > > > > > > this > simple branch is an improvement, and if so how much. > > > > > > > > > > > > > > > > > > > > I'll > push more code to the branch, but the changes should be gated behind > > > > > > > > > > a > feature flag. > > > > > > > > > > > > > > > > > > > > On > Tue, 18 Aug 2015, 'Scott Mansfield' via memcached wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > No > worries man, you're doing us a favor. Let me know if there's anything you > need from us, and I promise I'll be quicker this time :) > > > > > > > > > > > > > > > > > > > > > > On > Aug 18, 2015 12:01 AM, "dormando" <dorm...@rydia.net> wrote: > > > > > > > > > > > > Hey, > > > > > > > > > > > > > > > > > > > > > > > I'm still really interested in working on this. I'll be taking a careful > > > > > > > > > > > > look soon I hope. > > > > > > > > > > > > > > > > > > > > > > > On Mon, 3 Aug 2015, Scott Mansfield wrote: > > > > > > > > > > > > > > > > > > > > > > > > I've tweaked the program slightly, so I'm adding a new version. It prints > more stats as it goes and runs a bit faster. > > > > > > > > > > > > > > > > > > > > > > > > > > On Monday, August 3, 2015 at 1:20:37 AM UTC-7, Scott Mansfield wrote: > > > > > > > > > > > > > Total brain fart on my part. Apparently I had memcached 1.4.13 on > my path (who knows how...) Using the actual one that I've built works. Sorry > for the confusion... can't believe I didn't realize that before. I'm testing > against the compiled one now to see how it > behaves. > > > > > > > > > > > > > On Monday, August 3, 2015 at 1:15:06 AM UTC-7, Dormando wrote: > > > > > > > > > > > > > You sure that's 1.4.24? None of those fail for me :( > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, 3 Aug 2015, Scott Mansfield wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > The command line I've used that will start is: > > > > > > > > > > > > > > > > > > > > > > > > > > > > memcached -m 64 -o slab_reassign,slab_automove > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > the ones that fail are: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > memcached -m 64 -o > slab_reassign,slab_automove,lru_crawler,lru_maintainer > > > > > > > > > > > > > > > > > > > > > > > > > > > > memcached -o lru_crawler > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm sure I've missed something during compile, though I > just used ./configure and make. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Monday, August 3, 2015 at 12:22:33 AM UTC-7, Scott > Mansfield wrote: > > > > > > > > > > > > > > I've attached a pretty simple program to connect, > fill a slab with data, and then fill another slab slowly with data of a > different size. I've been trying to get memcached to run with the lru_crawler > and lru_maintainer flags, but I get ' > > > > > > > > > > > > > > > > > > > > > > > > > > > > Illegal suboption "(null)"' every time I try to start > with either in any configuration. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I haven't seen it start to move slabs automatically > with a freshly installed 1.2.24. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tuesday, July 21, 2015 at 4:55:17 PM UTC-7, Scott > Mansfield wrote: > > > > > > > > > > > > > > I realize I've not given you the tests to > reproduce the behavior. I should be able to soon. Sorry about the delay here. > > > > > > > > > > > > > > In the mean time, I wanted to bring up a possible secondary > use of the same logic to move items on slab rebalancing. I think the system > might benefit from using the same logic to crawl the pages in a slab and > compact the data in the background. In the case > where we > > > have > > > > memory that > > > > > is > > > > > > assigned to > > > > > > > the slab > > > > > > > > but not > > > > > > > > > > being > used > > > > > > > > > > > > because > > > > > > > > > > > > > of replaced > > > > > > > > > > > > > > or TTL'd out data, returning the memory to a pool of free > memory will allow a slab to grow with that memory first instead of waiting > for an event where memory is needed at that instant. > > > > > > > > > > > > > > > > > > > > > > > > > > > > It's a change in approach, from reactive to proactive. What > do you think? > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Monday, July 13, 2015 at 5:54:11 PM UTC-7, Dormando > wrote: > > > > > > > > > > > > > > > First, more detail for you: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We are running 1.4.24 in production and haven't > noticed any bugs as of yet. The new LRUs seem to be working well, though we > nearly always run memcached scaled to hold all data without evictions. Those > with evictions are behaving well. Those without > evictions > > > haven't > > > > seen > > > > > crashing or > > > > > > any > > > > > > > other > > > > > > > > noticeable > > > > > > > > > bad > > > > > > > > > > > behavior. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Neat. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OK, I think I see an area where I was speculating > on functionality. If you have a key in slab 21 and then the same key is > written again at a larger size in slab 23 I assumed that the space in 21 was > not freed on the second write. With that > assumption, the LRU > > > crawler > > > > would > > > > > not free > > > > > > up that > > > > > > > space. > > > > > > > > Also just > > > > > > > > > > by > observation > > > > > > > > > > > > in > > > > > > > > > > > > > the > > > > > > > > > > > > > > macro, the space is not freed > > > > > > > > > > > > > > > fast enough to be effective, in our use case, to > accept the writes that are happening. Think in the hundreds of millions of > "overwrites" in a 6 - 10 hour period across a cluster. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Internally, "items" (a key/value pair) are generally > immutable. The only > > > > > > > > > > > > > > time when it's not is for INCR/DECR, and it still > becomes immutable if two > > > > > > > > > > > > > > INCR/DECR's collide. > > > > > > > > > > > > > > > > > > > > > > > > > > > > What this means, is that the new item is staged in a > piece of free memory > > > > > > > > > > > > > > while the "upload" stage of the SET happens. When > memcached has all of the > > > > > > > > > > > > > > data in memory to replace the item, it does an > internal swap under a lock. > > > > > > > > > > > > > > The old item is removed from the hash table and LRU, > and the new item gets > > > > > > > > > > > > > > put in its place (at the head of the LRU). > > > > > > > > > > > > > > > > > > > > > > > > > > > > Since items are refcounted, this means that if other > users are downloading > > > > > > > > > > > > > > an item which just got replaced, their memory doesn't > get corrupted by the > > > > > > > > > > > > > > item changing out from underneath them. They can > continue to read the old > > > > > > > > > > > > > > item until they're done. When the refcount reaches > zero the old memory is > > > > > > > > > > > > > > reclaimed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Most of the time, the item replacement happens then > the old memory is > > > > > > > > > > > > > > immediately removed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, this does mean that you need *one* piece of > free memory to > > > > > > > > > > > > > > replace the old one. Then the old memory gets freed > after that set. > > > > > > > > > > > > > > > > > > > > > > > > > > > > So if you take a memcached instance with 0 free > chunks, and do a rolling > > > > > > > > > > > > > > replacement of all items (within the same slab class > as before), the first > > > > > > > > > > > > > > one would cause an eviction from the tail of the LRU > to get a free chunk. > > > > > > > > > > > > > > Every SET after that would use the chunk freed from > the replacement of the > > > > > > > > > > > > > > previous memory. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > After that last sentence I realized I also may not > have explained well enough the access pattern. The keys are all overwritten > every day, but it takes some time to write them all (obviously). We see a > huge increase in the bytes metric as if the new > data for > > > the old > > > > keys was > > > > > being > > > > > > written > > > > > > > for the > > > > > > > > first > > > > > > > > > time. > > > > > > > > > > Since > the > > > > > > > > > > > > "old" > > > > > > > > > > > > > slab for > > > > > > > > > > > > > > the same key doesn't > > > > > > > > > > > > > > > proactively release memory, it starts to fill up > the cache and then start evicting data in the new slab. Once that happens, we > see evictions in the old slab because of the algorithm you mentioned (random > picking / freeing of memory). Typically we > don't see > > > any use > > > > for > > > > > "upgrading" an > > > > > > item as > > > > > > > the new > > > > > > > > data > > > > > > > > > > would > be entirely > > > > > > > > > > > > > new and > > > > > > > > > > > > > > should wholesale replace the > > > > > > > > > > > > > > > old data for that key. More specifically, the > operation is always set, with different data each day. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Right. Most of your problems will come from two > areas. One being that > > > > > > > > > > > > > > writing data aggressively into the new slab class > (unless you set the > > > > > > > > > > > > > > rebalancer to always-replace mode), the mover will > make memory available > > > > > > > > > > > > > > more slowly than you can insert. So you'll cause > extra evictions in the > > > > > > > > > > > > > > new slab class. > > > > > > > > > > > > > > > > > > > > > > > > > > > > The secondary problem is from the random evictions in > the previous slab > > > > > > > > > > > > > > class as stuff is chucked on the floor to make memory > moveable. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As for testing, we'll be able to put it under real > production workload. I don't know what kind of data you mean you need for > testing. The data stored in the caches are highly confidential. I can give > you all kinds of metrics, since we collect most > of the ones > > > that > > > > are in the > > > > > stats > > > > > > and some > > > > > > > from the > > > > > > > > stats > > > > > > > > > > slabs > output. If > > > > > > > > > > > > > you have > > > > > > > > > > > > > > some specific ones that > > > > > > > > > > > > > > > need collecting, I'll double check and make sure we > can get those. Alternatively, it might be most beneficial to see the metrics > in person :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > I just need stats snapshots here and there, and > actually putting the thing > > > > > > > > > > > > > > under load. When I did the LRU work I had to beg for > several months > > > > > > > > > > > > > > before anyone tested it with a production load. This > slows things down and > > > > > > > > > > > > > > demotivates me from working on the project. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Unfortunately my dayjob keeps me pretty busy so > ~internet~ would probably > > > > > > > > > > > > > > be best. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I can create a driver program to reproduce the > behavior on a smaller scale. It would write e.g. 10k keys of 10k size, then > rewrite the same keys with different size data. I'll work on that and post it > to this thread when I can reproduce the behavior > locally. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ok. There're slab rebalance unit tests in the t/ > directory which do things > > > > > > > > > > > > > > like this, and I've used mc-crusher to slam the > rebalancer. It's pretty > > > > > > > > > > > > > > easy to run one config to load up 10k objects, then > flip to the other > > > > > > > > > > > > > > using the same key namespace. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > Scott > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Saturday, July 11, 2015 at 12:05:54 PM UTC-7, > Dormando wrote: > > > > > > > > > > > > > > > Hey, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, 10 Jul 2015, Scott Mansfield wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We've seen issues recently where we run a > cluster that typically has the majority of items overwritten in the same slab > every day and a sudden change in data size evicts a ton of data, affecting > downstream systems. To be clear that is our > problem, but > > > I think > > > > there's > > > > > a tweak > > > > > > in > > > > > > > memcached > > > > > > > > that might > > > > > > > > > > be > useful and > > > > > > > > > > > > > another > > > > > > > > > > > > > > possible feature that > > > > > > > > > > > > > > > would be even > > > > > > > > > > > > > > > > better. > > > > > > > > > > > > > > > > The data that is written to this cache is > overwritten every day, though the TTL is 7 days. One slab takes up the > majority of the space in the cache. The application wrote e.g. 10KB (slab 21) > every day for each key consistently. One day, a > change > > > occurred > > > > where it > > > > > started > > > > > > writing > > > > > > > 15KB (slab > > > > > > > > 23), > > > > > > > > > > > causing a migration > > > > > > > > > > > > > of data > > > > > > > > > > > > > > from one slab to > > > > > > > > > > > > > > > another. We had -o > > > > > > > > > > > > > > > > slab_reassign,slab_automove=1 set on the > server, causing large numbers of evictions on the initial slab. Let's say the > cache could hold the data at 15KB per key, but the old data was not > technically TTL'd out in it's old slab. This means > that memory > > > was not > > > > being > > > > > freed by > > > > > > the lru > > > > > > > crawler > > > > > > > > thread (I > > > > > > > > > > > think) because > > > > > > > > > > > > its > > > > > > > > > > > > > expiry > > > > > > > > > > > > > > had not come > > > > > > > > > > > > > > > around. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > lines 1199 and 1200 in items.c: > > > > > > > > > > > > > > > > if ((search->exptime != 0 && > search->exptime < current_time) || is_flushed(search)) { > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If there was a check to see if this data > was "orphaned," i.e. that the key, if accessed, would map to a different slab > than the current one, then these orphans could be reclaimed as free memory. I > am working on a patch to do this, though I > have > > > reservations > > > > about > > > > > performing > > > > > > a hash > > > > > > > on the > > > > > > > > key on the > > > > > > > > > > lru > crawler > > > > > > > > > ... > > -- > > --- > You received this message because you are subscribed to the Google Groups > "memcached" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to memcached+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > >