Well I must be doomed and I feel embarrassed now. TL;DR: It's an error on my side (part pylibmc, part our engineering problem), not memcached's.
Long story: So, it turns out that some of our computation instances have pylibmc 1.5.0 and some pylibmc 1.5.1. I already knew pylibmc 1.5.1 had regressed on serialization means <https://github.com/lericson/pylibmc/issues/216>, so I had already instituted a policy on holding pylibmc version at 1.5.0. However, in hindsight this policy probably haven't been enforced (by human beings or by computer codes) very thoroughly. So I ended up having a troublesome mix --- and the number of pylibmc-1.5.1-equipped instances are relatively tiny, hence the intermittent-ness and the transient-ness (the "retry" was always carried out manually on a pylibmc-1.5.0-equipped instance). Also retrospectively confirmed is that, for those objects that we fail to store, they have a serialized size less than 1MB if by pylibmc 1.5.0 and larger than 1MB if by pylibmc 1.5.1. I always compute the size of my objects by 1.5.0's serialization means, so when they are processed by pylibmc-1.5.1-equipped instances, I get a too big error for an item I think I shouldn't get the error for. Dormando, sorry for summoning the dinosaur and having you take time for this. As for the under-utilization of the memory, yes it's probably something that needs attention, but I'll have to double check with our access pattern/traffic, to see if I've been just overprovisioning more cache than needed by our workloads, or something else. Again thanks for your time on this false alarm. Cheers, - Mnjul On Tuesday, April 25, 2017 at 6:18:58 PM UTC-4, Dormando wrote: > > Cool. > > Yeah this agrees; zero outofmemory errors on all classes. Think I'm > missing a counter for chunked items still, in cases of "late" allocation > errors. Given the amount of memory free I can't see why that would happen > though. > > Hopefully you're able to find the real error. Another thing I need to > finish doing is add more logging endpoints so it's easier to gather data > like that :( > > On Tue, 25 Apr 2017, Min-Zhong "John" Lu wrote: > > > Annnnnd, I guess forgetting to attach the files I promise is a sign of > dinosaurness. > > Here they are. > > > > On Tuesday, April 25, 2017 at 5:08:01 PM UTC-4, Min-Zhong "John" Lu > wrote: > > Hello, > > Thanks for the response! So the slab automover is not the culprit. > > > > As for the exact server error: unfortunately I don't have that for now > as I use > > libmemcached (plus pylibmc for that matter). With that said, I do have > used the > > plain telnet protocol when doing "further get requests" (as in my > original mail) > > to verify the success of set requests (and the item size showing there > are exactly > > the same as I've calculated within my python codes, FWIW). > > > > I think I can set up a little nice netcat script to imitate those set > requests, > > directly through the telnet protocol, to capture the exact error > message. Not sure > > how the intermittent nature of the failures can come into play here, but > I'll try > > my best to reproduce it. > > > > As for setting -o slab_chunk_size_max=1048576 --- I'll try that, but I > need to > > schedule a maintenance window with my peers. Let me do the netcat script > first, > > and I'll probably have the instance relaunched (with the new setting) > within a > > couple days and a few days later I'll ping back on whether I'm still > seeing the > > failures. > > > > I'm attaching |stats items| here. Also attaching those |stats| and > |stats slabs| > > dumped at the same time for consistency. > > > > Will come back with more info for the fun, > > - Mnjul > > > > > > > > > > > > On Tuesday, April 25, 2017 at 4:40:52 PM UTC-4, Dormando wrote: > > Hey! > > > > Unfortunately you've summoned a dinosaur, as I am old now :P > > > > My main question; do you have the exact server error returned by > > memcached? If it is "SERVER_ERROR object too large for cache" - > that > > error > > has nothing to do with memory allocation, and is just reflecting > that > > the > > item attempted to store is too large (over 1MB). If it fails for > that > > reason it should always fail. > > > > First off, unfortunately your assumption that the slab page mover > is > > synchronous isn't correct. It's a fully backgrounded process that > > doesn't > > ever block anything. New memory allocations don't block for > anything. > > > > Also; can you include "stats items"? It has some possibly relevant > > info. > > > > Especially in your instance, which isn't using all of the memory > > you've > > assigned to it (about 1/3rd?). The slab page mover is simply > moving > > memory > > back into a free pool when there is too much memory free in any > > particular > > slab class. > > > > ie; > > STAT slab_global_page_pool 308 > > > > When new memory is requested and none is available readily in a > slab > > class, first a new page is pulled from the global page pool if > > available. > > After that, a new page is malloced. After that, items are pulled > from > > the > > LRU and evicted. If nothing can be evicted for some reason you > would > > get > > an allocation error. > > > > So you really shouldn't be seeing any. "stats items" would tell me > the > > nature of any allocation problems (hopefully) that you're seeing. > Also > > getting the exact error being thrown to you is very helpful. Most > > errors > > in the system are unique so I can trace them back to particular > code. > > > > It is possible there is a bug or weirdness with chunked > allocation, > > which > > happens for items > 512k and has gone through a couple revisions. > You > > can > > test this theory by adding: "-o slab_chunk_size_max=1048576" (the > same > > as > > item size max). Would be great to know if this makes the problem > go > > away, > > since it means I have some more stuff to tune there. > > > > have fun, > > -Dormando > > > > On Mon, 24 Apr 2017, Min-Zhong "John" Lu wrote: > > > > > Hi there, > > > > > > I've recently been investigating an intermittent & transient > > failure-to-set issue, in a > > > long-running memcached instance. And I believe I could use some > > insight from you all. > > > > > > Let me list my configurations first. I have |stats| and |stats > > slabs| dumps available as > > > Google Groups attachment. If they fail to go through just lemme > > re-post them on some > > > pastebin service. > > > > > > Configuration: > > > Command line arg: -m 2900 -f 1.16 -c 10240 -k -o modern > > > > > > Using 1.4.36 (compiled by myself) on Ubuntu 14.04.4 x64. > > > > > > The -k flag has been verified to be effective (I've got limits > > configured correctly). > > > > > > Growth factor of 1.16 is just an empirical value for my item > sizes. > > > > > > > > > Symptom of the issue: > > > After running the memcached for around 10 days, there have been > > occasions where a set > > > request of an large item (sized around 760KiBs to 930KiBs) would > > fail, where memcached > > > returns 37 (item too big). However, when this happens, if I wait > for > > around one minute, > > > and then send the same set request again (with exactly the same > > key/item/expiration to > > > store), memcached would gladly store it. Further get requests > verify > > that the item is > > > correctly stored. > > > > > > According to my logs, this happens intermittently, and I haven't > > been able to correlate > > > those transient failures with my slab stats. > > > > > > > > > Observation & Question 1: > > > Q1: Does my issue arise because when the initial set request > arrives > > at memcached, > > > memcached has to run the slab automover to produce a slab (maybe > two > > slabs, since the > > > item is larger than 512KiB) to accommodate the set request? > > > > > > This is my hunch --- I am yet to do a quick |stats| dump at the > > exact moment of the set > > > failure to confirm this. But I have seen > [slab_reassign_busy_items = > > 10K] and > > > [slabs_moved = 16.9K] in my |stats| dumps, which means the slab > > automover must have been > > > triggered during memcached's entire life time. This leads to my > next > > question: > > > > > > > > > Observation & Question 2 & 3: > > > Q2: When the slab automover is running, would it possibly block > the > > large-item set > > > request, as in my case above? > > > > > > Q3: Why would memcached favor triggering slab automover over > > allocating new memory, when > > > there is still host memory available? > > > > > > According to the stats dumps, my memcached instance has > > [total_malloced = 793MiB], and a > > > footprint of [bytes = 392.33MiB] --- both fall far short of > > [limit_maxbytes = 2900MiB]. > > > Furthermore, nothing has been evicted as I have got [evictions = > 0] > > > > > > (And the host system has extremely enough free physical memory, > per > > |free -m|) > > > > > > I would expect that allocating memory would be faster (and *way* > > faster actually) than > > > triggering slab automover to reassign slabs to accommodate the > > incoming set request, and > > > that allocating memory would allow the initial set request to be > > served immediately. > > > > > > In addition, if the slab automover just happens to be running > when > > the large-item set > > > request arrives, and the answer to Q2 is "yes"... can we make it > not > > block if there's > > > still host memory available? > > > > > > > > > > > > I'm kinda out of clues here...and I might actually be on a wrong > > route in my > > > investigation. > > > > > > Any insight is appreciated, and it'd be great if I can get rid > of > > those set failures > > > without having to summon a dinosaur. > > > > > > For example, would disabling slab automover be an acceptable > > band-aid fix? (and that I > > > launch the manual mover (mc_slab_mover) when I know I have > > relatively lighter traffic) > > > > > > Thanks a lot. > > > > > > p.s. While 'retry this set request at a later time' will work > > (anecdotally), I don't > > > want to implement a retry mechanism at client side, since 1) the > > 'later time' is > > > probably non-deterministic, and 2) I don't have a readily > available > > construct to > > > decouple such retry from the rest of my task, and thus having to > > retry would > > > unnecessarily block client side. > > > > > > -- > > > > > > --- > > > You received this message because you are subscribed to the > Google > > Groups "memcached" > > > group. > > > To unsubscribe from this group and stop receiving emails from > it, > > send an email to > > > memcached+...@googlegroups.com. > > > For more options, visit https://groups.google.com/d/optout. > > > > > > > > > > -- > > > > --- > > You received this message because you are subscribed to the Google > Groups "memcached" > > group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to > > memcached+...@googlegroups.com <javascript:>. > > For more options, visit https://groups.google.com/d/optout. > > > > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.