Re: Culprit of intermittent & transient item-too-large set failures, and why favoring slab reassignment over memory allocation?

Min-Zhong "John" Lu Wed, 26 Apr 2017 14:05:50 -0700

Well I must be doomed and I feel embarrassed now.

TL;DR: It's an error on my side (part pylibmc, part our engineering 
problem), not memcached's.


Long story:
So, it turns out that some of our computation instances have pylibmc 1.5.0 
and some pylibmc 1.5.1.

I already knew pylibmc 1.5.1 had regressed on serialization means 
<https://github.com/lericson/pylibmc/issues/216>, so I had already 
instituted a policy on holding pylibmc version at 1.5.0. However, in 
hindsight this policy probably haven't been enforced (by human beings or by 
computer codes) very thoroughly. So I ended up having a troublesome mix --- 
and the number of pylibmc-1.5.1-equipped instances are relatively tiny, 
hence the intermittent-ness and the transient-ness (the "retry" was always 
carried out manually on a pylibmc-1.5.0-equipped instance).

Also retrospectively confirmed is that, for those objects that we fail to 
store, they have a serialized size less than 1MB if by pylibmc 1.5.0 and 
larger than 1MB if by pylibmc 1.5.1. I always compute the size of my 
objects by 1.5.0's serialization means, so when they are processed by 
pylibmc-1.5.1-equipped instances, I get a too big error for an item I think 
I shouldn't get the error for.

Dormando, sorry for summoning the dinosaur and having you take time for 
this.

As for the under-utilization of the memory, yes it's probably something 
that needs attention, but I'll have to double check with our access 
pattern/traffic, to see if I've been just overprovisioning more cache than 
needed by our workloads, or something else.

Again thanks for your time on this false alarm.

Cheers,
- Mnjul

On Tuesday, April 25, 2017 at 6:18:58 PM UTC-4, Dormando wrote:
>
> Cool. 
>
> Yeah this agrees; zero outofmemory errors on all classes. Think I'm 
> missing a counter for chunked items still, in cases of "late" allocation 
> errors. Given the amount of memory free I can't see why that would happen 
> though. 
>
> Hopefully you're able to find the real error. Another thing I need to 
> finish doing is add more logging endpoints so it's easier to gather data 
> like that :( 
>
> On Tue, 25 Apr 2017, Min-Zhong "John" Lu wrote: 
>
> > Annnnnd, I guess forgetting to attach the files I promise is a sign of 
> dinosaurness. 
> > Here they are. 
> > 
> > On Tuesday, April 25, 2017 at 5:08:01 PM UTC-4, Min-Zhong "John" Lu 
> wrote: 
> >       Hello, 
> > Thanks for the response! So the slab automover is not the culprit. 
> > 
> > As for the exact server error: unfortunately I don't have that for now 
> as I use 
> > libmemcached (plus pylibmc for that matter). With that said, I do have 
> used the 
> > plain telnet protocol when doing "further get requests" (as in my 
> original mail) 
> > to verify the success of set requests (and the item size showing there 
> are exactly 
> > the same as I've calculated within my python codes, FWIW). 
> > 
> > I think I can set up a little nice netcat script to imitate those set 
> requests, 
> > directly through the telnet protocol, to capture the exact error 
> message. Not sure 
> > how the intermittent nature of the failures can come into play here, but 
> I'll try 
> > my best to reproduce it. 
> > 
> > As for setting -o slab_chunk_size_max=1048576 --- I'll try that, but I 
> need to 
> > schedule a maintenance window with my peers. Let me do the netcat script 
> first, 
> > and I'll probably have the instance relaunched (with the new setting) 
> within a 
> > couple days and a few days later I'll ping back on whether I'm still 
> seeing the 
> > failures. 
> > 
> > I'm attaching |stats items| here. Also attaching those |stats| and 
> |stats slabs| 
> > dumped at the same time for consistency. 
> > 
> > Will come back with more info for the fun, 
> > - Mnjul 
> > 
> > 
> > 
> > 
> > 
> > On Tuesday, April 25, 2017 at 4:40:52 PM UTC-4, Dormando wrote: 
> >       Hey! 
> > 
> >       Unfortunately you've summoned a dinosaur, as I am old now :P 
> > 
> >       My main question; do you have the exact server error returned by 
> >       memcached? If it is "SERVER_ERROR object too large for cache" - 
> that 
> >       error 
> >       has nothing to do with memory allocation, and is just reflecting 
> that 
> >       the 
> >       item attempted to store is too large (over 1MB). If it fails for 
> that 
> >       reason it should always fail. 
> > 
> >       First off, unfortunately your assumption that the slab page mover 
> is 
> >       synchronous isn't correct. It's a fully backgrounded process that 
> >       doesn't 
> >       ever block anything. New memory allocations don't block for 
> anything. 
> > 
> >       Also; can you include "stats items"? It has some possibly relevant 
> >       info. 
> > 
> >       Especially in your instance, which isn't using all of the memory 
> >       you've 
> >       assigned to it (about 1/3rd?). The slab page mover is simply 
> moving 
> >       memory 
> >       back into a free pool when there is too much memory free in any 
> >       particular 
> >       slab class. 
> > 
> >       ie; 
> >       STAT slab_global_page_pool 308 
> > 
> >       When new memory is requested and none is available readily in a 
> slab 
> >       class, first a new page is pulled from the global page pool if 
> >       available. 
> >       After that, a new page is malloced. After that, items are pulled 
> from 
> >       the 
> >       LRU and evicted. If nothing can be evicted for some reason you 
> would 
> >       get 
> >       an allocation error. 
> > 
> >       So you really shouldn't be seeing any. "stats items" would tell me 
> the 
> >       nature of any allocation problems (hopefully) that you're seeing. 
> Also 
> >       getting the exact error being thrown to you is very helpful. Most 
> >       errors 
> >       in the system are unique so I can trace them back to particular 
> code. 
> > 
> >       It is possible there is a bug or weirdness with chunked 
> allocation, 
> >       which 
> >       happens for items > 512k and has gone through a couple revisions. 
> You 
> >       can 
> >       test this theory by adding: "-o slab_chunk_size_max=1048576" (the 
> same 
> >       as 
> >       item size max). Would be great to know if this makes the problem 
> go 
> >       away, 
> >       since it means I have some more stuff to tune there. 
> > 
> >       have fun, 
> >       -Dormando 
> > 
> >       On Mon, 24 Apr 2017, Min-Zhong "John" Lu wrote: 
> > 
> >       > Hi there, 
> >       > 
> >       > I've recently been investigating an intermittent & transient 
> >       failure-to-set issue, in a 
> >       > long-running memcached instance. And I believe I could use some 
> >       insight from you all. 
> >       > 
> >       > Let me list my configurations first. I have |stats| and |stats 
> >       slabs| dumps available as 
> >       > Google Groups attachment. If they fail to go through just lemme 
> >       re-post them on some 
> >       > pastebin service. 
> >       > 
> >       > Configuration: 
> >       > Command line arg: -m 2900 -f 1.16 -c 10240 -k -o modern 
> >       > 
> >       > Using 1.4.36 (compiled by myself) on Ubuntu 14.04.4 x64. 
> >       > 
> >       > The -k flag has been verified to be effective (I've got limits 
> >       configured correctly). 
> >       > 
> >       > Growth factor of 1.16 is just an empirical value for my item 
> sizes. 
> >       > 
> >       > 
> >       > Symptom of the issue:  
> >       > After running the memcached for around 10 days, there have been 
> >       occasions where a set 
> >       > request of an large item (sized around 760KiBs to 930KiBs) would 
> >       fail, where memcached 
> >       > returns 37 (item too big). However, when this happens, if I wait 
> for 
> >       around one minute, 
> >       > and then send the same set request again (with exactly the same 
> >       key/item/expiration to 
> >       > store), memcached would gladly store it. Further get requests 
> verify 
> >       that the item is 
> >       > correctly stored. 
> >       > 
> >       > According to my logs, this happens intermittently, and I haven't 
> >       been able to correlate 
> >       > those transient failures with my slab stats. 
> >       > 
> >       > 
> >       > Observation & Question 1: 
> >       > Q1: Does my issue arise because when the initial set request 
> arrives 
> >       at memcached, 
> >       > memcached has to run the slab automover to produce a slab (maybe 
> two 
> >       slabs, since the 
> >       > item is larger than 512KiB) to accommodate the set request? 
> >       > 
> >       > This is my hunch --- I am yet to do a quick |stats| dump at the 
> >       exact moment of the set 
> >       > failure to confirm this. But I have seen 
> [slab_reassign_busy_items = 
> >       10K] and 
> >       > [slabs_moved = 16.9K] in my |stats| dumps, which means the slab 
> >       automover must have been 
> >       > triggered during memcached's entire life time. This leads to my 
> next 
> >       question: 
> >       > 
> >       > 
> >       > Observation & Question 2 & 3: 
> >       > Q2: When the slab automover is running, would it possibly block 
> the 
> >       large-item set 
> >       > request, as in my case above? 
> >       > 
> >       > Q3: Why would memcached favor triggering slab automover over 
> >       allocating new memory, when 
> >       > there is still host memory available? 
> >       > 
> >       > According to the stats dumps, my memcached instance has 
> >       [total_malloced = 793MiB], and a 
> >       > footprint of [bytes = 392.33MiB] --- both fall far short of 
> >       [limit_maxbytes = 2900MiB]. 
> >       > Furthermore, nothing has been evicted as I have got [evictions = 
> 0] 
> >       > 
> >       > (And the host system has extremely enough free physical memory, 
> per 
> >       |free -m|) 
> >       > 
> >       > I would expect that allocating memory would be faster (and *way* 
> >       faster actually) than 
> >       > triggering slab automover to reassign slabs to accommodate the 
> >       incoming set request, and 
> >       > that allocating memory would allow the initial set request to be 
> >       served immediately. 
> >       > 
> >       > In addition, if the slab automover just happens to be running 
> when 
> >       the large-item set 
> >       > request arrives, and the answer to Q2 is "yes"... can we make it 
> not 
> >       block if there's 
> >       > still host memory available? 
> >       > 
> >       > 
> >       > 
> >       > I'm kinda out of clues here...and I might actually be on a wrong 
> >       route in my 
> >       > investigation. 
> >       > 
> >       > Any insight is appreciated, and it'd be great if I can get rid 
> of 
> >       those set failures 
> >       > without having to summon a dinosaur. 
> >       > 
> >       > For example, would disabling slab automover be an acceptable 
> >       band-aid fix? (and that I 
> >       > launch the manual mover (mc_slab_mover) when I know I have 
> >       relatively lighter traffic) 
> >       > 
> >       > Thanks a lot. 
> >       > 
> >       > p.s. While 'retry this set request at a later time' will work 
> >       (anecdotally), I don't 
> >       > want to implement a retry mechanism at client side, since 1) the 
> >       'later time' is 
> >       > probably non-deterministic, and 2) I don't have a readily 
> available 
> >       construct to 
> >       > decouple such retry from the rest of my task, and thus having to 
> >       retry would 
> >       > unnecessarily block client side. 
> >       > 
> >       > -- 
> >       > 
> >       > --- 
> >       > You received this message because you are subscribed to the 
> Google 
> >       Groups "memcached" 
> >       > group. 
> >       > To unsubscribe from this group and stop receiving emails from 
> it, 
> >       send an email to 
> >       > memcached+...@googlegroups.com. 
> >       > For more options, visit https://groups.google.com/d/optout. 
> >       > 
> >       > 
> > 
> > -- 
> > 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups "memcached" 
> > group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to 
> > memcached+...@googlegroups.com <javascript:>. 
> > For more options, visit https://groups.google.com/d/optout. 
> > 
> >

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Culprit of intermittent & transient item-too-large set failures, and why favoring slab reassignment over memory allocation?

Reply via email to