Re: Culprit of intermittent & transient item-too-large set failures, and why favoring slab reassignment over memory allocation?

dormando Wed, 26 Apr 2017 14:33:07 -0700

Nice.

Unfortunate there's a regression there :( Hopefully it's been looked at?


I'll try to prioritize getting more of the logging hooks done, since this
would've been resolved a lot more easily by you if the errors were more
easily accessible. Most clients hide errors or make it difficult to gather
anyway :(

Under-utilization of memory isn't a problem per se, unless you're not
getting as much benefit as you potentially could :)

-Dormando

On Wed, 26 Apr 2017, Min-Zhong "John" Lu wrote:

> Well I must be doomed and I feel embarrassed now.
> TL;DR: It's an error on my side (part pylibmc, part our engineering problem), 
> not memcached's.
>
> Long story:
> So, it turns out that some of our computation instances have pylibmc 1.5.0 
> and some pylibmc 1.5.1.
>
> I already knew pylibmc 1.5.1 had regressed on serialization means, so I had 
> already instituted a policy on holding pylibmc version at 1.5.0. However, in
> hindsight this policy probably haven't been enforced (by human beings or by 
> computer codes) very thoroughly. So I ended up having a troublesome mix ---
> and the number of pylibmc-1.5.1-equipped instances are relatively tiny, hence 
> the intermittent-ness and the transient-ness (the "retry" was always
> carried out manually on a pylibmc-1.5.0-equipped instance).
>
> Also retrospectively confirmed is that, for those objects that we fail to 
> store, they have a serialized size less than 1MB if by pylibmc 1.5.0 and 
> larger
> than 1MB if by pylibmc 1.5.1. I always compute the size of my objects by 
> 1.5.0's serialization means, so when they are processed by
> pylibmc-1.5.1-equipped instances, I get a too big error for an item I think I 
> shouldn't get the error for.
>
> Dormando, sorry for summoning the dinosaur and having you take time for this.
>
> As for the under-utilization of the memory, yes it's probably something that 
> needs attention, but I'll have to double check with our access
> pattern/traffic, to see if I've been just overprovisioning more cache than 
> needed by our workloads, or something else.
>
> Again thanks for your time on this false alarm.
>
> Cheers,
> - Mnjul
>
> On Tuesday, April 25, 2017 at 6:18:58 PM UTC-4, Dormando wrote:
>       Cool.
>
>       Yeah this agrees; zero outofmemory errors on all classes. Think I'm
>       missing a counter for chunked items still, in cases of "late" allocation
>       errors. Given the amount of memory free I can't see why that would 
> happen
>       though.
>
>       Hopefully you're able to find the real error. Another thing I need to
>       finish doing is add more logging endpoints so it's easier to gather data
>       like that :(
>
>       On Tue, 25 Apr 2017, Min-Zhong "John" Lu wrote:
>
>       > Annnnnd, I guess forgetting to attach the files I promise is a sign 
> of dinosaurness.
>       > Here they are.
>       >
>       > On Tuesday, April 25, 2017 at 5:08:01 PM UTC-4, Min-Zhong "John" Lu 
> wrote:
>       >       Hello,
>       > Thanks for the response! So the slab automover is not the culprit.
>       >
>       > As for the exact server error: unfortunately I don't have that for 
> now as I use
>       > libmemcached (plus pylibmc for that matter). With that said, I do 
> have used the
>       > plain telnet protocol when doing "further get requests" (as in my 
> original mail)
>       > to verify the success of set requests (and the item size showing 
> there are exactly
>       > the same as I've calculated within my python codes, FWIW).
>       >
>       > I think I can set up a little nice netcat script to imitate those set 
> requests,
>       > directly through the telnet protocol, to capture the exact error 
> message. Not sure
>       > how the intermittent nature of the failures can come into play here, 
> but I'll try
>       > my best to reproduce it.
>       >
>       > As for setting -o slab_chunk_size_max=1048576 --- I'll try that, but 
> I need to
>       > schedule a maintenance window with my peers. Let me do the netcat 
> script first,
>       > and I'll probably have the instance relaunched (with the new setting) 
> within a
>       > couple days and a few days later I'll ping back on whether I'm still 
> seeing the
>       > failures.
>       >
>       > I'm attaching |stats items| here. Also attaching those |stats| and 
> |stats slabs|
>       > dumped at the same time for consistency.
>       >
>       > Will come back with more info for the fun,
>       > - Mnjul
>       >
>       >
>       >
>       >
>       >
>       > On Tuesday, April 25, 2017 at 4:40:52 PM UTC-4, Dormando wrote:
>       >       Hey!
>       >
>       >       Unfortunately you've summoned a dinosaur, as I am old now :P
>       >
>       >       My main question; do you have the exact server error returned by
>       >       memcached? If it is "SERVER_ERROR object too large for cache" - 
> that
>       >       error
>       >       has nothing to do with memory allocation, and is just 
> reflecting that
>       >       the
>       >       item attempted to store is too large (over 1MB). If it fails 
> for that
>       >       reason it should always fail.
>       >
>       >       First off, unfortunately your assumption that the slab page 
> mover is
>       >       synchronous isn't correct. It's a fully backgrounded process 
> that
>       >       doesn't
>       >       ever block anything. New memory allocations don't block for 
> anything.
>       >
>       >       Also; can you include "stats items"? It has some possibly 
> relevant
>       >       info.
>       >
>       >       Especially in your instance, which isn't using all of the memory
>       >       you've
>       >       assigned to it (about 1/3rd?). The slab page mover is simply 
> moving
>       >       memory
>       >       back into a free pool when there is too much memory free in any
>       >       particular
>       >       slab class.
>       >
>       >       ie;
>       >       STAT slab_global_page_pool 308
>       >
>       >       When new memory is requested and none is available readily in a 
> slab
>       >       class, first a new page is pulled from the global page pool if
>       >       available.
>       >       After that, a new page is malloced. After that, items are 
> pulled from
>       >       the
>       >       LRU and evicted. If nothing can be evicted for some reason you 
> would
>       >       get
>       >       an allocation error.
>       >
>       >       So you really shouldn't be seeing any. "stats items" would tell 
> me the
>       >       nature of any allocation problems (hopefully) that you're 
> seeing. Also
>       >       getting the exact error being thrown to you is very helpful. 
> Most
>       >       errors
>       >       in the system are unique so I can trace them back to particular 
> code.
>       >
>       >       It is possible there is a bug or weirdness with chunked 
> allocation,
>       >       which
>       >       happens for items > 512k and has gone through a couple 
> revisions. You
>       >       can
>       >       test this theory by adding: "-o slab_chunk_size_max=1048576" 
> (the same
>       >       as
>       >       item size max). Would be great to know if this makes the 
> problem go
>       >       away,
>       >       since it means I have some more stuff to tune there.
>       >
>       >       have fun,
>       >       -Dormando
>       >
>       >       On Mon, 24 Apr 2017, Min-Zhong "John" Lu wrote:
>       >
>       >       > Hi there,
>       >       >
>       >       > I've recently been investigating an intermittent & transient
>       >       failure-to-set issue, in a
>       >       > long-running memcached instance. And I believe I could use 
> some
>       >       insight from you all.
>       >       >
>       >       > Let me list my configurations first. I have |stats| and |stats
>       >       slabs| dumps available as
>       >       > Google Groups attachment. If they fail to go through just 
> lemme
>       >       re-post them on some
>       >       > pastebin service.
>       >       >
>       >       > Configuration:
>       >       > Command line arg: -m 2900 -f 1.16 -c 10240 -k -o modern
>       >       >
>       >       > Using 1.4.36 (compiled by myself) on Ubuntu 14.04.4 x64.
>       >       >
>       >       > The -k flag has been verified to be effective (I've got limits
>       >       configured correctly).
>       >       >
>       >       > Growth factor of 1.16 is just an empirical value for my item 
> sizes.
>       >       >
>       >       >
>       >       > Symptom of the issue: 
>       >       > After running the memcached for around 10 days, there have 
> been
>       >       occasions where a set
>       >       > request of an large item (sized around 760KiBs to 930KiBs) 
> would
>       >       fail, where memcached
>       >       > returns 37 (item too big). However, when this happens, if I 
> wait for
>       >       around one minute,
>       >       > and then send the same set request again (with exactly the 
> same
>       >       key/item/expiration to
>       >       > store), memcached would gladly store it. Further get requests 
> verify
>       >       that the item is
>       >       > correctly stored.
>       >       >
>       >       > According to my logs, this happens intermittently, and I 
> haven't
>       >       been able to correlate
>       >       > those transient failures with my slab stats.
>       >       >
>       >       >
>       >       > Observation & Question 1:
>       >       > Q1: Does my issue arise because when the initial set request 
> arrives
>       >       at memcached,
>       >       > memcached has to run the slab automover to produce a slab 
> (maybe two
>       >       slabs, since the
>       >       > item is larger than 512KiB) to accommodate the set request?
>       >       >
>       >       > This is my hunch --- I am yet to do a quick |stats| dump at 
> the
>       >       exact moment of the set
>       >       > failure to confirm this. But I have seen 
> [slab_reassign_busy_items =
>       >       10K] and
>       >       > [slabs_moved = 16.9K] in my |stats| dumps, which means the 
> slab
>       >       automover must have been
>       >       > triggered during memcached's entire life time. This leads to 
> my next
>       >       question:
>       >       >
>       >       >
>       >       > Observation & Question 2 & 3:
>       >       > Q2: When the slab automover is running, would it possibly 
> block the
>       >       large-item set
>       >       > request, as in my case above?
>       >       >
>       >       > Q3: Why would memcached favor triggering slab automover over
>       >       allocating new memory, when
>       >       > there is still host memory available?
>       >       >
>       >       > According to the stats dumps, my memcached instance has
>       >       [total_malloced = 793MiB], and a
>       >       > footprint of [bytes = 392.33MiB] --- both fall far short of
>       >       [limit_maxbytes = 2900MiB].
>       >       > Furthermore, nothing has been evicted as I have got 
> [evictions = 0]
>       >       >
>       >       > (And the host system has extremely enough free physical 
> memory, per
>       >       |free -m|)
>       >       >
>       >       > I would expect that allocating memory would be faster (and 
> *way*
>       >       faster actually) than
>       >       > triggering slab automover to reassign slabs to accommodate the
>       >       incoming set request, and
>       >       > that allocating memory would allow the initial set request to 
> be
>       >       served immediately.
>       >       >
>       >       > In addition, if the slab automover just happens to be running 
> when
>       >       the large-item set
>       >       > request arrives, and the answer to Q2 is "yes"... can we make 
> it not
>       >       block if there's
>       >       > still host memory available?
>       >       >
>       >       >
>       >       >
>       >       > I'm kinda out of clues here...and I might actually be on a 
> wrong
>       >       route in my
>       >       > investigation.
>       >       >
>       >       > Any insight is appreciated, and it'd be great if I can get 
> rid of
>       >       those set failures
>       >       > without having to summon a dinosaur.
>       >       >
>       >       > For example, would disabling slab automover be an acceptable
>       >       band-aid fix? (and that I
>       >       > launch the manual mover (mc_slab_mover) when I know I have
>       >       relatively lighter traffic)
>       >       >
>       >       > Thanks a lot.
>       >       >
>       >       > p.s. While 'retry this set request at a later time' will work
>       >       (anecdotally), I don't
>       >       > want to implement a retry mechanism at client side, since 1) 
> the
>       >       'later time' is
>       >       > probably non-deterministic, and 2) I don't have a readily 
> available
>       >       construct to
>       >       > decouple such retry from the rest of my task, and thus having 
> to
>       >       retry would
>       >       > unnecessarily block client side.
>       >       >
>       >       > --
>       >       >
>       >       > ---
>       >       > You received this message because you are subscribed to the 
> Google
>       >       Groups "memcached"
>       >       > group.
>       >       > To unsubscribe from this group and stop receiving emails from 
> it,
>       >       send an email to
>       >       > memcached+...@googlegroups.com.
>       >       > For more options, visit https://groups.google.com/d/optout.
>       >       >
>       >       >
>       >
>       > --
>       >
>       > ---
>       > You received this message because you are subscribed to the Google 
> Groups "memcached"
>       > group.
>       > To unsubscribe from this group and stop receiving emails from it, 
> send an email to
>       > memcached+...@googlegroups.com.
>       > For more options, visit https://groups.google.com/d/optout.
>       >
>       >
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Culprit of intermittent & transient item-too-large set failures, and why favoring slab reassignment over memory allocation?

Reply via email to