To add another datapoint here, we at Grafana Labs use memcached extensively
in our cloud and this fix made a massive impact on our cache effectiveness:
https://user-images.githubusercontent.com/373762/204228886-7c5a759a-927c-46fb-ae55-3e0b4056ebae.png
Thank you very much to you both for the investigation and bugfix!
On Saturday, August 27, 2022 at 8:53:47 AM UTC+2 Dormando wrote:
> Thanks for taking the time to evaluate! It helps my confidence level with
> the fix.
>
> You caught me at a good time :) Been really behind with fixes for quite a
> while and only catching up this week. I've looked at this a few times and
> didn't see the easy fix before...
>
> I think earlier versions of the item chunking code were more fragile and I
> didn't revisit it after the cleanup work. In this case each chunk
> remembers its original slab class, so having the final chunk be from an
> unintended class doesn't break anything. Otherwise freeing the chunks
> would be impossible if I had to recalculate their original slab class from
> the chunk size.
>
> So now it'll use too much memory in some cases, and lowering slab chunk
> max would ease that a bit... so maybe soon will finally be a good time to
> lower the default chunk max a little to at least 128k or 256k.
>
> -Dormando
>
> On Fri, 26 Aug 2022, Hayden wrote:
>
> > I didn't see the docker files in the repo that could build the docker
> image, and when I tried cloning the git repo and doing a docker build I
> encountered
> > errors that I think were related to the web proxy on my work network. I
> was able to grab the release tarball and the bitnami docker file, do a
> little
> > surgery to work around my proxy issue, and build a 1.6.17 docker image
> though.
> > I ran my application against the new version and it ran for ~2hr without
> any errors (it previously wouldn't run more than 30s or so before
> encountering
> > blocks of the OOM during read errors). I also made a little test loop
> that just hammered the instance with similar sized writes (1-2MB) as fast
> as it
> > could and let it run a few hours, and it didn't have a single blip. That
> encompassed a couple million evictions. I'm pretty comfortable saying the
> issue
> > is fixed, at least for the kind of use I had in mind.
> >
> > I added a comment to the issue on GitHub to the same effect.
> >
> > I'm impressed by the quick turnaround, BTW. ;-)
> >
> > H
> >
> > On Friday, August 26, 2022 at 5:54:26 PM UTC-7 Dormando wrote:
> > So I tested this a bit more and released it in 1.6.17; I think bitnami
> > should pick it up soonish. if not I'll try to figure out docker this
> > weekend if you still need it.
> >
> > I'm not 100% sure it'll fix your use case but it does fix some things I
> > can test and it didn't seem like a regression. would be nice to validate
> > still.
> >
> > On Fri, 26 Aug 2022, dormando wrote:
> >
> > > You can't build docker images or compile binaries? there's a
> > > docker-compose.yml in the repo already if that helps.
> > >
> > > If not I can try but I don't spend a lot of time with docker directly.
> > >
> > > On Fri, 26 Aug 2022, Hayden wrote:
> > >
> > > > I'd be happy to help validate the fix, but I can't do it until the
> weekend, and I don't have a ready way to build an updated image. Any
> > chance you could
> > > > create a docker image with the fix that I could grab from somewhere?
> > > >
> > > > On Friday, August 26, 2022 at 10:38:54 AM UTC-7 Dormando wrote:
> > > > I have an opportunity to put this fix into a release today if anyone
> wants
> > > > to help validate :)
> > > >
> > > > On Thu, 25 Aug 2022, dormando wrote:
> > > >
> > > > > Took another quick look...
> > > > >
> > > > > Think there's an easy patch that might work:
> > > > > https://github.com/memcached/memcached/pull/924
> > > > >
> > > > > If you wouldn't mind helping validate? An external validator would
> help me
> > > > > get it in time for the next release :)
> > > > >
> > > > > Thanks,
> > > > > -Dormando
> > > > >
> > > > > On Wed, 24 Aug 2022, dormando wrote:
> > > > >
> > > > > > Hey,
> > > > > >
> > > > > > Thanks for the info. Yes; this generally confirms the issue. I
> see some of
> > > > > > your higher slab classes with "free_chunks 0", so if you're
> setting data
> > > > > > that requires these chunks it could error out. The "stats items"
> confirms
> > > > > > this since there are no actual items in those lower slab classes.
> > > > > >
> > > > > > You're certainly right a workaround of making your items < 512k
> would also
> > > > > > work; but in general if I have features it'd be nice if they
> worked well
> > > > > > :) Please open an issue so we can improve things!
> > > > > >
> > > > > > I intended to lower the slab_chunk_max default from 512k to much
> lower, as
> > > > > > that actually raises the memory efficiency by a bit (less gap at
> the
> > > > > > higher classes). That may help here. The system should also try
> ejecting
> > > > > > items from the