Re: "Out of memory during read" errors instead of key eviction

2022-08-26 Thread dormando
So I tested this a bit more and released it in 1.6.17; I think bitnami
should pick it up soonish. if not I'll try to figure out docker this
weekend if you still need it.

I'm not 100% sure it'll fix your use case but it does fix some things I
can test and it didn't seem like a regression. would be nice to validate
still.

On Fri, 26 Aug 2022, dormando wrote:

> You can't build docker images or compile binaries? there's a
> docker-compose.yml in the repo already if that helps.
>
> If not I can try but I don't spend a lot of time with docker directly.
>
> On Fri, 26 Aug 2022, Hayden wrote:
>
> > I'd be happy to help validate the fix, but I can't do it until the weekend, 
> > and I don't have a ready way to build an updated image. Any chance you could
> > create a docker image with the fix that I could grab from somewhere?
> >
> > On Friday, August 26, 2022 at 10:38:54 AM UTC-7 Dormando wrote:
> >   I have an opportunity to put this fix into a release today if anyone 
> > wants
> >   to help validate :)
> >
> >   On Thu, 25 Aug 2022, dormando wrote:
> >
> >   > Took another quick look...
> >   >
> >   > Think there's an easy patch that might work:
> >   > https://github.com/memcached/memcached/pull/924
> >   >
> >   > If you wouldn't mind helping validate? An external validator would 
> > help me
> >   > get it in time for the next release :)
> >   >
> >   > Thanks,
> >   > -Dormando
> >   >
> >   > On Wed, 24 Aug 2022, dormando wrote:
> >   >
> >   > > Hey,
> >   > >
> >   > > Thanks for the info. Yes; this generally confirms the issue. I 
> > see some of
> >   > > your higher slab classes with "free_chunks 0", so if you're 
> > setting data
> >   > > that requires these chunks it could error out. The "stats items" 
> > confirms
> >   > > this since there are no actual items in those lower slab classes.
> >   > >
> >   > > You're certainly right a workaround of making your items < 512k 
> > would also
> >   > > work; but in general if I have features it'd be nice if they 
> > worked well
> >   > > :) Please open an issue so we can improve things!
> >   > >
> >   > > I intended to lower the slab_chunk_max default from 512k to much 
> > lower, as
> >   > > that actually raises the memory efficiency by a bit (less gap at 
> > the
> >   > > higher classes). That may help here. The system should also try 
> > ejecting
> >   > > items from the highest LRU... I need to double check that it 
> > wasn't
> >   > > already intending to do that and failing.
> >   > >
> >   > > Might also be able to adjust the page mover but not sure. The 
> > page mover
> >   > > can probably be adjusted to attempt to keep one page in reserve, 
> > but I
> >   > > think the algorithm isn't expecting slabs with no items in it so 
> > I'd have
> >   > > to audit that too.
> >   > >
> >   > > If you're up for experiments it'd be interesting to know if 
> > setting
> >   > > "-o slab_chunk_max=32768" or 16k (probably not more than 64) 
> > makes things
> >   > > better or worse.
> >   > >
> >   > > Also, crud.. it's documented as kilobytes but that's not working 
> > somehow?
> >   > > aaahahah. I guess the big EXPERIMENTAL tag scared people off 
> > since that
> >   > > never got reported.
> >   > >
> >   > > I'm guessing most people have a mix of small to large items, but 
> > you only
> >   > > have large items and a relatively low memory limit, so this is 
> > why you're
> >   > > seeing it so easily. I think most people setting large items have 
> > like
> >   > > 30G+ of memory so you end up with more spread around.
> >   > >
> >   > > Thanks,
> >   > > -Dormando
> >   > >
> >   > > On Wed, 24 Aug 2022, Hayden wrote:
> >   > >
> >   > > > What you're saying makes sense, and I'm pretty sure it won't be 
> > too hard to add some functionality to my writing code to break my large
> >   items up into
> >   > > > smaller parts that can each fit into a single chunk. That has 
> > the added benefit that I won't have to bother increasing the max item
> >   size.
> >   > > > In the meantime, though, I reran my pipeline and captured the 
> > output of stats, stats slabs, and stats items both when evicting normally
> >   and when getting
> >   > > > spammed with the error.
> >   > > >
> >   > > > First, the output when I'm in the error state:
> >   > > >  Output of stats
> >   > > > STAT pid 1
> >   > > > STAT uptime 11727
> >   > > > STAT time 1661406229
> >   > > > STAT version b'1.6.14'
> >   > > > STAT libevent b'2.1.8-stable'
> >   > > > STAT pointer_size 64
> >   > > > STAT rusage_user 2.93837
> >   > > > STAT rusage_system 6.339015
> >   > > > STAT max_connections 1024
> >   > > > STAT curr_connections 2
> >   > > > STAT 

Re: "Out of memory during read" errors instead of key eviction

2022-08-26 Thread dormando
You can't build docker images or compile binaries? there's a
docker-compose.yml in the repo already if that helps.

If not I can try but I don't spend a lot of time with docker directly.

On Fri, 26 Aug 2022, Hayden wrote:

> I'd be happy to help validate the fix, but I can't do it until the weekend, 
> and I don't have a ready way to build an updated image. Any chance you could
> create a docker image with the fix that I could grab from somewhere?
>
> On Friday, August 26, 2022 at 10:38:54 AM UTC-7 Dormando wrote:
>   I have an opportunity to put this fix into a release today if anyone 
> wants
>   to help validate :)
>
>   On Thu, 25 Aug 2022, dormando wrote:
>
>   > Took another quick look...
>   >
>   > Think there's an easy patch that might work:
>   > https://github.com/memcached/memcached/pull/924
>   >
>   > If you wouldn't mind helping validate? An external validator would 
> help me
>   > get it in time for the next release :)
>   >
>   > Thanks,
>   > -Dormando
>   >
>   > On Wed, 24 Aug 2022, dormando wrote:
>   >
>   > > Hey,
>   > >
>   > > Thanks for the info. Yes; this generally confirms the issue. I see 
> some of
>   > > your higher slab classes with "free_chunks 0", so if you're setting 
> data
>   > > that requires these chunks it could error out. The "stats items" 
> confirms
>   > > this since there are no actual items in those lower slab classes.
>   > >
>   > > You're certainly right a workaround of making your items < 512k 
> would also
>   > > work; but in general if I have features it'd be nice if they worked 
> well
>   > > :) Please open an issue so we can improve things!
>   > >
>   > > I intended to lower the slab_chunk_max default from 512k to much 
> lower, as
>   > > that actually raises the memory efficiency by a bit (less gap at the
>   > > higher classes). That may help here. The system should also try 
> ejecting
>   > > items from the highest LRU... I need to double check that it wasn't
>   > > already intending to do that and failing.
>   > >
>   > > Might also be able to adjust the page mover but not sure. The page 
> mover
>   > > can probably be adjusted to attempt to keep one page in reserve, 
> but I
>   > > think the algorithm isn't expecting slabs with no items in it so 
> I'd have
>   > > to audit that too.
>   > >
>   > > If you're up for experiments it'd be interesting to know if setting
>   > > "-o slab_chunk_max=32768" or 16k (probably not more than 64) makes 
> things
>   > > better or worse.
>   > >
>   > > Also, crud.. it's documented as kilobytes but that's not working 
> somehow?
>   > > aaahahah. I guess the big EXPERIMENTAL tag scared people off since 
> that
>   > > never got reported.
>   > >
>   > > I'm guessing most people have a mix of small to large items, but 
> you only
>   > > have large items and a relatively low memory limit, so this is why 
> you're
>   > > seeing it so easily. I think most people setting large items have 
> like
>   > > 30G+ of memory so you end up with more spread around.
>   > >
>   > > Thanks,
>   > > -Dormando
>   > >
>   > > On Wed, 24 Aug 2022, Hayden wrote:
>   > >
>   > > > What you're saying makes sense, and I'm pretty sure it won't be 
> too hard to add some functionality to my writing code to break my large
>   items up into
>   > > > smaller parts that can each fit into a single chunk. That has the 
> added benefit that I won't have to bother increasing the max item
>   size.
>   > > > In the meantime, though, I reran my pipeline and captured the 
> output of stats, stats slabs, and stats items both when evicting normally
>   and when getting
>   > > > spammed with the error.
>   > > >
>   > > > First, the output when I'm in the error state:
>   > > >  Output of stats
>   > > > STAT pid 1
>   > > > STAT uptime 11727
>   > > > STAT time 1661406229
>   > > > STAT version b'1.6.14'
>   > > > STAT libevent b'2.1.8-stable'
>   > > > STAT pointer_size 64
>   > > > STAT rusage_user 2.93837
>   > > > STAT rusage_system 6.339015
>   > > > STAT max_connections 1024
>   > > > STAT curr_connections 2
>   > > > STAT total_connections 8230
>   > > > STAT rejected_connections 0
>   > > > STAT connection_structures 6
>   > > > STAT response_obj_oom 0
>   > > > STAT response_obj_count 1
>   > > > STAT response_obj_bytes 65536
>   > > > STAT read_buf_count 8
>   > > > STAT read_buf_bytes 131072
>   > > > STAT read_buf_bytes_free 49152
>   > > > STAT read_buf_oom 0
>   > > > STAT reserved_fds 20
>   > > > STAT cmd_get 0
>   > > > STAT cmd_set 12640
>   > > > STAT cmd_flush 0
>   > > > STAT cmd_touch 0
>   > > > STAT cmd_meta 0
>   > > > STAT get_hits 0
>   

Re: "Out of memory during read" errors instead of key eviction

2022-08-26 Thread Hayden
I'd be happy to help validate the fix, but I can't do it until the weekend, 
and I don't have a ready way to build an updated image. Any chance you 
could create a docker image with the fix that I could grab from somewhere?

On Friday, August 26, 2022 at 10:38:54 AM UTC-7 Dormando wrote:

> I have an opportunity to put this fix into a release today if anyone wants
> to help validate :)
>
> On Thu, 25 Aug 2022, dormando wrote:
>
> > Took another quick look...
> >
> > Think there's an easy patch that might work:
> > https://github.com/memcached/memcached/pull/924
> >
> > If you wouldn't mind helping validate? An external validator would help 
> me
> > get it in time for the next release :)
> >
> > Thanks,
> > -Dormando
> >
> > On Wed, 24 Aug 2022, dormando wrote:
> >
> > > Hey,
> > >
> > > Thanks for the info. Yes; this generally confirms the issue. I see 
> some of
> > > your higher slab classes with "free_chunks 0", so if you're setting 
> data
> > > that requires these chunks it could error out. The "stats items" 
> confirms
> > > this since there are no actual items in those lower slab classes.
> > >
> > > You're certainly right a workaround of making your items < 512k would 
> also
> > > work; but in general if I have features it'd be nice if they worked 
> well
> > > :) Please open an issue so we can improve things!
> > >
> > > I intended to lower the slab_chunk_max default from 512k to much 
> lower, as
> > > that actually raises the memory efficiency by a bit (less gap at the
> > > higher classes). That may help here. The system should also try 
> ejecting
> > > items from the highest LRU... I need to double check that it wasn't
> > > already intending to do that and failing.
> > >
> > > Might also be able to adjust the page mover but not sure. The page 
> mover
> > > can probably be adjusted to attempt to keep one page in reserve, but I
> > > think the algorithm isn't expecting slabs with no items in it so I'd 
> have
> > > to audit that too.
> > >
> > > If you're up for experiments it'd be interesting to know if setting
> > > "-o slab_chunk_max=32768" or 16k (probably not more than 64) makes 
> things
> > > better or worse.
> > >
> > > Also, crud.. it's documented as kilobytes but that's not working 
> somehow?
> > > aaahahah. I guess the big EXPERIMENTAL tag scared people off since that
> > > never got reported.
> > >
> > > I'm guessing most people have a mix of small to large items, but you 
> only
> > > have large items and a relatively low memory limit, so this is why 
> you're
> > > seeing it so easily. I think most people setting large items have like
> > > 30G+ of memory so you end up with more spread around.
> > >
> > > Thanks,
> > > -Dormando
> > >
> > > On Wed, 24 Aug 2022, Hayden wrote:
> > >
> > > > What you're saying makes sense, and I'm pretty sure it won't be too 
> hard to add some functionality to my writing code to break my large items 
> up into
> > > > smaller parts that can each fit into a single chunk. That has the 
> added benefit that I won't have to bother increasing the max item size.
> > > > In the meantime, though, I reran my pipeline and captured the output 
> of stats, stats slabs, and stats items both when evicting normally and when 
> getting
> > > > spammed with the error.
> > > >
> > > > First, the output when I'm in the error state:
> > > >  Output of stats
> > > > STAT pid 1
> > > > STAT uptime 11727
> > > > STAT time 1661406229
> > > > STAT version b'1.6.14'
> > > > STAT libevent b'2.1.8-stable'
> > > > STAT pointer_size 64
> > > > STAT rusage_user 2.93837
> > > > STAT rusage_system 6.339015
> > > > STAT max_connections 1024
> > > > STAT curr_connections 2
> > > > STAT total_connections 8230
> > > > STAT rejected_connections 0
> > > > STAT connection_structures 6
> > > > STAT response_obj_oom 0
> > > > STAT response_obj_count 1
> > > > STAT response_obj_bytes 65536
> > > > STAT read_buf_count 8
> > > > STAT read_buf_bytes 131072
> > > > STAT read_buf_bytes_free 49152
> > > > STAT read_buf_oom 0
> > > > STAT reserved_fds 20
> > > > STAT cmd_get 0
> > > > STAT cmd_set 12640
> > > > STAT cmd_flush 0
> > > > STAT cmd_touch 0
> > > > STAT cmd_meta 0
> > > > STAT get_hits 0
> > > > STAT get_misses 0
> > > > STAT get_expired 0
> > > > STAT get_flushed 0
> > > > STAT delete_misses 0
> > > > STAT delete_hits 0
> > > > STAT incr_misses 0
> > > > STAT incr_hits 0
> > > > STAT decr_misses 0
> > > > STAT decr_hits 0
> > > > STAT cas_misses 0
> > > > STAT cas_hits 0
> > > > STAT cas_badval 0
> > > > STAT touch_hits 0
> > > > STAT touch_misses 0
> > > > STAT store_too_large 0
> > > > STAT store_no_memory 0
> > > > STAT auth_cmds 0
> > > > STAT auth_errors 0
> > > > STAT bytes_read 21755739959
> > > > STAT bytes_written 330909
> > > > STAT limit_maxbytes 5368709120
> > > > STAT accepting_conns 1
> > > > STAT listen_disabled_num 0
> > > > STAT time_in_listen_disabled_us 0
> > > > STAT threads 4
> > > > STAT conn_yields 0
> > > > STAT hash_power_level 16
> > > > STAT