To add another datapoint here, we at Grafana Labs use memcached extensively in our cloud and this fix made a massive impact on our cache effectiveness: https://user-images.githubusercontent.com/373762/204228886-7c5a759a-927c-46fb-ae55-3e0b4056ebae.png
Thank you very much to you both for the investigation and bugfix! On Saturday, August 27, 2022 at 8:53:47 AM UTC+2 Dormando wrote: > Thanks for taking the time to evaluate! It helps my confidence level with > the fix. > > You caught me at a good time :) Been really behind with fixes for quite a > while and only catching up this week. I've looked at this a few times and > didn't see the easy fix before... > > I think earlier versions of the item chunking code were more fragile and I > didn't revisit it after the cleanup work. In this case each chunk > remembers its original slab class, so having the final chunk be from an > unintended class doesn't break anything. Otherwise freeing the chunks > would be impossible if I had to recalculate their original slab class from > the chunk size. > > So now it'll use too much memory in some cases, and lowering slab chunk > max would ease that a bit... so maybe soon will finally be a good time to > lower the default chunk max a little to at least 128k or 256k. > > -Dormando > > On Fri, 26 Aug 2022, Hayden wrote: > > > I didn't see the docker files in the repo that could build the docker > image, and when I tried cloning the git repo and doing a docker build I > encountered > > errors that I think were related to the web proxy on my work network. I > was able to grab the release tarball and the bitnami docker file, do a > little > > surgery to work around my proxy issue, and build a 1.6.17 docker image > though. > > I ran my application against the new version and it ran for ~2hr without > any errors (it previously wouldn't run more than 30s or so before > encountering > > blocks of the OOM during read errors). I also made a little test loop > that just hammered the instance with similar sized writes (1-2MB) as fast > as it > > could and let it run a few hours, and it didn't have a single blip. That > encompassed a couple million evictions. I'm pretty comfortable saying the > issue > > is fixed, at least for the kind of use I had in mind. > > > > I added a comment to the issue on GitHub to the same effect. > > > > I'm impressed by the quick turnaround, BTW. ;-) > > > > H > > > > On Friday, August 26, 2022 at 5:54:26 PM UTC-7 Dormando wrote: > > So I tested this a bit more and released it in 1.6.17; I think bitnami > > should pick it up soonish. if not I'll try to figure out docker this > > weekend if you still need it. > > > > I'm not 100% sure it'll fix your use case but it does fix some things I > > can test and it didn't seem like a regression. would be nice to validate > > still. > > > > On Fri, 26 Aug 2022, dormando wrote: > > > > > You can't build docker images or compile binaries? there's a > > > docker-compose.yml in the repo already if that helps. > > > > > > If not I can try but I don't spend a lot of time with docker directly. > > > > > > On Fri, 26 Aug 2022, Hayden wrote: > > > > > > > I'd be happy to help validate the fix, but I can't do it until the > weekend, and I don't have a ready way to build an updated image. Any > > chance you could > > > > create a docker image with the fix that I could grab from somewhere? > > > > > > > > On Friday, August 26, 2022 at 10:38:54 AM UTC-7 Dormando wrote: > > > > I have an opportunity to put this fix into a release today if anyone > wants > > > > to help validate :) > > > > > > > > On Thu, 25 Aug 2022, dormando wrote: > > > > > > > > > Took another quick look... > > > > > > > > > > Think there's an easy patch that might work: > > > > > https://github.com/memcached/memcached/pull/924 > > > > > > > > > > If you wouldn't mind helping validate? An external validator would > help me > > > > > get it in time for the next release :) > > > > > > > > > > Thanks, > > > > > -Dormando > > > > > > > > > > On Wed, 24 Aug 2022, dormando wrote: > > > > > > > > > > > Hey, > > > > > > > > > > > > Thanks for the info. Yes; this generally confirms the issue. I > see some of > > > > > > your higher slab classes with "free_chunks 0", so if you're > setting data > > > > > > that requires these chunks it could error out. The "stats items" > confirms > > > > > > this since there are no actual items in those lower slab classes. > > > > > > > > > > > > You're certainly right a workaround of making your items < 512k > would also > > > > > > work; but in general if I have features it'd be nice if they > worked well > > > > > > :) Please open an issue so we can improve things! > > > > > > > > > > > > I intended to lower the slab_chunk_max default from 512k to much > lower, as > > > > > > that actually raises the memory efficiency by a bit (less gap at > the > > > > > > higher classes). That may help here. The system should also try > ejecting > > > > > > items from the highest LRU... I need to double check that it > wasn't > > > > > > already intending to do that and failing. > > > > > > > > > > > > Might also be able to adjust the page mover but not sure. The > page mover > > > > > > can probably be adjusted to attempt to keep one page in reserve, > but I > > > > > > think the algorithm isn't expecting slabs with no items in it so > I'd have > > > > > > to audit that too. > > > > > > > > > > > > If you're up for experiments it'd be interesting to know if > setting > > > > > > "-o slab_chunk_max=32768" or 16k (probably not more than 64) > makes things > > > > > > better or worse. > > > > > > > > > > > > Also, crud.. it's documented as kilobytes but that's not working > somehow? > > > > > > aaahahah. I guess the big EXPERIMENTAL tag scared people off > since that > > > > > > never got reported. > > > > > > > > > > > > I'm guessing most people have a mix of small to large items, but > you only > > > > > > have large items and a relatively low memory limit, so this is > why you're > > > > > > seeing it so easily. I think most people setting large items > have like > > > > > > 30G+ of memory so you end up with more spread around. > > > > > > > > > > > > Thanks, > > > > > > -Dormando > > > > > > > > > > > > On Wed, 24 Aug 2022, Hayden wrote: > > > > > > > > > > > > > What you're saying makes sense, and I'm pretty sure it won't > be too hard to add some functionality to my writing code to break my > > large > > > > items up into > > > > > > > smaller parts that can each fit into a single chunk. That has > the added benefit that I won't have to bother increasing the max item > > > > size. > > > > > > > In the meantime, though, I reran my pipeline and captured the > output of stats, stats slabs, and stats items both when evicting > > normally > > > > and when getting > > > > > > > spammed with the error. > > > > > > > > > > > > > > First, the output when I'm in the error state: > > > > > > > **** Output of stats > > > > > > > STAT pid 1 > > > > > > > STAT uptime 11727 > > > > > > > STAT time 1661406229 > > > > > > > STAT version b'1.6.14' > > > > > > > STAT libevent b'2.1.8-stable' > > > > > > > STAT pointer_size 64 > > > > > > > STAT rusage_user 2.93837 > > > > > > > STAT rusage_system 6.339015 > > > > > > > STAT max_connections 1024 > > > > > > > STAT curr_connections 2 > > > > > > > STAT total_connections 8230 > > > > > > > STAT rejected_connections 0 > > > > > > > STAT connection_structures 6 > > > > > > > STAT response_obj_oom 0 > > > > > > > STAT response_obj_count 1 > > > > > > > STAT response_obj_bytes 65536 > > > > > > > STAT read_buf_count 8 > > > > > > > STAT read_buf_bytes 131072 > > > > > > > STAT read_buf_bytes_free 49152 > > > > > > > STAT read_buf_oom 0 > > > > > > > STAT reserved_fds 20 > > > > > > > STAT cmd_get 0 > > > > > > > STAT cmd_set 12640 > > > > > > > STAT cmd_flush 0 > > > > > > > STAT cmd_touch 0 > > > > > > > STAT cmd_meta 0 > > > > > > > STAT get_hits 0 > > > > > > > STAT get_misses 0 > > > > > > > STAT get_expired 0 > > > > > > > STAT get_flushed 0 > > > > > > > STAT delete_misses 0 > > > > > > > STAT delete_hits 0 > > > > > > > STAT incr_misses 0 > > > > > > > STAT incr_hits 0 > > > > > > > STAT decr_misses 0 > > > > > > > STAT decr_hits 0 > > > > > > > STAT cas_misses 0 > > > > > > > STAT cas_hits 0 > > > > > > > STAT cas_badval 0 > > > > > > > STAT touch_hits 0 > > > > > > > STAT touch_misses 0 > > > > > > > STAT store_too_large 0 > > > > > > > STAT store_no_memory 0 > > > > > > > STAT auth_cmds 0 > > > > > > > STAT auth_errors 0 > > > > > > > STAT bytes_read 21755739959 > > > > > > > STAT bytes_written 330909 > > > > > > > STAT limit_maxbytes 5368709120 > > > > > > > STAT accepting_conns 1 > > > > > > > STAT listen_disabled_num 0 > > > > > > > STAT time_in_listen_disabled_us 0 > > > > > > > STAT threads 4 > > > > > > > STAT conn_yields 0 > > > > > > > STAT hash_power_level 16 > > > > > > > STAT hash_bytes 524288 > > > > > > > STAT hash_is_expanding False > > > > > > > STAT slab_reassign_rescues 0 > > > > > > > STAT slab_reassign_chunk_rescues 0 > > > > > > > STAT slab_reassign_evictions_nomem 0 > > > > > > > STAT slab_reassign_inline_reclaim 0 > > > > > > > STAT slab_reassign_busy_items 0 > > > > > > > STAT slab_reassign_busy_deletes 0 > > > > > > > STAT slab_reassign_running False > > > > > > > STAT slabs_moved 0 > > > > > > > STAT lru_crawler_running 0 > > > > > > > STAT lru_crawler_starts 20 > > > > > > > STAT lru_maintainer_juggles 71777 > > > > > > > STAT malloc_fails 0 > > > > > > > STAT log_worker_dropped 0 > > > > > > > STAT log_worker_written 0 > > > > > > > STAT log_watcher_skipped 0 > > > > > > > STAT log_watcher_sent 0 > > > > > > > STAT log_watchers 0 > > > > > > > STAT unexpected_napi_ids 0 > > > > > > > STAT round_robin_fallback 0 > > > > > > > STAT bytes 5241499325 > > > > > > > STAT curr_items 4211 > > > > > > > STAT total_items 12640 > > > > > > > STAT slab_global_page_pool 0 > > > > > > > STAT expired_unfetched 0 > > > > > > > STAT evicted_unfetched 8429 > > > > > > > STAT evicted_active 0 > > > > > > > STAT evictions 8429 > > > > > > > STAT reclaimed 0 > > > > > > > STAT crawler_reclaimed 0 > > > > > > > STAT crawler_items_checked 4212 > > > > > > > STAT lrutail_reflocked 0 > > > > > > > STAT moves_to_cold 11872 > > > > > > > STAT moves_to_warm 0 > > > > > > > STAT moves_within_lru 0 > > > > > > > STAT direct_reclaims 55559 > > > > > > > STAT lru_bumps_dropped 0 > > > > > > > END > > > > > > > **** Output of stats slabs > > > > > > > STAT 2:chunk_size 120 > > > > > > > STAT 2:chunks_per_page 8738 > > > > > > > STAT 2:total_pages 1 > > > > > > > STAT 2:total_chunks 8738 > > > > > > > STAT 2:used_chunks 4211 > > > > > > > STAT 2:free_chunks 4527 > > > > > > > STAT 2:free_chunks_end 0 > > > > > > > STAT 2:get_hits 0 > > > > > > > STAT 2:cmd_set 0 > > > > > > > STAT 2:delete_hits 0 > > > > > > > STAT 2:incr_hits 0 > > > > > > > STAT 2:decr_hits 0 > > > > > > > STAT 2:cas_hits 0 > > > > > > > STAT 2:cas_badval 0 > > > > > > > STAT 2:touch_hits 0 > > > > > > > STAT 30:chunk_size 66232 > > > > > > > STAT 30:chunks_per_page 15 > > > > > > > STAT 30:total_pages 1 > > > > > > > STAT 30:total_chunks 15 > > > > > > > STAT 30:used_chunks 3 > > > > > > > STAT 30:free_chunks 12 > > > > > > > STAT 30:free_chunks_end 0 > > > > > > > STAT 30:get_hits 0 > > > > > > > STAT 30:cmd_set 0 > > > > > > > STAT 30:delete_hits 0 > > > > > > > STAT 30:incr_hits 0 > > > > > > > STAT 30:decr_hits 0 > > > > > > > STAT 30:cas_hits 0 > > > > > > > STAT 30:cas_badval 0 > > > > > > > STAT 30:touch_hits 0 > > > > > > > STAT 31:chunk_size 82792 > > > > > > > STAT 31:chunks_per_page 12 > > > > > > > STAT 31:total_pages 1 > > > > > > > STAT 31:total_chunks 12 > > > > > > > STAT 31:used_chunks 6 > > > > > > > STAT 31:free_chunks 6 > > > > > > > STAT 31:free_chunks_end 0 > > > > > > > STAT 31:get_hits 0 > > > > > > > STAT 31:cmd_set 0 > > > > > > > STAT 31:delete_hits 0 > > > > > > > STAT 31:incr_hits 0 > > > > > > > STAT 31:decr_hits 0 > > > > > > > STAT 31:cas_hits 0 > > > > > > > STAT 31:cas_badval 0 > > > > > > > STAT 31:touch_hits 0 > > > > > > > STAT 32:chunk_size 103496 > > > > > > > STAT 32:chunks_per_page 10 > > > > > > > STAT 32:total_pages 19 > > > > > > > STAT 32:total_chunks 190 > > > > > > > STAT 32:used_chunks 183 > > > > > > > STAT 32:free_chunks 7 > > > > > > > STAT 32:free_chunks_end 0 > > > > > > > STAT 32:get_hits 0 > > > > > > > STAT 32:cmd_set 0 > > > > > > > STAT 32:delete_hits 0 > > > > > > > STAT 32:incr_hits 0 > > > > > > > STAT 32:decr_hits 0 > > > > > > > STAT 32:cas_hits 0 > > > > > > > STAT 32:cas_badval 0 > > > > > > > STAT 32:touch_hits 0 > > > > > > > STAT 33:chunk_size 129376 > > > > > > > STAT 33:chunks_per_page 8 > > > > > > > STAT 33:total_pages 50 > > > > > > > STAT 33:total_chunks 400 > > > > > > > STAT 33:used_chunks 393 > > > > > > > STAT 33:free_chunks 7 > > > > > > > STAT 33:free_chunks_end 0 > > > > > > > STAT 33:get_hits 0 > > > > > > > STAT 33:cmd_set 0 > > > > > > > STAT 33:delete_hits 0 > > > > > > > STAT 33:incr_hits 0 > > > > > > > STAT 33:decr_hits 0 > > > > > > > STAT 33:cas_hits 0 > > > > > > > STAT 33:cas_badval 0 > > > > > > > STAT 33:touch_hits 0 > > > > > > > STAT 34:chunk_size 161720 > > > > > > > STAT 34:chunks_per_page 6 > > > > > > > STAT 34:total_pages 41 > > > > > > > STAT 34:total_chunks 246 > > > > > > > STAT 34:used_chunks 245 > > > > > > > STAT 34:free_chunks 1 > > > > > > > STAT 34:free_chunks_end 0 > > > > > > > STAT 34:get_hits 0 > > > > > > > STAT 34:cmd_set 0 > > > > > > > STAT 34:delete_hits 0 > > > > > > > STAT 34:incr_hits 0 > > > > > > > STAT 34:decr_hits 0 > > > > > > > STAT 34:cas_hits 0 > > > > > > > STAT 34:cas_badval 0 > > > > > > > STAT 34:touch_hits 0 > > > > > > > STAT 35:chunk_size 202152 > > > > > > > STAT 35:chunks_per_page 5 > > > > > > > STAT 35:total_pages 231 > > > > > > > STAT 35:total_chunks 1155 > > > > > > > STAT 35:used_chunks 1155 > > > > > > > STAT 35:free_chunks 0 > > > > > > > STAT 35:free_chunks_end 0 > > > > > > > STAT 35:get_hits 0 > > > > > > > STAT 35:cmd_set 0 > > > > > > > STAT 35:delete_hits 0 > > > > > > > STAT 35:incr_hits 0 > > > > > > > STAT 35:decr_hits 0 > > > > > > > STAT 35:cas_hits 0 > > > > > > > STAT 35:cas_badval 0 > > > > > > > STAT 35:touch_hits 0 > > > > > > > STAT 36:chunk_size 252696 > > > > > > > STAT 36:chunks_per_page 4 > > > > > > > STAT 36:total_pages 536 > > > > > > > STAT 36:total_chunks 2144 > > > > > > > STAT 36:used_chunks 2144 > > > > > > > STAT 36:free_chunks 0 > > > > > > > STAT 36:free_chunks_end 0 > > > > > > > STAT 36:get_hits 0 > > > > > > > STAT 36:cmd_set 0 > > > > > > > STAT 36:delete_hits 0 > > > > > > > STAT 36:incr_hits 0 > > > > > > > STAT 36:decr_hits 0 > > > > > > > STAT 36:cas_hits 0 > > > > > > > STAT 36:cas_badval 0 > > > > > > > STAT 36:touch_hits 0 > > > > > > > STAT 37:chunk_size 315872 > > > > > > > STAT 37:chunks_per_page 3 > > > > > > > STAT 37:total_pages 28 > > > > > > > STAT 37:total_chunks 84 > > > > > > > STAT 37:used_chunks 82 > > > > > > > STAT 37:free_chunks 2 > > > > > > > STAT 37:free_chunks_end 0 > > > > > > > STAT 37:get_hits 0 > > > > > > > STAT 37:cmd_set 0 > > > > > > > STAT 37:delete_hits 0 > > > > > > > STAT 37:incr_hits 0 > > > > > > > STAT 37:decr_hits 0 > > > > > > > STAT 37:cas_hits 0 > > > > > > > STAT 37:cas_badval 0 > > > > > > > STAT 37:touch_hits 0 > > > > > > > STAT 39:chunk_size 524288 > > > > > > > STAT 39:chunks_per_page 2 > > > > > > > STAT 39:total_pages 4212 > > > > > > > STAT 39:total_chunks 8424 > > > > > > > STAT 39:used_chunks 8422 > > > > > > > STAT 39:free_chunks 2 > > > > > > > STAT 39:free_chunks_end 0 > > > > > > > STAT 39:get_hits 0 > > > > > > > STAT 39:cmd_set 12640 > > > > > > > STAT 39:delete_hits 0 > > > > > > > STAT 39:incr_hits 0 > > > > > > > STAT 39:decr_hits 0 > > > > > > > STAT 39:cas_hits 0 > > > > > > > STAT 39:cas_badval 0 > > > > > > > STAT 39:touch_hits 0 > > > > > > > STAT active_slabs 10 > > > > > > > STAT total_malloced 5368709120 > > > > > > > END > > > > > > > **** Output of stats items > > > > > > > STAT items:39:number 4211 > > > > > > > STAT items:39:number_hot 768 > > > > > > > STAT items:39:number_warm 0 > > > > > > > STAT items:39:number_cold 3443 > > > > > > > STAT items:39:age_hot 28 > > > > > > > STAT items:39:age_warm 0 > > > > > > > STAT items:39:age 143 > > > > > > > STAT items:39:mem_requested 5241499325 > > > > > > > STAT items:39:evicted 8429 > > > > > > > STAT items:39:evicted_nonzero 0 > > > > > > > STAT items:39:evicted_time 140 > > > > > > > STAT items:39:outofmemory 0 > > > > > > > STAT items:39:tailrepairs 0 > > > > > > > STAT items:39:reclaimed 0 > > > > > > > STAT items:39:expired_unfetched 0 > > > > > > > STAT items:39:evicted_unfetched 8429 > > > > > > > STAT items:39:evicted_active 0 > > > > > > > STAT items:39:crawler_reclaimed 0 > > > > > > > STAT items:39:crawler_items_checked 4212 > > > > > > > STAT items:39:lrutail_reflocked 0 > > > > > > > STAT items:39:moves_to_cold 11872 > > > > > > > STAT items:39:moves_to_warm 0 > > > > > > > STAT items:39:moves_within_lru 0 > > > > > > > STAT items:39:direct_reclaims 8429 > > > > > > > STAT items:39:hits_to_hot 0 > > > > > > > STAT items:39:hits_to_warm 0 > > > > > > > STAT items:39:hits_to_cold 0 > > > > > > > STAT items:39:hits_to_temp 0 > > > > > > > END > > > > > > > > > > > > > > Then, the output when it's humming along happily again: > > > > > > > **** Output of stats > > > > > > > STAT pid 1 > > > > > > > STAT uptime 11754 > > > > > > > STAT time 1661406256 > > > > > > > STAT version b'1.6.14' > > > > > > > STAT libevent b'2.1.8-stable' > > > > > > > STAT pointer_size 64 > > > > > > > STAT rusage_user 3.056135 > > > > > > > STAT rusage_system 7.074541 > > > > > > > STAT max_connections 1024 > > > > > > > STAT curr_connections 3 > > > > > > > STAT total_connections 10150 > > > > > > > STAT rejected_connections 0 > > > > > > > STAT connection_structures 6 > > > > > > > STAT response_obj_oom 0 > > > > > > > STAT response_obj_count 1 > > > > > > > STAT response_obj_bytes 65536 > > > > > > > STAT read_buf_count 8 > > > > > > > STAT read_buf_bytes 131072 > > > > > > > STAT read_buf_bytes_free 49152 > > > > > > > STAT read_buf_oom 0 > > > > > > > STAT reserved_fds 20 > > > > > > > STAT cmd_get 0 > > > > > > > STAT cmd_set 12794 > > > > > > > STAT cmd_flush 0 > > > > > > > STAT cmd_touch 0 > > > > > > > STAT cmd_meta 0 > > > > > > > STAT get_hits 0 > > > > > > > STAT get_misses 0 > > > > > > > STAT get_expired 0 > > > > > > > STAT get_flushed 0 > > > > > > > STAT delete_misses 0 > > > > > > > STAT delete_hits 0 > > > > > > > STAT incr_misses 0 > > > > > > > STAT incr_hits 0 > > > > > > > STAT decr_misses 0 > > > > > > > STAT decr_hits 0 > > > > > > > STAT cas_misses 0 > > > > > > > STAT cas_hits 0 > > > > > > > STAT cas_badval 0 > > > > > > > STAT touch_hits 0 > > > > > > > STAT touch_misses 0 > > > > > > > STAT store_too_large 0 > > > > > > > STAT store_no_memory 0 > > > > > > > STAT auth_cmds 0 > > > > > > > STAT auth_errors 0 > > > > > > > STAT bytes_read 24375641173 > > > > > > > STAT bytes_written 415262 > > > > > > > STAT limit_maxbytes 5368709120 > > > > > > > STAT accepting_conns 1 > > > > > > > STAT listen_disabled_num 0 > > > > > > > STAT time_in_listen_disabled_us 0 > > > > > > > STAT threads 4 > > > > > > > STAT conn_yields 0 > > > > > > > STAT hash_power_level 16 > > > > > > > STAT hash_bytes 524288 > > > > > > > STAT hash_is_expanding False > > > > > > > STAT slab_reassign_rescues 0 > > > > > > > STAT slab_reassign_chunk_rescues 0 > > > > > > > STAT slab_reassign_evictions_nomem 0 > > > > > > > STAT slab_reassign_inline_reclaim 0 > > > > > > > STAT slab_reassign_busy_items 0 > > > > > > > STAT slab_reassign_busy_deletes 0 > > > > > > > STAT slab_reassign_running False > > > > > > > STAT slabs_moved 0 > > > > > > > STAT lru_crawler_running 0 > > > > > > > STAT lru_crawler_starts 20 > > > > > > > STAT lru_maintainer_juggles 71952 > > > > > > > STAT malloc_fails 0 > > > > > > > STAT log_worker_dropped 0 > > > > > > > STAT log_worker_written 0 > > > > > > > STAT log_watcher_skipped 0 > > > > > > > STAT log_watcher_sent 0 > > > > > > > STAT log_watchers 0 > > > > > > > STAT unexpected_napi_ids 0 > > > > > > > STAT round_robin_fallback 0 > > > > > > > STAT bytes 5242957328 <(524)%20295-7328> > > > > > > > STAT curr_items 4212 > > > > > > > STAT total_items 12794 > > > > > > > STAT slab_global_page_pool 0 > > > > > > > STAT expired_unfetched 0 > > > > > > > STAT evicted_unfetched 8582 > > > > > > > STAT evicted_active 0 > > > > > > > STAT evictions 8582 > > > > > > > STAT reclaimed 0 > > > > > > > STAT crawler_reclaimed 0 > > > > > > > STAT crawler_items_checked 4212 > > > > > > > STAT lrutail_reflocked 0 > > > > > > > STAT moves_to_cold 12533 > > > > > > > STAT moves_to_warm 0 > > > > > > > STAT moves_within_lru 0 > > > > > > > STAT direct_reclaims 74822 > > > > > > > STAT lru_bumps_dropped 0 > > > > > > > END > > > > > > > **** Output of stats slabs > > > > > > > STAT 2:chunk_size 120 > > > > > > > STAT 2:chunks_per_page 8738 > > > > > > > STAT 2:total_pages 1 > > > > > > > STAT 2:total_chunks 8738 > > > > > > > STAT 2:used_chunks 4212 > > > > > > > STAT 2:free_chunks 4526 > > > > > > > STAT 2:free_chunks_end 0 > > > > > > > STAT 2:get_hits 0 > > > > > > > STAT 2:cmd_set 0 > > > > > > > STAT 2:delete_hits 0 > > > > > > > STAT 2:incr_hits 0 > > > > > > > STAT 2:decr_hits 0 > > > > > > > STAT 2:cas_hits 0 > > > > > > > STAT 2:cas_badval 0 > > > > > > > STAT 2:touch_hits 0 > > > > > > > STAT 30:chunk_size 66232 > > > > > > > STAT 30:chunks_per_page 15 > > > > > > > STAT 30:total_pages 1 > > > > > > > STAT 30:total_chunks 15 > > > > > > > STAT 30:used_chunks 3 > > > > > > > STAT 30:free_chunks 12 > > > > > > > STAT 30:free_chunks_end 0 > > > > > > > STAT 30:get_hits 0 > > > > > > > STAT 30:cmd_set 0 > > > > > > > STAT 30:delete_hits 0 > > > > > > > STAT 30:incr_hits 0 > > > > > > > STAT 30:decr_hits 0 > > > > > > > STAT 30:cas_hits 0 > > > > > > > STAT 30:cas_badval 0 > > > > > > > STAT 30:touch_hits 0 > > > > > > > STAT 31:chunk_size 82792 > > > > > > > STAT 31:chunks_per_page 12 > > > > > > > STAT 31:total_pages 1 > > > > > > > STAT 31:total_chunks 12 > > > > > > > STAT 31:used_chunks 6 > > > > > > > STAT 31:free_chunks 6 > > > > > > > STAT 31:free_chunks_end 0 > > > > > > > STAT 31:get_hits 0 > > > > > > > STAT 31:cmd_set 0 > > > > > > > STAT 31:delete_hits 0 > > > > > > > STAT 31:incr_hits 0 > > > > > > > STAT 31:decr_hits 0 > > > > > > > STAT 31:cas_hits 0 > > > > > > > STAT 31:cas_badval 0 > > > > > > > STAT 31:touch_hits 0 > > > > > > > STAT 32:chunk_size 103496 > > > > > > > STAT 32:chunks_per_page 10 > > > > > > > STAT 32:total_pages 19 > > > > > > > STAT 32:total_chunks 190 > > > > > > > STAT 32:used_chunks 183 > > > > > > > STAT 32:free_chunks 7 > > > > > > > STAT 32:free_chunks_end 0 > > > > > > > STAT 32:get_hits 0 > > > > > > > STAT 32:cmd_set 0 > > > > > > > STAT 32:delete_hits 0 > > > > > > > STAT 32:incr_hits 0 > > > > > > > STAT 32:decr_hits 0 > > > > > > > STAT 32:cas_hits 0 > > > > > > > STAT 32:cas_badval 0 > > > > > > > STAT 32:touch_hits 0 > > > > > > > STAT 33:chunk_size 129376 > > > > > > > STAT 33:chunks_per_page 8 > > > > > > > STAT 33:total_pages 50 > > > > > > > STAT 33:total_chunks 400 > > > > > > > STAT 33:used_chunks 391 > > > > > > > STAT 33:free_chunks 9 > > > > > > > STAT 33:free_chunks_end 0 > > > > > > > STAT 33:get_hits 0 > > > > > > > STAT 33:cmd_set 0 > > > > > > > STAT 33:delete_hits 0 > > > > > > > STAT 33:incr_hits 0 > > > > > > > STAT 33:decr_hits 0 > > > > > > > STAT 33:cas_hits 0 > > > > > > > STAT 33:cas_badval 0 > > > > > > > STAT 33:touch_hits 0 > > > > > > > STAT 34:chunk_size 161720 > > > > > > > STAT 34:chunks_per_page 6 > > > > > > > STAT 34:total_pages 41 > > > > > > > STAT 34:total_chunks 246 > > > > > > > STAT 34:used_chunks 246 > > > > > > > STAT 34:free_chunks 0 > > > > > > > STAT 34:free_chunks_end 0 > > > > > > > STAT 34:get_hits 0 > > > > > > > STAT 34:cmd_set 0 > > > > > > > STAT 34:delete_hits 0 > > > > > > > STAT 34:incr_hits 0 > > > > > > > STAT 34:decr_hits 0 > > > > > > > STAT 34:cas_hits 0 > > > > > > > STAT 34:cas_badval 0 > > > > > > > STAT 34:touch_hits 0 > > > > > > > STAT 35:chunk_size 202152 > > > > > > > STAT 35:chunks_per_page 5 > > > > > > > STAT 35:total_pages 231 > > > > > > > STAT 35:total_chunks 1155 > > > > > > > STAT 35:used_chunks 1155 > > > > > > > STAT 35:free_chunks 0 > > > > > > > STAT 35:free_chunks_end 0 > > > > > > > STAT 35:get_hits 0 > > > > > > > STAT 35:cmd_set 0 > > > > > > > STAT 35:delete_hits 0 > > > > > > > STAT 35:incr_hits 0 > > > > > > > STAT 35:decr_hits 0 > > > > > > > STAT 35:cas_hits 0 > > > > > > > STAT 35:cas_badval 0 > > > > > > > STAT 35:touch_hits 0 > > > > > > > STAT 36:chunk_size 252696 > > > > > > > STAT 36:chunks_per_page 4 > > > > > > > STAT 36:total_pages 536 > > > > > > > STAT 36:total_chunks 2144 > > > > > > > STAT 36:used_chunks 2144 > > > > > > > STAT 36:free_chunks 0 > > > > > > > STAT 36:free_chunks_end 0 > > > > > > > STAT 36:get_hits 0 > > > > > > > STAT 36:cmd_set 0 > > > > > > > STAT 36:delete_hits 0 > > > > > > > STAT 36:incr_hits 0 > > > > > > > STAT 36:decr_hits 0 > > > > > > > STAT 36:cas_hits 0 > > > > > > > STAT 36:cas_badval 0 > > > > > > > STAT 36:touch_hits 0 > > > > > > > STAT 37:chunk_size 315872 > > > > > > > STAT 37:chunks_per_page 3 > > > > > > > STAT 37:total_pages 28 > > > > > > > STAT 37:total_chunks 84 > > > > > > > STAT 37:used_chunks 84 > > > > > > > STAT 37:free_chunks 0 > > > > > > > STAT 37:free_chunks_end 0 > > > > > > > STAT 37:get_hits 0 > > > > > > > STAT 37:cmd_set 0 > > > > > > > STAT 37:delete_hits 0 > > > > > > > STAT 37:incr_hits 0 > > > > > > > STAT 37:decr_hits 0 > > > > > > > STAT 37:cas_hits 0 > > > > > > > STAT 37:cas_badval 0 > > > > > > > STAT 37:touch_hits 0 > > > > > > > STAT 39:chunk_size 524288 > > > > > > > STAT 39:chunks_per_page 2 > > > > > > > STAT 39:total_pages 4212 > > > > > > > STAT 39:total_chunks 8424 > > > > > > > STAT 39:used_chunks 8424 > > > > > > > STAT 39:free_chunks 0 > > > > > > > STAT 39:free_chunks_end 0 > > > > > > > STAT 39:get_hits 0 > > > > > > > STAT 39:cmd_set 12794 > > > > > > > STAT 39:delete_hits 0 > > > > > > > STAT 39:incr_hits 0 > > > > > > > STAT 39:decr_hits 0 > > > > > > > STAT 39:cas_hits 0 > > > > > > > STAT 39:cas_badval 0 > > > > > > > STAT 39:touch_hits 0 > > > > > > > STAT active_slabs 10 > > > > > > > STAT total_malloced 5368709120 > > > > > > > END > > > > > > > **** Output of stats items > > > > > > > STAT items:39:number 4212 > > > > > > > STAT items:39:number_hot 261 > > > > > > > STAT items:39:number_warm 0 > > > > > > > STAT items:39:number_cold 3951 > > > > > > > STAT items:39:age_hot 33 > > > > > > > STAT items:39:age_warm 0 > > > > > > > STAT items:39:age 165 > > > > > > > STAT items:39:mem_requested 5242957328 <(524)%20295-7328> > > > > > > > STAT items:39:evicted 8582 > > > > > > > STAT items:39:evicted_nonzero 0 > > > > > > > STAT items:39:evicted_time 165 > > > > > > > STAT items:39:outofmemory 0 > > > > > > > STAT items:39:tailrepairs 0 > > > > > > > STAT items:39:reclaimed 0 > > > > > > > STAT items:39:expired_unfetched 0 > > > > > > > STAT items:39:evicted_unfetched 8582 > > > > > > > STAT items:39:evicted_active 0 > > > > > > > STAT items:39:crawler_reclaimed 0 > > > > > > > STAT items:39:crawler_items_checked 4212 > > > > > > > STAT items:39:lrutail_reflocked 0 > > > > > > > STAT items:39:moves_to_cold 12533 > > > > > > > STAT items:39:moves_to_warm 0 > > > > > > > STAT items:39:moves_within_lru 0 > > > > > > > STAT items:39:direct_reclaims 8582 > > > > > > > STAT items:39:hits_to_hot 0 > > > > > > > STAT items:39:hits_to_warm 0 > > > > > > > STAT items:39:hits_to_cold 0 > > > > > > > STAT items:39:hits_to_temp 0 > > > > > > > END > > > > > > > > > > > > > > I'm happy to open an issue on GitHub if the stats confirm > there actually is something in the code that could be fixed. You can > > decide > > > > then how much > > > > > > > effort it's worth to fix it. If my workaround idea works, > though, I'll just put it in place and move on to the next thing. ;-) > > > > > > > On Wednesday, August 24, 2022 at 7:01:33 PM UTC-7 Dormando > wrote: > > > > > > > To put a little more internal detail on this: > > > > > > > > > > > > > > - As a SET is being processed item chunks must be made > available > > > > > > > - If it is chunked memory, it will be fetching these data > chunks from > > > > > > > across different slab classes (ie: 512k + 512k + sized enough > for > > > > > > > whatever's left over) > > > > > > > - That full chunked item gets put in the largest slab class > > > > > > > - If another SET comes along and it needs 512k + 512k + an 8k, > it has to > > > > > > > look into the 8k slab class for an item to evict. > > > > > > > - Except there's no memory in the 8k class: it's all actually > in the > > > > > > > largest class. > > > > > > > - So there's nothing to evict to free up memory > > > > > > > - So you get an error. > > > > > > > - The slab page mover can make this worse by not leaving > enough reserved > > > > > > > memory in the lower slab classes. > > > > > > > > > > > > > > I wasn't sure how often this would happen in practice and > fixed a few edge > > > > > > > cases in the past. Though I always figured I would've > revisited it years > > > > > > > ago, so sorry about the trouble. > > > > > > > > > > > > > > There are a few tuning options: > > > > > > > 1) more memory, lol. > > > > > > > 2) you can override slab_chunk_max to be much lower (like 8k > or 16k), > > > > > > > which will make a lot more chunks but you won't realistically > notice a > > > > > > > performance difference. This can reduce the number of total > slab classes, > > > > > > > making it easier for more "end cap" memory to be found. > > > > > > > 3) delete items as you use them so it doesn't have to evict. > not the best > > > > > > > option. > > > > > > > > > > > > > > There're code fixes I can try but I need to see what the exact > symptom is > > > > > > > first, which is why I ask for the stats stuff. > > > > > > > > > > > > > > On Wed, 24 Aug 2022, dormando wrote: > > > > > > > > > > > > > > > Hey, > > > > > > > > > > > > > > > > You're probably hitting an edge case in the "large item > support". > > > > > > > > > > > > > > > > Basically to store values > 512k memcached internally splits > them up into > > > > > > > > chunks. When storing items memcached first allocates the > item storage, > > > > > > > > then reads data from the client socket directly into the > data storage. > > > > > > > > > > > > > > > > For chunked items it will be allocating chunks of memory as > it reads from > > > > > > > > the socket, which can lead to that (thankfully very > specific) "during > > > > > > > > read" error. I've long suspected some edge cases but haven't > revisited > > > > > > > > that code in ... a very long time. > > > > > > > > > > > > > > > > If you can grab snapshots of "stats items" and "stats slabs" > when it's > > > > > > > > both evicting normally and when it's giving you errors, I > might be able to > > > > > > > > figure out what's causing it to bottom out and see if > there's some tuning > > > > > > > > to do. Normal "stats" output is also helpful. > > > > > > > > > > > > > > > > It kind of smells like some slab classes are running low on > memory > > > > > > > > sometimes, and the items in them are being read for a long > time... but we > > > > > > > > have to see the data to be sure. > > > > > > > > > > > > > > > > If you're feeling brave you can try building the current > "next" branch > > > > > > > > from github and try it out, as some fixes to the page mover > went in there. > > > > > > > > Those fixes may have caused too much memory to be moved away > from a slab > > > > > > > > class sometimes. > > > > > > > > > > > > > > > > Feel free to open an issue on github to track this if you'd > like. > > > > > > > > > > > > > > > > have fun, > > > > > > > > -Dormando > > > > > > > > > > > > > > > > On Wed, 24 Aug 2022, Hayden wrote: > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > I'm trying to use memcached for a use case I don't think > is outlandish, but it's not behaving the way I expect. I > > > > > > > > > wanted to sanity check what I'm doing to see if it should > be working but there's maybe something I've done wrong > > > > > > > > > with my configuration, or if my idea of how it's supposed > to work is wrong, or if there's a problem with > > > > > > > > > memcached itself. > > > > > > > > > > > > > > > > > > I'm using memcached as a temporary shared image store in a > distributed video processing application. At the front > > > > > > > > > of the pipeline is a process (actually all these processes > are pods in a kubernetes cluster, if it matters, and > > > > > > > > > memcached is running in the cluster as well) that consumes > a video stream over RTSP, saves each frame to > > > > > > > > > memcached, and outputs events to a message bus (kafka) > with metadata about each frame. At the end of the pipeline > > > > > > > > > is another process that consumes these metadata events, > and when it sees events it thinks are interesting it > > > > > > > > > retrieves the corresponding frame from memcached and adds > the frame to a web UI. The video is typically 30fps, so > > > > > > > > > there are about 30 set() operations each second, and since > each value is effectively an image the values are a > > > > > > > > > bit big (around 1MB... I upped the maximum value size in > memcached to 2MB to make sure they'd fit, and I haven't > > > > > > > > > had any problems with my writes being rejected because of > size). > > > > > > > > > > > > > > > > > > The video stream is processed in real-time, and > effectively infinite, but the memory available to memcached > > > > > > > > > obviously isn't (I've configured it to use 5GB, FWIW). > That's OK, because the cache is only supposed to be > > > > > > > > > temporary storage. My expectation is that once the > available memory is filled up (which takes a few minutes), > > > > > > > > > then roughly speaking for every new frame added to > memcached another entry (ostensibly the oldest one) will be > > > > > > > > > evicted. If the consuming process at the end of the > pipeline doesn't get to a frame it wants before it gets > > > > > > > > > evicted that's OK. > > > > > > > > > > > > > > > > > > That's not what I'm seeing, though, or at least that's not > all that I'm seeing. There are lots of evictions > > > > > > > > > happening, but the process that's writing to memcached > also goes through periods where every set() operation is > > > > > > > > > rejected with an "Out of memory during read" error. It > seems to happen in bursts where for several seconds every > > > > > > > > > write encounters the error, then for several seconds the > set() calls work just fine (and presumably other keys > > > > > > > > > are being evicted), then the cycle repeats. It goes on > this way for as long as I let the process run. > > > > > > > > > > > > > > > > > > I'm using memcached v1.6.14, installed into my k8s cluster > using the bitnami helm chart v6.0.5. My reading and > > > > > > > > > writing applications are both using pymemcache v3.5.2 for > their access. > > > > > > > > > > > > > > > > > > Can anyone tell me if it seems like what I'm doing should > work the way I described, and where I should try > > > > > > > > > investigating to see what's going wrong? Or alternatively, > why what I'm trying to do shouldn't work the way I > > > > > > > > > expected it to, so I can figure out how to make my > applications behave differently? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Hayden > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > --- > > > > > > > > > You received this message because you are subscribed to > the Google Groups "memcached" group. > > > > > > > > > To unsubscribe from this group and stop receiving emails > from it, send an email to > > > > > > > > > memcached+...@googlegroups.com. > > > > > > > > > To view this discussion on the web visit > > > > > > > > > > https://groups.google.com/d/msgid/memcached/702cae66-3108-46de-bb48-38eb3e17a5b7n%40googlegroups.com > . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > --- > > > > > > > > You received this message because you are subscribed to the > Google Groups "memcached" group. > > > > > > > > To unsubscribe from this group and stop receiving emails > from it, send an email to memcached+...@googlegroups.com. > > > > > > > > To view this discussion on the web visit > > > https://groups.google.com/d/msgid/memcached/2e8a8cd4-cc13-3e78-f76-772b92374a9b%40rydia.net > . > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > --- > > > > > > > You received this message because you are subscribed to the > Google Groups "memcached" group. > > > > > > > To unsubscribe from this group and stop receiving emails from > it, send an email to memcached+...@googlegroups.com. > > > > > > > To view this discussion on the web visit > > > > > https://groups.google.com/d/msgid/memcached/3c08514a-f43f-45aa-b25b-87b431cb74aen%40googlegroups.com > . > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > --- > > > > > > You received this message because you are subscribed to the > Google Groups "memcached" group. > > > > > > To unsubscribe from this group and stop receiving emails from > it, send an email to memcached+...@googlegroups.com. > > > > > > To view this discussion on the web visit > https://groups.google.com/d/msgid/memcached/76e3d381-1ba9-e7f2-d4b2-30d87e7cb7e%40rydia.net > . > > > > > > > > > > > > > > > > -- > > > > > > > > > > --- > > > > > You received this message because you are subscribed to the Google > Groups "memcached" group. > > > > > To unsubscribe from this group and stop receiving emails from it, > send an email to memcached+...@googlegroups.com. > > > > > To view this discussion on the web visit > https://groups.google.com/d/msgid/memcached/18244f57-4cd-b086-38a3-97c6e7755030%40rydia.net > . > > > > > > > > > > > > > -- > > > > > > > > --- > > > > You received this message because you are subscribed to the Google > Groups "memcached" group. > > > > To unsubscribe from this group and stop receiving emails from it, > send an email to memcached+...@googlegroups.com. > > > > To view this discussion on the web visit > > > https://groups.google.com/d/msgid/memcached/f158b293-e9ef-47f3-8b7d-c9a42ec4f3e4n%40googlegroups.com > . > > > > > > > > > > > > > > -- > > > > > > --- > > > You received this message because you are subscribed to the Google > Groups "memcached" group. > > > To unsubscribe from this group and stop receiving emails from it, send > an email to memcached+...@googlegroups.com. > > > To view this discussion on the web visit > https://groups.google.com/d/msgid/memcached/fe99cc83-7c3c-9a4e-f36-eaa18de015%40rydia.net > . > > > > > > > -- > > > > --- > > You received this message because you are subscribed to the Google > Groups "memcached" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to memcached+...@googlegroups.com. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/memcached/a3c5ecd8-0acc-4a05-8033-6cfb4f55de23n%40googlegroups.com > . > > > > > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/memcached/d85d625d-0e59-4e4a-89e8-7e5f8c03c93fn%40googlegroups.com.