Re: "Out of memory during read" errors instead of key eviction

2022-08-24 Thread dormando
To put a little more internal detail on this:

- As a SET is being processed item chunks must be made available
- If it is chunked memory, it will be fetching these data chunks from
across different slab classes (ie: 512k + 512k + sized enough for
whatever's left over)
- That full chunked item gets put in the largest slab class
- If another SET comes along and it needs 512k + 512k + an 8k, it has to
look into the 8k slab class for an item to evict.
- Except there's no memory in the 8k class: it's all actually in the
largest class.
- So there's nothing to evict to free up memory
- So you get an error.
- The slab page mover can make this worse by not leaving enough reserved
memory in the lower slab classes.

I wasn't sure how often this would happen in practice and fixed a few edge
cases in the past. Though I always figured I would've revisited it years
ago, so sorry about the trouble.

There are a few tuning options:
1) more memory, lol.
2) you can override slab_chunk_max to be much lower (like 8k or 16k),
which will make a lot more chunks but you won't realistically notice a
performance difference. This can reduce the number of total slab classes,
making it easier for more "end cap" memory to be found.
3) delete items as you use them so it doesn't have to evict. not the best
option.

There're code fixes I can try but I need to see what the exact symptom is
first, which is why I ask for the stats stuff.

On Wed, 24 Aug 2022, dormando wrote:

> Hey,
>
> You're probably hitting an edge case in the "large item support".
>
> Basically to store values > 512k memcached internally splits them up into
> chunks. When storing items memcached first allocates the item storage,
> then reads data from the client socket directly into the data storage.
>
> For chunked items it will be allocating chunks of memory as it reads from
> the socket, which can lead to that (thankfully very specific) "during
> read" error. I've long suspected some edge cases but haven't revisited
> that code in ... a very long time.
>
> If you can grab snapshots of "stats items" and "stats slabs" when it's
> both evicting normally and when it's giving you errors, I might be able to
> figure out what's causing it to bottom out and see if there's some tuning
> to do. Normal "stats" output is also helpful.
>
> It kind of smells like some slab classes are running low on memory
> sometimes, and the items in them are being read for a long time... but we
> have to see the data to be sure.
>
> If you're feeling brave you can try building the current "next" branch
> from github and try it out, as some fixes to the page mover went in there.
> Those fixes may have caused too much memory to be moved away from a slab
> class sometimes.
>
> Feel free to open an issue on github to track this if you'd like.
>
> have fun,
> -Dormando
>
> On Wed, 24 Aug 2022, Hayden wrote:
>
> > Hello,
> > I'm trying to use memcached for a use case I don't think is outlandish, but 
> > it's not behaving the way I expect. I
> > wanted to sanity check what I'm doing to see if it should be working but 
> > there's maybe something I've done wrong
> > with my configuration, or if my idea of how it's supposed to work is wrong, 
> > or if there's a problem with
> > memcached itself.
> >
> > I'm using memcached as a temporary shared image store in a distributed 
> > video processing application. At the front
> > of the pipeline is a process (actually all these processes are pods in a 
> > kubernetes cluster, if it matters, and
> > memcached is running in the cluster as well) that consumes a video stream 
> > over RTSP, saves each frame to
> > memcached, and outputs events to a message bus (kafka) with metadata about 
> > each frame. At the end of the pipeline
> > is another process that consumes these metadata events, and when it sees 
> > events it thinks are interesting it
> > retrieves the corresponding frame from memcached and adds the frame to a 
> > web UI. The video is typically 30fps, so
> > there are about 30 set() operations each second, and since each value is 
> > effectively an image the values are a
> > bit big (around 1MB... I upped the maximum value size in memcached to 2MB 
> > to make sure they'd fit, and I haven't
> > had any problems with my writes being rejected because of size).
> >
> > The video stream is processed in real-time, and effectively infinite, but 
> > the memory available to memcached
> > obviously isn't (I've configured it to use 5GB, FWIW). That's OK, because 
> > the cache is only supposed to be
> > temporary storage. My expectation is that once the available memory is 
> > filled up (which takes a few minutes),
> > then roughly speaking for every new frame added to memcached another entry 
> > (ostensibly the oldest one) will be
> > evicted. If the consuming process at the end of the pipeline doesn't get to 
> > a frame it wants before it gets
> > evicted that's OK.
> >
> > That's not what I'm seeing, though, or at least that's not all that I'm 
> > 

Re: "Out of memory during read" errors instead of key eviction

2022-08-24 Thread dormando
Hey,

You're probably hitting an edge case in the "large item support".

Basically to store values > 512k memcached internally splits them up into
chunks. When storing items memcached first allocates the item storage,
then reads data from the client socket directly into the data storage.

For chunked items it will be allocating chunks of memory as it reads from
the socket, which can lead to that (thankfully very specific) "during
read" error. I've long suspected some edge cases but haven't revisited
that code in ... a very long time.

If you can grab snapshots of "stats items" and "stats slabs" when it's
both evicting normally and when it's giving you errors, I might be able to
figure out what's causing it to bottom out and see if there's some tuning
to do. Normal "stats" output is also helpful.

It kind of smells like some slab classes are running low on memory
sometimes, and the items in them are being read for a long time... but we
have to see the data to be sure.

If you're feeling brave you can try building the current "next" branch
from github and try it out, as some fixes to the page mover went in there.
Those fixes may have caused too much memory to be moved away from a slab
class sometimes.

Feel free to open an issue on github to track this if you'd like.

have fun,
-Dormando

On Wed, 24 Aug 2022, Hayden wrote:

> Hello,
> I'm trying to use memcached for a use case I don't think is outlandish, but 
> it's not behaving the way I expect. I
> wanted to sanity check what I'm doing to see if it should be working but 
> there's maybe something I've done wrong
> with my configuration, or if my idea of how it's supposed to work is wrong, 
> or if there's a problem with
> memcached itself.
>
> I'm using memcached as a temporary shared image store in a distributed video 
> processing application. At the front
> of the pipeline is a process (actually all these processes are pods in a 
> kubernetes cluster, if it matters, and
> memcached is running in the cluster as well) that consumes a video stream 
> over RTSP, saves each frame to
> memcached, and outputs events to a message bus (kafka) with metadata about 
> each frame. At the end of the pipeline
> is another process that consumes these metadata events, and when it sees 
> events it thinks are interesting it
> retrieves the corresponding frame from memcached and adds the frame to a web 
> UI. The video is typically 30fps, so
> there are about 30 set() operations each second, and since each value is 
> effectively an image the values are a
> bit big (around 1MB... I upped the maximum value size in memcached to 2MB to 
> make sure they'd fit, and I haven't
> had any problems with my writes being rejected because of size).
>
> The video stream is processed in real-time, and effectively infinite, but the 
> memory available to memcached
> obviously isn't (I've configured it to use 5GB, FWIW). That's OK, because the 
> cache is only supposed to be
> temporary storage. My expectation is that once the available memory is filled 
> up (which takes a few minutes),
> then roughly speaking for every new frame added to memcached another entry 
> (ostensibly the oldest one) will be
> evicted. If the consuming process at the end of the pipeline doesn't get to a 
> frame it wants before it gets
> evicted that's OK.
>
> That's not what I'm seeing, though, or at least that's not all that I'm 
> seeing. There are lots of evictions
> happening, but the process that's writing to memcached also goes through 
> periods where every set() operation is
> rejected with an "Out of memory during read" error. It seems to happen in 
> bursts where for several seconds every
> write encounters the error, then for several seconds the set() calls work 
> just fine (and presumably other keys
> are being evicted), then the cycle repeats. It goes on this way for as long 
> as I let the process run.
>
> I'm using memcached v1.6.14, installed into my k8s cluster using the bitnami 
> helm chart v6.0.5. My reading and
> writing applications are both using pymemcache v3.5.2 for their access.
>
> Can anyone tell me if it seems like what I'm doing should work the way I 
> described, and where I should try
> investigating to see what's going wrong? Or alternatively, why what I'm 
> trying to do shouldn't work the way I
> expected it to, so I can figure out how to make my applications behave 
> differently?
>
> Thanks,
> Hayden
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to
> memcached+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/memcached/702cae66-3108-46de-bb48-38eb3e17a5b7n%40googlegroups.com.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 

"Out of memory during read" errors instead of key eviction

2022-08-24 Thread Hayden
Hello,

I'm trying to use memcached for a use case I don't *think* is outlandish, 
but it's not behaving the way I expect. I wanted to sanity check what I'm 
doing to see if it should be working but there's maybe something I've done 
wrong with my configuration, or if my idea of how it's supposed to work is 
wrong, or if there's a problem with memcached itself.

I'm using memcached as a temporary shared image store in a distributed 
video processing application. At the front of the pipeline is a process 
(actually all these processes are pods in a kubernetes cluster, if it 
matters, and memcached is running in the cluster as well) that consumes a 
video stream over RTSP, saves each frame to memcached, and outputs events 
to a message bus (kafka) with metadata about each frame. At the end of the 
pipeline is another process that consumes these metadata events, and when 
it sees events it thinks are interesting it retrieves the corresponding 
frame from memcached and adds the frame to a web UI. The video is typically 
30fps, so there are about 30 set() operations each second, and since each 
value is effectively an image the values are a bit big (around 1MB... I 
upped the maximum value size in memcached to 2MB to make sure they'd fit, 
and I haven't had any problems with my writes being rejected because of 
size).

The video stream is processed in real-time, and effectively infinite, but 
the memory available to memcached obviously isn't (I've configured it to 
use 5GB, FWIW). That's OK, because the cache is only supposed to be 
temporary storage. My expectation is that once the available memory is 
filled up (which takes a few minutes), then roughly speaking for every new 
frame added to memcached another entry (ostensibly the oldest one) will be 
evicted. If the consuming process at the end of the pipeline doesn't get to 
a frame it wants before it gets evicted that's OK.

That's not what I'm seeing, though, or at least that's not all that I'm 
seeing. There are lots of evictions happening, but the process that's 
writing to memcached also goes through periods where every set() operation 
is rejected with an "Out of memory during read" error. It seems to happen 
in bursts where for several seconds every write encounters the error, then 
for several seconds the set() calls work just fine (and presumably other 
keys are being evicted), then the cycle repeats. It goes on this way for as 
long as I let the process run.

I'm using memcached v1.6.14, installed into my k8s cluster using the 
bitnami helm chart v6.0.5. My reading and writing applications are both 
using pymemcache v3.5.2 for their access.

Can anyone tell me if it seems like what I'm doing should work the way I 
described, and where I should try investigating to see what's going wrong? 
Or alternatively, why what I'm trying to do shouldn't work the way I 
expected it to, so I can figure out how to make my applications behave 
differently?

Thanks,
Hayden

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/702cae66-3108-46de-bb48-38eb3e17a5b7n%40googlegroups.com.