> It would be really bad for both of us if you created a mission critical backup solution based off of an undocumented, unsupported dataformat which potentially changes with version updates.
Oh absolutely haha! This is more of a POC to prove feasibility, and I was also just curious about what data was actually in the file. > One of many questions; is this due to cost? (ie; don't want to double the cache storage) or some other reason? Mostly about cost, yeah. I'll hit you up on Discord On Mon, Apr 24, 2023 at 11:50 PM dormando <dorma...@rydia.net> wrote: > Hey, > > > Aside: > I'm actually busy trying to parse the datafile with a small Go program to > try and replay all the data. Solving this warming will give us a lot of > confidence to roll this out in a big way across our infra. > What're your thoughts on this and the above? > > > It would be really bad for both of us if you created a mission critical > backup solution based off of an undocumented, unsupported dataformat which > potentially changes with version updates. I think you may have also > misunderstood me; the data is actually partially in RAM. > > Is there any chance I could get you into the MC discord to chat a bit > further about your use case? (linked from https://memcached.org/) - > easier to play 20 questions there. If that's not possible I'll list a bunch > of questions in the mailing list here instead :) > > > @Javier, thanks for your thoughts here too. Replication is not an option > for us at this scale; that said, your solution is pretty cool! > > > One of many questions; is this due to cost? (ie; don't want to double the > cache storage) or some other reason? > > On Monday, April 24, 2023 at 1:05:23 PM UTC+2 Javier Arias Losada wrote: > >> Hi there, >> >> one thing we've done to mitigate this kind of risk is having two copies >> of every shard in different availability zones in our cloud provider. Also, >> we run in kubernetes so for us nodes leaving the cluster is a relatively >> frequent issue... we are playing with a small process that does the warmup >> of new nodes quicker. >> >> Since we have more than one copy of the data, we do a warmup process. Our >> cache nodes are MUCH MUCH smaller... so this approach might not be >> reasonable for your use-case. >> >> This is how our process works, when a new node is restarted or any other >> situation that involves an empty memcached process starting, our warmup >> process: >> locates the warmer node for the shard >> gets all the keys and TTLS with from the warmer node: lru_crawler >> metadump all >> traverses in reverse the list of keys (lru_crawler goes from the least >> recently used, for this it's better to go from most recent). >> For each key: get the value from the warmer node and add (not set) it to >> the cold node, including TTL. >> >> This process might lead to some small data inconcistencies, it will >> depend on your use case how important that is. >> >> Since our access patterns are very skewed (a small % of keys gets the >> bigger % of traffic, at least during some time) going in reverse in the LRU >> dump helps being much more effective. >> >> Best >> Javier Arias >> On Sunday, April 23, 2023 at 7:24:28 PM UTC+2 dormando wrote: >> >>> Hey, >>> >>> Thanks for reaching out! >>> >>> There is no crash safety in memcached or extstore; it does look like the >>> data is on disk but it is actually spread across memory and disk, with >>> recent or heavily accessed data staying in RAM. Best case you only >>> recover >>> your cold data. Further, keys can appear multiple times in the extstore >>> datafile and we rely on the RAM index to know which one is current. >>> >>> I've never heard of anyone losing an entire cluster, but people do try >>> to >>> mitigate this by replicating cache across availability zones/regions. >>> This can be done with a few methods, like our new proxy code. I'd be >>> happy >>> to go over a few scenarios if you'd like. >>> >>> -Dormando >>> >>> On Sun, 23 Apr 2023, 'Danny Kopping' via memcached wrote: >>> >>> > First off, thanks for the amazing work @dormando & others! >>> > Context: >>> > I work at Grafana Labs, and we are very interested in trying out >>> extstore for some very large (>50TB) caches. We plan to split this 50TB >>> cache into about >>> > 35 different nodes, each with 1.5TB of NVMe & a small memcached >>> instance. Losing any given node will result in losing ~3% of the overall >>> cache which is >>> > acceptable, however if we lose all nodes at once somehow, losing all >>> of our cache will be pretty bad and will put severe pressure on our >>> backend. >>> > >>> > Ask: >>> > Having looked at the file that extstore writes on disk, it looks like >>> it has both keys & values contained in it. Would it be possible to >>> "re-warm" the >>> > cache on startup by scanning this data and resubmitting it to itself? >>> We could then have add some condition to our readiness check in k8s to wait >>> until >>> > the data is all re-warmed and then allow traffic to flow to those >>> instances. Is this feature planned for anytime soon? >>> > >>> > Thanks! >>> > >>> > -- >>> > >>> > --- >>> > You received this message because you are subscribed to the Google >>> Groups "memcached" group. >>> > To unsubscribe from this group and stop receiving emails from it, send >>> an email to memcached+...@googlegroups.com. >>> > To view this discussion on the web visit >>> https://groups.google.com/d/msgid/memcached/cc45382b-eee7-4e37-a841-d210bf18ff4bn%40googlegroups.com. >>> >>> > >>> > >>> >> -- > > --- > You received this message because you are subscribed to the Google Groups > "memcached" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to memcached+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/memcached/daa1fca5-e1b6-4879-9184-eafe0aa9cb82n%40googlegroups.com > <https://groups.google.com/d/msgid/memcached/daa1fca5-e1b6-4879-9184-eafe0aa9cb82n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "memcached" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/memcached/13dpoGIU1VY/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > memcached+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/memcached/C8B67A77-50E3-4CA2-8762-A57093794A5B%40rydia.net > <https://groups.google.com/d/msgid/memcached/C8B67A77-50E3-4CA2-8762-A57093794A5B%40rydia.net?utm_medium=email&utm_source=footer> > . > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/memcached/CAGPVm4M0%3Ddr2f3GoqggQH08BHYfA-38vgZhRs5ZcUxe6weYGRA%40mail.gmail.com.