We have a medium-sized dataset (~50M entries) with small values (a few hundred bytes) where we need "persistence" with a very high read throughput and occasional updates. To solve this, we built a cluster of memcached servers with enough RAM on each machine to store the entire dataset and wrote our own memcached client with the following characteristics:
- each write operation writes to every machine in the cluster - each read operation reads from any one machine in the cluster - if a machine becomes non-responsive, it is marked as dirty and removed from the cluster list Every night a "full populate" script is run and any new machines or machines that have been removed throughout the day are re-added to the cluster. With this setup, we achieve hundreds of thousands of reads per-second and achieve virtual "persistence." On Tue, Aug 11, 2009 at 4:56 PM, smolix <alex.sm...@gmail.com> wrote: > Hi Adam, > > Thanks for the tokyocabinet pointer. Unfortunately that would be too > slow (we need as high iops as we can get and no, ssd would not be an > answer unless it gets into FusionIO performance range). What was the > hack you did? We don't need persistent storage for many days. The > total computation will run in 1 maybe 2 days total. > > Take care, > > Alex > > On Aug 11, 12:37 pm, Adam Lee <a...@fotolog.biz> wrote: > > We do a hack that enables something similar to this, but I wouldn't > > recommend it. If you want something memcached-like but persistent, you > > should look into, for example, tokyocabinet. It even speaks memcached > > protocol, so you can use it as a drop-in replacement and achieve the > desired > > effect. It's not _as_ fast as memcached, but it's still very fast. > > > > > > > > On Tue, Aug 11, 2009 at 1:59 PM, smolix <alex.sm...@gmail.com> wrote: > > > > > Hi, > > > > > Is there a way to use memcached as a _guaranteed_ distributed > > > (key,value) storage? That is, I want to have a distributed storage of > > > (key, value) pairs which can be accessed from many clients > > > efficiently. The RAM is sufficient that all should easily fit into > > > memory but I probably can't have an overhead of more than 2x the > > > amount of data it takes to store the pairs. Is there a way to turn off > > > the discard option in memcached? I can tune the keys such that they > > > are sequential or do similar preprocessing if needed. > > > > > This is about 100-500GB of data that I need to store with values less > > > than 4k per item (in some cases much smaller). > > > > > Any help and suggestions would be greatly appreciated. > > > > > Thanks, > > > > > Alex > > > > -- > > awl > -- awl