We have a medium-sized dataset (~50M entries) with small values (a few
hundred bytes) where we need "persistence" with a very high read throughput
and occasional updates.
To solve this, we built a cluster of memcached servers with enough RAM on
each machine to store the entire dataset and wrote our own memcached client
with the following characteristics:

- each write operation writes to every machine in the cluster
- each read operation reads from any one machine in the cluster
- if a machine becomes non-responsive, it is marked as dirty and removed
from the cluster list

Every night a "full populate" script is run and any new machines or machines
that have been removed throughout the day are re-added to the cluster.

With this setup, we achieve hundreds of thousands of reads per-second and
achieve virtual "persistence."

On Tue, Aug 11, 2009 at 4:56 PM, smolix <alex.sm...@gmail.com> wrote:

> Hi Adam,
>
> Thanks for the tokyocabinet pointer. Unfortunately that would be too
> slow (we need as high iops as we can get and no, ssd would not be an
> answer unless it gets into FusionIO performance range). What was the
> hack you did? We don't need persistent storage for many days. The
> total computation will run in 1 maybe 2 days total.
>
> Take care,
>
> Alex
>
> On Aug 11, 12:37 pm, Adam Lee <a...@fotolog.biz> wrote:
> > We do a hack that enables something similar to this, but I wouldn't
> > recommend it.  If you want something memcached-like but persistent, you
> > should look into, for example, tokyocabinet.  It even speaks memcached
> > protocol, so you can use it as a drop-in replacement and achieve the
> desired
> > effect.  It's not _as_ fast as memcached, but it's still very fast.
> >
> >
> >
> > On Tue, Aug 11, 2009 at 1:59 PM, smolix <alex.sm...@gmail.com> wrote:
> >
> > > Hi,
> >
> > > Is there a way to use memcached as a _guaranteed_ distributed
> > > (key,value) storage? That is, I want to have a distributed storage of
> > > (key, value) pairs which can be accessed from many clients
> > > efficiently. The RAM is sufficient that all should easily fit into
> > > memory but I probably can't have an overhead of more than 2x the
> > > amount of data it takes to store the pairs. Is there a way to turn off
> > > the discard option in memcached? I can tune the keys such that they
> > > are sequential or do similar preprocessing if needed.
> >
> > > This is about 100-500GB of data that I need to store with values less
> > > than 4k per item (in some cases much smaller).
> >
> > > Any help and suggestions would be greatly appreciated.
> >
> > > Thanks,
> >
> > > Alex
> >
> > --
> > awl
>



-- 
awl

Reply via email to