As luck would have it, I'm dealt a problem that finally calls for an
honest-to-god distributed file system.  Basically, my company is in
talks with a very security-conscious vendor that can't allow any of
their data to touch Amazon S3, or indeed any other CDN.  This means
that we have the fun task of building out a private CDN to store a
whole bunch of ~1MB files, starting slow and quickly ramping up into
the many-dozens-of-Terabytes range.  (So, big enough to require a bit
of work, but not so big as to be silly to attempt.)  Based on the
current (and ever-changing) state of the art, what do you advise?

To give a bit more detail:
- Files will be uploaded to us from another datacenter, so write speed
needs to be ok, but it's not like we're generating files locally at
some insane rate.
- This isn't a super latency-sensitive problem (just hosting image
thumbnails), but read performance needs to support interactive
browsing.
- Availability needs to be in the 99.9% range inside the rack, though
the real availability problem is always at the network level.
- Each file will generally be accessed at most once (they're images
and thumbnails, and we'll set client-side caching very high).
- The access pattern will be almost uniformly random (eg, there won't
be some power-law curve where a few files dominate).
- We have 5 clusters (physical racks) available in 3 different
datacenters, though we aren't obligated to use them all for this
project.
- The racks will have an IPsec, LAN-to-LAN tunnel, likely with all
storage boxes on the same VLAN.
- Files will be named according to their sha1hashes.

Thoughts?  Off the cuff I'd think an easy place to start is to buy a
hot-swappable enclosure of consumer drives with a software RAID
controller, make a few huge ext4 volumes, and then just
fully-replicate all content between two racks in different datacenters
using rsync or even unison.  But my fear is this'll break down after
some number of files (this will quickly get into the millions-of-files
territory).  Accordingly, I assume that distributed filesystems are
designed to handle this much better.  I've read a bit on Tahoe, but
it's daunting and I'm not sure where to start.  Zooko, any tips?

Furthermore, while full replication between two racks is certainly
doable and acceptable cost, it ignores the performance and
availability opportunities of that third datacenter.  Ideally I'd host
something like 2/3 of the data in each datacenter, such that we get
full 2x replication, without the extraneous cost of 3x replication (or
5x if we use all racks).  But my fear here is there will be a "read
penalty" for 1/3 of all reads, as each datacenter won't be able to
host all its requests from its local storage.  (Unless we start to get
really tricky with DNS, perhaps having subdomains for "0-9" and "a-f"
and then configure each datacenter to host precisely 2/3 of the data,
but a different 2/3 for each; then use DNS such that requests always
go to a datacenter that has the content...)

Anyway, I'm not the sysadmin on this project, so somebody else with a
lot more knowledge than me will actually implement this.  But I'd
welcome some pointers.  Thanks!

-david
_______________________________________________
p2p-hackers mailing list
p2p-hackers@lists.zooko.com
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Reply via email to