As luck would have it, I'm dealt a problem that finally calls for an honest-to-god distributed file system. Basically, my company is in talks with a very security-conscious vendor that can't allow any of their data to touch Amazon S3, or indeed any other CDN. This means that we have the fun task of building out a private CDN to store a whole bunch of ~1MB files, starting slow and quickly ramping up into the many-dozens-of-Terabytes range. (So, big enough to require a bit of work, but not so big as to be silly to attempt.) Based on the current (and ever-changing) state of the art, what do you advise?
To give a bit more detail: - Files will be uploaded to us from another datacenter, so write speed needs to be ok, but it's not like we're generating files locally at some insane rate. - This isn't a super latency-sensitive problem (just hosting image thumbnails), but read performance needs to support interactive browsing. - Availability needs to be in the 99.9% range inside the rack, though the real availability problem is always at the network level. - Each file will generally be accessed at most once (they're images and thumbnails, and we'll set client-side caching very high). - The access pattern will be almost uniformly random (eg, there won't be some power-law curve where a few files dominate). - We have 5 clusters (physical racks) available in 3 different datacenters, though we aren't obligated to use them all for this project. - The racks will have an IPsec, LAN-to-LAN tunnel, likely with all storage boxes on the same VLAN. - Files will be named according to their sha1hashes. Thoughts? Off the cuff I'd think an easy place to start is to buy a hot-swappable enclosure of consumer drives with a software RAID controller, make a few huge ext4 volumes, and then just fully-replicate all content between two racks in different datacenters using rsync or even unison. But my fear is this'll break down after some number of files (this will quickly get into the millions-of-files territory). Accordingly, I assume that distributed filesystems are designed to handle this much better. I've read a bit on Tahoe, but it's daunting and I'm not sure where to start. Zooko, any tips? Furthermore, while full replication between two racks is certainly doable and acceptable cost, it ignores the performance and availability opportunities of that third datacenter. Ideally I'd host something like 2/3 of the data in each datacenter, such that we get full 2x replication, without the extraneous cost of 3x replication (or 5x if we use all racks). But my fear here is there will be a "read penalty" for 1/3 of all reads, as each datacenter won't be able to host all its requests from its local storage. (Unless we start to get really tricky with DNS, perhaps having subdomains for "0-9" and "a-f" and then configure each datacenter to host precisely 2/3 of the data, but a different 2/3 for each; then use DNS such that requests always go to a datacenter that has the content...) Anyway, I'm not the sysadmin on this project, so somebody else with a lot more knowledge than me will actually implement this. But I'd welcome some pointers. Thanks! -david _______________________________________________ p2p-hackers mailing list p2p-hackers@lists.zooko.com http://lists.zooko.com/mailman/listinfo/p2p-hackers