Dear all, I think the topic raised by this mail is very interesting and I'd like to contribute to the discussion (actually, we are a group working on this subject and what I'm about to write is the result of our discussions).
As a first observation, imho, we should draw a line to separate wide-area file systems, online (p2p) storage and online (p2p) backup, as the working assumption are different for each of these applications. So let me focus for now on *backup*. Before getting into technical details, I'd like to take up on the observations made by Alex: why a p2p approach to online backup? Why can't we just use a USB drive or a NAS box to backup/archive our data? One argument is that such an approach is not really seamless for the user: you need to plug your USB drive, you need to make sure your NAS box is up and running, etc… (things are easier with time machine/capsule -like products). Another argument is that USB disk / NAS boxes can break down, can be stolen (co-location of your PC/laptop and such devices does not help here), or burn... Well, at least these are some of the arguments in favor of "cloud" backup/storage solutions. Think of dropbox, symantec/norton backup and similar. So the natural follow up question is: why existing "cloud" storage systems are not enough? Why do we need a p2p approach? I think here there may be many arguments to discuss, but I'll focus on just few of them. Think of price. At a first sight, price seems negligible: 100 bucks for 50GB a year is not much. Right: but here we're talking about backup, i.e., *long-term* storage. Now, let's do a simple back-of-the-envelope calculation. I buy a PC today (no fancy peripherals) and I pay roughly $300. Average life span of this PC 3-5 years, say 3. Assume I generate 50GB today and i want to back it up for 3 years. Let me also be imprecise and assume that the rate at which I generate new data compensate equally the rate at which online storage prices will go down (note, S3 prices for storage have been stable in the past few years). So 50GB in total at a constant price of $100 for 3 year is $300. => Backing up your 50GB costs you as much as your PC! Think about your data in the hands of a company that can run out of business (hopefully this doesn't apply to Amazon), or think about a government that tells this company to erase your data (wikileaks, amazon). What if I can offer you the same backup service, at essentially no or very small fees, with no fear of putting your data all in one basket? You could save your 300 bucks and maybe go even for a trip and take pictures to actually backup! I think that this could be a good motivating example to work on a p2p approach (or hybrid, as we do) to online backup. If you are interested in what we have been working on, here are some links to our work: http://bit.ly/p2pbackup http://arxiv.org/pdf/1009.1344v1 http://www.eurecom.fr/util/popuppubli.en.htm?page=copyright&id=3140 In few words, for those who don't like papers: * p2p backup requires redundancy: apply your favorite coding and place fragments on remote peers (trivial, state of the art) * p2p backup => you keep a local copy of your data. So what? No need to achieve high data availability for low latency access to individual files. Moreover, no need to go bonkers with complex repair techniques: one remote peers goes dead and you need to replace the missing fragment? Without local copy (i.e. storage) you need to download enough redundant blocks to generate a new one and place it, with a local copy, you just output a new encoded block * p2p backup => durability is key, and before that making sure you actually complete a backup operation as fast as possible is even more important (do that before you crash!). You can trade-off backup / restore performance: it's not like in storage that you have to access your files quickly. Restores happens (hopefully) rarely, so it's better to have low time to backup for a (slightly) larger time to restore. How to achieve that? Reduce redundancy, which implies less data to upload and storage efficiency. Just be careful not to reduce it too much, otherwise you may loose data. We have done some more nerdy work on optimality vs. random scheduling, incentives, security and so on, but I realize this mail is long and don't want to push it too much. If you're interested I'll be happy to give more details. Let me conclude with this. Despite the intellectual and technical challenges that such an application constitutes, we've been also considering some business cases. We work in a project (http://www.nanodatacenters.eu/) in which a telco could use edge resources (set-top-boxes) to deploy services, and p2p backup is one of such. You spread data on set top boxes (which by the way are up most of the time, which further reduces redundancy requirements), the telco can help you out with additional storage and coordination, and bandwidth remains within the telco. We're building an (open source) application out of all this, and will post it here asap for you to have a look. Ciao, Pietro, Matteo, Laszlo and Mario On Fri, Jan 28, 2011 at 4:44 PM, Alen Peacock <alenlpeac...@gmail.com> wrote: > I dabbled in this area a number of years ago, and still maintain the > flŭd backup website (http://flud.org). flŭd had almost identical > design goals to the ones you describe. Unfortunately, other pursuits > caused me to largely abandon flud. Not much has been done on it in the > past few years. Still, you might be interested in some of the > discussion and designs revolving around durability, privacy, localized > trust in an untrusted environment, attack resistance, etc. explained > in the wiki and blog. > > I'll warn you upfront, though, having worked in this industry for the > past 4.5 years: automatic offsite backup is a problem that appears > very simple at first but is deceptively complex with all sorts of > really high-effort-to-get-right features required (many of them > client-side), even if you eliminate the p2p complexities. > > Alen > > > On Wed, Jan 26, 2011 at 12:04 PM, Michael Militzer <mich...@xvid.org> wrote: >> Hi all, >> >> I am new to the list and also have no background with P2P. However, >> I'd like to realize a P2P related project and have therefore read a >> bit on the topic during the past year. >> >> Basically, what I have in mind is a peer-to-peer, wide-area persistant >> file storage system for a backup use-case. That may not sound very >> exciting because several other software promising the same already >> exists. However, when taking a closer look it seems to me all existing >> solutions have serious shortcomings in one or the other area. >> >> That's also the main topic of my post here. I'd appreciate your >> feedback on whether my analysis about related software in the following >> is correct or if I'm rather missing something important. >> >> I envision a storage network that is open to everyone, so is comprised >> of untrusted nodes that are deployed on a global scale. Like with any >> external backup service, I think the most important design goal is to >> ensure "trust". So such a system must be robust and secure to a very >> high degree - after all, users are supposed to entrust the system their >> irreplacable data. >> >> Therefore, I think robustness and security here cannot mean only the >> promise for data integrity by some system operator or software vendor. >> Data availability, privacy and also censorship resistance must be >> verifiable. In addition, a secure storage system must withstand >> adversarial attacks. A direct consequence of this is that the peer >> software and protocol must be open-source. A storage system built >> around a secret protocol and proprietary software cannot be trusted. >> >> And with these requirements in mind, what is currently available seems >> somewhat disappointing (but maybe it's also just my inability to conduct >> proper research - so if you know more please give me some input): >> >> >> Allmydata/Tahoe: >> >> The only true open-source contender I know of. Unfortunately, not >> really targeted towards a global-scale network of untrusted nodes. Also, >> no particular measures to withstand adversarial attacks (but is also not >> needed when deployed in a trusted environment). >> >> Cleversafe: >> >> Apparently not open-source anymore. Also not P2P in the sense of a >> wide-area network of untrusted nodes. >> >> Crashplan: >> >> Proprietary. P2P only to set up a "friends network", so no untrusted >> nodes. >> >> Freenet: >> >> Open-source. Is not really a persistent file store and has other design >> goals that don't quite fit a backup storage system. >> >> Maidsafe/PerpetualData: >> >> Some support libraries open-sourced but not the actual protocol and >> client software. Software not yet publically available. From what is >> known about the protocol, it looks complex. Not sure about how it >> will scale or the robustness it can provide. >> >> Powerfolder: >> >> Some source code seems available. However based on manual peer >> selection, so also a "friends network". >> >> Wuala: >> >> Proprietary software. Not much is known about how it internally >> works, in particular the relation between Wuala's central servers >> and the storage provided by peers (So: What is dominating? Is Wuala >> actually a normal cloud storage service with some P2P buzz or is the >> storage really P2P organized mainly?). Nothing is known about how >> Wuala can withstand adversarial attacks (Security by secrecy because >> of secret protocol?). Seems to be the only global-scale P2P storage >> system in "production use" today according to my knowledge. >> >> >> So it seems one is a bit at a loss when looking for an open-source P2P >> storage system that is build on a network of untrusted nodes. There >> are some more open-source programs derived from research like e.g. >> OceanStore. But these seem unmaintained and not actually deployed. >> I haven't found a P2P backup solution that has: >> >> - Deployability on a global scale with untrusted nodes >> - Secure, private and persistent data storage >> - Open-source protocol and software >> - Censorship-resistance >> - Resiliency to adversarial attacks >> - Reasonably simple and manageable design >> >> This however would be the kind of project I'd like to explore further if >> not already available. If anyone is interested I could briefly describe >> the design I have in mind in a later post. I'd like to connect to people >> who have practical experience with P2P networking (other than me) to >> discuss and further refine design ideas... >> >> Thanks for any input you can provide! >> >> Best regards, >> Michael >> >> >> >> _______________________________________________ >> p2p-hackers mailing list >> p2p-hackers@lists.zooko.com >> http://lists.zooko.com/mailman/listinfo/p2p-hackers >> > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@lists.zooko.com > http://lists.zooko.com/mailman/listinfo/p2p-hackers > -- Pietro Michiardi, PhD Faculty, Networking Dept. Eurecom http://www.eurecom.fr/~michiard _______________________________________________ p2p-hackers mailing list p2p-hackers@lists.zooko.com http://lists.zooko.com/mailman/listinfo/p2p-hackers