Alen, thanks a lot for the pointer! I did not know about flud! Indeed, the design goals of your system are not so different. BTW: Have you created a complete implementation of the described system? Because it has quite some complexity if you go down into all details...
Fairness is really a crucial point for the performance of such a storage system. For a not solely backup-centric filestore, of course also fair distribution of available network bandwidth is important. I had looked into Samsara too but the overhead is too large obviously to use for exchanging network bandwidth. Also, the probabilistic punishment may affect others too (refer to the very last paragraph below). As to robustness, the admission control you do in flud is quite good (at least you can't do much better without a CA). Maybe one could add a larger cost to the entry of a new node by a proof of work system (cryptographic puzzle, bitcoin, etc). But I don't quite like such solutions. When deployed large-scale, millions of computers will be running at 100% CPU just producing heat and consuming electric power - this is not really green technology... So a vulnerability to sybil attack remains. It's possible with reasonable effort for an attacker to place a large number of nodes into the system. You have measures in flud to counteract malicious nodes - but what about nodes that don't act maliciously to others? This is a problem I thought about a lot. That's because there is an information leakage in the system if one party controls a certain number of nodes in the network. Data de-duplication has the advantage that it saves storage and that data already present in the system does not have to be uploaded twice. The disadvantage however is that same data encrypts to the same storage block. This allows a sybil attacker to upload a number of "interesting" files and log who else in the network accesses the same files. This is easily possible even though all stored and transmitted data is encrypted. So encryption is not enough to ensure privacy. Had this been discussed throughout the design of flud? Another problem derived from de-duplication is deletion of data. Data can only be deleted when it is not referenced anymore by any user in the system. This means that also the original uploader may not be allowed to actually delete the file. Something like a ref-counter or delete token is needed. How does flud solve this problem? Thanks, Michael Quoting Alen Peacock <alenlpeac...@gmail.com>: > I dabbled in this area a number of years ago, and still maintain the > fl?d backup website (http://flud.org). fl?d had almost identical > design goals to the ones you describe. Unfortunately, other pursuits > caused me to largely abandon flud. Not much has been done on it in the > past few years. Still, you might be interested in some of the > discussion and designs revolving around durability, privacy, localized > trust in an untrusted environment, attack resistance, etc. explained > in the wiki and blog. > > I'll warn you upfront, though, having worked in this industry for the > past 4.5 years: automatic offsite backup is a problem that appears > very simple at first but is deceptively complex with all sorts of > really high-effort-to-get-right features required (many of them > client-side), even if you eliminate the p2p complexities. > > Alen > > > On Wed, Jan 26, 2011 at 12:04 PM, Michael Militzer <mich...@xvid.org> wrote: >> Hi all, >> >> I am new to the list and also have no background with P2P. However, >> I'd like to realize a P2P related project and have therefore read a >> bit on the topic during the past year. >> >> Basically, what I have in mind is a peer-to-peer, wide-area persistant >> file storage system for a backup use-case. That may not sound very >> exciting because several other software promising the same already >> exists. However, when taking a closer look it seems to me all existing >> solutions have serious shortcomings in one or the other area. >> >> That's also the main topic of my post here. I'd appreciate your >> feedback on whether my analysis about related software in the following >> is correct or if I'm rather missing something important. >> >> I envision a storage network that is open to everyone, so is comprised >> of untrusted nodes that are deployed on a global scale. Like with any >> external backup service, I think the most important design goal is to >> ensure "trust". So such a system must be robust and secure to a very >> high degree - after all, users are supposed to entrust the system their >> irreplacable data. >> >> Therefore, I think robustness and security here cannot mean only the >> promise for data integrity by some system operator or software vendor. >> Data availability, privacy and also censorship resistance must be >> verifiable. In addition, a secure storage system must withstand >> adversarial attacks. A direct consequence of this is that the peer >> software and protocol must be open-source. A storage system built >> around a secret protocol and proprietary software cannot be trusted. >> >> And with these requirements in mind, what is currently available seems >> somewhat disappointing (but maybe it's also just my inability to conduct >> proper research - so if you know more please give me some input): >> >> >> Allmydata/Tahoe: >> >> The only true open-source contender I know of. Unfortunately, not >> really targeted towards a global-scale network of untrusted nodes. Also, >> no particular measures to withstand adversarial attacks (but is also not >> needed when deployed in a trusted environment). >> >> Cleversafe: >> >> Apparently not open-source anymore. Also not P2P in the sense of a >> wide-area network of untrusted nodes. >> >> Crashplan: >> >> Proprietary. P2P only to set up a "friends network", so no untrusted >> nodes. >> >> Freenet: >> >> Open-source. Is not really a persistent file store and has other design >> goals that don't quite fit a backup storage system. >> >> Maidsafe/PerpetualData: >> >> Some support libraries open-sourced but not the actual protocol and >> client software. Software not yet publically available. From what is >> known about the protocol, it looks complex. Not sure about how it >> will scale or the robustness it can provide. >> >> Powerfolder: >> >> Some source code seems available. However based on manual peer >> selection, so also a "friends network". >> >> Wuala: >> >> Proprietary software. Not much is known about how it internally >> works, in particular the relation between Wuala's central servers >> and the storage provided by peers (So: What is dominating? Is Wuala >> actually a normal cloud storage service with some P2P buzz or is the >> storage really P2P organized mainly?). Nothing is known about how >> Wuala can withstand adversarial attacks (Security by secrecy because >> of secret protocol?). Seems to be the only global-scale P2P storage >> system in "production use" today according to my knowledge. >> >> >> So it seems one is a bit at a loss when looking for an open-source P2P >> storage system that is build on a network of untrusted nodes. There >> are some more open-source programs derived from research like e.g. >> OceanStore. But these seem unmaintained and not actually deployed. >> I haven't found a P2P backup solution that has: >> >> - Deployability on a global scale with untrusted nodes >> - Secure, private and persistent data storage >> - Open-source protocol and software >> - Censorship-resistance >> - Resiliency to adversarial attacks >> - Reasonably simple and manageable design >> >> This however would be the kind of project I'd like to explore further if >> not already available. If anyone is interested I could briefly describe >> the design I have in mind in a later post. I'd like to connect to people >> who have practical experience with P2P networking (other than me) to >> discuss and further refine design ideas... >> >> Thanks for any input you can provide! >> >> Best regards, >> Michael >> >> >> >> _______________________________________________ >> p2p-hackers mailing list >> p2p-hackers@lists.zooko.com >> http://lists.zooko.com/mailman/listinfo/p2p-hackers >> > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@lists.zooko.com > http://lists.zooko.com/mailman/listinfo/p2p-hackers > _______________________________________________ p2p-hackers mailing list p2p-hackers@lists.zooko.com http://lists.zooko.com/mailman/listinfo/p2p-hackers