Dear Chris, thanks for your thoughts. I had a look also at your Octavia filesystem some time ago. While I don't agree that we should really drop erasure coding, I however like your approach to keep things simple. Also today indeed bandwidth should be the more precious resource in a P2P system compared to storage, which is available in abundance to the home user. So a simple replication strategy might not be so bad after all...
Quoting Chris Palmer <ch...@noncombatant.org>: > Michael Militzer writes: > >> Data availability, privacy and also censorship resistance must be >> verifiable. In addition, a secure storage system must withstand >> adversarial attacks. A direct consequence of this is that the peer >> software and protocol must be open-source. A storage system built around a >> secret protocol and proprietary software cannot be trusted. > > I'm not convinced that OSD-approved licensing (which is what I assume you > mean by "open-source") necessarily or exclusively correlates with > trustworthiness. Plenty of open source software is untrustworthy, and at > least some proprietary software is at least as trustworthy as the most > trustworthy open source software. Hm, maybe "trustworthiness" is not the right term and "verifiability" is better. If you have all the necessary source code (no matter if OSI approved or not) to create the corresponding binary yourself you have all ability to verify what the application will do at runtime. This mere possibility does of course not autmatically imply trustworthiness. > Nor is closed-source software really as "closed" to security scrutiny as > people believe, and nor is open source as open to security scrutiny as > people believe. That's true. > That said, of course, I want open source software too. :) > >> Allmydata/Tahoe: >> >> The only true open-source contender I know of. Unfortunately, not really >> targeted towards a global-scale network of untrusted nodes. Also, no >> particular measures to withstand adversarial attacks (but is also not >> needed when deployed in a trusted environment). > > I think Tahoe-LAFS has pretty good defenses against a range of attacks on > confidentiality, integrity, and availability. What do you find insufficient > about its defense measures? Well, I have no clue on cryptograhpy and what I know about Tahoe is solely derived from descriptions and documentation I read. That said, I think data confidentiality and integrity in Tahoe is sound as it's based on well understood cryptographic primitives. I am more concerned about availability. Not when used to set up a small grid (which is the targeted use-case of Taqhoe) but rather when trying to build up a large-scale network made from untrusted nodes. If I understood it right, Tahoe clients simply keep a connection with each storage node in a storage cluster. Obviously, this doesn't scale. So for a global, large-scale deployment the peer selection and lookups should be performed based on nodeid and a DHT. So data availability then ultimately depends on the robustness of the DHT. If adversarial nodes can compromise the DHT, data still present on active storage nodes might not be found anymore by clients and hence become unavailable. So if the DHT is deployed on untrusted nodes we need to care about things like admission control, sybil attack, routing and index poisening, eclipse attack and so on. Any kind of denial of service attack against the DHT could mean data becoming inaccessible and hence unavailabe in the system even though the data itself may physically still be present. I am unaware of any counter-measures against these kinds of attacks in Tahoe (but there's also no need for them within Tahoe's current use-case). > You might also want to look at David Mazières' SFS. It was a bit ahead of > its time, and so is sometimes forgotten. But it deserves a good look, and > maybe resuscitation. Thanks for the pointer. I have found some papers on SFS but the former website seems to be down unfortunately: http://www.fs.net/sfswww/ >> I haven't found a P2P backup solution that has: >> >> - Deployability on a global scale with untrusted nodes >> - Secure, private and persistent data storage >> - Open-source protocol and software >> - Censorship-resistance >> - Resiliency to adversarial attacks >> - Reasonably simple and manageable design > > Tahoe has all but the last item (and maybe that is fixable). SFS has all of > them, but lacks (as far as I can tell) a maintained, recent implementation. Interestingly, my thoughts were almost the opposite (for Tahoe). I think the basic design is still reasonably simple but it is not ready for use in a large-scale, untrusted network: - As outlined above, it doesn't seem to scale to thousands or hundreds of thousand nodes. - It may need further modification to be safely usable in a network comprised of untrusted nodes (sybils, DHT robustness against denial of service attacks, ...) - To guarantee persistence in a P2P network of untrusted and unreliable nodes Tahoe's information dispersal strategy needs be adapted. The degree of redundancy must be increased (n/k) but just as well the number of erasure coded fragments (k) too for storage efficiency. I don't know if this is practically doable within Tahoe's current structure (galois-field based Reed-Solomon coding is slow with large k and n) or what other side effects this may have (size of the Merkle trees?). - Further, an automatic repair mechanism is required to retain data availability in the long term. The client controlled repair strategy Tahoe currently implements seems insufficient in a network with low availability of the single nodes. - Censorship-resistance obviously also depends on availability and data persistence guarantees. If directed (or undirected) denial of service attacks are possible on the DHT, the system cannot said to be censorship- resistant. And there are other, less-obvious censorship risks too: If a third-party can force specific node owners (e.g. by court order) to shut down their storage nodes then certain data can become unavailable in the system. In Tahoe, data is encrypted and erasure coded before dispersed to different storage nodes. However, the dispersal is a 1:1 mapping in an information- theoretic (and legal) sense. Therefore, it will be easy to determine which storage nodes are responsible to serve parts of the original data. One may argue that most of us live in societies with rule of law, so that censorship ordered by independent courts would be ok and no need to feel sorry about outlawed data. But I see it more from a practical point of view: If by joining the storage network people risk to be exposed to legal hassle or punishment due the actions of others, no one (apart from the usual geeks) will use such service. I think similar risks and fears already hinder the wide-spread adoption of other P2P systems (-> Freenet, Tor)... Regards, Michael _______________________________________________ p2p-hackers mailing list p2p-hackers@lists.zooko.com http://lists.zooko.com/mailman/listinfo/p2p-hackers