On Fri, Jan 28, 2011 at 11:21 AM, Michael Militzer <mich...@xvid.org> wrote: > Alen, > > thanks a lot for the pointer! I did not know about flud! Indeed, the design > goals of your system are not so different. BTW: Have you created a complete > implementation of the described system? Because it has quite some complexity > if you go down into all details...
Major components of the system were completed and functional, at least at a working prototype level. Other major components are not: most of the client-side features, for example. I also delayed doing any NAT traversal work at all, because I assumed early adopters were likely savvy enough to manually open ports if needed, and because I didn't see much advantage to getting that working early on given that there were several attempts to create general purpose libraries that I hoped would mature enough to use by the time I needed that feature. But you can absolutely grab the code and use the test scripts to spawn a bunch of nodes (multiple instances on multiple machines if desired) and store/retrieve data from them. There are even some visualization tools to let you see the data getting sprayed out on multiple instances, kill instances via a button click to test survivability, etc. > Fairness is really a crucial point for the performance of such a storage > system. For a not solely backup-centric filestore, of course also fair > distribution of available network bandwidth is important. I had looked > into Samsara too but the overhead is too large obviously to use for > exchanging network bandwidth. Also, the probabilistic punishment may affect > others too (refer to the very last paragraph below). The unforgeable claims from Samsara are clever and necessary, but I think in the majority of trading relationships you'd avoid them -- you can likely always find a legitimate trading partner with real data to backup when enough nodes are in the network, and should relatively rarely need to resort to storing a generated claim. > As to robustness, the admission control you do in flud is quite good (at > least you can't do much better without a CA). Maybe one could add a larger > cost to the entry of a new node by a proof of work system (cryptographic > puzzle, bitcoin, etc). But I don't quite like such solutions. When deployed > large-scale, millions of computers will be running at 100% CPU just > producing heat and consuming electric power - this is not really green > technology... Implemented (at least partially) in flud is a hashcash-inspired collision finding algorithm for admission and continued operation, but it should be needed somewhat rarely. > So a vulnerability to sybil attack remains. It's possible with reasonable > effort for an attacker to place a large number of nodes into the system. > You have measures in flud to counteract malicious nodes - but what about > nodes that don't act maliciously to others? There is some discussion of safeguards against sybil attacks implemented by flud here: http://flud.org/wiki/Architecture#Metadata_Layer I am not aware of any purely decentralized approach that protects 100% against sybil attacks mounted by an adversary with virtually unlimited resources (I think the "Defending Against Sybil Attacks in P2P Networks" paper goes into the theoretical obstacles to that, iirc). Of course, the theoretical centralized solutions are likely not able to protect 100% against these in practice either. At some point, we all have to get comfortable with probabilistic security even though it sounds terrible -- because honestly, all security /is/ probabilistic, e.g., the probability that AES has a yet undiscovered -- or, egads, undisclosed -- vulnerability, the probability that sha256 will remain uncompromised through its NIST-predicted usable lifetime, the probability that compute power will not experience a steep spike before it is expected, the probability that your key-escrow service (if present) is not compromised and/or colludes, etc. > This is a problem I thought about a lot. That's because there is an > information leakage in the system if one party controls a certain number of > nodes in the network. Data de-duplication has the advantage that it saves > storage and that data already present in the system does not have to be > uploaded twice. The disadvantage however is that same data encrypts to the > same storage block. This allows a sybil attacker to upload a number of > "interesting" files and log who else in the network accesses the same files. > This is easily possible even though all stored and transmitted data is > encrypted. So encryption is not enough to ensure privacy. > > Had this been discussed throughout the design of flud? I am more convinced than ever that global convergent storage (or "single-instance storage," or "deduplication") is of dubious benefit. Part of the reasons for that stem from sources and studies that are are not publicly available, so I'm not at liberty to discuss them ;) . But even more compelling than those results regarding the practical advantages are the simple attacks outlined by Zooko and the Tahoe team here: http://www.mail-archive.com/cryptography@metzdowd.com/msg08949.html There were some rudimentary protections in flud against an entity storing "interesting" files and then fishing for other users who also stored them. One of these was that storing nodes would not reveal identities of nodes storing blocks except to provably owning nodes, and even then, only the single identity of that node itself (using the self-certifying IDs and challenge/response pairs). This of course is insufficient if a storing node is compromised or colludes with the originator of the fishing expedition -- another good reason to not do global convergent encryption. I believe in its current form, flud still does global convergent encryption. That was implemented before the tahoe-discovered vulnerabilities were known, and before I understood the cost/benefits of what convergent encryption/storage does/does not get you. This is very simple to change in flud, and would likely be close to the top of my todo list if I were to pick up development again. There are several other minor cryptographic choices made in flud that I would change today as well, for example, even though all bulk encryption in flud is done via AES256 (which is good), I feel now that there is still too much encryption done with a node's assymetric keypair, and I'd fix that too. > Another problem derived from de-duplication is deletion of data. Data > can only be deleted when it is not referenced anymore by any user in the > system. This means that also the original uploader may not be allowed to > actually delete the file. Something like a ref-counter or delete token > is needed. How does flud solve this problem? There is still likely a lot of benefit from convergent encryption within a single entity (think of a consumer with backing up two machines that each have an entire music collection duplicated, or of a small business with 10 users sharing many of the same files). In these cases, if the original uploader deletes a file, that should not delete it for the other users who have also "uploaded" that file, even though their upload was accelerated by the fact that the bits didn't need to be retransmitted. Reference-counting is notoriously hard because subtle bugs here, including very difficult to spot timing bugs, can either delete data prematurely -- and permanently -- or render data forever undeletable. flud does reference-lists instead. Reference lists have many of the same problems as ref-counting, but you can catch and fix most mistakes through programatic auditing. Alen _______________________________________________ p2p-hackers mailing list p2p-hackers@lists.zooko.com http://lists.zooko.com/mailman/listinfo/p2p-hackers