Re: [p2p-hackers] P2P file storage systems

Alen Peacock Fri, 28 Jan 2011 16:25:21 -0800

On Fri, Jan 28, 2011 at 11:21 AM, Michael Militzer <mich...@xvid.org> wrote:
> Alen,
>
> thanks a lot for the pointer! I did not know about flud! Indeed, the design
> goals of your system are not so different. BTW: Have you created a complete
> implementation of the described system? Because it has quite some complexity
> if you go down into all details...


Major components of the system were completed and functional, at least
at a working prototype level. Other major components are not: most of
the client-side features, for example. I also delayed doing any NAT
traversal work at all, because I assumed early adopters were likely
savvy enough to manually open ports if needed, and because I didn't
see much advantage to getting that working early on given that there
were several attempts to create general purpose libraries that I hoped
would mature enough to use by the time I needed that feature. But you
can absolutely grab the code and use the test scripts to spawn a bunch
of nodes (multiple instances on multiple machines if desired) and
store/retrieve data from them. There are even some visualization tools
to let you see the data getting sprayed out on multiple instances,
kill instances via a button click to test survivability, etc.


> Fairness is really a crucial point for the performance of such a storage
> system. For a not solely backup-centric filestore, of course also fair
> distribution of available network bandwidth is important. I had looked
> into Samsara too but the overhead is too large obviously to use for
> exchanging network bandwidth. Also, the probabilistic punishment may affect
> others too (refer to the very last paragraph below).

The unforgeable claims from Samsara are clever and necessary, but I
think in the majority of trading relationships you'd avoid them -- you
can likely always find a legitimate trading partner with real data to
backup when enough nodes are in the network, and should relatively
rarely need to resort to storing a generated claim.


> As to robustness, the admission control you do in flud is quite good (at
> least you can't do much better without a CA). Maybe one could add a larger
> cost to the entry of a new node by a proof of work system (cryptographic
> puzzle, bitcoin, etc). But I don't quite like such solutions. When deployed
> large-scale, millions of computers will be running at 100% CPU just
> producing heat and consuming electric power - this is not really green
> technology...

Implemented (at least partially) in flud is a hashcash-inspired
collision finding algorithm for admission and continued operation, but
it should be needed somewhat rarely.


> So a vulnerability to sybil attack remains. It's possible with reasonable
> effort for an attacker to place a large number of nodes into the system.
> You have measures in flud to counteract malicious nodes - but what about
> nodes that don't act maliciously to others?

There is some discussion of safeguards against sybil attacks
implemented by flud here:
http://flud.org/wiki/Architecture#Metadata_Layer
I am not aware of any purely decentralized approach that protects 100%
against sybil attacks mounted by an adversary with virtually unlimited
resources (I think the "Defending Against Sybil Attacks in P2P
Networks" paper goes into the theoretical obstacles to that, iirc). Of
course, the theoretical centralized solutions are likely not able to
protect 100% against these in practice either. At some point, we all
have to get comfortable with probabilistic security even though it
sounds terrible -- because honestly, all security /is/ probabilistic,
e.g., the probability that AES has a yet undiscovered -- or, egads,
undisclosed -- vulnerability, the probability that sha256 will remain
uncompromised through its NIST-predicted usable lifetime, the
probability that compute power will not experience a steep spike
before it is expected, the probability that your key-escrow service
(if present) is not compromised and/or colludes, etc.


> This is a problem I thought about a lot. That's because there is an
> information leakage in the system if one party controls a certain number of
> nodes in the network. Data de-duplication has the advantage that it saves
> storage and that data already present in the system does not have to be
> uploaded twice. The disadvantage however is that same data encrypts to the
> same storage block. This allows a sybil attacker to upload a number of
> "interesting" files and log who else in the network accesses the same files.
> This is easily possible even though all stored and transmitted data is
> encrypted. So encryption is not enough to ensure privacy.
>
> Had this been discussed throughout the design of flud?

I am more convinced than ever that global convergent storage (or
"single-instance storage," or "deduplication") is of dubious benefit.
Part of the reasons for that stem from sources and studies that are
are not publicly available, so I'm not at liberty to discuss them ;) .
But even more compelling than those results regarding the practical
advantages are the simple attacks outlined by Zooko and the Tahoe team
here: http://www.mail-archive.com/cryptography@metzdowd.com/msg08949.html

There were some rudimentary protections in flud against an entity
storing "interesting" files and then fishing for other users who also
stored them. One of these was that storing nodes would not reveal
identities of nodes storing blocks except to provably owning nodes,
and even then, only the single identity of that node itself (using the
self-certifying IDs and challenge/response pairs). This of course is
insufficient if a storing node is compromised or colludes with the
originator of the fishing expedition -- another good reason to not do
global convergent encryption.

I believe in its current form, flud still does global convergent
encryption. That was implemented before the tahoe-discovered
vulnerabilities were known, and before I understood the cost/benefits
of what convergent encryption/storage does/does not get you. This is
very simple to change in flud, and would likely be close to the top of
my todo list if I were to pick up development again. There are several
other minor cryptographic choices made in flud that I would change
today as well, for example, even though all bulk encryption in flud is
done via AES256 (which is good), I feel now that there is still too
much encryption done with a node's assymetric keypair, and I'd fix
that too.


> Another problem derived from de-duplication is deletion of data. Data
> can only be deleted when it is not referenced anymore by any user in the
> system. This means that also the original uploader may not be allowed to
> actually delete the file. Something like a ref-counter or delete token
> is needed. How does flud solve this problem?

There is still likely a lot of benefit from convergent encryption
within a single entity (think of a consumer with backing up two
machines that each have an entire music collection duplicated, or of a
small business with 10 users sharing many of the same files). In these
cases, if the original uploader deletes a file, that should not delete
it for the other users who have also "uploaded" that file, even though
their upload was accelerated by the fact that the bits didn't need to
be retransmitted. Reference-counting is notoriously hard because
subtle bugs here, including very difficult to spot timing bugs, can
either delete data prematurely -- and permanently -- or render data
forever undeletable. flud does reference-lists instead. Reference
lists have many of the same problems as ref-counting, but you can
catch and fix most mistakes through programatic auditing.


Alen
_______________________________________________
p2p-hackers mailing list
p2p-hackers@lists.zooko.com
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Re: [p2p-hackers] P2P file storage systems

Reply via email to