Quoting David Barrett <dbarr...@quinthar.com> from ml.p2p.hackers:
:How does Gnutella choose supernodes, and would it be reasonable to
:somehow use a similar approach with the DHT where not everyone is
:eligible to manage buckets, but only those who have built up a
:reputation for reliability as measured over many transfers with many
:users?

In Gnutella, DHT nodes are either "active" (they get inserted in the
routing table and become part of the structure) or "passive" (they never
get inserted in routing tables but can query the structure).  Typically,
firewalled nodes or busy nodes (ultra peers) are "passive" nodes.

This status has nothing to do with the Gnutella role of the node (ultra
peer or leaf node): it is completely orthogonal to it.

:The key to the Pirate Pay attack is that you can selectively target
:buckets by injecting new nodes into the DHT at will.  If we slow down
:the rate at which new nodes are added to the DHT, ideally identifying
:nodes with something like a public key (to avoid impersonation and
:ensure uniformly distributed random identifiers) and then having the
:*whole* network gather reliability/trust data (as opposed to just the
:participants of a single torrent) then this might provide a more
:stable core.

There are many protection that can be done to severely mitigate this node ID
pollution:

- The node distribution checks I already mentionned.
- The amount of nodes in a given IPv4 "class C" network that you allow in
  DHT buckets or in a search path (e.g. 3).
- The amount of nodes in a given IPv4 "class C" network that you allow in
  the whole routing table (10).
- The amount of nodes you allow in a search path per distinct IP address (1).

Adding complex node ID verification logic is not worth the pain.  What you
rather want is make Sybil attacks increasingly more costly, up to a point
where the amount of resources required is so vast that either the attack is
impossible, or it cannot intercept more than an insignificant amount of
requests and therefore remains ineffective.

The beauty of the scheme invented by Thibault Cholez and his colleagues is
that the statistical defense attacks (hah!) right where an attacker wants to
position itself: the more willingness there is to attract / capture traffic
for a key, the easier the detection will be.  Hence attackers cannot be too
close to a key, and therefore they cannot attract the whole traffic.

Recall that for a store or a lookup, 20 nodes are contacted in Kademlia.
Being able to put your Sybils in the 20 closest nodes that remain selected
once you have applied the Kullback-Leibler divergence checks is going to
require an extraordinary amount of luck!  And the more you attempt to put
there, the more you'll be detected and countered.

When I discovered Thibault's paper in November 2010, I immediately felt
that this was a great approach: easy to implement, easy to deploy, totally
backward compatible with legacy clients and immediately effective for your
node. Plus I found this an effective strategy -- I've never been convinced
with other approaches which require special node IDs or mandate external
validation / signatures.

Read the paper at:

        http://www.springer.com/alert/urltracking.do?id=Lc52ad2Ma0c369Sb02d6b3

The implementation in gtk-gnutella uses a slightly more sophisiticated
eviction alogorithm in the path than the one described there, but that
does not matter much -- this is more fine tuning on my part than a radical
change in the overall philosophy.

Raphael
_______________________________________________
p2p-hackers mailing list
p2p-hackers@lists.zooko.com
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Reply via email to