Quoting David Barrett <dbarr...@quinthar.com> from ml.p2p.hackers: :How does Gnutella choose supernodes, and would it be reasonable to :somehow use a similar approach with the DHT where not everyone is :eligible to manage buckets, but only those who have built up a :reputation for reliability as measured over many transfers with many :users?
In Gnutella, DHT nodes are either "active" (they get inserted in the routing table and become part of the structure) or "passive" (they never get inserted in routing tables but can query the structure). Typically, firewalled nodes or busy nodes (ultra peers) are "passive" nodes. This status has nothing to do with the Gnutella role of the node (ultra peer or leaf node): it is completely orthogonal to it. :The key to the Pirate Pay attack is that you can selectively target :buckets by injecting new nodes into the DHT at will. If we slow down :the rate at which new nodes are added to the DHT, ideally identifying :nodes with something like a public key (to avoid impersonation and :ensure uniformly distributed random identifiers) and then having the :*whole* network gather reliability/trust data (as opposed to just the :participants of a single torrent) then this might provide a more :stable core. There are many protection that can be done to severely mitigate this node ID pollution: - The node distribution checks I already mentionned. - The amount of nodes in a given IPv4 "class C" network that you allow in DHT buckets or in a search path (e.g. 3). - The amount of nodes in a given IPv4 "class C" network that you allow in the whole routing table (10). - The amount of nodes you allow in a search path per distinct IP address (1). Adding complex node ID verification logic is not worth the pain. What you rather want is make Sybil attacks increasingly more costly, up to a point where the amount of resources required is so vast that either the attack is impossible, or it cannot intercept more than an insignificant amount of requests and therefore remains ineffective. The beauty of the scheme invented by Thibault Cholez and his colleagues is that the statistical defense attacks (hah!) right where an attacker wants to position itself: the more willingness there is to attract / capture traffic for a key, the easier the detection will be. Hence attackers cannot be too close to a key, and therefore they cannot attract the whole traffic. Recall that for a store or a lookup, 20 nodes are contacted in Kademlia. Being able to put your Sybils in the 20 closest nodes that remain selected once you have applied the Kullback-Leibler divergence checks is going to require an extraordinary amount of luck! And the more you attempt to put there, the more you'll be detected and countered. When I discovered Thibault's paper in November 2010, I immediately felt that this was a great approach: easy to implement, easy to deploy, totally backward compatible with legacy clients and immediately effective for your node. Plus I found this an effective strategy -- I've never been convinced with other approaches which require special node IDs or mandate external validation / signatures. Read the paper at: http://www.springer.com/alert/urltracking.do?id=Lc52ad2Ma0c369Sb02d6b3 The implementation in gtk-gnutella uses a slightly more sophisiticated eviction alogorithm in the path than the one described there, but that does not matter much -- this is more fine tuning on my part than a radical change in the overall philosophy. Raphael _______________________________________________ p2p-hackers mailing list p2p-hackers@lists.zooko.com http://lists.zooko.com/mailman/listinfo/p2p-hackers