Daniel Stutzbach wrote: > I spent a lot of time staring at the Kad source-code and poking and > prodding it with various home-brewed measurement tools. The > conclusion I came to was this: > > The Kad implementation had many bugs and the bugs were hard to > find, so they made the system so robust that it continues to > operate even though it is full of bugs. > > However, it is not very efficient: every node has around 700 neighbors.
I think this underlines the point that a simple, less structured solution has value. If the people who added DHT lookups to EMule had asked me for advice, I would have said: 1) Have each node pick an identity at random when they join, and have them connect to any k other nodes. 2) Every once in a while have nodes replace one of their current neighbors with one that is the destination (ie the closest matching identity to the search that can be found) of a query they received. 3) Do greedy routing. If a node quits, it just quits, and its neighbors just fill their slots like in step 2. Store data by routing to the key value and having nodes cache it along the route (perhaps being more likely to cache if their identity is close to the key) and cache data from queries when they return. It is that simple. Anybody can implement it in like a hundred lines of code, making obscure bugs unlikely. Yes, in theory it would only scale log^2, but with 700 neighbors I can promise it will scale far beyond a million nodes with only a few steps needed for the average lookup. > You should read the original Kademlia paper. I'd be interested to > know if you'd think it'd be as vulnerable as you describe. I have read it of course, but it was a while ago. The kind of attack I mean is wide scale sybil type attack with a user spawning millions of fake identities for his node, giving nodes faulty neighbors and misinforming them about the size of the network etc. There are other more devious attacks such as those attempting to upset just routes for one particular key value as well. I tend to think the reason these networks haven't been brought down is mostly security by obscurity. The intersection of people who understand the theory, know the code, and are sufficiently motivated to try to hurt them is currently empty. > Kademlia isn't hypercube-based, that's CAN. Kademlia does > prefix-matching like Pastry, though that's not obvious from the > Kademlia paper. The "XOR metric" is just a clever way of saying "find > me the highest order non-matching bit". I use "hyperbuse-based" in a very general sense. I mean networks constructed sort of like a L^d mesh, in which one walks by prefix matching (as you say), having a level of neighbors who match only in the first b bits, then those that match in 2b bits, etc. The difference between the Plaxton derivatives (I shall call these P-nets for lack of a better term) and CAN in this regard is simply that as the network grows CAN scales L but keeps d constant, while P-nets keep L constant and scales d. As far as the P-nets are concerned, one can see the development as: Hypercube -> Plaxton -> [Pastry|Tapestry|Kademlia] where in each step, the networks have become less structured (in a Hypercube, every edge is canonical, but Plaxton realized that since you only need to match starting from the first bit, you can choose neighbors which don't match in that bit freely, etc.) Losing unnecessary structure is good, but I'm saying you can go even further by recognizing the essential dynamics (which is simply this: if there are N nodes which are as close to x as y, then the density of edges from x to y should be 1/N) and creating random algorithms where these dynamics hold. // oskar _______________________________________________ p2p-hackers mailing list p2p-hackers@zgp.org http://zgp.org/mailman/listinfo/p2p-hackers _______________________________________________ Here is a web page listing P2P Conferences: http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences