Daniel Stutzbach wrote:
> I spent a lot of time staring at the Kad source-code and poking and
> prodding it with various home-brewed measurement tools.  The
> conclusion I came to was this:
> 
>     The Kad implementation had many bugs and the bugs were hard to
>     find, so they made the system so robust that it continues to
>     operate even though it is full of bugs.
> 
>     However, it is not very efficient: every node has around 700 neighbors.

I think this underlines the point that a simple, less structured
solution has value. If the people who added DHT lookups to EMule had
asked me for advice, I would have said:

1) Have each node pick an identity at random when they join, and have
them connect to any k other nodes.
2) Every once in a while have nodes replace one of their current
neighbors with one that is the destination (ie the closest matching
identity to the search that can be found) of a query they received.
3) Do greedy routing.

If a node quits, it just quits, and its neighbors just fill their slots
like in step 2. Store data by routing to the key value and having nodes
cache it along the route (perhaps being more likely to cache if their
identity is close to the key) and cache data from queries when they return.

It is that simple. Anybody can implement it in like a hundred lines of
code, making obscure bugs unlikely. Yes, in theory it would only scale
log^2, but with 700 neighbors I can promise it will scale far beyond a
million nodes with only a few steps needed for the average lookup.

> You should read the original Kademlia paper.  I'd be interested to
> know if you'd think it'd be as vulnerable as you describe.

I have read it of course, but it was a while ago. The kind of attack I
mean is wide scale sybil type attack with a user spawning millions of
fake identities for his node, giving nodes faulty neighbors and
misinforming them about the size of the network etc. There are other
more devious attacks such as those attempting to upset just routes for
one particular key value as well.

I tend to think the reason these networks haven't been brought down is
mostly security by obscurity. The intersection of people who understand
the theory, know the code, and are sufficiently motivated to try to hurt
them is currently empty.

> Kademlia isn't hypercube-based, that's CAN.  Kademlia does
> prefix-matching like Pastry, though that's not obvious from the
> Kademlia paper.  The "XOR metric" is just a clever way of saying "find
> me the highest order non-matching bit".

I use "hyperbuse-based" in a very general sense. I mean networks
constructed sort of like a L^d mesh, in which one walks by prefix
matching (as you say), having a level of neighbors who match only in the
first b bits, then those that match in 2b bits, etc. The difference
between the Plaxton derivatives (I shall call these P-nets for lack of a
better term) and CAN in this regard is simply that as the network grows
CAN scales L but keeps d constant, while P-nets keep L constant and
scales d.

As far as the P-nets are concerned, one can see the development as:

Hypercube -> Plaxton -> [Pastry|Tapestry|Kademlia]

where in each step, the networks have become less structured (in a
Hypercube, every edge is canonical, but Plaxton realized that since you
only need to match starting from the first bit, you can choose neighbors
which don't match in that bit freely, etc.) Losing unnecessary structure
is good, but I'm saying you can go even further by recognizing the
essential dynamics (which is simply this: if there are N nodes which are
as close to x as y, then the density of edges from x to y should be 1/N)
and creating random algorithms where these dynamics hold.

// oskar
_______________________________________________
p2p-hackers mailing list
p2p-hackers@zgp.org
http://zgp.org/mailman/listinfo/p2p-hackers
_______________________________________________
Here is a web page listing P2P Conferences:
http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences

Reply via email to