Serguei Osokine wrote:
        Ian, Oskar, that was wonderful. I rarely see anything that has
that much aesthetic value, but the analyitical proof that the graph
rewiring caused by the random search actions actually makes it a small-world graph with log^2(N) path length, was certainly worth the one-year wait since last April, which is when Ian has preannounced this paper.

It should be noted that the analytic proof in fact does not established this. I did come up with the proof that appears in the thesis last April, and I have since (with little success) attempted to generalize it so it holds for the rewiring algorithm. The problem is that the proof, like the Kleinberg proofs, assumes that edges are chosen independently at each point, but in fact using our algorithm they will depend on one another (in a good way, but that is hard to prove).

In fact, I haven't even managed to show that the class of distributions for which I have a bound - those which are solutions to the system implied by equations (2.3) and (2.4) in the text - is nonempty. Until this is done the theorem as it stands is interesting, but from a strict mathematical viewpoint quite meaningless. (Which is why I have held off on publication).

<>
1. What is the difference between Freenet[1] and Sandberg/Dijjer[2]
   models that makes Dijjer easier to analyze? This is mentioned a few
   times in [2] and in other Freenet- and Dijjer- related places, but
   I did not notice the explicit explanation of the differences
   anywhere. Was it the directed links in the graph in [2]? Something
else? I honestly tried to figure this out by myself, but couldn't; sorry... Maybe if I'd spend more time looking at both [1] and [2],
   I'd figure this out, but since you guys are already here, I thought
that maybe you could answer that?

What makes Freenet (by which I mean the old Freenet routing, not the one they are using now) much more difficult to analyze is that the data rather than the nodes which have addresses. Data bounces around the network dynamically, and so the markov chain implied becomes a lot more complicated. For one thing, Freenet suffers from load balancing issues, while the point to point routing cannot have such issues.

2. Whatever this difference is, how relevant do you think are the
   Sandberg/Dijjer results[2] for the original Freenet paper[1]?

   For example:

        a. section 5.2 of [1] says about Fig. 3: "We can see that the
           pathlength scales approximately logarithmically, with a change
           of slope near 40,000 nodes."

           Given what you guys know now, do you think that Fig. 3 shows
           log^2(N) instead, or Freenet simply has different scaling
           properties due to its different algorithms?

Paths in Freenet are affected by other things than just point to point routing. Caching for instance will play a big role. I also don't remember exactly how we were scaling the routing table size in those simulations (the log^2 bound is for a constant routing table size).

I'm guessing that a lot of what was seen in those simulation were artifacts of other factors.

        b. Fig. 5 shows the link number distribution that awfully
           resembles the power-law one, whereas Oskar mentioned in
another mail that Kleinberg is constant out-degree and Poisson in-degree. Is my assumption that Sandberg/Dijjer model is supposed to mimic the Kleinberg distibution
           incorrect and in fact both Sandberg/Dijjer and Freenet
           are power-law? Or is just one of them power law due to
           the different algorithms used? Or maybe Fig. 5 in [1] is
actually showing a part of the Poisson distribution that can be easily mistaken for the power law?

The model presented in the paper has fixed out-degree and (roughly) poisson in-degree like Kleinberg's. Freenet has a fixed out-degree but possibly a power-law in-degree, which is the cause of the aforementioned load balancing issues (I don't have any good evidence for this, just the feeling that their is some sort of preferential attachment going on).

        c. Section 5.4 of [1] says: "In a small-world network, the
           majority of nodes have only relatively few, local, connections
to other nodes, while a small number of nodes have large, wide-ranging sets of connections. Small-world networks permit
           efficient short paths between arbitrary points because of the
           shortcuts provided by the well-connected nodes..."

           This does not sound like anything I've been able to see in the
           Oskar's thesis[2] - it has left me with a distinct impression
           that the connectivity in the small world (at least in the one
           analyzed there) was provided by the exactly right (Kleinberg's)
           proportion of long-distance links, which Sandberg/Dijjer
rewiring model was designed to achieve, and not by the means of any special nodes with "wide-ranging sets of connections".
           Is it because Freenet and Dijjer used different algorithms, or
           the original assumption about the reasons for the Freenet
connectivity was wrong?

The original paper is based on less sophisticated small-world models than the later stuff. I think most of the small-world references in there are hand-waving really.

If you place an arbitrary point somewhere where you say that connections shorter than that are local and others are long distance, then Kleinberg's model (and ours) will reflect the somewhat fuzzy statement you quote above.

3. Why would one want to use the system with O(log^2(N)) path length
   when pretty much every DHT gives a path lenght of O(log(N))? At
   today's P2P nets scale (millions of nodes) the difference between
   log and log^2 is non-trivial. I understand that creating the small-
   world network by a simple and natural evolutionary process is very
elegant, simple, and attractive, but is this elegance worth the increased path length? Why not simply use a DHT?

In real life, the order is much less interesting then the actual route length. One shouldn't stair blindly at the order when the constants could be dominating in real world situations. Also, the log^2 is with constant routing table size, if you scale up the routing table with the size of the network, you can roughly divide the order by the routing table size (most log n DHTs scale the routing table with log n as well).

4. Speaking of DHTs: Oskar recently said something curious here that I did not understand: "Actually, while the frequency of Chord links
   falls exponentially with the "level" (not sure what the Chord term
   is) the length of such links increases exponentially as well, so in
   fact the frequency of links with certain lengths do fall
   harmonically. One could see Chord as some sort of "mean field"
   version of the same dynamics as Kleinberg's model."

   What exactly does this mean? DHTs (including Chord) have log(N)
   diameter, and Kleinberg has log^2(N), right? This looks fairly
   different to me, and I totally missed the meaning of the "mean
   field", and why Chord and Kleinberg would have the same dynamics;
   I did not even understand what is this dynamics that you're talking
   about. Oskar, could you please elaborate on this?

Again, the routing tables grow in Chord. If each node had one "chord" link (every log n th node having a chord of the same length), then Chord would be roughly log^2 as well.

<>
6. Both the simulation in Freenet paper[1] and Sandberg/Dijjer paper[2]
   use the random requests to rewire the networks. In reality, there
   will be all kinds of hotspots and uneven distributions of requests
   due to the different popularity of content. Do you guys think it
   would affect the results of analysis performed in [2], and if yes,
   how? For example, if one particular file is extremely popular, one
could imagine that its storage node[s] will have many more links than the average, and the search for the other (rarely requested)
   data items might become ineffective, because these path lengths will
become very high.

I would imagine that the large number of files stored per node, and the random distribution implied by using a secure hash would help even out distribution. I'm not sure what an uneven distribution does under our algorithm, but intuitively it should handle it just fine (it would mean more links to the destination of popular queries, but that is a good idea since such links are popular...)


   Do you think it is a valid concern? Are there any results that would
   show that this is not an issue?

Yes, and no (besides standard concentration results showing that it should even out if there are sufficiently many documents per node).

7. While Sandberg [2] assumes that the requests are passing between the random points, both Freenet and Dijjer seem to allow the length
   of the path to be shortened when the requested content is already
   cached on the intermediate node. If this is really so, what is the
   effect that will have on Oskar's analysis results? Fewer links will
   be rewired, so this should have some effect on the graph dynamics,
   right? Any idea what it would be?

This is part of the answer to your first question about why these situations are harder to analyze.

8. The answer to '7' above might also depend on the cache size and replacement policies on the nodes, and on the content popularity
   distribution (which I already mentioned in '6' above). All these
   factors are hard to predict for the real system in advance, and I'm
   wondering what is the feeling that you guys have about the overall
   stability of the rewiring algorithm?

No idea.

   I mean, the original Kleinberg d(x, y)^(-1) approach is extremely
   sensitive to the value of this "-1". Any other value, and you either
   do not have enough long distance links, or you do not have enough
   short-range links once you arrive into the general vicinity of your
   destination. In any case, your path length becomes polynomial instead
   of polylogarithmic once you have the smallest deviation from this
   "-1" number.

It isn't actually that sensitive. It is sensitive asymptotically, but for any given network size varying the exponent a little will make rather little difference. In fact, one can calculate numerically (I know of no proof) that there are better exponents than -1 for any given n (though the best exponent approaches 1 as the network grows towards infinity).

   And this makes me a bit uneasy; I do understand that the catastrophic
   failure is unlikely, and the detrimental effect of all these changes
   (if any) is likely to be tolerable, but still... Look at it this
way: Chapter 1 of Oskar's thesis is dedicated to proving that "...family given by (1.1) allows for polylogarithmic routing at, and only at, one value of the [alpha]...", then Chapter 2 proves
   that this is exactly what happens with random-request rewiring, and
   then Dijjer's practical implementation and deployment arrive and
   throw these random requests out of the window with their caching,
   different content popularity, and whatnot. So given the strict
requirements that were proven to be vital in Chapter 1, I cannot help wondering: is Dijjer still polylogarithmic, or not?

It is certainly not proved.

   I'd feel much better about all this if there would be some results
   showing that the system is stable in the presence of such changes
   (in a sense that reasonably small algorithm input changes should not
   cause any catastrophically large changes in the algorithm results).
   Do you have any results that would point in that direction? Would
you say that the data displayed on Figures 2.4-2.6 already shows that your results are not exactly Kleinberg, but this does not cause any catastrophic consequences, or you also have some other
   sources of optimism?

I would feel better about it too if more were proven. I am optimistic that things actually do work, but I am not very optimistic about how much I will ever be able to show (and it isn't just me, though a better mathematician may of course have gotten further, I have put these problems in front of a many good researchers who seem to agree they are difficult problems).

9. And finally, if I'm not mistaken, Dijjer description seems to imply
   that the links are rewired after every search, whereas Oskar suggests
   that this should be happen only with proability p, and in section 2.3
   says: "...there are simple heuristic arguments for why p should
reasonably be on the order of one over the expected length of the greedy walks". Any particular reason for that Dijjer behaviour, or
   I've missed something and Dijjer does, in fact, rewire with this
   probability p? (Would be interesting to hear these simple heuristic
   arguments too, but that's another story...)

It won't matter so much if the routing table is large (remember the mathematical model has one long link) though I would recommend that Dijjer updated less often.

// oskar
_______________________________________________
p2p-hackers mailing list
p2p-hackers@zgp.org
http://zgp.org/mailman/listinfo/p2p-hackers
_______________________________________________
Here is a web page listing P2P Conferences:
http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences

Reply via email to