On Tuesday, March 07, 2006 Ian Clarke wrote: > I think Chapters 1 and 2 of Oskar Sandberg's recent thesis may be > extremely relevant: > > http://www.math.chalmers.se/~ossa/lic.pdf > > It outlines an algorithm to choose which nodes in a network should > have edges between them such that "greedy" routing (routing to the > peer closest to what you are looking for) has small world O( log^2 > (N) ) path lengths.
Ian, Oskar, that was wonderful. I rarely see anything that has that much aesthetic value, but the analyitical proof that the graph rewiring caused by the random search actions actually makes it a small-world graph with log^2(N) path length, was certainly worth the one-year wait since last April, which is when Ian has preannounced this paper. I have to apologize in advance for my questions - some of them might be downright stupid (I won't even pretend that I followed the logic of all your proofs), and in any case there are many of them. Perhaps, too many. Feel free to answer any subset that does not seem stupid to you (and as a special case, of course, just ignore all of them :-) But I waited for this paper for a year, and as you might imagine, I had time to ponder quite a few issues... So here we go: 1. What is the difference between Freenet[1] and Sandberg/Dijjer[2] models that makes Dijjer easier to analyze? This is mentioned a few times in [2] and in other Freenet- and Dijjer- related places, but I did not notice the explicit explanation of the differences anywhere. Was it the directed links in the graph in [2]? Something else? I honestly tried to figure this out by myself, but couldn't; sorry... Maybe if I'd spend more time looking at both [1] and [2], I'd figure this out, but since you guys are already here, I thought that maybe you could answer that? 2. Whatever this difference is, how relevant do you think are the Sandberg/Dijjer results[2] for the original Freenet paper[1]? For example: a. section 5.2 of [1] says about Fig. 3: "We can see that the pathlength scales approximately logarithmically, with a change of slope near 40,000 nodes." Given what you guys know now, do you think that Fig. 3 shows log^2(N) instead, or Freenet simply has different scaling properties due to its different algorithms? b. Fig. 5 shows the link number distribution that awfully resembles the power-law one, whereas Oskar mentioned in another mail that Kleinberg is constant out-degree and Poisson in-degree. Is my assumption that Sandberg/Dijjer model is supposed to mimic the Kleinberg distibution incorrect and in fact both Sandberg/Dijjer and Freenet are power-law? Or is just one of them power law due to the different algorithms used? Or maybe Fig. 5 in [1] is actually showing a part of the Poisson distribution that can be easily mistaken for the power law? c. Section 5.4 of [1] says: "In a small-world network, the majority of nodes have only relatively few, local, connections to other nodes, while a small number of nodes have large, wide-ranging sets of connections. Small-world networks permit efficient short paths between arbitrary points because of the shortcuts provided by the well-connected nodes..." This does not sound like anything I've been able to see in the Oskar's thesis[2] - it has left me with a distinct impression that the connectivity in the small world (at least in the one analyzed there) was provided by the exactly right (Kleinberg's) proportion of long-distance links, which Sandberg/Dijjer rewiring model was designed to achieve, and not by the means of any special nodes with "wide-ranging sets of connections". Is it because Freenet and Dijjer used different algorithms, or the original assumption about the reasons for the Freenet connectivity was wrong? Or is it simply that I'm misreading the Freenet paper[1] and it did not mean to imply that small world properties of Freenet are due to the small subset of well connected nodes? Maybe my understanding of Sandberg/Dijjer[2] is wrong? 3. Why would one want to use the system with O(log^2(N)) path length when pretty much every DHT gives a path lenght of O(log(N))? At today's P2P nets scale (millions of nodes) the difference between log and log^2 is non-trivial. I understand that creating the small- world network by a simple and natural evolutionary process is very elegant, simple, and attractive, but is this elegance worth the increased path length? Why not simply use a DHT? 4. Speaking of DHTs: Oskar recently said something curious here that I did not understand: "Actually, while the frequency of Chord links falls exponentially with the "level" (not sure what the Chord term is) the length of such links increases exponentially as well, so in fact the frequency of links with certain lengths do fall harmonically. One could see Chord as some sort of "mean field" version of the same dynamics as Kleinberg's model." What exactly does this mean? DHTs (including Chord) have log(N) diameter, and Kleinberg has log^2(N), right? This looks fairly different to me, and I totally missed the meaning of the "mean field", and why Chord and Kleinberg would have the same dynamics; I did not even understand what is this dynamics that you're talking about. Oskar, could you please elaborate on this? 5. Throughout this mail I've been using the expresson Sandberg/Dijjer to denote both what is described in [2] and Dijjer, as if it is the same system; is this a right thing to do, or there are significant differences between these two? What about these directed links, for example? Does Dijjer use them? Does this matter? 6. Both the simulation in Freenet paper[1] and Sandberg/Dijjer paper[2] use the random requests to rewire the networks. In reality, there will be all kinds of hotspots and uneven distributions of requests due to the different popularity of content. Do you guys think it would affect the results of analysis performed in [2], and if yes, how? For example, if one particular file is extremely popular, one could imagine that its storage node[s] will have many more links than the average, and the search for the other (rarely requested) data items might become ineffective, because these path lengths will become very high. Do you think it is a valid concern? Are there any results that would show that this is not an issue? 7. While Sandberg [2] assumes that the requests are passing between the random points, both Freenet and Dijjer seem to allow the length of the path to be shortened when the requested content is already cached on the intermediate node. If this is really so, what is the effect that will have on Oskar's analysis results? Fewer links will be rewired, so this should have some effect on the graph dynamics, right? Any idea what it would be? 8. The answer to '7' above might also depend on the cache size and replacement policies on the nodes, and on the content popularity distribution (which I already mentioned in '6' above). All these factors are hard to predict for the real system in advance, and I'm wondering what is the feeling that you guys have about the overall stability of the rewiring algorithm? I mean, the original Kleinberg d(x, y)^(-1) approach is extremely sensitive to the value of this "-1". Any other value, and you either do not have enough long distance links, or you do not have enough short-range links once you arrive into the general vicinity of your destination. In any case, your path length becomes polynomial instead of polylogarithmic once you have the smallest deviation from this "-1" number. And this makes me a bit uneasy; I do understand that the catastrophic failure is unlikely, and the detrimental effect of all these changes (if any) is likely to be tolerable, but still... Look at it this way: Chapter 1 of Oskar's thesis is dedicated to proving that "...family given by (1.1) allows for polylogarithmic routing at, and only at, one value of the [alpha]...", then Chapter 2 proves that this is exactly what happens with random-request rewiring, and then Dijjer's practical implementation and deployment arrive and throw these random requests out of the window with their caching, different content popularity, and whatnot. So given the strict requirements that were proven to be vital in Chapter 1, I cannot help wondering: is Dijjer still polylogarithmic, or not? I'd feel much better about all this if there would be some results showing that the system is stable in the presence of such changes (in a sense that reasonably small algorithm input changes should not cause any catastrophically large changes in the algorithm results). Do you have any results that would point in that direction? Would you say that the data displayed on Figures 2.4-2.6 already shows that your results are not exactly Kleinberg, but this does not cause any catastrophic consequences, or you also have some other sources of optimism? 9. And finally, if I'm not mistaken, Dijjer description seems to imply that the links are rewired after every search, whereas Oskar suggests that this should be happen only with proability p, and in section 2.3 says: "...there are simple heuristic arguments for why p should reasonably be on the order of one over the expected length of the greedy walks". Any particular reason for that Dijjer behaviour, or I've missed something and Dijjer does, in fact, rewire with this probability p? (Would be interesting to hear these simple heuristic arguments too, but that's another story...) [1] Freenet: A Distributed Anonymous Information Storage and Retrieval System. Ian Clarke1, Oskar Sandberg2, Brandon Wiley3, and Theodore W. Hong. http://www.cl.cam.ac.uk/~twh25/academic/papers/icsi-revised.pdf [2] Searching in a Small World. Oskar Sandberg. http://www.math.chalmers.se/~ossa/lic.pdf Best wishes - S.Osokine. 11 Mar 2006. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Ian Clarke Sent: Tuesday, March 07, 2006 5:03 PM To: Peer-to-peer development. Subject: Re: [p2p-hackers] clustering I think Chapters 1 and 2 of Oskar Sandberg's recent thesis may be extremely relevant: http://www.math.chalmers.se/~ossa/lic.pdf It outlines an algorithm to choose which nodes in a network should have edges between them such that "greedy" routing (routing to the peer closest to what you are looking for) has small world O( log^2 (N) ) path lengths. The algorithm is very simple, and pleasingly "natural". Basically you use greedy routing to find a path to your intended destination node, then you add a link from each node along this path to the destination. The number of outbound edges from each node can't obviously increase indefinitely, so they are deleted according to a least recently used scheme to make room for new edges. This is based on Freenet's approach to edge selection. The paper describes both experimental results that confirm that this algorithm does lead to a small world topology, and also delves into the mathematics behind why this might be the case. It is interesting not just because of the simplicity of the algorithm, but because it is extremely amenable to the construction and maintainence of decentralized P2P networks, and because it could tell us something about why human relationships tend to form small world networks. This algorithm is used by the Dijjer (http://dijjer.org/) P2P network. Ian. On 7 Mar 2006, at 02:46, Jeff Rose wrote: > Anyone out there doing work in clustering? I'd be interested in > hearing what people are up to, or if you have pointers to good > papers that would be great too. At this point I'm interested in > any angle, geographic distance based, semantic distance based, > hierarchical, non-hierarchical etc... My goal is to work towards a > very amorphous clustering scheme that allows any kind of object to > be located close to relatives in the network as long as they share > a common distance function. _______________________________________________ p2p-hackers mailing list p2p-hackers@zgp.org http://zgp.org/mailman/listinfo/p2p-hackers _______________________________________________ Here is a web page listing P2P Conferences: http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences _______________________________________________ p2p-hackers mailing list p2p-hackers@zgp.org http://zgp.org/mailman/listinfo/p2p-hackers _______________________________________________ Here is a web page listing P2P Conferences: http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences