On Sat, May 4, 2013 at 4:07 PM, Matthew Toseland <t...@amphibian.dyndns.org>wrote:
> On Saturday 04 May 2013 07:09:17 Oskar Sandberg wrote: > > I looked over both papers. > > Thanks! I'm going to reply separately re each paper. > > > > Second paper: > > > > The reason I never let the nodes independently draw and change IDs is > that > > it seems to me that, and this was borne out in experiments, this would > > cause all the IDs to converge to a single point. > > This happens with churn anyway, though hopefully randomisation resolves > this. Randomisation and churn mean that we already don't have trivially > provable uniformity though? > > > The math that motivates > > the formula c(u) in their paper actually only works when the IDs are in > > fixed grid - if let the space be continuous the distribution degenerates > at > > as the distance between points approaches zero, so it isn't necessary > that > > the Markov chain should converge to any meaningful distribution at all. > > > > If it works, than it works of course. But I would approach with some > > caution. > > Their simulations say it works. I guess you can prove anything by tweaking > your simulations... The results they give only apply to 10k nodes, though a > later presentation suggests they've tried bigger networks. Ẃe're a long way > away from having 10k darknet nodes right now (we have about that on > opennet). But the published attacks are hideous, and they add some more, > and your/Michael's Pitch Black fix is rather tentative and may not fix > their other attacks... > > > > (Note that if a node is running the Markov chain all on its own it should > > be possible to analytically calculate, or at least estimate, its > stationary > > distribution from a particular set of neighbors, making actually running > > the chain unnecessary.) > > You still need some randomness to escape false minima though, right? > This doesn't mean calculate a single value, it means that you can draw your position randomly from a known distribution (a distribution is something saying e.g. "you have 1/2 chance of getting value 1, 1/4 of getting value 2, 1/8 of value 3, etc.." for the set of possible values) rather than having to simulate a random process to chose your position. Taking a step back: Basically, the type of random processes that we are dealing with here are called "Markov chains", which means that what happens in the next step depends only on the current state (the current position) not on the history. Under certain conditions Markov chains have the property that when run for a long time, they gradually forget in what state they started, and you have a fixed probability of the process being in each possible state (regardless of what state they started in). This is what I referred to as the "stationary distribution" above. The idea behind the algorithms used here is that we want the network to have a certain distribution of positions, so one comes up with a Markov chain that has that stationary distribution, and runs it for a long time. This works for the swapping algorithm, but as we know it has issues. For the algorithm the paper proposes, the problem is the "Under certain conditions" I mentioned above. One of the conditions that is needed to be able to trust that a Markov doesn't get stuck in any particular state. I don't believe this to be the case regarding the distribution they propose. To illustrate this, consider the simplest state of a vertex with a single neighbor. For simplicity assume the single neighbor has position 0. The way the update works in the proposed algorithm in this case is that the vertex continually draws a position y randomly from [0,1]. If y is smaller (closer to 0) than the current position, x, then you move to y, otherwise you move with probability x/y (called the acceptance probability). If you run this, at some point the vertex will end up at a position very close to 0 (that is very close to its neighbor). The question is what happens after this: does it jump back out a bit further from 0, or does it get "sucked in". Suppose you end up at position 0.001. After this you keep drawing points, and one of two things happens: you eventually draw a position closer than 0.001, and step towards 0, or you keep drawing values bigger than 0.001, but at some point accept one of then anyways. Consider now if the acceptance probability was (x/y)^2 instead of (x/y). The chance of drawing a point closer to 0 and moving in is 0.001, so it will happen after 1000 steps on average. But when draw a bigger point, it will on average be at position y=0.5, so the acceptance probability is (0.001/0.5)^2=0.000004, so it will only happens after 250,000 steps on average. Obviously this isn't a rigorous argument, but it can be formalized, and the end effect is that 0 (the position of the neighbor) acts like a black hole (almost literally), once you get close to it you get sucked in and never come out. With a similar argument, one can see that if the acceptance probability is sqrt(x/y) than the chance of accepting is always much bigger than that of drawing a point closer to 0, so there is no black hole and you always get out again. So what about x/y (what they are using)? Because this is exactly at the boundary between having a black hole and not it is harder to make an intuitive argument (this is what mathematicians call a "critical value"). But it does indeed have a black hole (mathematically: the detailed balance distribution is 1/x on [0,1] which is improper). And you don't have to trust me on this, because the authors of the paper illustrate it very clearly in Figure 3: the green spike is the black hole. The thing is that they only run the chain for a fixed number of steps, and then they stopped. They don't talk about what happens to the distribution if you keep running the chain, nor how or why they chose the number of steps they did. What would happen if you keep running it? More and more nodes would get sucked into the spike, and the spike itself would get thinner and thinner, until eventually you run out of floating point precision and all the nodes have the same position and your routing is a random walk. So if you do implement this, you would have to figure out: when does a vertex stop updating its position? -- How can this be fixed? The simple way to fix is to not draw the proposed new position from [0,1] but rather from [1/n,1] where n is the number of vertices in the network (in the more general case this means don't pick a position closer than 1/n to any of your neighbors). By staying at least 1/n away from 0, you keep out the "event horizon" of the black hole, and don't risk falling in. However, this only works if you know "n", and of course you don't. The swapping algorithm is, in a sense, an attempt to get exactly this chain without knowing n. And the "convergence" attack is exactly meant to fool you about the value of n, which breaks the algorithm. The probing requests that I proposed to counter this attack (and it's accidental cousin the churn based convergence) are a second way to get an estimate at the size of the network, and thus make it harder to trick you about the value of n. But it of course, if these do provide a way of estimating n, then you could use that value and do away with the swapping (this might work). (The estimate of "n" can probably be pretty course: it would be very bad if it was too small (asking 2*n to be 1/n distance from their neighbors is obviously bad), but it can probably be a a few order of magnitude too big without much damage.) Another, much more hypothetical, way around would be to shift the acceptance probability just a smidgen away from the black hole case. Theoretically the best such acceptance ratio as the network size approaches infinity would be something like: x (1 - log(x))^(1+epsilon) / (y (1 - log(y))^(1+epsilon)) but in practice if you just took x^0.9 / y^0.9 (which generalized to: take the same product as currently, just raise it to 0.9 at the end) it would probably work better for networks of any imaginable size. Whether or not this would work would have to be simulated (more rigorously than the authors of that paper did). Sorry for being long winded, hopefully I made some sense to somebody. // oskar > > I don't suppose you have any ideas on how often we should "swap" etc? > > I'm inclined to deploy this as soon as we can validate our implementation. > Even if it does eventually collapse, our current system is also a "toy", > because the attacks are so devastating. What do we need to do to be sure? >
_______________________________________________ Devl mailing list Devl@freenetproject.org https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl