Okay, this email is very confusing up to the point where I realise what you 
mean by "depth first". Normally "depth first" refers to a depth first search, 
which is exactly what we do in AnnounceSender, RequestSender etc. We go as deep 
as we can before backtracking.

However what you mean appears to be that, currently, we send, and process, the 
node references from the originator's end (usually a seednode) first. Hence if 
we get lots of noderefs, we end up accepting refs at the beginning (which 
initially are far from the target location and would give us long links, but 
those nodes are usually overloaded so won't give much), instead of refs at the 
end (which generally should be highly specialised although not necessarily as 
close to the target as the nodes in the middle).

IMHO any given node is unlikely to accept us, so we probably don't have so many 
nodes from one announcement that this becomes a problem - but on the other hand 
we have several announcements running at once so it might be. But even if it 
is, I'd expect it to create a bias in favour of the early nodes, close to the 
seednode and far away from the target location, which might give us long links. 
However as has been pointed out these nodes are unlikely to accept us, which 
may result in us not getting any long links on average. So AFAICS the solution 
is simply to route randomly, or route to a random location, for a while before 
starting the announcement.

I will now deal with your comments in detail...

On Thursday 04 November 2010 19:09:59 Robert Hailey wrote:
> 
> On 2010/11/04 (Nov), at 11:06 AM, Matthew Toseland wrote:
> 
> > On Wednesday 03 November 2010 20:00:48 Robert Hailey wrote:
> >>
> >> Example patch attached. Small, but untested!
> >
> > Could you please explain what it is that you think is going on, and  
> > what the patch will do to fix it?
> 
> For all this, I presume the correctness principals concerning  
> destination sampling. Specifically the "small worldness" of making a  
> connection at the end point of a route builds a more routable network.

IMHO this is the product of several factors, including requesting keys that 
aren't all exactly at the node's specialisation - because of our local requests 
or because of requests originating nearby. Whereas with announcement we route 
to a single location, and we accept many nodes along the way which may or may 
not include the "ideal" node.
> 
> As best I can see, then, there may be major problems with the current  
> announcement and path folding implementations:
> (1) announcements gather at non-destination points (in fact, shallow  
> first),

Which should result in us getting some long links, although they are unlikely 
due to the nodes near the seednodes being overloaded.

> (2) the RequestSender logic will always eat an incoming path folding  
> request if we want a peer (often for very short paths),

Maybe he needs the path? Maybe the originator doesn't? Also see the next item.

> (3) the RequestHandler logic will probabilistically drop an incoming  
> path folding request (about once every HTL)

This is a security feature. It's been present since well before 0.5, and came 
from our theoreticians - Oskar or Ian; I think it may even have been in the 
original paper. The objective is to ensure that the data source doesn't end up 
knowing reliably the request originator's identity. Of course this is a fairly 
weak protection because there is a significant chance that it *is* the 
originator... So if you are interested in who requests specific relatively 
unpopular content, and you control a bunch of nodes which are near the "ideal" 
locations for a bunch of keys in the file, you can record who requests them and 
correlate them and you should be able to identify some of the nodes requesting 
that file. This costs a lot less than connecting to every opennet node, 
although a lot more than mobile attacker source tracing.

> (4) if path folding request is dropped (or not wanted), an  
> intermediate node might forward it's reference instead

This is an integral part of #3.
> 
> I think the behavior of 2, 3, & 4 are more likely to be making the  
> peer-clumps, which (as you said) might not be a horrible thing. But I  
> think all of these points go against the original intent of  
> destination sampling.

IMHO the bias is to short links. This should probably result in less clumping, 
not more, because all the nodes later on are reasonably close to the target. 
However we get to the target fairly quickly. In announcement the first few 
nodes that might give us long links are overloaded, but in path folding there 
is no such problem; so a bias to short links would tend to weaken path folding 
and give us more longish links and fewer short links, which may be bad. But it 
would result in less specialisation, not more.
> 
> I think the behavior of 1 (this patch) is forming a specialized  
> "backbone" of routability around the seed nodes.

The behaviour the the patch will be to accept only one node from the entire 
announcement AFAICS.
> 
> > In theory, announcements are routed to a specific random keyspace  
> > location (expressed as a number), starting from the seednode.
> 
> In code, announcements are routed to the node's current location only  
> (one thing modified in my experiments), 

Which is the right thing to do. We choose a random location when we create the 
node, then we announce to that location to get a set of peers mostly close to 
it so that we can immediately become a productive part of the network. If we 
are then shut down for a while and start back up, we re-announce to the same 
location because, amongst other things, that's where our datastore is 
specialised for.

> and only to the seed nodes   
> (even if we have connected peers).

That is a fair point, we should announce through our darknet peers etc. In fact 
I thought we did... Maybe we don't?
> 
> I would expect them to run (approximately) the same paths each time,  
> therefore.

There are many seednodes and the paths will be different depending on which you 
start at.
> 
> > They rapidly approach the target location, and thus having the  
> > opportunity to connect on each node should roughly give us the 1/d  
> > distribution that theory tells us is necessary: A few long links and  
> > a lot of short links.
> ...
> > This is plausible given that I'd expect the nodes close to the  
> > seednodes, where you might get your few long links from, to be  
> > overloaded.
> 
> Yea, by announcing shallow-first, all the nodes around the seed nodes  
> will (nearly always) not want a peer. If we go depth first we could  
> even count the number of forwarded refs, and only send our ref if that  
> count is too low.

I don't follow. An announcement yields many peers, and we only want to send our 
ref once, to limit bandwidth usage.
> 
> I'm not sure we can realistically expect to get any specific  
> proportion of short/long links with announcement only. And even if we  
> could, they would all fall  along the common announcement-path (not to  
> a more deserving node at the same location).

When the announcement reaches the target location it should give us a bunch of 
peers which are close to the target. Which is the main thing we need. And it is 
reasonable to assume that most of the nodes close to that target location are 
close together network-wise - if not, routing is broken.

And we announce through multiple seednodes.
> 
> What is d? or what does 1/d come out to? Should I expect one long link  
> among 12 peer connections?

Probability ( connect to peer ) ~= 1 / ( distance from ideal location to peer's 
location )

In other words, we need a lot more short links than long links. Which is what 
I've been saying all along.
> 
> > As far as I can see, your theory is that 1) announcements are making  
> > up a large proportion of a node's connections,
> 
> Judging by one of my nodes, announcements make up for about 1/3 of  
> connections. 

Meaning path folding makes up 2/3rds. Which is okay, although not ideal.

> And because a node currently announces to it's location,   
> these will almost surely be all short links (because I am not a seed  
> node), 

That's the whole point! Lots of short links and a *much smaller* number of long 
links. In order to be useful to the network the main thing we need is 
connections to nodes near our location.

> and reachable from the perspective of the seed node. 

True, since the only way we can get to them initially is through the seednode.
> 
> > and 2) announcements are tending to get a bunch of noderefs that are  
> > *all* very close to the target, rather than getting the 1/d  
> > distribution which we need, which would include lots of short links  
> > but also some long links. Which results in all your peer locations  
> > being clustered together, with apparently no long links, as you've  
> > shown.
> 
> In theory, shouldn't it be just as correct to announce to one of our  
> peers (instead of a seednode)? 

We should yes, to reduce load on the seednodes. However this won't make a huge 
impact in practice because once we have some connections we can get more 
through path folding.

> Depth-first/shallow-first, 

I'm not convinced this makes much of a difference at present.

> our-location/   
> random-location... 

Indeed this does. Announcing to our location is ESSENTIAL.

> I think it makes a big difference! 
> 
> I think it needs to be depth-first to our-location to be generally  
> correct, and I think it would be much improved to use our peers over  
> seed nodes [[seed nodes breath a sigh of relief across the network]].
> 
> > What can we do to further investigate this theory? It would be good  
> > to gather some more data before applying potentially major changes.
> 
> Compare routability statistics of seednodes to "normal" nodes (or  
> "leaf" / far from seed nodes).
> 
> > What exactly does your patch do,
> 
> Instead of sending our ref once we accept an announcement, it sends  
> the ref just-before sending the completion message (that is, after we  
> have routed the deeper replies).

And it only keeps one noderef, so we only accept one node, afaics.
> 
> > and what is it intended to do (not necessarily the same thing)?
> 
> It is intended to deliver the announcement replies in a depth-first  
> order to the originator, in practice out-of-order messages might make  
> them only "roughly" depth first.

I don't see how it does that. Even if it did I'm not sure what difference it 
would make.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20101105/12851add/attachment.pgp>

Reply via email to