[freenet-dev] Depth-first announcement

Matthew Toseland Sat, 6 Nov 2010 13:43:03 +0000

Okay, there is much confusion here too.

"Short links" is being used for two different, and opposite, things:
- Announcement (or path folding) cuts consisting of few hops. Lets call this 
"few hops".
- Connections to peers with the distance between our location and the peer's 
location being small numerically. Lets call this "small location distance".


Announcements start at a seednode, which is likely to be in a random location. 
They are routed to our location, which is likely to be a long way away from it. 
The first few hops are therefore a long way away location-wise from the target, 
and the later hops are reasonably close to it. The effect of connecting to 
nodes only a few hops along the announcement is to create links with large 
location distance (far from our specialisation, needed in small quantity for 
routing), and the effect of connecting to nodes many hops along the 
announcement is to create links with small location distance (close to our 
specialisation, needed in large quantity for routing).

We want links in this proportion, for routing to work:

p ( link between two nodes ) = constant / ( distance between the two nodes' 
locations )

Because the nodes at the beginning are close to the seednodes, IMHO they will 
tend to accept very infrequently. This might result in us not having enough 
links with large location distance. If this is a problem we should walk a few 
random hops from the seednode before announcing.

Now, for path folding, the issues are similar. We are starting from our 
location, and moving towards the target location. Often this will be close to 
our location, because our incoming requests are specialised, but this is not 
always the case because any request we generate locally or which was generated 
on a nearby node may not be close to our specialisation. So if we only sample 
the destination, we get mostly links which are close to our location, but some 
that are further away. If we reset along the way, we introduce a small bias 
towards our location, and away from the target, which might be distant; so 
there is the possibility of it increasing clumping to some degree.

However, random reset has always been a part of Freenet, is supported by our 
theoreticians (particularly Oskar), and is a critical security feature on 
opennet.

On Friday 05 November 2010 18:37:48 Robert Hailey wrote:
> 
> On 2010/11/05 (Nov), at 10:37 AM, Matthew Toseland wrote:
> 
> > Okay, this email is very confusing up to the point where I realise  
> > what you mean by "depth first". Normally "depth first" refers to a  
> > depth first search, which is exactly what we do in AnnounceSender,  
> > RequestSender etc. We go as deep as we can before backtracking.
> >
> > However what you mean appears to be that, currently, we send, and  
> > process, the node references from the originator's end (usually a  
> > seednode) first. Hence if we get lots of noderefs, we end up  
> > accepting refs at the beginning (which initially are far from the  
> > target location
> 
> Yes! Technical thought communicated!

Cool, another one is above.
> 
> > and would give us long links, but those nodes are usually overloaded  
> > so won't give much), instead of refs at the end (which generally  
> > should be highly specialised although not necessarily as close to  
> > the target as the nodes in the middle).
> 
> We have often said "overloaded" nodes (near the seed nodes), but I  
> have changed my perception of this (more later).
> 
> >> As best I can see, then, there may be major problems with the current
> >> announcement and path folding implementations:
> >> (1) announcements gather at non-destination points (in fact, shallow
> >> first),
> >
> > Which should result in us getting some long links, although they are  
> > unlikely due to the nodes near the seednodes being overloaded.
> >
> >> (2) the RequestSender logic will always eat an incoming path folding
> >> request if we want a peer (often for very short paths),
> >
> > Maybe he needs the path? Maybe the originator doesn't? Also see the  
> > next item.
> 
> I think that is very far from the common case. If we presume that the  
> "wantPeer" state of all nodes in the chain are about the same (because  
> it is mostly based on timings), I would expect an originating node  
> (even 2-3 hops from a long link) to be starved from needed and wanted  
> long links.

Okay, we are talking about path folding here. Your argument is that path 
folding will provide connections to peers which are small in hops and therefore 
relatively close to the originator's location, and relatively distant from the 
target location, and therefore tend to enhance specialisation. And that this is 
caused by resetting connections.
> 
> Couldn't this middleman-node get destination samples of his own?  
> Surely he is also making requests into the network.

He may just be routing. I object in the strongest possible terms to path 
folding only applying to nodes which are actively requesting CHKs. Nodes which 
don't keep up are effectively expelled from the network. We do not want to have 
to spider the network merely to stay on it. Of course, nodes can participate in 
path folding both as requestor and responder - but to only participate as 
responder would likely have serious consequences.
> 
> At the very least, it should be probabalistic (unless we are the  
> originator). Otherwise what we are optimizing for is very far from  
> destination sampling, we will be folding the path at every  
> opportunity, including in the direction of the initial and post  
> routing-miss (is there a paper on that?).

On 0.5, we had probabilistic path folding, but the probability depended on the 
node's load. Hence we tried to divert load away from ourselves when we were 
overloaded - by letting the path folding attempt go through and thus add a peer 
to the requestor whose requests came to us.

IMHO even with resets, we will almost always be connecting to a node that is 
significantly closer to the target than we are.
> 
> >> (3) the RequestHandler logic will probabilistically drop an incoming
> >> path folding request (about once every HTL)
> >
> > This is a security feature. It's been present since well before 0.5,  
> > and came from our theoreticians - Oskar or Ian; I think it may even  
> > have been in the original paper. The objective is to ensure that the  
> > data source doesn't end up knowing reliably the request originator's  
> > identity. Of course this is a fairly weak protection because there  
> > is a significant chance that it *is* the originator... So if you are  
> > interested in who requests specific relatively unpopular content,  
> > and you control a bunch of nodes which are near the "ideal"  
> > locations for a bunch of keys in the file, you can record who  
> > requests them and correlate them and you should be able to identify  
> > some of the nodes requesting that file. This costs a lot less than  
> > connecting to every opennet node, although a lot more than mobile  
> > attacker source tracing.
> >
> >> (4) if path folding request is dropped (or not wanted), an
> >> intermediate node might forward it's reference instead
> >
> > This is an integral part of #3.
> >>
> >> I think the behavior of 2, 3, & 4 are more likely to be making the
> >> peer-clumps, which (as you said) might not be a horrible thing. But I
> >> think all of these points go against the original intent of
> >> destination sampling.
> >
> > IMHO the bias is to short links.
> 
> Agreed. I think that these points might be working together to reduce  
> the number of long links to unhealthy levels; but mostly #2.

To clarify, IMHO in path folding the bias is towards few hops, which means to 
links which are relatively close to our location, and relatively far from the 
target location. Mostly the target location will be close to us anyway, but 
when it's not, we can expect to get connections which are only a few hops away 
so are not all the way across the keyspace but only partly across it. However I 
am not convinced this is a serious problem: In a few hops you can get quite a 
distance.
> 
> > This should probably result in less clumping, not more, because all  
> > the nodes later on are reasonably close to the target. However we  
> > get to the target fairly quickly. In announcement the first few  
> > nodes that might give us long links are overloaded, but in path  
> > folding there is no such problem;
> > so a bias to short links would tend to weaken path folding and give  
> > us more longish links and fewer short links, which may be bad.
> 
> I don't follow. More short links gives us less short links? #1 is  
> about announcements; 2,3,4 are path folding. I agree they interact.

More links of few hops gives us more links of long keyspace distance, in 
announcement.

However in path folding, more links of few hops gives us more links of short 
keyspace distance (relative to our own location).
> 
> >> In code, announcements are routed to the node's current location only
> >> (one thing modified in my experiments),
> >
> > Which is the right thing to do. [...snip...]
> >
> >> and only to the seed nodes
> >> (even if we have connected peers).
> >
> > That is a fair point, we should announce through our darknet peers  
> > etc. In fact I thought we did... Maybe we don't?
> 
> I think anything to take load off the seed nodes would be a good  
> thing. Is there a reason to not also announce through opennet peers?

No, we should announce through all our peers. Are you sure we don't?
> 
> >> I would expect them to run (approximately) the same paths each time,
> >> therefore.
> >
> > There are many seednodes and the paths will be different depending  
> > on which you start at.
> 
> After pondering this a while, I have changed my idea on the busy-nodes- 
> near-seed-nodes theory. What I would expect to see is much "network  
> churn" near seed nodes. Because for any particular location,  
> announcements traveling through the same path will tend to replace  
> that path.
> 
Network churn is deliberately throttled...

> >>> They rapidly approach the target location, and thus having the
> >>> opportunity to connect on each node should roughly give us the 1/d
> >>> distribution that theory tells us is necessary: A few long links and
> >>> a lot of short links.
> >> ...
> >>> This is plausible given that I'd expect the nodes close to the
> >>> seednodes, where you might get your few long links from, to be
> >>> overloaded.
> >>
> >> Yea, by announcing shallow-first, all the nodes around the seed nodes
> >> will (nearly always) not want a peer. If we go depth first we could
> >> even count the number of forwarded refs, and only send our ref if  
> >> that
> >> count is too low.
> >
> > I don't follow. An announcement yields many peers, and we only want  
> > to send our ref once, to limit bandwidth usage.
> 
> None of that has changed or is in question.
> 
> Here is a crude interaction diagram (monospaced font will help)
> 
> [Present impl. Shallow-first]
> 
> Newbie Node |  Seed  |  A  |  B  |  C  |  D  |
> ------------+--------+-----+-----+-----+-----+
>              |        |     |     |     |     |
> 1=ref(new) --->  * ---->*  |     |     |     |
>              <---ref(A)-/ \ |     |     |     |
> got-A       |        |    \-->*  |     |     |
>              |    *<---ref(B)-/ \ |     |     |
>        <--ref(B)-/    |     |    \-->*  |     |
> got-B       |        |   *<-ref(C)-/ \--->*  |
>            <------ref(C)-/  |  *<-ref(D)--/   |
> got-C       |    *<--ref(D)--/   |     |     |
>        *<-ref(D)-/    |     |     |     |     |
> got-D       |        |     |     |     |     |
> 
> 
> [Depth-first, cap=2 (for example)]
> 
> Newbie Node |  Seed  |  A  |  B  |  C  |  D  |
> ------------+--------+-----+-----+-----+-----+
>              |        |     |     |     |     |
> 1=ref(new) --->  * ---->*  |     |     |     |
>              |        |    \-->*  |     |     |
>              |        |     |   \--->*  |     |
>              |        |     |     |   \--->*  |(e.g. htl expire)
>              |        |     |     |*<-ref(D)-*|
>              |        |     |*<-ref(D)  |     |
>              |        |*<-ref(D)  |     |     |
>              |*<-ref(D)     |     |*<-done()- |
> got-D    <--/        |     |*<-ref(C)-*|     |(C: done, need more)
>              |        |*<-ref(C)  |     |     |
>              |*<-ref(C)     |     |     |     |
> got-C    <--/        |     |*<-done()- |     |
>              |        |     |(*!*)|     |     |(B: numForwarded>=cap)
>              |        |*<-done()- |     |     |(B: doesn't forward ref)
>              |*<-done()-    |     |     |     |(A: numForwarded>=cap)
> (done)  <-done()-    |     |     |     |     |

I have yet to be convinced that shallow first vs depth first is relevant. You 
would have to establish that we are rejecting peers from later on in the 
announcement because we've already accepted early on nodes. Even if this was 
true it would cause a bias to links which are few in hops and therefore large 
in keyspace distance.
> 
> >> Instead of sending our ref once we accept an announcement, it sends
> >> the ref just-before sending the completion message (that is, after we
> >> have routed the deeper replies).
> >
> > And it only keeps one noderef, so we only accept one node, afaics.
> 
> I don't know what you mean by "keeps one noderef".
> It is intended to forward downstream requests, but only forward ours  
> when we are "done".
> It is quite possible that it is written wrong, I'll take another look  
> at it.

It only accepts one peer into the routing table, as far as I can see.
> 
> >> It is intended to deliver the announcement replies in a depth-first
> >> order to the originator, in practice out-of-order messages might make
> >> them only "roughly" depth first.
> >
> > I don't see how it does that.
> 
> it is meant to change:
> 
> start() {
>       sendOurRef();
>       route();
>       complete();
> }
> 
> complete() {
>       sendDonePacket();
> }
> 
> to become:
> 
> start() {
>       route();
>       complete();
> }
> 
> complete() {
>       sendOurRef();
>       sendDonePacket();
> }
> 
> but again, I would add an extra conditional...
> 
> complete() {
>       if (numForwardedRefs<MAX_ANNOUNCE_REPLIES)
>               sendOurRef();
>       sendDonePacket();
> }

How does this prevent the earlier matches from being accepted? If we wait for 
all the announcement replies, and then send our ref, they will all accept us, 
won't they?
> 
> > Even if it did I'm not sure what difference it would make.
> 
> (1) announcing would get nearly-all short links (rely on path folding  
> for long)

Please clarify short vs long here. If you mean announcement would get nearly 
all links close to the target in keyspace terms, well, that is happening now 
isn't it? Isn't that exactly what your graphs show?

What we want is a few high keyspace distance links and a lot of low keyspace 
distance links. And as far as I can see the current code should give that, 
although it might give too few high keyspace distance links because of always 
starting on a seednode.

> (2) we could cap the number of results announces will return to  
> eliminate near-seednode churn effect

Only if we are rejecting later on peers returned from announcement because of 
having accepted early on ones. You have not established this.

> (3) links "close to the seed nodes" would not constantly be replaced  
> with bootstrapping nodes

Only if they know not to accept us when we send our reference. Which they 
don't. You'd have to introduce some sort of counter, requiring changes to 
messages.

> (4) seed nodes would let us connect at a given location, rather than  
> being the authority on what is at a given location
>       (4a) in other words, be "window" into the network, and not an "anchor"

I don't get it.

> (5) it follows the destination sampling paradigm, for which we have  
> very-nearly-hard-math

Which doesn't directly apply.
> 
> 
> On 2010/11/05 (Nov), at 12:20 PM, Matthew Toseland wrote:
> 
> > ...there are so few accepts that we will always have to announce to  
> > multiple nodes.
> 
> This might actually be proof of the "churn" effect I mentioned.  
> Because all the paths to remote locations (from a seed node), are  
> filled with
> recently-announced nodes which are in there connection grace period.

Nonetheless it undermines your key claim which is that we are getting too many 
answers and not being able to accept them all.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20101106/81a82990/attachment.pgp>

[freenet-dev] Depth-first announcement

Reply via email to