Okay, there is much confusion here too.
"Short links" is being used for two different, and opposite, things:
- Announcement (or path folding) cuts consisting of few hops. Lets call this
"few hops".
- Connections to peers with the distance between our location and the peer's
location being small numerically. Lets call this "small location distance".
Announcements start at a seednode, which is likely to be in a random location.
They are routed to our location, which is likely to be a long way away from it.
The first few hops are therefore a long way away location-wise from the target,
and the later hops are reasonably close to it. The effect of connecting to
nodes only a few hops along the announcement is to create links with large
location distance (far from our specialisation, needed in small quantity for
routing), and the effect of connecting to nodes many hops along the
announcement is to create links with small location distance (close to our
specialisation, needed in large quantity for routing).
We want links in this proportion, for routing to work:
p ( link between two nodes ) = constant / ( distance between the two nodes'
locations )
Because the nodes at the beginning are close to the seednodes, IMHO they will
tend to accept very infrequently. This might result in us not having enough
links with large location distance. If this is a problem we should walk a few
random hops from the seednode before announcing.
Now, for path folding, the issues are similar. We are starting from our
location, and moving towards the target location. Often this will be close to
our location, because our incoming requests are specialised, but this is not
always the case because any request we generate locally or which was generated
on a nearby node may not be close to our specialisation. So if we only sample
the destination, we get mostly links which are close to our location, but some
that are further away. If we reset along the way, we introduce a small bias
towards our location, and away from the target, which might be distant; so
there is the possibility of it increasing clumping to some degree.
However, random reset has always been a part of Freenet, is supported by our
theoreticians (particularly Oskar), and is a critical security feature on
opennet.
On Friday 05 November 2010 18:37:48 Robert Hailey wrote:
>
> On 2010/11/05 (Nov), at 10:37 AM, Matthew Toseland wrote:
>
> > Okay, this email is very confusing up to the point where I realise
> > what you mean by "depth first". Normally "depth first" refers to a
> > depth first search, which is exactly what we do in AnnounceSender,
> > RequestSender etc. We go as deep as we can before backtracking.
> >
> > However what you mean appears to be that, currently, we send, and
> > process, the node references from the originator's end (usually a
> > seednode) first. Hence if we get lots of noderefs, we end up
> > accepting refs at the beginning (which initially are far from the
> > target location
>
> Yes! Technical thought communicated!
Cool, another one is above.
>
> > and would give us long links, but those nodes are usually overloaded
> > so won't give much), instead of refs at the end (which generally
> > should be highly specialised although not necessarily as close to
> > the target as the nodes in the middle).
>
> We have often said "overloaded" nodes (near the seed nodes), but I
> have changed my perception of this (more later).
>
> >> As best I can see, then, there may be major problems with the current
> >> announcement and path folding implementations:
> >> (1) announcements gather at non-destination points (in fact, shallow
> >> first),
> >
> > Which should result in us getting some long links, although they are
> > unlikely due to the nodes near the seednodes being overloaded.
> >
> >> (2) the RequestSender logic will always eat an incoming path folding
> >> request if we want a peer (often for very short paths),
> >
> > Maybe he needs the path? Maybe the originator doesn't? Also see the
> > next item.
>
> I think that is very far from the common case. If we presume that the
> "wantPeer" state of all nodes in the chain are about the same (because
> it is mostly based on timings), I would expect an originating node
> (even 2-3 hops from a long link) to be starved from needed and wanted
> long links.
Okay, we are talking about path folding here. Your argument is that path
folding will provide connections to peers which are small in hops and therefore
relatively close to the originator's location, and relatively distant from the
target location, and therefore tend to enhance specialisation. And that this is
caused by resetting connections.
>
> Couldn't this middleman-node get destination samples of his own?
> Surely he is also making requests into the network.
He may just be routing. I object in the strongest possible terms to path
folding only applying to nodes which are actively requesting CHKs. Nodes which
don't keep up are effectively expelled from the network. We do not want to have
to spider the network merely to stay on it. Of course, nodes can participate in
path folding both as requestor and responder - but to only participate as
responder would likely have serious consequences.
>
> At the very least, it should be probabalistic (unless we are the
> originator). Otherwise what we are optimizing for is very far from
> destination sampling, we will be folding the path at every
> opportunity, including in the direction of the initial and post
> routing-miss (is there a paper on that?).
On 0.5, we had probabilistic path folding, but the probability depended on the
node's load. Hence we tried to divert load away from ourselves when we were
overloaded - by letting the path folding attempt go through and thus add a peer
to the requestor whose requests came to us.
IMHO even with resets, we will almost always be connecting to a node that is
significantly closer to the target than we are.
>
> >> (3) the RequestHandler logic will probabilistically drop an incoming
> >> path folding request (about once every HTL)
> >
> > This is a security feature. It's been present since well before 0.5,
> > and came from our theoreticians - Oskar or Ian; I think it may even
> > have been in the original paper. The objective is to ensure that the
> > data source doesn't end up knowing reliably the request originator's
> > identity. Of course this is a fairly weak protection because there
> > is a significant chance that it *is* the originator... So if you are
> > interested in who requests specific relatively unpopular content,
> > and you control a bunch of nodes which are near the "ideal"
> > locations for a bunch of keys in the file, you can record who
> > requests them and correlate them and you should be able to identify
> > some of the nodes requesting that file. This costs a lot less than
> > connecting to every opennet node, although a lot more than mobile
> > attacker source tracing.
> >
> >> (4) if path folding request is dropped (or not wanted), an
> >> intermediate node might forward it's reference instead
> >
> > This is an integral part of #3.
> >>
> >> I think the behavior of 2, 3, & 4 are more likely to be making the
> >> peer-clumps, which (as you said) might not be a horrible thing. But I
> >> think all of these points go against the original intent of
> >> destination sampling.
> >
> > IMHO the bias is to short links.
>
> Agreed. I think that these points might be working together to reduce
> the number of long links to unhealthy levels; but mostly #2.
To clarify, IMHO in path folding the bias is towards few hops, which means to
links which are relatively close to our location, and relatively far from the
target location. Mostly the target location will be close to us anyway, but
when it's not, we can expect to get connections which are only a few hops away
so are not all the way across the keyspace but only partly across it. However I
am not convinced this is a serious problem: In a few hops you can get quite a
distance.
>
> > This should probably result in less clumping, not more, because all
> > the nodes later on are reasonably close to the target. However we
> > get to the target fairly quickly. In announcement the first few
> > nodes that might give us long links are overloaded, but in path
> > folding there is no such problem;
> > so a bias to short links would tend to weaken path folding and give
> > us more longish links and fewer short links, which may be bad.
>
> I don't follow. More short links gives us less short links? #1 is
> about announcements; 2,3,4 are path folding. I agree they interact.
More links of few hops gives us more links of long keyspace distance, in
announcement.
However in path folding, more links of few hops gives us more links of short
keyspace distance (relative to our own location).
>
> >> In code, announcements are routed to the node's current location only
> >> (one thing modified in my experiments),
> >
> > Which is the right thing to do. [...snip...]
> >
> >> and only to the seed nodes
> >> (even if we have connected peers).
> >
> > That is a fair point, we should announce through our darknet peers
> > etc. In fact I thought we did... Maybe we don't?
>
> I think anything to take load off the seed nodes would be a good
> thing. Is there a reason to not also announce through opennet peers?
No, we should announce through all our peers. Are you sure we don't?
>
> >> I would expect them to run (approximately) the same paths each time,
> >> therefore.
> >
> > There are many seednodes and the paths will be different depending
> > on which you start at.
>
> After pondering this a while, I have changed my idea on the busy-nodes-
> near-seed-nodes theory. What I would expect to see is much "network
> churn" near seed nodes. Because for any particular location,
> announcements traveling through the same path will tend to replace
> that path.
>
Network churn is deliberately throttled...
> >>> They rapidly approach the target location, and thus having the
> >>> opportunity to connect on each node should roughly give us the 1/d
> >>> distribution that theory tells us is necessary: A few long links and
> >>> a lot of short links.
> >> ...
> >>> This is plausible given that I'd expect the nodes close to the
> >>> seednodes, where you might get your few long links from, to be
> >>> overloaded.
> >>
> >> Yea, by announcing shallow-first, all the nodes around the seed nodes
> >> will (nearly always) not want a peer. If we go depth first we could
> >> even count the number of forwarded refs, and only send our ref if
> >> that
> >> count is too low.
> >
> > I don't follow. An announcement yields many peers, and we only want
> > to send our ref once, to limit bandwidth usage.
>
> None of that has changed or is in question.
>
> Here is a crude interaction diagram (monospaced font will help)
>
> [Present impl. Shallow-first]
>
> Newbie Node | Seed | A | B | C | D |
> ------------+--------+-----+-----+-----+-----+
> | | | | | |
> 1=ref(new) ---> * ---->* | | | |
> <---ref(A)-/ \ | | | |
> got-A | | \-->* | | |
> | *<---ref(B)-/ \ | | |
> <--ref(B)-/ | | \-->* | |
> got-B | | *<-ref(C)-/ \--->* |
> <------ref(C)-/ | *<-ref(D)--/ |
> got-C | *<--ref(D)--/ | | |
> *<-ref(D)-/ | | | | |
> got-D | | | | | |
>
>
> [Depth-first, cap=2 (for example)]
>
> Newbie Node | Seed | A | B | C | D |
> ------------+--------+-----+-----+-----+-----+
> | | | | | |
> 1=ref(new) ---> * ---->* | | | |
> | | \-->* | | |
> | | | \--->* | |
> | | | | \--->* |(e.g. htl expire)
> | | | |*<-ref(D)-*|
> | | |*<-ref(D) | |
> | |*<-ref(D) | | |
> |*<-ref(D) | |*<-done()- |
> got-D <--/ | |*<-ref(C)-*| |(C: done, need more)
> | |*<-ref(C) | | |
> |*<-ref(C) | | | |
> got-C <--/ | |*<-done()- | |
> | | |(*!*)| | |(B: numForwarded>=cap)
> | |*<-done()- | | |(B: doesn't forward ref)
> |*<-done()- | | | |(A: numForwarded>=cap)
> (done) <-done()- | | | | |
I have yet to be convinced that shallow first vs depth first is relevant. You
would have to establish that we are rejecting peers from later on in the
announcement because we've already accepted early on nodes. Even if this was
true it would cause a bias to links which are few in hops and therefore large
in keyspace distance.
>
> >> Instead of sending our ref once we accept an announcement, it sends
> >> the ref just-before sending the completion message (that is, after we
> >> have routed the deeper replies).
> >
> > And it only keeps one noderef, so we only accept one node, afaics.
>
> I don't know what you mean by "keeps one noderef".
> It is intended to forward downstream requests, but only forward ours
> when we are "done".
> It is quite possible that it is written wrong, I'll take another look
> at it.
It only accepts one peer into the routing table, as far as I can see.
>
> >> It is intended to deliver the announcement replies in a depth-first
> >> order to the originator, in practice out-of-order messages might make
> >> them only "roughly" depth first.
> >
> > I don't see how it does that.
>
> it is meant to change:
>
> start() {
> sendOurRef();
> route();
> complete();
> }
>
> complete() {
> sendDonePacket();
> }
>
> to become:
>
> start() {
> route();
> complete();
> }
>
> complete() {
> sendOurRef();
> sendDonePacket();
> }
>
> but again, I would add an extra conditional...
>
> complete() {
> if (numForwardedRefs<MAX_ANNOUNCE_REPLIES)
> sendOurRef();
> sendDonePacket();
> }
How does this prevent the earlier matches from being accepted? If we wait for
all the announcement replies, and then send our ref, they will all accept us,
won't they?
>
> > Even if it did I'm not sure what difference it would make.
>
> (1) announcing would get nearly-all short links (rely on path folding
> for long)
Please clarify short vs long here. If you mean announcement would get nearly
all links close to the target in keyspace terms, well, that is happening now
isn't it? Isn't that exactly what your graphs show?
What we want is a few high keyspace distance links and a lot of low keyspace
distance links. And as far as I can see the current code should give that,
although it might give too few high keyspace distance links because of always
starting on a seednode.
> (2) we could cap the number of results announces will return to
> eliminate near-seednode churn effect
Only if we are rejecting later on peers returned from announcement because of
having accepted early on ones. You have not established this.
> (3) links "close to the seed nodes" would not constantly be replaced
> with bootstrapping nodes
Only if they know not to accept us when we send our reference. Which they
don't. You'd have to introduce some sort of counter, requiring changes to
messages.
> (4) seed nodes would let us connect at a given location, rather than
> being the authority on what is at a given location
> (4a) in other words, be "window" into the network, and not an "anchor"
I don't get it.
> (5) it follows the destination sampling paradigm, for which we have
> very-nearly-hard-math
Which doesn't directly apply.
>
>
> On 2010/11/05 (Nov), at 12:20 PM, Matthew Toseland wrote:
>
> > ...there are so few accepts that we will always have to announce to
> > multiple nodes.
>
> This might actually be proof of the "churn" effect I mentioned.
> Because all the paths to remote locations (from a seed node), are
> filled with
> recently-announced nodes which are in there connection grace period.
Nonetheless it undermines your key claim which is that we are getting too many
answers and not being able to accept them all.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20101106/81a82990/attachment.pgp>