On 2010/11/05 (Nov), at 10:37 AM, Matthew Toseland wrote:

> Okay, this email is very confusing up to the point where I realise  
> what you mean by "depth first". Normally "depth first" refers to a  
> depth first search, which is exactly what we do in AnnounceSender,  
> RequestSender etc. We go as deep as we can before backtracking.
>
> However what you mean appears to be that, currently, we send, and  
> process, the node references from the originator's end (usually a  
> seednode) first. Hence if we get lots of noderefs, we end up  
> accepting refs at the beginning (which initially are far from the  
> target location

Yes! Technical thought communicated!

> and would give us long links, but those nodes are usually overloaded  
> so won't give much), instead of refs at the end (which generally  
> should be highly specialised although not necessarily as close to  
> the target as the nodes in the middle).

We have often said "overloaded" nodes (near the seed nodes), but I  
have changed my perception of this (more later).

>> As best I can see, then, there may be major problems with the current
>> announcement and path folding implementations:
>> (1) announcements gather at non-destination points (in fact, shallow
>> first),
>
> Which should result in us getting some long links, although they are  
> unlikely due to the nodes near the seednodes being overloaded.
>
>> (2) the RequestSender logic will always eat an incoming path folding
>> request if we want a peer (often for very short paths),
>
> Maybe he needs the path? Maybe the originator doesn't? Also see the  
> next item.

I think that is very far from the common case. If we presume that the  
"wantPeer" state of all nodes in the chain are about the same (because  
it is mostly based on timings), I would expect an originating node  
(even 2-3 hops from a long link) to be starved from needed and wanted  
long links.

Couldn't this middleman-node get destination samples of his own?  
Surely he is also making requests into the network.

At the very least, it should be probabalistic (unless we are the  
originator). Otherwise what we are optimizing for is very far from  
destination sampling, we will be folding the path at every  
opportunity, including in the direction of the initial and post  
routing-miss (is there a paper on that?).

>> (3) the RequestHandler logic will probabilistically drop an incoming
>> path folding request (about once every HTL)
>
> This is a security feature. It's been present since well before 0.5,  
> and came from our theoreticians - Oskar or Ian; I think it may even  
> have been in the original paper. The objective is to ensure that the  
> data source doesn't end up knowing reliably the request originator's  
> identity. Of course this is a fairly weak protection because there  
> is a significant chance that it *is* the originator... So if you are  
> interested in who requests specific relatively unpopular content,  
> and you control a bunch of nodes which are near the "ideal"  
> locations for a bunch of keys in the file, you can record who  
> requests them and correlate them and you should be able to identify  
> some of the nodes requesting that file. This costs a lot less than  
> connecting to every opennet node, although a lot more than mobile  
> attacker source tracing.
>
>> (4) if path folding request is dropped (or not wanted), an
>> intermediate node might forward it's reference instead
>
> This is an integral part of #3.
>>
>> I think the behavior of 2, 3, & 4 are more likely to be making the
>> peer-clumps, which (as you said) might not be a horrible thing. But I
>> think all of these points go against the original intent of
>> destination sampling.
>
> IMHO the bias is to short links.

Agreed. I think that these points might be working together to reduce  
the number of long links to unhealthy levels; but mostly #2.

> This should probably result in less clumping, not more, because all  
> the nodes later on are reasonably close to the target. However we  
> get to the target fairly quickly. In announcement the first few  
> nodes that might give us long links are overloaded, but in path  
> folding there is no such problem;
> so a bias to short links would tend to weaken path folding and give  
> us more longish links and fewer short links, which may be bad.

I don't follow. More short links gives us less short links? #1 is  
about announcements; 2,3,4 are path folding. I agree they interact.

>> In code, announcements are routed to the node's current location only
>> (one thing modified in my experiments),
>
> Which is the right thing to do. [...snip...]
>
>> and only to the seed nodes
>> (even if we have connected peers).
>
> That is a fair point, we should announce through our darknet peers  
> etc. In fact I thought we did... Maybe we don't?

I think anything to take load off the seed nodes would be a good  
thing. Is there a reason to not also announce through opennet peers?

>> I would expect them to run (approximately) the same paths each time,
>> therefore.
>
> There are many seednodes and the paths will be different depending  
> on which you start at.

After pondering this a while, I have changed my idea on the busy-nodes- 
near-seed-nodes theory. What I would expect to see is much "network  
churn" near seed nodes. Because for any particular location,  
announcements traveling through the same path will tend to replace  
that path.

>>
>>> They rapidly approach the target location, and thus having the
>>> opportunity to connect on each node should roughly give us the 1/d
>>> distribution that theory tells us is necessary: A few long links and
>>> a lot of short links.
>> ...
>>> This is plausible given that I'd expect the nodes close to the
>>> seednodes, where you might get your few long links from, to be
>>> overloaded.
>>
>> Yea, by announcing shallow-first, all the nodes around the seed nodes
>> will (nearly always) not want a peer. If we go depth first we could
>> even count the number of forwarded refs, and only send our ref if  
>> that
>> count is too low.
>
> I don't follow. An announcement yields many peers, and we only want  
> to send our ref once, to limit bandwidth usage.

None of that has changed or is in question.

Here is a crude interaction diagram (monospaced font will help)

[Present impl. Shallow-first]

Newbie Node |  Seed  |  A  |  B  |  C  |  D  |
------------+--------+-----+-----+-----+-----+
             |        |     |     |     |     |
1=ref(new) --->  * ---->*  |     |     |     |
             <---ref(A)-/ \ |     |     |     |
got-A       |        |    \-->*  |     |     |
             |    *<---ref(B)-/ \ |     |     |
       <--ref(B)-/    |     |    \-->*  |     |
got-B       |        |   *<-ref(C)-/ \--->*  |
           <------ref(C)-/  |  *<-ref(D)--/   |
got-C       |    *<--ref(D)--/   |     |     |
       *<-ref(D)-/    |     |     |     |     |
got-D       |        |     |     |     |     |


[Depth-first, cap=2 (for example)]

Newbie Node |  Seed  |  A  |  B  |  C  |  D  |
------------+--------+-----+-----+-----+-----+
             |        |     |     |     |     |
1=ref(new) --->  * ---->*  |     |     |     |
             |        |    \-->*  |     |     |
             |        |     |   \--->*  |     |
             |        |     |     |   \--->*  |(e.g. htl expire)
             |        |     |     |*<-ref(D)-*|
             |        |     |*<-ref(D)  |     |
             |        |*<-ref(D)  |     |     |
             |*<-ref(D)     |     |*<-done()- |
got-D    <--/        |     |*<-ref(C)-*|     |(C: done, need more)
             |        |*<-ref(C)  |     |     |
             |*<-ref(C)     |     |     |     |
got-C    <--/        |     |*<-done()- |     |
             |        |     |(*!*)|     |     |(B: numForwarded>=cap)
             |        |*<-done()- |     |     |(B: doesn't forward ref)
             |*<-done()-    |     |     |     |(A: numForwarded>=cap)
(done)  <-done()-    |     |     |     |     |


>> Instead of sending our ref once we accept an announcement, it sends
>> the ref just-before sending the completion message (that is, after we
>> have routed the deeper replies).
>
> And it only keeps one noderef, so we only accept one node, afaics.

I don't know what you mean by "keeps one noderef".
It is intended to forward downstream requests, but only forward ours  
when we are "done".
It is quite possible that it is written wrong, I'll take another look  
at it.

>> It is intended to deliver the announcement replies in a depth-first
>> order to the originator, in practice out-of-order messages might make
>> them only "roughly" depth first.
>
> I don't see how it does that.

it is meant to change:

start() {
        sendOurRef();
        route();
        complete();
}

complete() {
        sendDonePacket();
}

to become:

start() {
        route();
        complete();
}

complete() {
        sendOurRef();
        sendDonePacket();
}

but again, I would add an extra conditional...

complete() {
        if (numForwardedRefs<MAX_ANNOUNCE_REPLIES)
                sendOurRef();
        sendDonePacket();
}

> Even if it did I'm not sure what difference it would make.


(1) announcing would get nearly-all short links (rely on path folding  
for long)
(2) we could cap the number of results announces will return to  
eliminate near-seednode churn effect
(3) links "close to the seed nodes" would not constantly be replaced  
with bootstrapping nodes
(4) seed nodes would let us connect at a given location, rather than  
being the authority on what is at a given location
        (4a) in other words, be "window" into the network, and not an "anchor"
(5) it follows the destination sampling paradigm, for which we have  
very-nearly-hard-math


On 2010/11/05 (Nov), at 12:20 PM, Matthew Toseland wrote:

> ...there are so few accepts that we will always have to announce to  
> multiple nodes.

This might actually be proof of the "churn" effect I mentioned.  
Because all the paths to remote locations (from a seed node), are  
filled with
recently-announced nodes which are in there connection grace period.

--
Robert Hailey


-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20101105/e404f552/attachment.html>

Reply via email to