Colin Davis wrote:
>
>
> On Jun 2, 2006, at 8:18 AM, Matthew Toseland wrote:
>
>> On Fri, Jun 02, 2006 at 08:13:55AM -0400, Colin Davis wrote:
>>>
>>> I think the first thing to do is to add a warning, when the node is
>>> spending more than XXX% of it's bandwidth/time checking node status.
>>
>> Possibly.
>
> Great. This is fairly trivial, but lets people know that they are 
> being stupid.
>
>
>>>
>>> The second is to change the delay between checks to see if a node is
>>> back up.
>>
>> No. Some NATs have tunnel timeouts of under 30 seconds. We cannot
>> therefore increase the delay over that amount. That is the unfortunate
>> reality; everyone is NATted, and it's a PITA, but that's life.
>
>
>
> Forgive my ignorance, but it sounds like we're confusing two distinct 
> issues.
>
> The first issue is maintaining a connection to a node- For this, I 
> understand that a heartbeat needs to be sent out every ~30 secs, to 
> keep the connection running through the NAT punchthrough...
>
> But the second is determining if the node is up at all. I don't see 
> any reason this has to be the same heartbeat, with the same frequency.
> If I ask every one of my disconnected nodes if it is up (and tell it 
> that I am up) only once / hr, but do it immediately on startup, that 
> doesn't affect the above heartbeat...
>
>
>
> Example- Nodes A, B, C and D.
>
>
> Nodes A and B are running 24/7
> They exchange heartbeats every 30 secs, and punch through NATs.
>
>
>
> Nodes C and D are transient- They run when the user has free 
> bandwidth, etc.
> There is no reason that A and B should be trying to connect to these 
> machines every 30 secs....
>
>
>
> If C and D send a connection to all the nodes on their list on 
> startup, telling them that they are up, and trying to establish the 
> connection, then they are OK.
> The reply gets back, since it is within 30 secs, it goes through the 
> passthrough.
>
>
> A and B can gradually increase the rate at which they check for C and 
> D. If they don't get a reply after 30 secs, they increase it to a 
> minute. If they don't get a reply after a minute, they increase it to 
> 10 minutes, etc.
> Eventually, they hit a max in freenet.ini, and don't check any less 
> frequently than that. (Ie, always check at least once an hour)
>
> But if node C or D come back online, they are sending a connection 
> request to A and B /anyway/!
> The fact that we aren't checking for another hour doesn't hurt them.
If A or B requires a UDP hole to be punched, they may hear neither C nor 
D because they haven't sent packets often enough to keep the UDP hole 
punched.

What might work though is if C and D use a handshake retry back off in a 
different sense:

timeX increases the longer a connection has been unsuccessful
timeY is some time where timeY > ((timeX / 2 ) + timeZ
timeZ is enough time for both ends to get a UDP hole punched and make a 
successful connection

The rate of change for timeX needs to be probably at least >= timeX * 2

For example, if timeX has backed off to 1 hour, timeY might be 32 
minutes and timeZ 2 minutes.  This means that during an hour, both ends 
should be trying a connection for at least two overlapping minutes.

The whole point of this algorithm is that both peers need to be 
guaranteed to be trying to talk to each other at the same time, thus 
only less than half the time can be spent without sending handshake 
requests.


Another idea for resources management might be for a node pair to agree 
to be connected but not routed through unless bandwidth is available, at 
which point they negotiate with the other to start routing.  Kind of a 
"IDLE-CONNECTED" status.  A crude hack of this would be to randomly max 
routing back off a peer when bwlimitDelayTime gets too bad.

Reply via email to