[freenet-dev] Why does announcement take so long?

Matthew Toseland Wed, 20 Jan 2010 16:00:36 +0000

On Thursday 03 December 2009 23:01:59 you wrote:
> On Wed, Dec 02, 2009 at 05:02:49PM +0000, Matthew Toseland wrote:
> > On Wednesday 02 December 2009 14:13:14 vive wrote:
> > > It sounds like matching newly arrived nodes to free opennet slots works 
> > > bad. If I understand the source correctly, this also seems natural in the 
> > > current setting. Since after some time of network operation, most nodes 
> > > will have announced and gotten their desired number of opennet peers. 
> > > There will still be a steady inflow of announces and requests, which 
> > > makes arriving or existing nodes compete and grab the slots whenever they 
> > > are open. Therefore, increasing /both/ the number of desired and 
> > > available opennet connections is not going to fix the problem: they will 
> > > quickly be consumed by dynamics in the existing network. We should 
> > > consider how there could be more available slots than the established 
> > > nodes themselves require.
> > > 
> > > I propose to discuss the following solution, based on separating normal 
> > > ("persistent") opennet connections, and announcements:
> > > Nodes keep announcing as before. But let every opennet node offer a small 
> > > number of temporary slots, say 3, that it does /not/ use itself for path 
> > > folding but it only offers to announcing opennet nodes. The announcing 
> > > nodes would then get a better chance to establish themselves during a 
> > > short time window. These temporary slots could be FIFO/LRU-dropped after 
> > > 1 minute upon new announcements from other opennet nodes, or 
> > > automatically after 10 minutes to reduce the burden on nodes. When there 
> > > are free persistent slots, nodes should offer these instead than 
> > > temporary ones. But when there are no persistent slots to offer, tempoary 
> > > ones are offered for a short while. Newly arrived nodes only use these up 
> > > to, say, maximally 10 such temporary connections simultaneously - and 
> > > drop them as they operate and get persistent connections.
> > > 
> > > I think this scheme has the following benefits:
> > > 
> > > 1. Give newly announcing nodes a chance to get onto the network (to get 
> > > opennet connections that are persistent, rather than temporary ones, by 
> > > routing as well as announcing).
> > > 
> > > 2. Prevent established nodes to consume the available slots using 
> > > requests, since they do not desire more than their persistent ones.
> > > 
> > > 3. Allow newly added opennet nodes to receive requests and path fold with 
> > > each other.
> > > 
> > > (Perhaps nodes also need to become more greedy about getting these 
> > > temporary connections close to its identity key.)
> > > 
> > > Does this sound interesting? If interesting enough, I could try to 
> > > simulate this.
> > > 
> > > Cheers,
> > > /vive
> > > 
> > > 
> > > On Fri, Nov 27, 2009 at 05:36:09PM +0000, Matthew Toseland wrote:
> > > > The question boils down to why do the vast majority of nodes on an 
> > > > announcement chain reject the announcement. The answer appears to be:
> > > > 
> > > > Most of the time, we reject a new node (from path folding or an 
> > > > announcement) because we only drop a connected peer every 10 minutes, 
> > > > even if it is isDroppable(). Of course, we drop disconnected 
> > > > isDroppable() peers (= peers that disconnected more than a few minutes 
> > > > ago) very quickly, but most peers are usually connected.
> > > > 
> > > > Generally there are peers we could drop according to isDroppable() 
> > > > (more heuristics, e.g. don't drop a node if we added it less than 5 
> > > > minutes ago), but we don't drop them because of the 10 minute throttle.
> > > > 
> > > > Also, we only drop a connected peer after every 10 successful requests. 
> > > > This appears not to be very significant on my node however.
> > > > 
> > > > Thoughts? evanbd suggested a token bucket for the intervals in 
> > > > OpennetManager - we have two, this one and one for adding 
> > > > old-opennet-peers. This would allow more burstiness.
> > > > 
> > > > Or we could scrap the 10 minute interval altogether (or cut it 
> > > > drastically) and rely on the 10 successful requests criterion?
> > > > 
> > > > Anything we do to opennet may disrupt the network if we get it wrong, 
> > > > and even if it's right it may take time to settle...
> > > 
> > There are several problems here:
> > - Announcements are slow and inefficient: We send announcements to several 
> > seednodes, these pass through hundreds of nodes, and the vast majority 
> > reject them. Eventually we get some connections, and through those 
> > connections we get more connections.
> 
> Well, that is what the proposal is meant to improve.


Right.

> By reserving special slots for announcing nodes, it gets quicker for them to 
> start to operate - the natural process by which nodes get connected.  
> 
> > - Most announcements, in normal circumstances (i.e. not during a slashdot), 
> > are from existing nodes that have been offline for a while. Reconnection to 
> > existing peers can serve the same function, however on many nodes this 
> > won't work due to NATs (or will be slow and only work after we have some 
> > connections e.g. ARKs), and on those where it could work, it doesn't 
> > because reconnections are treated the same as announcements and path 
> > folding - the vast majority are rejected.
> 
> But is that surprising, since other nodes can consume them quickly? When 
> leaving the network, 20 slots get free but also potentially consumed (from 
> other requests) after a short while. 

Or from announcements.
> 
> > - So there are worries both about announcements being crowded out by path 
> > folding, reconnections being crowded out by announcements and path folding, 
> > and in general the algorithm for deciding whether to connect not taking 
> > into account what sort of connection it is. This can be resolved by 
> > reserving slots somehow. IMHO these slots should not be additional to the 
> > peers limit, they should be included in it; and when nodes exit their grace 
> > periods they should be treated as any other node i.e. dropped if we need 
> > the slot and they are at the bottom of the LRU list. Hence, the proposal is 
> > to have limits on the number of nodes that may be in their grace periods 
> > for each category - path folding, announcements and reconnects. In a given 
> > category, if we reach the limit, we summarily reject new connections of 
> > that type. We would need to pick arbitrary limits for each category e.g. 2 
> > announcements 2 reconnects (max_peers - 4) path folding. The numbers must 
> > add up to no more than the peers limit, but whether they should be less 
> > than it or equal to it I am not sure of. This mechanism could slow down 
> > announcement during slashdottings, and in fact at other times, but at least 
> > it would keep path folding working; it would likely make reconnections much 
> > more successful. Overall I hope it would be an improvement in performance 
> > even in terms of time to connect newbies, but it does involve alchemy in 
> > the parameter setting.
> 
> I disagree with treating persistent and temporary announce-related peers the 
> same, unless there are free persistent slots at the end of the grace period. 
> They should in no way be eaten up by being treated as persistent after a 
> while, unless there are free persistent slots available by then. If announce 
> peers leave the grace period they should be kicked out quickly upon new 
> announcements - otherwise we are back to the same problem. 

My proposal (with much input from evanbd) is that they become permanent peers 
which are no longer in their grace period. What this means is that they are 
subject to the LRU, so if they still have not answered a request, they will be 
kicked out very quickly.

It does solve the problem: A peer ceases to be in the announcement grace 
period, and then another announcement comes in:
- First we check whether we can accept another announcement peer. We can, 
because there is now only 1 peer in the announcement grace period out of a 
limit of 2.
- Second we find the peer at the bottom of the LRU which is not in any grace 
period. In this case this happens to be the announced node that just exited its 
grace period, because it didn't answer any requests during the grace period. So 
we can kick that peer out quite happily. In another case, maybe it did answer a 
request, and a different node is at the bottom of the LRU. It doesn't matter: 
If the node is not at the bottom of the LRU, it has earned its right to stay 
connected for the time being.
- Third we check the overall anti-churn limit (currently we can only drop a 
connected peer every 10 minutes). This limit may be per category, e.g. we 
dropped a connected peer because of an announcement 10 minutes ago so we can do 
so again now.

Whereas if an announced peer cannot become a permanent peer, it will be 
announcing forever. It is critical that an announced peer can become a 
permanent peer.
> 
> Importantly, path folding during normal network operation should not crowd 
> out more essential things: like announcing or reconnecting. After all, path 
> folding is important both for allowing nodes into the network - as well as 
> slowly optimizing the topology for routing. The standard folding from 
> requests only has a very marginal effect after a short while a of folding. 
> And almost certainly, it will crowd out announces and reconnects in the 
> current setting. This is a tradeoff between node performance and how it 
> allows other nodes to temporary use its resources - I think it clearly would 
> open more ways into the network.

It depends on the circumstances, doesn't it? If the network expands rapidly, do 
we want all the announcements to completely crowd out path folding, or do we 
want some level of path folding - which is very valuable in such an early days, 
stressful situation - to go on? IMHO we need to protect each of the 3 
categories against the others: Announcements should not crowd out path folding 
any more than path folding crowds out announcements.
> 
> > - The code is overly complex with regards to dropping disconnected, 
> > droppable peers. We should count the number of peers which are either 
> > connected or are not droppable, and if this is less than the peers limit, 
> > accept the incoming peer unconditionally. If it isn't, we need to apply the 
> > above logic.
> 
> Agreed on complexity. Lots of nice microoptimisations :-)
> With respect to counting: yes, if the slots for available persistent peers 
> are not filled - a normal slot should be given. Otherwise, a node gets a 
> temporary slot (based on some scheme) within a minute or two, which is has 
> for max 10 minutes (but potentially only a minute or two if others ask for 
> it!) if no other node asks for it. The key here is to not treat it like the 
> other peers with the 10-minute-before-drop limit. But in case a persistent 
> slot becomes available during that time, a temporary peer can become 
> persistent and free a temporary slot. Why not also allow a slot or two for 
> reconnecting nodes, since the point is to let the nodes *operate* and read 
> network traffic for available requests and announcements. After all, 
> reconnects dont happen very frequently to individual nodes: that dynamics is 
> much slower.

IMHO it is exactly the same for reconnects as for announcements: We don't want 
them to be crowded out by path folding (or by announcements for that matter).

As far as I can see your proposal is that announced or reconnected peer slots 
become so precarious that announcement or reconnection will become a permanent 
state, massively increasing load, only terminated by nodes managing to path 
fold through their temporary connections - which they won't if they only have a 
matter of seconds to do it in because they are immediately trumped by any other 
announcement. This is not helpful. We have to give a new announced node a 
reasonable period in which to prove itself by answering a request, and if it 
does, it should be upgraded to a permanent connection - which is in fact a very 
precarious situation.
> 
> > - Currently the dominant reason for us to reject a connection is the fact 
> > that we only drop a connected opennet peer (whether it is in its grace 
> > period or not) every 10 minutes. This limit should probably be 
> > significantly reduced, and should apply separately to each category e.g. a 
> > new announcement connection can only kick out an existing connection every 
> > N minutes. We have a similar limit based on the number of successful 
> > requests since we last dropped a connected peer. We might want this to 
> > apply just to path folding, or to apply for each category, maybe with 
> > different limits.
> 
> I dont think one should make several changes at once - better to implement 
> and evaluate a new method defensively. But maybe the idea to limit on the 
> number of requests is also good: when an announcement connection has seen a 
> number of successful request - it should have used those to fold with the 
> rest of the network, and can be immediately dropped.

I agree we should deploy one change at a time, or a small number of changes 
that we are fairly confident of and can test simultaneously. This is the 
current plan.
> 
> > It looks like improving announcement is probably going to have to include 
> > alchemy, but well, evanbd and vivee started that, and they're 
> > theoreticians, they can tell us what to set the parameters to :)
> 
> I still only see limited way for defensive experimentation, alchemy or 
> improved simulations here - rather than theory. The result can only be 
> observed if the implementation is global. But maybe I am wrong :-)

Both our proposals involve alchemy - the number of reserved slots or the size 
of the announced-nodes-in-grace-period limit.
> 
> > Thoughts?
> 
> I would just like to re-iterate one point: to have additional slots for 
> announcing (and reconnecting) for a node X, is IMHO /not/ meant for 
> connecting nodes to sit on some "waiting-list" to get upgraded to persistent 
> connections to node X. Instead, it is meant to allow connecting nodes to 
> start operate and observe traffic in the network as soon as possible! This is 
> what gives nodes connections. The hypothesis is that by operating in the 
> network as soon as possible, even though only for minutes - and protecting 
> such slots from path folding, this allows several things. This includes nodes 
> that are connecting in parallell to e.g. see each others announcements, start 
> normal requests themselves, observe other requests and so on - but I wrote 
> more about it in the original mail. The key is to not have the same kind of 
> 10-minute limit to these extra slots - I think that is offering them too much.

If they only have a few seconds they will not have enough time to perform path 
folding. If they can't perform path folding they will be second class citizens 
forever. It can easily take 30 seconds for a successful request (the average is 
18 iirc), and most requests are not successful.

Why is it bad for announced nodes to be able to connect to the node they 
announced to if they prove useful to it? They will be dropped quickly if they 
are not useful. Is your fundamental objection that we are dropping nodes that 
are known to work in favour of nodes that are not? We do that in path folding 
already - yes in theory the node has answered a request successfully, but that 
doesn't prove it will answer future requests too.

> What about trying simply with a period of 2 or 3 minutes before such 
> connections can be eaten up by other announcements or reconnects - and shift 
> it upwards if it doesnt work? This will quickly give an announcing nodes at 
> least a number of peers from those that had connections for more than 2 or 3 
> minutes - a good chance for nodes unless the pressure is very strong on the 
> network. 

Currently the grace period is 5 minutes. IMHO this is reasonable given that 
many requests are not CHK requests, that successful CHK requests can take a 
while, and that many requests do not succeed. And once a node gets out of the 
grace period, if it is weak it will be dropped - often to make space for a new 
announcee.
> 
> > The conversation between me and evanbd (and a little with vivee) on this 
> > topic is on this bug:
> > https://bugs.freenetproject.org/view.php?id=3753
> 
> Two comments after having read this conversation
> 
> 1. I think it is wrong to let nodes be moved from being temporary peers to 
> normal persistent peers /unless/ there are suddently free slots in the 
> persistent queue when the grace period is over (or if it happens during the 
> grace period). The temporary slots should only be to allow nodes quickly 
> start operating in the network - path folding should be made to where there 
> are available persistent slots since that is the goal. The arguments are 
> above (and in the first mail).

"Normal persistent peers" are much LESS secure than newly announced nodes, it 
is not a privelidged status. And even peers resulting from path folding *must 
have a grace period* because without it they would be dropped as soon as a new 
peer from path folding comes in - so why not have a grace period for announcees 
too, and limit the number of such privelidged nodes to prevent one class from 
overwhelming another?
> 
> 2. One point to make regarding to interpreting the nice data that evanbd has 
> collected: even though a large fraction of nodes are still probably going to 
> be online Y hours afterwards - it is a question of how relevant that is to 
> reconnecting. And how many nodes we really want to offer the reconnection 
> possibility. During the course of operating in the network nodes are likely 
> to see many different peers over the hours. There are perhaps implementation 
> reasons to limit the number of peers that can reconnect using some heuristic. 
> Lets say that the nodes implement the previously discussed sharing of 
> store/cache states. Then, keeping track of previous opennet peers and their 
> previous states may become quite costy. Such things could either limit the 
> fraction of nodes we really prefer Y hours later, or that it is simply costy 
> implementation-wise. I have not thought much about it, but it does not seem 
> clear to allow anyone we previously saw to reconnect..

Well, in terms of bloom filter sharing, we will never want to transfer such 
data unless we have seen a node regularly. It will take more than a few hours 
to send the data, especially if we are using spare bytes not used up by 
requests to do it. It is very costly and a long term commitment. It works fine 
on darknet, but on opennet it is pointless unless we are fairly sure that we 
will see the node again soon. Which means it must have been connected for many 
hours over the last month or so. And yes it is possible that bloom filter 
sharing on opennet is too costly; there are lots of other things we want to do 
to optimise data retrievability, and in particular to improve opennet 
announcement and reconnection and generally support for low uptime nodes, 
before tackling bloom filter sharing.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20100120/155a294d/attachment.pgp>

[freenet-dev] Why does announcement take so long?

Reply via email to