On Tuesday 28 October 2003 05:15 pm, Tracy R Reed wrote:
> On Tue, Oct 28, 2003 at 05:44:43PM -0500, Nick Tarleton spake thusly:
> > > Right now with
> > > the totally random routing due to no specialization freenet can only
> > > store as much retrievable data as 25*n where n is the average size of
> > > the datastores on freenet and 25 is the current max htl. No bueno.
> >
> > I would think it'd be greater because although the amount of retrievable
> > data on one path is 25*n, each node has a choice of many other nodes to
> > go to.
>
> Ideally it should have to try only one path and proper routing should
> approach this ideal. If it has to try multiple paths we are greatly
> increasing cpu and bandwidth load on the network. Having just one failure
> and trying another path doubles the load.
>
> With proper routing we have "a place for everything and everything in its
> place". Sort of like keeping a file cabinet in alphabetical order or a
> library arranged by the Dewey Decimal System. This makes us much more
> efficient at finding and storing data.  We can find data more quickly and
> we can store more of it.
>
> I think once we have good routing and high degrees of specialization we
> will see overall network load decrease significantly while finding more
> information more quickly. Right now the network is doing a huge amount of
> work for relatively little result, even with the better peforming builds
> of late. In fact, if routing issues had been tackled first perhaps we
> would not have needed nio and the many other performance improvements that
> have been implemented so soon although we certainly would have needed them
> eventually.

One point you you bring up is that NGrouting does not take into account the 
network traffic. I'm willing to bet that this is why we are not seeing nodes 
specializing. Or rather they are not specializing in an obvious way, as 
clearly they are to some extent because we have not yet degenerated into RSR.

We don't see this because NGrouting is using a combination of breadth first 
and depth first searching. It does this because thanks to probabilistic 
caching the data is more likely to be on any given node as it gets closer to 
the end of the chain. So towards the end of the line it becomes more and more 
profitable to go with fast nodes likely to fail then to the closest match for 
the data if it is slow. This pattern is also self reinforcing, as if it gets 
routed in more of a breadth first manor towards the end of the request line, 
then it will get smeared out around the nodes that are close to the original 
one that had it, rather than, as the old algorithm tended to do and draw data 
closer to the requester.

What all this means is that NGrouting is really behaving the way it is because 
it is tuning itself to our caching algorithm. This is not necessarily a bad 
thing, but what it does do is rather than storing things as effectively as 
possible, it is doing the minimum possible specialization such that data will 
still be found at the average request HTL with the highest level of 
redundancy possible to reduce the average request time. (The faster nodes 
will be contacted more even when they are not so specialized, and so their 
store will contain all the most popular content.) This means that, it is, in 
effect, minimizing the storage capacity of the network.

As a result of this I would predict that if the default HTL were lowered or if 
the network grows larger, it will specialize more, so it can get to the data 
in the same number of hops and the capacity of the network would increase.  

So what should we do about this? I propose we should do two things. First make 
sure that it starts taking network resources into account. Meaning that in 
the NGrouting formula when calculating the predicted time, we need to 
consider that given finite bandwidth (as listed in the config file) 
initiating any new request will slow all the others currently processing 
down. This additional time should be added to the request time of the request 
we are making. It currently assumes that new requests never slow things down, 
so making more requests, which are likely to fail, looks more favorable than 
it really is.

Second what gets placed in the datastore should also be done in a 
non-alchemistic way. We need to look at all the incoming requests and create 
a model for what we are likely to receive requests for in the future. This 
would have to look at: both the key values and relative popularity of a 
specific key.



_______________________________________________
Devl mailing list
[EMAIL PROTECTED]
http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to