On Tuesday 05 May 2009 05:19:35 Evan Daniel wrote:
> On Mon, May 4, 2009 at 6:15 PM, Matthew Toseland
> <toad at amphibian.dyndns.org> wrote:
> > On Monday 04 May 2009 17:29:51 Evan Daniel wrote:
> >> On Mon, May 4, 2009 at 11:33 AM, Matthew Toseland
> >> <toad at amphibian.dyndns.org> wrote:
> >> > 1. Release the 20 nodes barrier (206 votes)
> >> >
> >> > As I have mentioned IMHO this is a straightforward plea for more
> > performance.
> >>
> >> I'll reiterate a point I've made before.
> >>
> >> While this represents a simple plea for performance, I don't think
> >> it's an irrational one -- that is, I think the overall network
> >> performance is hampered by having all nodes have the same number of
> >> connections.
> >>
> >> Because all connections use similar amounts of bandwidth, the network
> >> speed is limited by the slower nodes. ?This is true regardless of the
> >> absolute number of connections; raising the maximum for fast nodes
> >> should have a very similar effect to lowering it for slow nodes. ?What
> >> matters is that slow nodes have fewer connections than fast nodes.
> >>
> >> For example, the max allowed connections (and default setting) could
> >> be 1 connection per 2KiB/s output bandwidth, but never more than 20 or
> >> less than 15.
> >
> > What would the point be? Don't we need a significant range for it to make 
much
> > difference?
> 
> If the network is in fact limited by the per-connection speed of the
> slower nodes, and they are in fact a minority of the network,
> increasing the per-connection bandwidth of the slower nodes by 33%
> should result in a throughput increase for most of the rest of the
> network of a similar magnitude.  A performance improvement of 10-30%
> should be easily measurable, and (at the high end of that) noticeable
> enough to be appreciated by most users.

Well, it *shouldn't* be limited by the slowest nodes, because they should get 
backed off. And I dunno if 30% is measurable - the signal is incredibly 
noisy. I'd hope it would be.
> 
> Really, though, the idea would be to use it as a network-wide test.
> Small tests by a few users are helpful, but not nearly as informative
> as a network-wide test.  Assuming the change produced measurable
> improvement, it would make sense to explore further changes.  For
> example, changing the range to 15-30, or increasing the per-connection
> bandwidth requirement, or making the per-connection requirement
> nonlinear, or some other option.  However, security concerns
> (especially ubernodes) are bigger with more dramatic changes.

Yes, but manageable imho. The interaction with FOAF throws up additional 
security challenges, but the per-node request proportion limit should deal 
with this adequately, no?
> 
> >> Those numbers are based on some (very limited) testing
> >> I've done -- if I reduce the allowed bw, that is the approximate
> >> number of connections required to make full use of it.
> >>
> >> Reducing the number of connections for slow nodes has some additional
> >> benefits. ?First, my limited testing shows a slight increase in
> >> payload % at low bw limits as a result of reducing the connection
> >> count (there is some per-connection network overhead).
> >
> > True.
> 
> To be specific, my anecdotal evidence is that it improves the payload
> fraction by roughly 3-8%.

Yes but a 10KB/sec opennet node with 10 peers is noticeably slower than a 
10KB/sec opennet node with 20, no? We should find out.
> 
> >
> >> Second, bloom
> >> filter sharing represents a per-connection overhead (mostly in the
> >> initial transfer -- updates are low bw, as discussed). ?If (when?)
> >> implemented, it will represent a smaller total overhead with fewer
> >> connections than with more. ?Presumably, the greatest impact is on
> >> slower nodes.
> >
> > Really it's determined by churn, isn't it? Or by any heuristic artificial
> > limits we impose...
> 
> My assumption is that connection duration is well modeled by a
> per-connection half-life, that is largely independent of the number of
> connections.  The bandwidth used on such filters is proportional to
> the total churn, so fewer connections means less churn in absolute
> sense but the same connection half-life.  (That is, bloom filter
> bandwidth usage is proportional to # of connections * per-connection
> churn rate.)  I don't have any evidence for that assumption, though.

Isn't the churn rate proportional to the number of requests handled more than 
the number of connections? Fewer connections might even mean more churn?
> 
> >> On the other hand, too few connections may make various attacks
> >> easier. ?I have no idea how strong an effect this is. ?However, a node
> >> that has too many connections (ie insufficient bw to use them all
> >> fully) may show burstier behavior and thus be more susceptible to
> >> traffic analysis.
> >
> > Yes, definitely true with our current padding algorithms.
> >
> >> In addition, fewer connections means a larger
> >> network diameter on average, which may have an impact on routing.
> >> Lower degree also means that the node has fewer neighbor bloom filters
> >> to check, which means that a request is compared against fewer stores
> >> during its traversal of the network.
> >
> > True.
> 
> Do you know how big a problem this would cause?  My assumption is that
> it would be a fairly small effect even on the nodes with fewer
> connections, and that they would be in the minority.

Well, higher average degree should make routing more effective, although it 
will make it less easy to see whether it works. But at this point I think 
there is every reason to think that routing works, and performance should 
therefore take precedence, within reason.
> 
> >> I'm intentionally suggesting a small change -- it's less likely to
> >> cause major problems. ?By keeping the ratio between slow nodes (15
> >> connections) and fast nodes (20 connections) modest, the potential for
> >> reliance on ubernodes is kept minimal. ?(Similarly, if you want to
> >> raise the 20 connections limit instead of lower it, I think it should
> >> only be increased slightly.)
> >
> > Why? I don't see the point unless the upper bound is significantly higher 
than
> > the lower bound: any improvement won't be measurable.
> 
> As above, I would hope that the improvement *would* be measurable,
> even though it wouldn't be huge.

Maybe. I'd be more inclined to try 15-30. Or even 10-30, but past experience 
suggests maybe 10 is too few, although it doesn't really explain why.
> 
> >> And finally: I have done some testing on this proposed change. ?At
> >> first glance, it looks like it doesn't hurt and may help. ?However, I
> >> have not done enough testing to be able to say anything with
> >> confidence. ?I'm not suggesting to implement this change immediately;
> >> rather, I'm saying that *any* change like this should see some
> >> real-world testing before implementation, and that reducing the
> >> defaults for slow nodes is as worthy of consideration and testing as
> >> raising it for fast nodes.
> >
> > We did try this (with a minimum of 10 connections), and it seemed that 
slow
> > nodes with only 10 connections were significantly slower. However, this 
was
> > not based on widespread testing. My worry is that slow nodes with few
> > connections will be *too* slow, and the network will marginalise them. But
> > it's a tradeoff between slightly more efficiency, fewer routes to choose
> > from, and fewer nodes sending requests...
> 
> Would they be any more marginalized than they already are?  If they
> have fewer connections, then their routes aren't as good, but they
> should reject incoming requests from the connections they do have less
> often, right?

Yes, that's why the results from testers were perplexing. But maybe they were 
flukes, or were measuring the wrong thing; it certainly wasn't a rigorous 
test. How to measure the difference systematically?
> 
> >> Also: do we have any idea what the distribution of available node
> >> bandwidth looks like?
> >
> > It would be great, wouldn't it? Maybe a survey? What questions should we 
ask?
> 
> Hmm.  Depends a bit on how general a survey you want to make it, I
> suppose.  Would this be done as a new survey toadlet, or by some other
> means?  I get the impression not many people answer email surveys or
> feedback solicitations.

True.
> 
> Here's a few thoughts:
> 
> -- Configured output limit
> -- user-reported nominal connection speed
> -- whether the node is limiting on the configured limit or wire speed
> (or, more generally, the rejection reasons counts)
> 
> -- number of opennet connections
> -- number of darknet connections
> -- number of backed off peers
> -- recent uptime %
> -- datastore size
> -- anything else of interest from the stats page
> 
> -- feature priorities
> -- general user comments
> 
> Evan Daniel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20090505/aea6b2af/attachment.pgp>

Reply via email to