A big part of the performance of 0.7 relates to our pre-emptive
rejection strategy. 80% of backoffs are caused by pre-emptive rejects,
and 90% of pre-emptive rejects (in a random sample hour) are due to the
bwlimitDelayTime being too high (eventually this corrects itself through
rejections, but it looks like it cycles back soon after). Further there
is a good deal of alchemy in it.

I had hoped that high level bandwidth limiting would prevent high
bwlimitDelayTime's. I now see that I was mistaken. I propose that we
have two separate mechanisms:

1. High level bandwidth limiting. This we have now; we may want to make
it more aggressive. The purpose of this is to smooth out overall
bandwidth usage at a high level, by only accepting as many requests as
we can handle in the near future.

Implementation: A token bucket (one for input and one for output).
Tokens are added at a rate of 4/5ths of the bwlimit. The limit is
max(640kB, bwlimit * 1 minute). When a request comes in, we ask the
buckets to allocate N bytes, where N is the average number of bytes used
by a request of the given type. If we can allocate from both the input
and output buckets we accept the request.

Notes: Note that the "average number of bytes used by a request" here
includes a large number of failed requests: On my node the average
input/output for a remote CHK request is 2400 bytes for example (the
keys are 32kB!). So we can have hundreds of requests in flight. If we
get a cluster of requests of which an unusually large proportion
succeed, as happens occasionally, the replies may be sent very slowly
because there are way too many transfers happening at once.

2. Bandwidth liability limitation. Track the total number of bytes worth
of requests we have in flight at any one time, and impose an upper
limit. The purpose of the upper limit is to prevent timeouts in the
event that all the requests are successful, so it counts the expected
bytes transferred if the request succeeds in each case. We must be able
to transfer all this data within a reasonable time - say 90 seconds.

Related commit: 12250 (read explanation).

Some related work on bwlimitDelayTime:
- Request time = search time + bwlimitDelayTime * ( # hops + # blocks )
=> bwlimitDelayTime < (request timeout - search time) / ( # hops + #
blocks )
Lets assume # hops = 15 (maybe this is a bit high) and search time = 10.
With a timeout of 120 seconds:
bwlimitDelayTime < (120 - 10) / (32 + 15)
=> bwlimitDelayTime < 110 / 47
=> bwlimitDelayTime ~< 2s
With a timeout of 60 seconds:
bwlimitDelayTime < 50 / 47
=> bwlimitDelayTime ~< 1s

(Hence I increased the request timeout to 120 seconds)

Once this is implemented it would make sense to increase the
bwlimitDelayTime thresholds to 2000ms/3000ms (from the current
1000ms/2000ms).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20070321/d2026211/attachment.pgp>

Reply via email to