[freenet-dev] Improving insert persistence

Matthew Toseland Tue, 29 Jun 2010 06:27:15 -0700

Tests showed ages ago that triple inserting the same block gives 90%+ 
persistence after a week instead of 70%+. There is a (probably statistically 
insignificant) improvement relative even to inserting 3 separate blocks.
I had thought it was due to not forking on cacheable, but we fork on cacheable 
now and the numbers are the same.


Unfortunately I don't have any data more recent than 2010-05-03, due mostly to 
build problems and other bugs resulting from local changes. However, the fork 
on cacheable change went in way before that. Here is the output for all the 
tests up to that  day:

Lines where insert failed or no fetch: too short: 10 broken: 0 no number: 1 no 
url: 0 no fetch 48
Total attempts where insert succeeded and fetch executed: 110
Single keys succeeded: 108
MHKs succeeded: 104
Single key individual fetches: 330
Single key individual fetches succeeded: 258
Success rate for individual keys (from MHK inserts): 0.7818181818181819
Success rate for the single key triple inserted: 0.9818181818181818
Success rate for the MHK (success = any of the 3 different keys worked): 
0.9454545454545454

If these figures are accurate, a single key inserted once has a half life of 
approx 3 weeks; a single key inserted 3 times has a half life of approx 38 
weeks (an earlier stat was 0.96, which is 17 weeks); an MHK has a half life of 
13 weeks.

Now, there is not enough data here, I need to write a new tester which compares 
with the fork on cacheable flag enabled to it disabled and fetches a lot more 
blocks. In fact it turns out we have only fetched data up to 2010/03/17. :| 

So the most immediate thing is to rewrite the insert tester, make it insert 
more blocks, make it compare with the fork on cacheable flag to without it, 
make absolutely sure it works, make sure it emails me whenever it can't run etc.

But lets suppose for sake of argument that the above figures are not greatly 
affected by fork on cacheable.

Is this at least plausible?

Well, fork on cacheable was enabled by default on Feb 6th. We have a 1 week 
interval, and the tester runs whatever I'm working on, so data from the 14th 
should be interesting. Out of this period we have 32 successes and 5 failures, 
for a block level success rate of 86%. So fork on cacheable seems to have 
helped a bit, but we need more data to be sure. To break it down:

Blocks inserted once for MHK test: 4 failed 23 succeeded = 85%
Block triple-inserted: 1 failed 9 succeeded = 90%

This seems pretty inconclusive. So we really need more data.

I do however have a proposal based on the assumption that triple-insert still 
makes a big difference. Lets assume it does, and then see what follows...

Firstly, inserts do not create failure table entries. So they are always routed 
the same. In the few minutes it takes to insert a block 3 times (tests were 
always with single blocks so straight after the first insert), the set of peers 
is unlikely to have changed much. That leaves:
- Backoff.
- Rejections.

Either resulting in the data not being accepted, either not at enough nodes, or 
not at the nodes where other nodes will look for the data.

The solution would seem to be to emulate triple insertion by making inserts 3 
times as likely to be accepted.

With rejections:

Some reject reasons are randomised. We can roll the dice 3 times for them.

But output bandwidth liability limiting is the big one. How do we adapt it to 
favour inserts, as evanbd originally suggested? A larger window for inserts 
might be an option. IMHO the best solution is not to hack it with window sizes 
but to target a ratio:
- Track the average number of requests of each type (CHK/SSK req/ins) accepted 
recently. (This should not be just tracking hits over the last minute because 
that will result in oscillation; I prefer taking the interval between accepts, 
converting it to a frequency, taking the log and feeding it to a klein filter).
- If the "space left" for more requests is less than 1 CHK insert, 1 SSK 
insert, 1 CHK request and 1 SSK request, accept the request. This is already 
the case.
- If there is no space left, reject the request. This is also already the case.
- Otherwise, consider the frequencies of recently accepted requests. Decide 
whether to accept based on maintaining a target ratio. That target ratio should 
be 1/1/1/1 for the equally-likely-to-accept-any-type that we aim for at the 
moment. If we want to prefer inserts, we could change it to 1/1/3/3.

With backoff:

IMHO rejections are more likely to be the culprit. There just isn't that much 
backoff any more. However, we could allow an insert to be routed to a backed 
off peer provided the backoff-time-remaining is under some arbitrary threshold.

Now, can we test these proposals? Yes.

We need a new MHK tester to get more data, and determine whether triple 
insertion still helps a lot. IMHO there is no obvious reason why it would have 
degenerated. We need to insert and request a larger number of blocks (rather 
than 3+1 per day), and we need to test with fork on cacheable vs without it. We 
should probably also use a 2 week period rather than a 1 week period, to get 
more detailed numbers. However, we can add two more per-insert flags which we 
could test:
- Ignore low backoff: If enabled, route inserts to nodes with backoff time 
remaining under some threshold. This is easy to implement.
- Prefer inserts: If enabled, target a 1/1/3/3 ratio rather than a 1/1/1/1 
ratio. To implement this using the current kludge, we would need to deduct the 
space used by 2 inserts of each type from the space used, when we are 
considering whether to accept an insert. However IMHO the current kludge 
probably doesn't work very well. It would likely be better to change it as 
above, then we could just have a different target ratio. But for testing 
purposes we could reasonably just try the kludge.

Of course, the real solution is probably to rework load management so we don't 
misroute, or misroute much less (especially on inserts).

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl

[freenet-dev] Improving insert persistence

Reply via email to