[freenet-dev] Improving insert persistence

Matthew Toseland Tue, 29 Jun 2010 14:26:54 +0100

Tests showed ages ago that triple inserting the same block gives 90%+ 
persistence after a week instead of 70%+. There is a (probably statistically 
insignificant) improvement relative even to inserting 3 separate blocks.
I had thought it was due to not forking on cacheable, but we fork on cacheable 
now and the numbers are the same.

Unfortunately I don't have any data more recent than 2010-05-03, due mostly to
build problems and other bugs resulting from local changes. However, the fork
on cacheable change went in way before that. Here is the output for all the
tests up to that day:

Lines where insert failed or no fetch: too short: 10 broken: 0 no number: 1 no
url: 0 no fetch 48
Total attempts where insert succeeded and fetch executed: 110
Single keys succeeded: 108
MHKs succeeded: 104
Single key individual fetches: 330
Single key individual fetches succeeded: 258
Success rate for individual keys (from MHK inserts): 0.7818181818181819
Success rate for the single key triple inserted: 0.9818181818181818
Success rate for the MHK (success = any of the 3 different keys worked):
0.9454545454545454

If these figures are accurate, a single key inserted once has a half life of
approx 3 weeks; a single key inserted 3 times has a half life of approx 38
weeks (an earlier stat was 0.96, which is 17 weeks); an MHK has a half life of
13 weeks.

Now, there is not enough data here, I need to write a new tester which compares
with the fork on cacheable flag enabled to it disabled and fetches a lot more
blocks. In fact it turns out we have only fetched data up to 2010/03/17. :|

So the most immediate thing is to rewrite the insert tester, make it insert
more blocks, make it compare with the fork on cacheable flag to without it,
make absolutely sure it works, make sure it emails me whenever it can't run etc.

But lets suppose for sake of argument that the above figures are not greatly
affected by fork on cacheable.

Is this at least plausible?

Well, fork on cacheable was enabled by default on Feb 6th. We have a 1 week
interval, and the tester runs whatever I'm working on, so data from the 14th
should be interesting. Out of this period we have 32 successes and 5 failures,
for a block level success rate of 86%. So fork on cacheable seems to have
helped a bit, but we need more data to be sure. To break it down:

Blocks inserted once for MHK test: 4 failed 23 succeeded = 85%
Block triple-inserted: 1 failed 9 succeeded = 90%

This seems pretty inconclusive. So we really need more data.

I do however have a proposal based on the assumption that triple-insert still
makes a big difference. Lets assume it does, and then see what follows...

Firstly, inserts do not create failure table entries. So they are always routed
the same. In the few minutes it takes to insert a block 3 times (tests were
always with single blocks so straight after the first insert), the set of peers
is unlikely to have changed much. That leaves:
- Backoff.
- Rejections.

Either resulting in the data not being accepted, either not at enough nodes, or
not at the nodes where other nodes will look for the data.

The solution would seem to be to emulate triple insertion by making inserts 3
times as likely to be accepted.

With rejections:

Some reject reasons are randomised. We can roll the dice 3 times for them.

But output bandwidth liability limiting is the big one. How do we adapt it to
favour inserts, as evanbd originally suggested? A larger window for inserts
might be an option. IMHO the best solution is not to hack it with window sizes
but to target a ratio:
- Track the average number of requests of each type (CHK/SSK req/ins) accepted
recently. (This should not be just tracking hits over the last minute because
that will result in oscillation; I prefer taking the interval between accepts,
converting it to a frequency, taking the log and feeding it to a klein filter).
- If the "space left" for more requests is less than 1 CHK insert, 1 SSK
insert, 1 CHK request and 1 SSK request, accept the request. This is already
the case.
- If there is no space left, reject the request. This is also already the case.
- Otherwise, consider the frequencies of recently accepted requests. Decide
whether to accept based on maintaining a target ratio. That target ratio should
be 1/1/1/1 for the equally-likely-to-accept-any-type that we aim for at the
moment. If we want to prefer inserts, we could change it to 1/1/3/3.

With backoff:

IMHO rejections are more likely to be the culprit. There just isn't that much
backoff any more. However, we could allow an insert to be routed to a backed
off peer provided the backoff-time-remaining is under some arbitrary threshold.

Now, can we test these proposals? Yes.

We need a new MHK tester to get more data, and determine whether triple
insertion still helps a lot. IMHO there is no obvious reason why it would have
degenerated. We need to insert and request a larger number of blocks (rather
than 3+1 per day), and we need to test with fork on cacheable vs without it. We
should probably also use a 2 week period rather than a 1 week period, to get
more detailed numbers. However, we can add two more per-insert flags which we
could test:
- Ignore low backoff: If enabled, route inserts to nodes with backoff time
remaining under some threshold. This is easy to implement.
- Prefer inserts: If enabled, target a 1/1/3/3 ratio rather than a 1/1/1/1
ratio. To implement this using the current kludge, we would need to deduct the
space used by 2 inserts of each type from the space used, when we are
considering whether to accept an insert. However IMHO the current kludge
probably doesn't work very well. It would likely be better to change it as
above, then we could just have a different target ratio. But for testing
purposes we could reasonably just try the kludge.

Of course, the real solution is probably to rework load management so we don't
misroute, or misroute much less (especially on inserts).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20100629/c903e5ad/attachment.pgp>

[freenet-dev] Improving insert persistence

Reply via email to