[freenet-dev] Improving insert persistence

2010-06-29 Thread Matthew Toseland
On Tuesday 29 June 2010 16:54:42 Evan Daniel wrote:
> On Tue, Jun 29, 2010 at 11:45 AM, Robert Hailey
>  wrote:
> >
> > On Jun 29, 2010, at 8:26 AM, Matthew Toseland wrote:
> >
> >> Tests showed ages ago that triple inserting the same block gives 90%+
> >> persistence after a week instead of 70%+. There is a (probably 
> >> statistically
> >> insignificant) improvement relative even to inserting 3 separate blocks.
> >> I had thought it was due to not forking on cacheable, but we fork on
> >> cacheable now and the numbers are the same.
> >> [...]
> >>
> >> With backoff:
> >>
> >> IMHO rejections are more likely to be the culprit. There just isn't that
> >> much backoff any more. However, we could allow an insert to be routed to a
> >> backed off peer provided the backoff-time-remaining is under some arbitrary
> >> threshold.
> >>
> >> Now, can we test these proposals? Yes.
> >>
> >> We need a new MHK tester to get more data, and determine whether triple
> >> insertion still helps a lot. IMHO there is no obvious reason why it would
> >> have degenerated. We need to insert and request a larger number of blocks
> >> (rather than 3+1 per day), and we need to test with fork on cacheable vs
> >> without it. We should probably also use a 2 week period rather than a 1 
> >> week
> >> period, to get more detailed numbers. However, we can add two more
> >> per-insert flags which we could test:
> >> - Ignore low backoff: If enabled, route inserts to nodes with backoff time
> >> remaining under some threshold. This is easy to implement.
> >> - Prefer inserts: If enabled, target a 1/1/3/3 ratio rather than a 1/1/1/1
> >> ratio. To implement this using the current kludge, we would need to deduct
> >> the space used by 2 inserts of each type from the space used, when we are
> >> considering whether to accept an insert. However IMHO the current kludge
> >> probably doesn't work very well. It would likely be better to change it as
> >> above, then we could just have a different target ratio. But for testing
> >> purposes we could reasonably just try the kludge.
> >>
> >> Of course, the real solution is probably to rework load management so we
> >> don't misroute, or misroute much less (especially on inserts).
> >
> > About persistence... it logically must be confined to these areas.
> >
> > 1) insertion logic
> > 2) network change over time
> > 3) fetch logic
> >
> > If there is a major issue with 2 or 3, then beefing up 1 may not be a "good"
> > solution. Then again, I like your ideas more than just chalking it up to
> > "bad network topology"...
> 
> Bad topology is not confined to those areas.  The insert / fetch logic
> can be locally correct, and the network static, and bad topology will
> still produce poor performance.

True but opennet should produce good topology shouldn't it? Generally the stats 
page seems to suggest routing is working?
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: 



[freenet-dev] Improving insert persistence

2010-06-29 Thread Evan Daniel
On Tue, Jun 29, 2010 at 5:15 PM, Matthew Toseland
 wrote:
> On Tuesday 29 June 2010 16:54:42 Evan Daniel wrote:
>> On Tue, Jun 29, 2010 at 11:45 AM, Robert Hailey
>>  wrote:
>> >
>> > On Jun 29, 2010, at 8:26 AM, Matthew Toseland wrote:
>> >
>> >> Tests showed ages ago that triple inserting the same block gives 90%+
>> >> persistence after a week instead of 70%+. There is a (probably 
>> >> statistically
>> >> insignificant) improvement relative even to inserting 3 separate blocks.
>> >> I had thought it was due to not forking on cacheable, but we fork on
>> >> cacheable now and the numbers are the same.
>> >> [...]
>> >>
>> >> With backoff:
>> >>
>> >> IMHO rejections are more likely to be the culprit. There just isn't that
>> >> much backoff any more. However, we could allow an insert to be routed to a
>> >> backed off peer provided the backoff-time-remaining is under some 
>> >> arbitrary
>> >> threshold.
>> >>
>> >> Now, can we test these proposals? Yes.
>> >>
>> >> We need a new MHK tester to get more data, and determine whether triple
>> >> insertion still helps a lot. IMHO there is no obvious reason why it would
>> >> have degenerated. We need to insert and request a larger number of blocks
>> >> (rather than 3+1 per day), and we need to test with fork on cacheable vs
>> >> without it. We should probably also use a 2 week period rather than a 1 
>> >> week
>> >> period, to get more detailed numbers. However, we can add two more
>> >> per-insert flags which we could test:
>> >> - Ignore low backoff: If enabled, route inserts to nodes with backoff time
>> >> remaining under some threshold. This is easy to implement.
>> >> - Prefer inserts: If enabled, target a 1/1/3/3 ratio rather than a 1/1/1/1
>> >> ratio. To implement this using the current kludge, we would need to deduct
>> >> the space used by 2 inserts of each type from the space used, when we are
>> >> considering whether to accept an insert. However IMHO the current kludge
>> >> probably doesn't work very well. It would likely be better to change it as
>> >> above, then we could just have a different target ratio. But for testing
>> >> purposes we could reasonably just try the kludge.
>> >>
>> >> Of course, the real solution is probably to rework load management so we
>> >> don't misroute, or misroute much less (especially on inserts).
>> >
>> > About persistence... it logically must be confined to these areas.
>> >
>> > 1) insertion logic
>> > 2) network change over time
>> > 3) fetch logic
>> >
>> > If there is a major issue with 2 or 3, then beefing up 1 may not be a 
>> > "good"
>> > solution. Then again, I like your ideas more than just chalking it up to
>> > "bad network topology"...
>>
>> Bad topology is not confined to those areas. ?The insert / fetch logic
>> can be locally correct, and the network static, and bad topology will
>> still produce poor performance.
>
> True but opennet should produce good topology shouldn't it? Generally the 
> stats page seems to suggest routing is working?
>

True in theory.  Stats page suggests routing basically works, and is
not inconsistent with good overall topology.  I have enough data from
probe requests to do serious topology analysis, but have not yet done
so.  At this point I would say that the topology is assumed to be
good, but that we aren't completely certain.

Evan Daniel



[freenet-dev] Improving insert persistence

2010-06-29 Thread Matthew Toseland
Tests showed ages ago that triple inserting the same block gives 90%+ 
persistence after a week instead of 70%+. There is a (probably statistically 
insignificant) improvement relative even to inserting 3 separate blocks.
I had thought it was due to not forking on cacheable, but we fork on cacheable 
now and the numbers are the same.

Unfortunately I don't have any data more recent than 2010-05-03, due mostly to 
build problems and other bugs resulting from local changes. However, the fork 
on cacheable change went in way before that. Here is the output for all the 
tests up to that  day:

Lines where insert failed or no fetch: too short: 10 broken: 0 no number: 1 no 
url: 0 no fetch 48
Total attempts where insert succeeded and fetch executed: 110
Single keys succeeded: 108
MHKs succeeded: 104
Single key individual fetches: 330
Single key individual fetches succeeded: 258
Success rate for individual keys (from MHK inserts): 0.7818181818181819
Success rate for the single key triple inserted: 0.9818181818181818
Success rate for the MHK (success = any of the 3 different keys worked): 
0.9454545454545454

If these figures are accurate, a single key inserted once has a half life of 
approx 3 weeks; a single key inserted 3 times has a half life of approx 38 
weeks (an earlier stat was 0.96, which is 17 weeks); an MHK has a half life of 
13 weeks.

Now, there is not enough data here, I need to write a new tester which compares 
with the fork on cacheable flag enabled to it disabled and fetches a lot more 
blocks. In fact it turns out we have only fetched data up to 2010/03/17. :| 

So the most immediate thing is to rewrite the insert tester, make it insert 
more blocks, make it compare with the fork on cacheable flag to without it, 
make absolutely sure it works, make sure it emails me whenever it can't run etc.

But lets suppose for sake of argument that the above figures are not greatly 
affected by fork on cacheable.

Is this at least plausible?

Well, fork on cacheable was enabled by default on Feb 6th. We have a 1 week 
interval, and the tester runs whatever I'm working on, so data from the 14th 
should be interesting. Out of this period we have 32 successes and 5 failures, 
for a block level success rate of 86%. So fork on cacheable seems to have 
helped a bit, but we need more data to be sure. To break it down:

Blocks inserted once for MHK test: 4 failed 23 succeeded = 85%
Block triple-inserted: 1 failed 9 succeeded = 90%

This seems pretty inconclusive. So we really need more data.

I do however have a proposal based on the assumption that triple-insert still 
makes a big difference. Lets assume it does, and then see what follows...

Firstly, inserts do not create failure table entries. So they are always routed 
the same. In the few minutes it takes to insert a block 3 times (tests were 
always with single blocks so straight after the first insert), the set of peers 
is unlikely to have changed much. That leaves:
- Backoff.
- Rejections.

Either resulting in the data not being accepted, either not at enough nodes, or 
not at the nodes where other nodes will look for the data.

The solution would seem to be to emulate triple insertion by making inserts 3 
times as likely to be accepted.

With rejections:

Some reject reasons are randomised. We can roll the dice 3 times for them.

But output bandwidth liability limiting is the big one. How do we adapt it to 
favour inserts, as evanbd originally suggested? A larger window for inserts 
might be an option. IMHO the best solution is not to hack it with window sizes 
but to target a ratio:
- Track the average number of requests of each type (CHK/SSK req/ins) accepted 
recently. (This should not be just tracking hits over the last minute because 
that will result in oscillation; I prefer taking the interval between accepts, 
converting it to a frequency, taking the log and feeding it to a klein filter).
- If the "space left" for more requests is less than 1 CHK insert, 1 SSK 
insert, 1 CHK request and 1 SSK request, accept the request. This is already 
the case.
- If there is no space left, reject the request. This is also already the case.
- Otherwise, consider the frequencies of recently accepted requests. Decide 
whether to accept based on maintaining a target ratio. That target ratio should 
be 1/1/1/1 for the equally-likely-to-accept-any-type that we aim for at the 
moment. If we want to prefer inserts, we could change it to 1/1/3/3.

With backoff:

IMHO rejections are more likely to be the culprit. There just isn't that much 
backoff any more. However, we could allow an insert to be routed to a backed 
off peer provided the backoff-time-remaining is under some arbitrary threshold.

Now, can we test these proposals? Yes.

We need a new MHK tester to get more data, and determine whether triple 
insertion still helps a lot. IMHO there is no obvious reason why it would have 
degenerated. We need to insert and request a larger number of blocks (rather 
than 

[freenet-dev] Improving insert persistence

2010-06-29 Thread Evan Daniel
On Tue, Jun 29, 2010 at 11:45 AM, Robert Hailey
 wrote:
>
> On Jun 29, 2010, at 8:26 AM, Matthew Toseland wrote:
>
>> Tests showed ages ago that triple inserting the same block gives 90%+
>> persistence after a week instead of 70%+. There is a (probably statistically
>> insignificant) improvement relative even to inserting 3 separate blocks.
>> I had thought it was due to not forking on cacheable, but we fork on
>> cacheable now and the numbers are the same.
>> [...]
>>
>> With backoff:
>>
>> IMHO rejections are more likely to be the culprit. There just isn't that
>> much backoff any more. However, we could allow an insert to be routed to a
>> backed off peer provided the backoff-time-remaining is under some arbitrary
>> threshold.
>>
>> Now, can we test these proposals? Yes.
>>
>> We need a new MHK tester to get more data, and determine whether triple
>> insertion still helps a lot. IMHO there is no obvious reason why it would
>> have degenerated. We need to insert and request a larger number of blocks
>> (rather than 3+1 per day), and we need to test with fork on cacheable vs
>> without it. We should probably also use a 2 week period rather than a 1 week
>> period, to get more detailed numbers. However, we can add two more
>> per-insert flags which we could test:
>> - Ignore low backoff: If enabled, route inserts to nodes with backoff time
>> remaining under some threshold. This is easy to implement.
>> - Prefer inserts: If enabled, target a 1/1/3/3 ratio rather than a 1/1/1/1
>> ratio. To implement this using the current kludge, we would need to deduct
>> the space used by 2 inserts of each type from the space used, when we are
>> considering whether to accept an insert. However IMHO the current kludge
>> probably doesn't work very well. It would likely be better to change it as
>> above, then we could just have a different target ratio. But for testing
>> purposes we could reasonably just try the kludge.
>>
>> Of course, the real solution is probably to rework load management so we
>> don't misroute, or misroute much less (especially on inserts).
>
> About persistence... it logically must be confined to these areas.
>
> 1) insertion logic
> 2) network change over time
> 3) fetch logic
>
> If there is a major issue with 2 or 3, then beefing up 1 may not be a "good"
> solution. Then again, I like your ideas more than just chalking it up to
> "bad network topology"...

Bad topology is not confined to those areas.  The insert / fetch logic
can be locally correct, and the network static, and bad topology will
still produce poor performance.

Evan Daniel



[freenet-dev] Improving insert persistence

2010-06-29 Thread Robert Hailey

On Jun 29, 2010, at 8:26 AM, Matthew Toseland wrote:

> Tests showed ages ago that triple inserting the same block gives 90% 
> + persistence after a week instead of 70%+. There is a (probably  
> statistically insignificant) improvement relative even to inserting  
> 3 separate blocks.
> I had thought it was due to not forking on cacheable, but we fork on  
> cacheable now and the numbers are the same.
> [...]
>
> With backoff:
>
> IMHO rejections are more likely to be the culprit. There just isn't  
> that much backoff any more. However, we could allow an insert to be  
> routed to a backed off peer provided the backoff-time-remaining is  
> under some arbitrary threshold.
>
> Now, can we test these proposals? Yes.
>
> We need a new MHK tester to get more data, and determine whether  
> triple insertion still helps a lot. IMHO there is no obvious reason  
> why it would have degenerated. We need to insert and request a  
> larger number of blocks (rather than 3+1 per day), and we need to  
> test with fork on cacheable vs without it. We should probably also  
> use a 2 week period rather than a 1 week period, to get more  
> detailed numbers. However, we can add two more per-insert flags  
> which we could test:
> - Ignore low backoff: If enabled, route inserts to nodes with  
> backoff time remaining under some threshold. This is easy to  
> implement.
> - Prefer inserts: If enabled, target a 1/1/3/3 ratio rather than a  
> 1/1/1/1 ratio. To implement this using the current kludge, we would  
> need to deduct the space used by 2 inserts of each type from the  
> space used, when we are considering whether to accept an insert.  
> However IMHO the current kludge probably doesn't work very well. It  
> would likely be better to change it as above, then we could just  
> have a different target ratio. But for testing purposes we could  
> reasonably just try the kludge.
>
> Of course, the real solution is probably to rework load management  
> so we don't misroute, or misroute much less (especially on inserts).

About persistence... it logically must be confined to these areas.

1) insertion logic
2) network change over time
3) fetch logic

If there is a major issue with 2 or 3, then beefing up 1 may not be a  
"good" solution. Then again, I like your ideas more than just chalking  
it up to "bad network topology"...

--
Robert Hailey





[freenet-dev] Improving insert persistence

2010-06-29 Thread Matthew Toseland
Tests showed ages ago that triple inserting the same block gives 90%+ 
persistence after a week instead of 70%+. There is a (probably statistically 
insignificant) improvement relative even to inserting 3 separate blocks.
I had thought it was due to not forking on cacheable, but we fork on cacheable 
now and the numbers are the same.

Unfortunately I don't have any data more recent than 2010-05-03, due mostly to 
build problems and other bugs resulting from local changes. However, the fork 
on cacheable change went in way before that. Here is the output for all the 
tests up to that  day:

Lines where insert failed or no fetch: too short: 10 broken: 0 no number: 1 no 
url: 0 no fetch 48
Total attempts where insert succeeded and fetch executed: 110
Single keys succeeded: 108
MHKs succeeded: 104
Single key individual fetches: 330
Single key individual fetches succeeded: 258
Success rate for individual keys (from MHK inserts): 0.7818181818181819
Success rate for the single key triple inserted: 0.9818181818181818
Success rate for the MHK (success = any of the 3 different keys worked): 
0.9454545454545454

If these figures are accurate, a single key inserted once has a half life of 
approx 3 weeks; a single key inserted 3 times has a half life of approx 38 
weeks (an earlier stat was 0.96, which is 17 weeks); an MHK has a half life of 
13 weeks.

Now, there is not enough data here, I need to write a new tester which compares 
with the fork on cacheable flag enabled to it disabled and fetches a lot more 
blocks. In fact it turns out we have only fetched data up to 2010/03/17. :| 

So the most immediate thing is to rewrite the insert tester, make it insert 
more blocks, make it compare with the fork on cacheable flag to without it, 
make absolutely sure it works, make sure it emails me whenever it can't run etc.

But lets suppose for sake of argument that the above figures are not greatly 
affected by fork on cacheable.

Is this at least plausible?

Well, fork on cacheable was enabled by default on Feb 6th. We have a 1 week 
interval, and the tester runs whatever I'm working on, so data from the 14th 
should be interesting. Out of this period we have 32 successes and 5 failures, 
for a block level success rate of 86%. So fork on cacheable seems to have 
helped a bit, but we need more data to be sure. To break it down:

Blocks inserted once for MHK test: 4 failed 23 succeeded = 85%
Block triple-inserted: 1 failed 9 succeeded = 90%

This seems pretty inconclusive. So we really need more data.

I do however have a proposal based on the assumption that triple-insert still 
makes a big difference. Lets assume it does, and then see what follows...

Firstly, inserts do not create failure table entries. So they are always routed 
the same. In the few minutes it takes to insert a block 3 times (tests were 
always with single blocks so straight after the first insert), the set of peers 
is unlikely to have changed much. That leaves:
- Backoff.
- Rejections.

Either resulting in the data not being accepted, either not at enough nodes, or 
not at the nodes where other nodes will look for the data.

The solution would seem to be to emulate triple insertion by making inserts 3 
times as likely to be accepted.

With rejections:

Some reject reasons are randomised. We can roll the dice 3 times for them.

But output bandwidth liability limiting is the big one. How do we adapt it to 
favour inserts, as evanbd originally suggested? A larger window for inserts 
might be an option. IMHO the best solution is not to hack it with window sizes 
but to target a ratio:
- Track the average number of requests of each type (CHK/SSK req/ins) accepted 
recently. (This should not be just tracking hits over the last minute because 
that will result in oscillation; I prefer taking the interval between accepts, 
converting it to a frequency, taking the log and feeding it to a klein filter).
- If the space left for more requests is less than 1 CHK insert, 1 SSK 
insert, 1 CHK request and 1 SSK request, accept the request. This is already 
the case.
- If there is no space left, reject the request. This is also already the case.
- Otherwise, consider the frequencies of recently accepted requests. Decide 
whether to accept based on maintaining a target ratio. That target ratio should 
be 1/1/1/1 for the equally-likely-to-accept-any-type that we aim for at the 
moment. If we want to prefer inserts, we could change it to 1/1/3/3.

With backoff:

IMHO rejections are more likely to be the culprit. There just isn't that much 
backoff any more. However, we could allow an insert to be routed to a backed 
off peer provided the backoff-time-remaining is under some arbitrary threshold.

Now, can we test these proposals? Yes.

We need a new MHK tester to get more data, and determine whether triple 
insertion still helps a lot. IMHO there is no obvious reason why it would have 
degenerated. We need to insert and request a larger number of blocks (rather 
than 3+1 

Re: [freenet-dev] Improving insert persistence

2010-06-29 Thread Robert Hailey


On Jun 29, 2010, at 8:26 AM, Matthew Toseland wrote:

Tests showed ages ago that triple inserting the same block gives 90% 
+ persistence after a week instead of 70%+. There is a (probably  
statistically insignificant) improvement relative even to inserting  
3 separate blocks.
I had thought it was due to not forking on cacheable, but we fork on  
cacheable now and the numbers are the same.

[...]

With backoff:

IMHO rejections are more likely to be the culprit. There just isn't  
that much backoff any more. However, we could allow an insert to be  
routed to a backed off peer provided the backoff-time-remaining is  
under some arbitrary threshold.


Now, can we test these proposals? Yes.

We need a new MHK tester to get more data, and determine whether  
triple insertion still helps a lot. IMHO there is no obvious reason  
why it would have degenerated. We need to insert and request a  
larger number of blocks (rather than 3+1 per day), and we need to  
test with fork on cacheable vs without it. We should probably also  
use a 2 week period rather than a 1 week period, to get more  
detailed numbers. However, we can add two more per-insert flags  
which we could test:
- Ignore low backoff: If enabled, route inserts to nodes with  
backoff time remaining under some threshold. This is easy to  
implement.
- Prefer inserts: If enabled, target a 1/1/3/3 ratio rather than a  
1/1/1/1 ratio. To implement this using the current kludge, we would  
need to deduct the space used by 2 inserts of each type from the  
space used, when we are considering whether to accept an insert.  
However IMHO the current kludge probably doesn't work very well. It  
would likely be better to change it as above, then we could just  
have a different target ratio. But for testing purposes we could  
reasonably just try the kludge.


Of course, the real solution is probably to rework load management  
so we don't misroute, or misroute much less (especially on inserts).


About persistence... it logically must be confined to these areas.

1) insertion logic
2) network change over time
3) fetch logic

If there is a major issue with 2 or 3, then beefing up 1 may not be a  
good solution. Then again, I like your ideas more than just chalking  
it up to bad network topology...


--
Robert Hailey


___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl


Re: [freenet-dev] Improving insert persistence

2010-06-29 Thread Evan Daniel
On Tue, Jun 29, 2010 at 11:45 AM, Robert Hailey
rob...@freenetproject.org wrote:

 On Jun 29, 2010, at 8:26 AM, Matthew Toseland wrote:

 Tests showed ages ago that triple inserting the same block gives 90%+
 persistence after a week instead of 70%+. There is a (probably statistically
 insignificant) improvement relative even to inserting 3 separate blocks.
 I had thought it was due to not forking on cacheable, but we fork on
 cacheable now and the numbers are the same.
 [...]

 With backoff:

 IMHO rejections are more likely to be the culprit. There just isn't that
 much backoff any more. However, we could allow an insert to be routed to a
 backed off peer provided the backoff-time-remaining is under some arbitrary
 threshold.

 Now, can we test these proposals? Yes.

 We need a new MHK tester to get more data, and determine whether triple
 insertion still helps a lot. IMHO there is no obvious reason why it would
 have degenerated. We need to insert and request a larger number of blocks
 (rather than 3+1 per day), and we need to test with fork on cacheable vs
 without it. We should probably also use a 2 week period rather than a 1 week
 period, to get more detailed numbers. However, we can add two more
 per-insert flags which we could test:
 - Ignore low backoff: If enabled, route inserts to nodes with backoff time
 remaining under some threshold. This is easy to implement.
 - Prefer inserts: If enabled, target a 1/1/3/3 ratio rather than a 1/1/1/1
 ratio. To implement this using the current kludge, we would need to deduct
 the space used by 2 inserts of each type from the space used, when we are
 considering whether to accept an insert. However IMHO the current kludge
 probably doesn't work very well. It would likely be better to change it as
 above, then we could just have a different target ratio. But for testing
 purposes we could reasonably just try the kludge.

 Of course, the real solution is probably to rework load management so we
 don't misroute, or misroute much less (especially on inserts).

 About persistence... it logically must be confined to these areas.

 1) insertion logic
 2) network change over time
 3) fetch logic

 If there is a major issue with 2 or 3, then beefing up 1 may not be a good
 solution. Then again, I like your ideas more than just chalking it up to
 bad network topology...

Bad topology is not confined to those areas.  The insert / fetch logic
can be locally correct, and the network static, and bad topology will
still produce poor performance.

Evan Daniel
___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl


Re: [freenet-dev] Improving insert persistence

2010-06-29 Thread Matthew Toseland
On Tuesday 29 June 2010 16:54:42 Evan Daniel wrote:
 On Tue, Jun 29, 2010 at 11:45 AM, Robert Hailey
 rob...@freenetproject.org wrote:
 
  On Jun 29, 2010, at 8:26 AM, Matthew Toseland wrote:
 
  Tests showed ages ago that triple inserting the same block gives 90%+
  persistence after a week instead of 70%+. There is a (probably 
  statistically
  insignificant) improvement relative even to inserting 3 separate blocks.
  I had thought it was due to not forking on cacheable, but we fork on
  cacheable now and the numbers are the same.
  [...]
 
  With backoff:
 
  IMHO rejections are more likely to be the culprit. There just isn't that
  much backoff any more. However, we could allow an insert to be routed to a
  backed off peer provided the backoff-time-remaining is under some arbitrary
  threshold.
 
  Now, can we test these proposals? Yes.
 
  We need a new MHK tester to get more data, and determine whether triple
  insertion still helps a lot. IMHO there is no obvious reason why it would
  have degenerated. We need to insert and request a larger number of blocks
  (rather than 3+1 per day), and we need to test with fork on cacheable vs
  without it. We should probably also use a 2 week period rather than a 1 
  week
  period, to get more detailed numbers. However, we can add two more
  per-insert flags which we could test:
  - Ignore low backoff: If enabled, route inserts to nodes with backoff time
  remaining under some threshold. This is easy to implement.
  - Prefer inserts: If enabled, target a 1/1/3/3 ratio rather than a 1/1/1/1
  ratio. To implement this using the current kludge, we would need to deduct
  the space used by 2 inserts of each type from the space used, when we are
  considering whether to accept an insert. However IMHO the current kludge
  probably doesn't work very well. It would likely be better to change it as
  above, then we could just have a different target ratio. But for testing
  purposes we could reasonably just try the kludge.
 
  Of course, the real solution is probably to rework load management so we
  don't misroute, or misroute much less (especially on inserts).
 
  About persistence... it logically must be confined to these areas.
 
  1) insertion logic
  2) network change over time
  3) fetch logic
 
  If there is a major issue with 2 or 3, then beefing up 1 may not be a good
  solution. Then again, I like your ideas more than just chalking it up to
  bad network topology...
 
 Bad topology is not confined to those areas.  The insert / fetch logic
 can be locally correct, and the network static, and bad topology will
 still produce poor performance.

True but opennet should produce good topology shouldn't it? Generally the stats 
page seems to suggest routing is working?


signature.asc
Description: This is a digitally signed message part.
___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Improving insert persistence

2010-06-29 Thread Evan Daniel
On Tue, Jun 29, 2010 at 5:15 PM, Matthew Toseland
t...@amphibian.dyndns.org wrote:
 On Tuesday 29 June 2010 16:54:42 Evan Daniel wrote:
 On Tue, Jun 29, 2010 at 11:45 AM, Robert Hailey
 rob...@freenetproject.org wrote:
 
  On Jun 29, 2010, at 8:26 AM, Matthew Toseland wrote:
 
  Tests showed ages ago that triple inserting the same block gives 90%+
  persistence after a week instead of 70%+. There is a (probably 
  statistically
  insignificant) improvement relative even to inserting 3 separate blocks.
  I had thought it was due to not forking on cacheable, but we fork on
  cacheable now and the numbers are the same.
  [...]
 
  With backoff:
 
  IMHO rejections are more likely to be the culprit. There just isn't that
  much backoff any more. However, we could allow an insert to be routed to a
  backed off peer provided the backoff-time-remaining is under some 
  arbitrary
  threshold.
 
  Now, can we test these proposals? Yes.
 
  We need a new MHK tester to get more data, and determine whether triple
  insertion still helps a lot. IMHO there is no obvious reason why it would
  have degenerated. We need to insert and request a larger number of blocks
  (rather than 3+1 per day), and we need to test with fork on cacheable vs
  without it. We should probably also use a 2 week period rather than a 1 
  week
  period, to get more detailed numbers. However, we can add two more
  per-insert flags which we could test:
  - Ignore low backoff: If enabled, route inserts to nodes with backoff time
  remaining under some threshold. This is easy to implement.
  - Prefer inserts: If enabled, target a 1/1/3/3 ratio rather than a 1/1/1/1
  ratio. To implement this using the current kludge, we would need to deduct
  the space used by 2 inserts of each type from the space used, when we are
  considering whether to accept an insert. However IMHO the current kludge
  probably doesn't work very well. It would likely be better to change it as
  above, then we could just have a different target ratio. But for testing
  purposes we could reasonably just try the kludge.
 
  Of course, the real solution is probably to rework load management so we
  don't misroute, or misroute much less (especially on inserts).
 
  About persistence... it logically must be confined to these areas.
 
  1) insertion logic
  2) network change over time
  3) fetch logic
 
  If there is a major issue with 2 or 3, then beefing up 1 may not be a 
  good
  solution. Then again, I like your ideas more than just chalking it up to
  bad network topology...

 Bad topology is not confined to those areas.  The insert / fetch logic
 can be locally correct, and the network static, and bad topology will
 still produce poor performance.

 True but opennet should produce good topology shouldn't it? Generally the 
 stats page seems to suggest routing is working?


True in theory.  Stats page suggests routing basically works, and is
not inconsistent with good overall topology.  I have enough data from
probe requests to do serious topology analysis, but have not yet done
so.  At this point I would say that the topology is assumed to be
good, but that we aren't completely certain.

Evan Daniel
___
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl