[freenet-dev] Freenet Summer of Code wrap-up

2009-10-31 Thread Matthew Toseland
Apologies for being the absolute last wrap-up! This year went really well, we 
had 5 students, and they all (more or less) deserved their passes. We had a 
much stronger selection process than in the first 2 years, requiring some 
demonstration in the form of code: a new feature or a bugfix. So even though 
most of the students were completely new to us, mostly they did pretty well. 
One of our students had a work conflict, but this was resolved satisfactorily.

For the Googlers reading this, Freenet is an anonymous peer to peer system with 
support for forums, browsing the internal web, filesharing etc, with a focus on 
security and the option of running in "darknet" or friend-to-friend mode. It is 
intended (at least by me) for people in hostile environments (China, Iran etc) 
to express themselves freely, but it is currently mostly used in the West by 
geeks, filesharers, etc.

infinity0 and mikeb worked together on a new searching plugin, which we have 
now deployed. infinity0's work was primarily on a new index format (which 
works, but the spider needs more work), and on distributed indexing (which 
doesn't yet), and mikeb mainly worked on improving the user interface and 
adding essential features (simple non-infringing page ranking algorithm, 
booleans, phrase matches etc). kurmi worked on new filtering code for various 
formats, particularly a vastly improved CSS filter, which needed considerable 
work to sort out all the parsing perversities but is now merged (Freenet has to 
be very careful not to send anything to the browser that might give the user's 
IP away via a web-bug). ljb worked on more friend-to-friend functionality, his 
work is included in current builds. sashee worked on making the web interface 
more dynamic, including solving a long-running problem with image loading 
blocking the browser (freenet has quite high latency!), using Google Web 
Toolkit; this branch has not yet been merged, but hopefully will be inside the 
next 6 months or so, it needs some tuning and debugging for slow browsers. 

Some of our students achieved less than expected IMHO but in more cases there 
was a lot more work involved in the project than I expected, and the students 
did really well. I have talked to most of them in the last month, well after 
the programme was finished, and hopefully some of them will continue to 
contribute at least occasionally. Best year yet, many thanks to Google!
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20091031/588c94e5/attachment.pgp>


[freenet-dev] New version of XMLSpider

2009-10-31 Thread Matthew Toseland
XMLSpider version 42 is now available. This build was released from the master 
branch, not the temporary stable branch which the previous build came from, but 
there isn't much difference between them now. **Please let me know if it 
doesn't work!** Version 41 doesn't work with 1238 because of an API change 
relating to content filtering.
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20091031/da784c44/attachment.pgp>


[freenet-dev] Update on getting rid of emu

2009-10-31 Thread bo-le
Am Samstag, 31. Oktober 2009 17:28:38 schrieb Zero3:
> Matthew Toseland wrote:
> > On Friday 30 October 2009 17:10:02 Zero3 wrote:
> >> Matthew Toseland wrote:
> >>> If this line of reasoning is correct, we need to choose an
> >>> end-user-oriented issue tracker or forums system (either way ideally
> >>> gratis and hosted) to complement Uservoice. Suggestions?
> >>
> >> It would make sense to find a tracker that both users and devs can use.
> >> Saves the overhead of moving things from e.g. a forums system to a bug
> >> tracker.
> >
> > Is it possible? Is Trac something that end users can use?
> 
> I don't think Trac is the perfect solution (not as it is right now, at
> least), but it seems much better than our current solution (Mantis +
> Wikka Wakka).
> 
> Pidgin (see http://developer.pidgin.im/) has an interesting
> implementation directly into their website. The bar at the top contains
> easy access to some of the most used features: Wiki (starts here),
> Timeline (aka "what's new?"), Roadmap and Search. It is possible to
> create other things like "New ticket" and "Browse source" it seems, if
> you look at the official trac site (http://trac.edgewall.org/).
> 
> I don't have any personal experience with Trac though. Perhaps someone
> else around here has, and can give us some recommendations?
> 
this may fit our needs better then trac: http://basieproject.org/

> - Zero3
> ___
> Devl mailing list
> Devl at freenetproject.org
> http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
> 



[freenet-dev] Update on getting rid of emu

2009-10-31 Thread Zero3
Matthew Toseland wrote:
> On Friday 30 October 2009 17:10:02 Zero3 wrote:
>> Matthew Toseland wrote:
>>> If this line of reasoning is correct, we need to choose an 
>>> end-user-oriented issue tracker or forums system (either way ideally gratis 
>>> and hosted) to complement Uservoice. Suggestions?
>> It would make sense to find a tracker that both users and devs can use. 
>> Saves the overhead of moving things from e.g. a forums system to a bug 
>> tracker.
> 
> Is it possible? Is Trac something that end users can use?

I don't think Trac is the perfect solution (not as it is right now, at 
least), but it seems much better than our current solution (Mantis + 
Wikka Wakka).

Pidgin (see http://developer.pidgin.im/) has an interesting 
implementation directly into their website. The bar at the top contains 
easy access to some of the most used features: Wiki (starts here), 
Timeline (aka "what's new?"), Roadmap and Search. It is possible to 
create other things like "New ticket" and "Browse source" it seems, if 
you look at the official trac site (http://trac.edgewall.org/).

I don't have any personal experience with Trac though. Perhaps someone 
else around here has, and can give us some recommendations?

- Zero3



[freenet-dev] Solution for now...

2009-10-31 Thread Matthew Toseland
r the bundle's 
> > lifetime. 10 bundles each renewed once an hour beats hundreds of requests 
> > per hour! Long-lived bundles would probably have to automatically move to 
> > new nodes, and therefore could perhaps be traced back to source eventually 
> > - if the attacker managed to hook one, or more likely trace a stream of 
> > requests back to one.
> > 
> > Bundling is a lot more work, a lot more tuning, but of course more secure. 
> > It would replace the current no cache for a few hops, and would still check 
> > the local datastore.
> > 
> > Encrypted tunnels are a further evolution of bundling: We send out various 
> > randomly routed "anchors", which rendezvous to create a tunnel, which is a 
> > short encrypted (using a shared secret scheme) path to a random start node. 
> > This has most of the same issues as bundling, although it doesn't check the 
> > local datastore, and it provides a reasonable degree of protection against 
> > relatively nearby attackers.
> > 
> > Note that if Mallory cannot connect the requests, he can do very little. 
> > Randomising inserted data encryption keys will help a lot, but it is tricky 
> > and expensive with reinserts and is impossible with requests. We could use 
> > tunnels, random routing etc only on the top block etc, but they would still 
> > need to be not cached on the originator and therefore the next few nodes 
> > too.
> > 
> Okay, I propose to implement the following:
> - On all requests, when the HTL goes low enough that we start caching, we 
> should allow the request to come back to those nodes which it has already 
> visited, in case they are ideal/sink nodes for the request. Most likely this 
> will be implemented by creating a new UID for the request. This solves the 
> performance problem.

In fact, this is only necessary on inserts. Requests don't need to go back over 
the nodes they've already been to. Or do they, for better caching if it's found 
later?


> - A flag on requests at high HTL. If set, this will cause the request or 
> insert to be random-routed as well as not cached while in high HTL. Of 
> course, we will still check the datastore, as we do now. The flag initially 
> will be a config option and not enabled by any security level, so we can do 
> some performance testing. In future it will hopefully be enabled at seclevel 
> NORMAL and above. The flag will disappear after we go to low HTL, so only 
> gives away the seclevel while the request is at high HTL. This improves 
> security for the more paranoid, at a small performance cost.
> 
> In future, instead of random routing for individual requests, we should 
> implement bundling, and eventually encrypted tunnels, which might be used 
> only on some requests at certain seclevels. However, this will be 
> considerably more work because of the tuning needed.
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20091031/6a605e81/attachment.pgp>


[freenet-dev] Solution for now...

2009-10-31 Thread Matthew Toseland
ng, but of course more secure. It 
> would replace the current no cache for a few hops, and would still check the 
> local datastore.
> 
> Encrypted tunnels are a further evolution of bundling: We send out various 
> randomly routed "anchors", which rendezvous to create a tunnel, which is a 
> short encrypted (using a shared secret scheme) path to a random start node. 
> This has most of the same issues as bundling, although it doesn't check the 
> local datastore, and it provides a reasonable degree of protection against 
> relatively nearby attackers.
> 
> Note that if Mallory cannot connect the requests, he can do very little. 
> Randomising inserted data encryption keys will help a lot, but it is tricky 
> and expensive with reinserts and is impossible with requests. We could use 
> tunnels, random routing etc only on the top block etc, but they would still 
> need to be not cached on the originator and therefore the next few nodes too.
> 
Okay, I propose to implement the following:
- On all requests, when the HTL goes low enough that we start caching, we 
should allow the request to come back to those nodes which it has already 
visited, in case they are ideal/sink nodes for the request. Most likely this 
will be implemented by creating a new UID for the request. This solves the 
performance problem.
- A flag on requests at high HTL. If set, this will cause the request or insert 
to be random-routed as well as not cached while in high HTL. Of course, we will 
still check the datastore, as we do now. The flag initially will be a config 
option and not enabled by any security level, so we can do some performance 
testing. In future it will hopefully be enabled at seclevel NORMAL and above. 
The flag will disappear after we go to low HTL, so only gives away the seclevel 
while the request is at high HTL. This improves security for the more paranoid, 
at a small performance cost.

In future, instead of random routing for individual requests, we should 
implement bundling, and eventually encrypted tunnels, which might be used only 
on some requests at certain seclevels. However, this will be considerably more 
work because of the tuning needed.
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20091031/b4551b1c/attachment.pgp>


[freenet-dev] Freenet Summer of Code wrap-up

2009-10-31 Thread Matthew Toseland
Apologies for being the absolute last wrap-up! This year went really well, we 
had 5 students, and they all (more or less) deserved their passes. We had a 
much stronger selection process than in the first 2 years, requiring some 
demonstration in the form of code: a new feature or a bugfix. So even though 
most of the students were completely new to us, mostly they did pretty well. 
One of our students had a work conflict, but this was resolved satisfactorily.

For the Googlers reading this, Freenet is an anonymous peer to peer system with 
support for forums, browsing the internal web, filesharing etc, with a focus on 
security and the option of running in "darknet" or friend-to-friend mode. It is 
intended (at least by me) for people in hostile environments (China, Iran etc) 
to express themselves freely, but it is currently mostly used in the West by 
geeks, filesharers, etc.

infinity0 and mikeb worked together on a new searching plugin, which we have 
now deployed. infinity0's work was primarily on a new index format (which 
works, but the spider needs more work), and on distributed indexing (which 
doesn't yet), and mikeb mainly worked on improving the user interface and 
adding essential features (simple non-infringing page ranking algorithm, 
booleans, phrase matches etc). kurmi worked on new filtering code for various 
formats, particularly a vastly improved CSS filter, which needed considerable 
work to sort out all the parsing perversities but is now merged (Freenet has to 
be very careful not to send anything to the browser that might give the user's 
IP away via a web-bug). ljb worked on more friend-to-friend functionality, his 
work is included in current builds. sashee worked on making the web interface 
more dynamic, including solving a long-running problem with image loading 
blocking the browser (freenet has quite high latency!), using Google Web 
Toolkit; this branch has not yet been merged, but hopefully will be inside the 
next 6 months or so, it needs some tuning and debugging for slow browsers. 

Some of our students achieved less than expected IMHO but in more cases there 
was a lot more work involved in the project than I expected, and the students 
did really well. I have talked to most of them in the last month, well after 
the programme was finished, and hopefully some of them will continue to 
contribute at least occasionally. Best year yet, many thanks to Google!


signature.asc
Description: This is a digitally signed message part.
___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

[freenet-dev] New version of XMLSpider

2009-10-31 Thread Matthew Toseland
XMLSpider version 42 is now available. This build was released from the master 
branch, not the temporary stable branch which the previous build came from, but 
there isn't much difference between them now. **Please let me know if it 
doesn't work!** Version 41 doesn't work with 1238 because of an API change 
relating to content filtering.


signature.asc
Description: This is a digitally signed message part.
___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

[freenet-dev] Current security measures may be harming performance; better measures may help

2009-10-31 Thread Ed Tomlinson
On Friday 30 October 2009 21:19:18 Matthew Toseland wrote:
> Currently, requests are always routed the same way, but at high HTL we do not 
> cache either replies to requests or incoming inserts.
> 
> Specifically, at HTL 18 and 17 we do not cache returned data from requests 
> (though we do check the datastore), and at HTL 18, 17 and 16 we do not cache 
> data from inserts. On average we spend 2 hops at HTL 18, including the 
> originator, so on average for an insert it is 4 hops before we cache, with a 
> minimum of 3 (or is it a minimum of 2? afaics we start at htl 18 and then we 
> may decrement it when sending to the next hop, so a minimum of 3).
> 
> Decrement at HTL 18 is probabilistic, with a 50% probability.
> 
> Simulations suggest that the "ideal" node is likely found around HTL 14 to 
> 15. So a significant proportion of requests and inserts will go past it while 
> still in the no caching phase. This may partly explain poor data retention, 
> which appears to affect some proportion of keys much more than the others.
> 
> Hence we might get better data retention if we e.g. random routed while in 
> the no-cache phase.
> 
> But here is another reason for random routing while in the no-cache phase:
> 
> Lets assume that we only care about remote attackers. Generally they are much 
> more scary. So we are talking about the mobile attacker source tracing 
> attack. This means that a bad guy is a long way away, and he gets a few 
> requests by chance which were part of the same splitfile insert or request 
> originated by you. He is able to determine that they are part of the same, 
> interesting, splitfile. For each request, he knows 1) that it was routed to 
> him, and 2) its target location. He can thus determine where on the keyspace 
> the request could have come from. This is a big vague due to backoff etc, but 
> he can nonetheless identify an area where the originator is most likely 
> present, starting at his location and extending in one direction or the 
> other. In fact, he can identify the opposite end of it as the most likely 
> location of the originator. So he then tries to get peers closer to this 
> location, by announcement, path folding, changing his own location etc. If he 
> is right, he will then get requests from this source much more quickly. And 
> so he can keep on moving until he reaches the originator. It has been 
> suggested that we could mark requests so that they will not be routed to new 
> connections - the problem is this doesn't work for long-lived requests e.g. 
> big inserts.
> 
> The number of samples the attacker gets is proportional to the number of hops 
> from the originator to the "ideal" node, on average, since samples after the 
> "ideal" are much less informative. It is also proportional to the number of 
> requests sent, and inversely to the size of the network.
> 
> Random routing while the HTL is high, not to any specific location but to a 
> random peer at each hop (subject to e.g. backoff), would make the pre-ideal 
> samples much less useful, because they will each have effectively started at 
> a random node - not a truly random node, especially if we route randomly at 
> each hop, we won't have had enough hops for it to be a random node across the 
> whole keyspace, but it will still mean the picture is much more vague, and 
> the attacker will need a lot more samples. The post-ideal sample remains 
> useless. If the request reaches the attacker while it is still in the random 
> routing phase, this provides a useful sample to the attacker, but likely much 
> less useful than in the routed stage.
> 
> So, just maybe, we could improve data persistence (if not necessarily overall 
> performance), and maintain the current no-cache-at-high-htl, and improve 
> security, by random routing as well as not caching while HTL is high. Worth 
> simulating perhaps?
> 
> The next obvious solution is some form of bundling: Even if the bundle is not 
> encrypted, routing a large bunch of requests together for some distance gives 
> one sample instead of many. Short-lived bundles have the disadvantage that 
> there are many of them so the attacker gets more samples if they happen to 
> cross his path. However, we could do the don't-route-to-newbies trick with 
> short-lived bundles, using a fixed path for the bundle's lifetime. 10 bundles 
> each renewed once an hour beats hundreds of requests per hour! Long-lived 
> bundles would probably have to automatically move to new nodes, and therefore 
> could perhaps be traced back to source eventually - if the attacker managed 
> to hook one, or more likely trace a stream of requests back to one.
> 
> Bundling is a lot more work, a lot more tuning, but of course more secure. It 
> would replace the current no cache for a few hops, and would still check the 
> local datastore.
> 
> Encrypted tunnels are a further evolution of bundling: We send out various 
> randomly routed "anchors", which rendezvous to create a tunn

Re: [freenet-dev] Update on getting rid of emu

2009-10-31 Thread bo-le
Am Samstag, 31. Oktober 2009 17:28:38 schrieb Zero3:
> Matthew Toseland wrote:
> > On Friday 30 October 2009 17:10:02 Zero3 wrote:
> >> Matthew Toseland wrote:
> >>> If this line of reasoning is correct, we need to choose an
> >>> end-user-oriented issue tracker or forums system (either way ideally
> >>> gratis and hosted) to complement Uservoice. Suggestions?
> >>
> >> It would make sense to find a tracker that both users and devs can use.
> >> Saves the overhead of moving things from e.g. a forums system to a bug
> >> tracker.
> >
> > Is it possible? Is Trac something that end users can use?
> 
> I don't think Trac is the perfect solution (not as it is right now, at
> least), but it seems much better than our current solution (Mantis +
> Wikka Wakka).
> 
> Pidgin (see http://developer.pidgin.im/) has an interesting
> implementation directly into their website. The bar at the top contains
> easy access to some of the most used features: Wiki (starts here),
> Timeline (aka "what's new?"), Roadmap and Search. It is possible to
> create other things like "New ticket" and "Browse source" it seems, if
> you look at the official trac site (http://trac.edgewall.org/).
> 
> I don't have any personal experience with Trac though. Perhaps someone
> else around here has, and can give us some recommendations?
> 
this may fit our needs better then trac: http://basieproject.org/
 
> - Zero3
> ___
> Devl mailing list
> Devl@freenetproject.org
> http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
> 
___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl


Re: [freenet-dev] Solution for now...

2009-10-31 Thread Matthew Toseland
On Saturday 31 October 2009 15:47:07 Matthew Toseland wrote:
> On Saturday 31 October 2009 01:19:18 Matthew Toseland wrote:
> > Currently, requests are always routed the same way, but at high HTL we do 
> > not cache either replies to requests or incoming inserts.
> > 
> > Specifically, at HTL 18 and 17 we do not cache returned data from requests 
> > (though we do check the datastore), and at HTL 18, 17 and 16 we do not 
> > cache data from inserts. On average we spend 2 hops at HTL 18, including 
> > the originator, so on average for an insert it is 4 hops before we cache, 
> > with a minimum of 3 (or is it a minimum of 2? afaics we start at htl 18 and 
> > then we may decrement it when sending to the next hop, so a minimum of 3).
> > 
> > Decrement at HTL 18 is probabilistic, with a 50% probability.
> > 
> > Simulations suggest that the "ideal" node is likely found around HTL 14 to 
> > 15. So a significant proportion of requests and inserts will go past it 
> > while still in the no caching phase. This may partly explain poor data 
> > retention, which appears to affect some proportion of keys much more than 
> > the others.
> > 
> > Hence we might get better data retention if we e.g. random routed while in 
> > the no-cache phase.
> > 
> > But here is another reason for random routing while in the no-cache phase:
> > 
> > Lets assume that we only care about remote attackers. Generally they are 
> > much more scary. So we are talking about the mobile attacker source tracing 
> > attack. This means that a bad guy is a long way away, and he gets a few 
> > requests by chance which were part of the same splitfile insert or request 
> > originated by you. He is able to determine that they are part of the same, 
> > interesting, splitfile. For each request, he knows 1) that it was routed to 
> > him, and 2) its target location. He can thus determine where on the 
> > keyspace the request could have come from. This is a big vague due to 
> > backoff etc, but he can nonetheless identify an area where the originator 
> > is most likely present, starting at his location and extending in one 
> > direction or the other. In fact, he can identify the opposite end of it as 
> > the most likely location of the originator. So he then tries to get peers 
> > closer to this location, by announcement, path folding, changing his own 
> > location etc. If he is right, he will then get requests from this source 
> > much more quickly. And so he can keep on moving until he reaches the 
> > originator. It has been suggested that we could mark requests so that they 
> > will not be routed to new connections - the problem is this doesn't work 
> > for long-lived requests e.g. big inserts.
> > 
> > The number of samples the attacker gets is proportional to the number of 
> > hops from the originator to the "ideal" node, on average, since samples 
> > after the "ideal" are much less informative. It is also proportional to the 
> > number of requests sent, and inversely to the size of the network.
> > 
> > Random routing while the HTL is high, not to any specific location but to a 
> > random peer at each hop (subject to e.g. backoff), would make the pre-ideal 
> > samples much less useful, because they will each have effectively started 
> > at a random node - not a truly random node, especially if we route randomly 
> > at each hop, we won't have had enough hops for it to be a random node 
> > across the whole keyspace, but it will still mean the picture is much more 
> > vague, and the attacker will need a lot more samples. The post-ideal sample 
> > remains useless. If the request reaches the attacker while it is still in 
> > the random routing phase, this provides a useful sample to the attacker, 
> > but likely much less useful than in the routed stage.
> > 
> > So, just maybe, we could improve data persistence (if not necessarily 
> > overall performance), and maintain the current no-cache-at-high-htl, and 
> > improve security, by random routing as well as not caching while HTL is 
> > high. Worth simulating perhaps?
> > 
> > The next obvious solution is some form of bundling: Even if the bundle is 
> > not encrypted, routing a large bunch of requests together for some distance 
> > gives one sample instead of many. Short-lived bundles have the disadvantage 
> > that there are many of them so the attacker gets more samples if they 
> > happen to cross his path. However, we could do the don't-route-to-newbies 
> > trick with short-lived bundles, using a fixed path for the bundle's 
> > lifetime. 10 bundles each renewed once an hour beats hundreds of requests 
> > per hour! Long-lived bundles would probably have to automatically move to 
> > new nodes, and therefore could perhaps be traced back to source eventually 
> > - if the attacker managed to hook one, or more likely trace a stream of 
> > requests back to one.
> > 
> > Bundling is a lot more work, a lot more tuning, but of course more secure. 
> > It would replace t

Re: [freenet-dev] Update on getting rid of emu

2009-10-31 Thread Zero3
Matthew Toseland wrote:
> On Friday 30 October 2009 17:10:02 Zero3 wrote:
>> Matthew Toseland wrote:
>>> If this line of reasoning is correct, we need to choose an 
>>> end-user-oriented issue tracker or forums system (either way ideally gratis 
>>> and hosted) to complement Uservoice. Suggestions?
>> It would make sense to find a tracker that both users and devs can use. 
>> Saves the overhead of moving things from e.g. a forums system to a bug 
>> tracker.
> 
> Is it possible? Is Trac something that end users can use?

I don't think Trac is the perfect solution (not as it is right now, at 
least), but it seems much better than our current solution (Mantis + 
Wikka Wakka).

Pidgin (see http://developer.pidgin.im/) has an interesting 
implementation directly into their website. The bar at the top contains 
easy access to some of the most used features: Wiki (starts here), 
Timeline (aka "what's new?"), Roadmap and Search. It is possible to 
create other things like "New ticket" and "Browse source" it seems, if 
you look at the official trac site (http://trac.edgewall.org/).

I don't have any personal experience with Trac though. Perhaps someone 
else around here has, and can give us some recommendations?

- Zero3
___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl


[freenet-dev] Solution for now...

2009-10-31 Thread Matthew Toseland
On Saturday 31 October 2009 01:19:18 Matthew Toseland wrote:
> Currently, requests are always routed the same way, but at high HTL we do not 
> cache either replies to requests or incoming inserts.
> 
> Specifically, at HTL 18 and 17 we do not cache returned data from requests 
> (though we do check the datastore), and at HTL 18, 17 and 16 we do not cache 
> data from inserts. On average we spend 2 hops at HTL 18, including the 
> originator, so on average for an insert it is 4 hops before we cache, with a 
> minimum of 3 (or is it a minimum of 2? afaics we start at htl 18 and then we 
> may decrement it when sending to the next hop, so a minimum of 3).
> 
> Decrement at HTL 18 is probabilistic, with a 50% probability.
> 
> Simulations suggest that the "ideal" node is likely found around HTL 14 to 
> 15. So a significant proportion of requests and inserts will go past it while 
> still in the no caching phase. This may partly explain poor data retention, 
> which appears to affect some proportion of keys much more than the others.
> 
> Hence we might get better data retention if we e.g. random routed while in 
> the no-cache phase.
> 
> But here is another reason for random routing while in the no-cache phase:
> 
> Lets assume that we only care about remote attackers. Generally they are much 
> more scary. So we are talking about the mobile attacker source tracing 
> attack. This means that a bad guy is a long way away, and he gets a few 
> requests by chance which were part of the same splitfile insert or request 
> originated by you. He is able to determine that they are part of the same, 
> interesting, splitfile. For each request, he knows 1) that it was routed to 
> him, and 2) its target location. He can thus determine where on the keyspace 
> the request could have come from. This is a big vague due to backoff etc, but 
> he can nonetheless identify an area where the originator is most likely 
> present, starting at his location and extending in one direction or the 
> other. In fact, he can identify the opposite end of it as the most likely 
> location of the originator. So he then tries to get peers closer to this 
> location, by announcement, path folding, changing his own location etc. If he 
> is right, he will then get requests from this source much more quickly. And 
> so he can keep on moving until he reaches the originator. It has been 
> suggested that we could mark requests so that they will not be routed to new 
> connections - the problem is this doesn't work for long-lived requests e.g. 
> big inserts.
> 
> The number of samples the attacker gets is proportional to the number of hops 
> from the originator to the "ideal" node, on average, since samples after the 
> "ideal" are much less informative. It is also proportional to the number of 
> requests sent, and inversely to the size of the network.
> 
> Random routing while the HTL is high, not to any specific location but to a 
> random peer at each hop (subject to e.g. backoff), would make the pre-ideal 
> samples much less useful, because they will each have effectively started at 
> a random node - not a truly random node, especially if we route randomly at 
> each hop, we won't have had enough hops for it to be a random node across the 
> whole keyspace, but it will still mean the picture is much more vague, and 
> the attacker will need a lot more samples. The post-ideal sample remains 
> useless. If the request reaches the attacker while it is still in the random 
> routing phase, this provides a useful sample to the attacker, but likely much 
> less useful than in the routed stage.
> 
> So, just maybe, we could improve data persistence (if not necessarily overall 
> performance), and maintain the current no-cache-at-high-htl, and improve 
> security, by random routing as well as not caching while HTL is high. Worth 
> simulating perhaps?
> 
> The next obvious solution is some form of bundling: Even if the bundle is not 
> encrypted, routing a large bunch of requests together for some distance gives 
> one sample instead of many. Short-lived bundles have the disadvantage that 
> there are many of them so the attacker gets more samples if they happen to 
> cross his path. However, we could do the don't-route-to-newbies trick with 
> short-lived bundles, using a fixed path for the bundle's lifetime. 10 bundles 
> each renewed once an hour beats hundreds of requests per hour! Long-lived 
> bundles would probably have to automatically move to new nodes, and therefore 
> could perhaps be traced back to source eventually - if the attacker managed 
> to hook one, or more likely trace a stream of requests back to one.
> 
> Bundling is a lot more work, a lot more tuning, but of course more secure. It 
> would replace the current no cache for a few hops, and would still check the 
> local datastore.
> 
> Encrypted tunnels are a further evolution of bundling: We send out various 
> randomly routed "anchors", which rendezvous to create a tu

Re: [freenet-dev] Current security measures may be harming performance; better measures may help

2009-10-31 Thread Ed Tomlinson
On Friday 30 October 2009 21:19:18 Matthew Toseland wrote:
> Currently, requests are always routed the same way, but at high HTL we do not 
> cache either replies to requests or incoming inserts.
> 
> Specifically, at HTL 18 and 17 we do not cache returned data from requests 
> (though we do check the datastore), and at HTL 18, 17 and 16 we do not cache 
> data from inserts. On average we spend 2 hops at HTL 18, including the 
> originator, so on average for an insert it is 4 hops before we cache, with a 
> minimum of 3 (or is it a minimum of 2? afaics we start at htl 18 and then we 
> may decrement it when sending to the next hop, so a minimum of 3).
> 
> Decrement at HTL 18 is probabilistic, with a 50% probability.
> 
> Simulations suggest that the "ideal" node is likely found around HTL 14 to 
> 15. So a significant proportion of requests and inserts will go past it while 
> still in the no caching phase. This may partly explain poor data retention, 
> which appears to affect some proportion of keys much more than the others.
> 
> Hence we might get better data retention if we e.g. random routed while in 
> the no-cache phase.
> 
> But here is another reason for random routing while in the no-cache phase:
> 
> Lets assume that we only care about remote attackers. Generally they are much 
> more scary. So we are talking about the mobile attacker source tracing 
> attack. This means that a bad guy is a long way away, and he gets a few 
> requests by chance which were part of the same splitfile insert or request 
> originated by you. He is able to determine that they are part of the same, 
> interesting, splitfile. For each request, he knows 1) that it was routed to 
> him, and 2) its target location. He can thus determine where on the keyspace 
> the request could have come from. This is a big vague due to backoff etc, but 
> he can nonetheless identify an area where the originator is most likely 
> present, starting at his location and extending in one direction or the 
> other. In fact, he can identify the opposite end of it as the most likely 
> location of the originator. So he then tries to get peers closer to this 
> location, by announcement, path folding, changing his own location etc. If he 
> is right, he will then get requests from this source much more quickly. And 
> so he can keep on moving until he reaches the originator. It has been 
> suggested that we could mark requests so that they will not be routed to new 
> connections - the problem is this doesn't work for long-lived requests e.g. 
> big inserts.
> 
> The number of samples the attacker gets is proportional to the number of hops 
> from the originator to the "ideal" node, on average, since samples after the 
> "ideal" are much less informative. It is also proportional to the number of 
> requests sent, and inversely to the size of the network.
> 
> Random routing while the HTL is high, not to any specific location but to a 
> random peer at each hop (subject to e.g. backoff), would make the pre-ideal 
> samples much less useful, because they will each have effectively started at 
> a random node - not a truly random node, especially if we route randomly at 
> each hop, we won't have had enough hops for it to be a random node across the 
> whole keyspace, but it will still mean the picture is much more vague, and 
> the attacker will need a lot more samples. The post-ideal sample remains 
> useless. If the request reaches the attacker while it is still in the random 
> routing phase, this provides a useful sample to the attacker, but likely much 
> less useful than in the routed stage.
> 
> So, just maybe, we could improve data persistence (if not necessarily overall 
> performance), and maintain the current no-cache-at-high-htl, and improve 
> security, by random routing as well as not caching while HTL is high. Worth 
> simulating perhaps?
> 
> The next obvious solution is some form of bundling: Even if the bundle is not 
> encrypted, routing a large bunch of requests together for some distance gives 
> one sample instead of many. Short-lived bundles have the disadvantage that 
> there are many of them so the attacker gets more samples if they happen to 
> cross his path. However, we could do the don't-route-to-newbies trick with 
> short-lived bundles, using a fixed path for the bundle's lifetime. 10 bundles 
> each renewed once an hour beats hundreds of requests per hour! Long-lived 
> bundles would probably have to automatically move to new nodes, and therefore 
> could perhaps be traced back to source eventually - if the attacker managed 
> to hook one, or more likely trace a stream of requests back to one.
> 
> Bundling is a lot more work, a lot more tuning, but of course more secure. It 
> would replace the current no cache for a few hops, and would still check the 
> local datastore.
> 
> Encrypted tunnels are a further evolution of bundling: We send out various 
> randomly routed "anchors", which rendezvous to create a tunn

[freenet-dev] Update on getting rid of emu

2009-10-31 Thread Matthew Toseland
On Friday 30 October 2009 19:21:51 xor wrote:
> On Friday 30 October 2009 16:29:57 Matthew Toseland wrote:
> >
> > If we don't keep the bug tracker:
> 
> > - We will need to do a "spring clean": Keep the current bug tracker up for
> > a while but read-only, *manually* migrate any important bugs and issues to
> > the new tracker. - This will be significant work.
> > - It will involve going over the bugs, dumping those which are out of date,
> > abandoned etc, and rewriting those bugs and feature issues that are still
> > valid. Trac's wiki functionality may be useful for this, although it loses
> > the ability to link bugs formally. - It may be a useful exercise in terms
> > of prioritising and de-junking.
> >
> 
> Migrating it in read-only mode is insane because there would be no sane way 
> for monitoring the migration progress: Which bugs have been reviewed & 
> migrated?

Fair point. Devs should have the ability to close bugs after they have been 
migrated.
> 
> Instead, we should clean up mantis itself until ALL bugs there are suitable 
> for migration and then migrate. 
> 
> Besides, as I've said numerous times, we MUST migrate all bugs because from a 
> software engineering point of view the content of a bugtracker is the most 
> valuable data of a project besides the soure code and the documentation.
> Deleting any of it without reviewing it is just unprofessional, so we must 
> make very sure that no issues in mantis get lost.
> 
> BTW: When did someone do the last backup of it's database? Does emu have a 
> backup system?
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20091031/2b37bb42/attachment.pgp>


[freenet-dev] Current security measures may be harming performance; better measures may help

2009-10-31 Thread Matthew Toseland
n't check the 
local datastore, and it provides a reasonable degree of protection against 
relatively nearby attackers.

Note that if Mallory cannot connect the requests, he can do very little. 
Randomising inserted data encryption keys will help a lot, but it is tricky and 
expensive with reinserts and is impossible with requests. We could use tunnels, 
random routing etc only on the top block etc, but they would still need to be 
not cached on the originator and therefore the next few nodes too.
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20091031/cbd1e33c/attachment.pgp>