On Saturday 31 October 2009 01:19:18 Matthew Toseland wrote:
> Currently, requests are always routed the same way, but at high HTL we do not 
> cache either replies to requests or incoming inserts.
> 
> Specifically, at HTL 18 and 17 we do not cache returned data from requests 
> (though we do check the datastore), and at HTL 18, 17 and 16 we do not cache 
> data from inserts. On average we spend 2 hops at HTL 18, including the 
> originator, so on average for an insert it is 4 hops before we cache, with a 
> minimum of 3 (or is it a minimum of 2? afaics we start at htl 18 and then we 
> may decrement it when sending to the next hop, so a minimum of 3).
> 
> Decrement at HTL 18 is probabilistic, with a 50% probability.
> 
> Simulations suggest that the "ideal" node is likely found around HTL 14 to 
> 15. So a significant proportion of requests and inserts will go past it while 
> still in the no caching phase. This may partly explain poor data retention, 
> which appears to affect some proportion of keys much more than the others.
> 
> Hence we might get better data retention if we e.g. random routed while in 
> the no-cache phase.
> 
> But here is another reason for random routing while in the no-cache phase:
> 
> Lets assume that we only care about remote attackers. Generally they are much 
> more scary. So we are talking about the mobile attacker source tracing 
> attack. This means that a bad guy is a long way away, and he gets a few 
> requests by chance which were part of the same splitfile insert or request 
> originated by you. He is able to determine that they are part of the same, 
> interesting, splitfile. For each request, he knows 1) that it was routed to 
> him, and 2) its target location. He can thus determine where on the keyspace 
> the request could have come from. This is a big vague due to backoff etc, but 
> he can nonetheless identify an area where the originator is most likely 
> present, starting at his location and extending in one direction or the 
> other. In fact, he can identify the opposite end of it as the most likely 
> location of the originator. So he then tries to get peers closer to this 
> location, by announcement, path folding, changing his own location etc. If he 
> is right, he will then get requests from this source much more quickly. And 
> so he can keep on moving until he reaches the originator. It has been 
> suggested that we could mark requests so that they will not be routed to new 
> connections - the problem is this doesn't work for long-lived requests e.g. 
> big inserts.
> 
> The number of samples the attacker gets is proportional to the number of hops 
> from the originator to the "ideal" node, on average, since samples after the 
> "ideal" are much less informative. It is also proportional to the number of 
> requests sent, and inversely to the size of the network.
> 
> Random routing while the HTL is high, not to any specific location but to a 
> random peer at each hop (subject to e.g. backoff), would make the pre-ideal 
> samples much less useful, because they will each have effectively started at 
> a random node - not a truly random node, especially if we route randomly at 
> each hop, we won't have had enough hops for it to be a random node across the 
> whole keyspace, but it will still mean the picture is much more vague, and 
> the attacker will need a lot more samples. The post-ideal sample remains 
> useless. If the request reaches the attacker while it is still in the random 
> routing phase, this provides a useful sample to the attacker, but likely much 
> less useful than in the routed stage.
> 
> So, just maybe, we could improve data persistence (if not necessarily overall 
> performance), and maintain the current no-cache-at-high-htl, and improve 
> security, by random routing as well as not caching while HTL is high. Worth 
> simulating perhaps?
> 
> The next obvious solution is some form of bundling: Even if the bundle is not 
> encrypted, routing a large bunch of requests together for some distance gives 
> one sample instead of many. Short-lived bundles have the disadvantage that 
> there are many of them so the attacker gets more samples if they happen to 
> cross his path. However, we could do the don't-route-to-newbies trick with 
> short-lived bundles, using a fixed path for the bundle's lifetime. 10 bundles 
> each renewed once an hour beats hundreds of requests per hour! Long-lived 
> bundles would probably have to automatically move to new nodes, and therefore 
> could perhaps be traced back to source eventually - if the attacker managed 
> to hook one, or more likely trace a stream of requests back to one.
> 
> Bundling is a lot more work, a lot more tuning, but of course more secure. It 
> would replace the current no cache for a few hops, and would still check the 
> local datastore.
> 
> Encrypted tunnels are a further evolution of bundling: We send out various 
> randomly routed "anchors", which rendezvous to create a tunnel, which is a 
> short encrypted (using a shared secret scheme) path to a random start node. 
> This has most of the same issues as bundling, although it doesn't check the 
> local datastore, and it provides a reasonable degree of protection against 
> relatively nearby attackers.
> 
> Note that if Mallory cannot connect the requests, he can do very little. 
> Randomising inserted data encryption keys will help a lot, but it is tricky 
> and expensive with reinserts and is impossible with requests. We could use 
> tunnels, random routing etc only on the top block etc, but they would still 
> need to be not cached on the originator and therefore the next few nodes too.
> 
Okay, I propose to implement the following:
- On all requests, when the HTL goes low enough that we start caching, we 
should allow the request to come back to those nodes which it has already 
visited, in case they are ideal/sink nodes for the request. Most likely this 
will be implemented by creating a new UID for the request. This solves the 
performance problem.
- A flag on requests at high HTL. If set, this will cause the request or insert 
to be random-routed as well as not cached while in high HTL. Of course, we will 
still check the datastore, as we do now. The flag initially will be a config 
option and not enabled by any security level, so we can do some performance 
testing. In future it will hopefully be enabled at seclevel NORMAL and above. 
The flag will disappear after we go to low HTL, so only gives away the seclevel 
while the request is at high HTL. This improves security for the more paranoid, 
at a small performance cost.

In future, instead of random routing for individual requests, we should 
implement bundling, and eventually encrypted tunnels, which might be used only 
on some requests at certain seclevels. However, this will be considerably more 
work because of the tuning needed.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20091031/b4551b1c/attachment.pgp>

Reply via email to