On Saturday 31 October 2009 15:47:07 Matthew Toseland wrote:
> On Saturday 31 October 2009 01:19:18 Matthew Toseland wrote:
> > Currently, requests are always routed the same way, but at high HTL we do 
> > not cache either replies to requests or incoming inserts.
> > 
> > Specifically, at HTL 18 and 17 we do not cache returned data from requests 
> > (though we do check the datastore), and at HTL 18, 17 and 16 we do not 
> > cache data from inserts. On average we spend 2 hops at HTL 18, including 
> > the originator, so on average for an insert it is 4 hops before we cache, 
> > with a minimum of 3 (or is it a minimum of 2? afaics we start at htl 18 and 
> > then we may decrement it when sending to the next hop, so a minimum of 3).
> > 
> > Decrement at HTL 18 is probabilistic, with a 50% probability.
> > 
> > Simulations suggest that the "ideal" node is likely found around HTL 14 to 
> > 15. So a significant proportion of requests and inserts will go past it 
> > while still in the no caching phase. This may partly explain poor data 
> > retention, which appears to affect some proportion of keys much more than 
> > the others.
> > 
> > Hence we might get better data retention if we e.g. random routed while in 
> > the no-cache phase.
> > 
> > But here is another reason for random routing while in the no-cache phase:
> > 
> > Lets assume that we only care about remote attackers. Generally they are 
> > much more scary. So we are talking about the mobile attacker source tracing 
> > attack. This means that a bad guy is a long way away, and he gets a few 
> > requests by chance which were part of the same splitfile insert or request 
> > originated by you. He is able to determine that they are part of the same, 
> > interesting, splitfile. For each request, he knows 1) that it was routed to 
> > him, and 2) its target location. He can thus determine where on the 
> > keyspace the request could have come from. This is a big vague due to 
> > backoff etc, but he can nonetheless identify an area where the originator 
> > is most likely present, starting at his location and extending in one 
> > direction or the other. In fact, he can identify the opposite end of it as 
> > the most likely location of the originator. So he then tries to get peers 
> > closer to this location, by announcement, path folding, changing his own 
> > location etc. If he is right, he will then get requests from this source 
> > much more quickly. And so he can keep on moving until he reaches the 
> > originator. It has been suggested that we could mark requests so that they 
> > will not be routed to new connections - the problem is this doesn't work 
> > for long-lived requests e.g. big inserts.
> > 
> > The number of samples the attacker gets is proportional to the number of 
> > hops from the originator to the "ideal" node, on average, since samples 
> > after the "ideal" are much less informative. It is also proportional to the 
> > number of requests sent, and inversely to the size of the network.
> > 
> > Random routing while the HTL is high, not to any specific location but to a 
> > random peer at each hop (subject to e.g. backoff), would make the pre-ideal 
> > samples much less useful, because they will each have effectively started 
> > at a random node - not a truly random node, especially if we route randomly 
> > at each hop, we won't have had enough hops for it to be a random node 
> > across the whole keyspace, but it will still mean the picture is much more 
> > vague, and the attacker will need a lot more samples. The post-ideal sample 
> > remains useless. If the request reaches the attacker while it is still in 
> > the random routing phase, this provides a useful sample to the attacker, 
> > but likely much less useful than in the routed stage.
> > 
> > So, just maybe, we could improve data persistence (if not necessarily 
> > overall performance), and maintain the current no-cache-at-high-htl, and 
> > improve security, by random routing as well as not caching while HTL is 
> > high. Worth simulating perhaps?
> > 
> > The next obvious solution is some form of bundling: Even if the bundle is 
> > not encrypted, routing a large bunch of requests together for some distance 
> > gives one sample instead of many. Short-lived bundles have the disadvantage 
> > that there are many of them so the attacker gets more samples if they 
> > happen to cross his path. However, we could do the don't-route-to-newbies 
> > trick with short-lived bundles, using a fixed path for the bundle's 
> > lifetime. 10 bundles each renewed once an hour beats hundreds of requests 
> > per hour! Long-lived bundles would probably have to automatically move to 
> > new nodes, and therefore could perhaps be traced back to source eventually 
> > - if the attacker managed to hook one, or more likely trace a stream of 
> > requests back to one.
> > 
> > Bundling is a lot more work, a lot more tuning, but of course more secure. 
> > It would replace the current no cache for a few hops, and would still check 
> > the local datastore.
> > 
> > Encrypted tunnels are a further evolution of bundling: We send out various 
> > randomly routed "anchors", which rendezvous to create a tunnel, which is a 
> > short encrypted (using a shared secret scheme) path to a random start node. 
> > This has most of the same issues as bundling, although it doesn't check the 
> > local datastore, and it provides a reasonable degree of protection against 
> > relatively nearby attackers.
> > 
> > Note that if Mallory cannot connect the requests, he can do very little. 
> > Randomising inserted data encryption keys will help a lot, but it is tricky 
> > and expensive with reinserts and is impossible with requests. We could use 
> > tunnels, random routing etc only on the top block etc, but they would still 
> > need to be not cached on the originator and therefore the next few nodes 
> > too.
> > 
> Okay, I propose to implement the following:
> - On all requests, when the HTL goes low enough that we start caching, we 
> should allow the request to come back to those nodes which it has already 
> visited, in case they are ideal/sink nodes for the request. Most likely this 
> will be implemented by creating a new UID for the request. This solves the 
> performance problem.

In fact, this is only necessary on inserts. Requests don't need to go back over 
the nodes they've already been to. Or do they, for better caching if it's found 
later?


> - A flag on requests at high HTL. If set, this will cause the request or 
> insert to be random-routed as well as not cached while in high HTL. Of 
> course, we will still check the datastore, as we do now. The flag initially 
> will be a config option and not enabled by any security level, so we can do 
> some performance testing. In future it will hopefully be enabled at seclevel 
> NORMAL and above. The flag will disappear after we go to low HTL, so only 
> gives away the seclevel while the request is at high HTL. This improves 
> security for the more paranoid, at a small performance cost.
> 
> In future, instead of random routing for individual requests, we should 
> implement bundling, and eventually encrypted tunnels, which might be used 
> only on some requests at certain seclevels. However, this will be 
> considerably more work because of the tuning needed.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20091031/6a605e81/attachment.pgp>

Reply via email to