Currently, requests are always routed the same way, but at high HTL we do not 
cache either replies to requests or incoming inserts.

Specifically, at HTL 18 and 17 we do not cache returned data from requests 
(though we do check the datastore), and at HTL 18, 17 and 16 we do not cache 
data from inserts. On average we spend 2 hops at HTL 18, including the 
originator, so on average for an insert it is 4 hops before we cache, with a 
minimum of 3 (or is it a minimum of 2? afaics we start at htl 18 and then we 
may decrement it when sending to the next hop, so a minimum of 3).

Decrement at HTL 18 is probabilistic, with a 50% probability.

Simulations suggest that the "ideal" node is likely found around HTL 14 to 15. 
So a significant proportion of requests and inserts will go past it while still 
in the no caching phase. This may partly explain poor data retention, which 
appears to affect some proportion of keys much more than the others.

Hence we might get better data retention if we e.g. random routed while in the 
no-cache phase.

But here is another reason for random routing while in the no-cache phase:

Lets assume that we only care about remote attackers. Generally they are much 
more scary. So we are talking about the mobile attacker source tracing attack. 
This means that a bad guy is a long way away, and he gets a few requests by 
chance which were part of the same splitfile insert or request originated by 
you. He is able to determine that they are part of the same, interesting, 
splitfile. For each request, he knows 1) that it was routed to him, and 2) its 
target location. He can thus determine where on the keyspace the request could 
have come from. This is a big vague due to backoff etc, but he can nonetheless 
identify an area where the originator is most likely present, starting at his 
location and extending in one direction or the other. In fact, he can identify 
the opposite end of it as the most likely location of the originator. So he 
then tries to get peers closer to this location, by announcement, path folding, 
changing his own location etc. If he is right, he will then get requests from 
this source much more quickly. And so he can keep on moving until he reaches 
the originator. It has been suggested that we could mark requests so that they 
will not be routed to new connections - the problem is this doesn't work for 
long-lived requests e.g. big inserts.

The number of samples the attacker gets is proportional to the number of hops 
from the originator to the "ideal" node, on average, since samples after the 
"ideal" are much less informative. It is also proportional to the number of 
requests sent, and inversely to the size of the network.

Random routing while the HTL is high, not to any specific location but to a 
random peer at each hop (subject to e.g. backoff), would make the pre-ideal 
samples much less useful, because they will each have effectively started at a 
random node - not a truly random node, especially if we route randomly at each 
hop, we won't have had enough hops for it to be a random node across the whole 
keyspace, but it will still mean the picture is much more vague, and the 
attacker will need a lot more samples. The post-ideal sample remains useless. 
If the request reaches the attacker while it is still in the random routing 
phase, this provides a useful sample to the attacker, but likely much less 
useful than in the routed stage.

So, just maybe, we could improve data persistence (if not necessarily overall 
performance), and maintain the current no-cache-at-high-htl, and improve 
security, by random routing as well as not caching while HTL is high. Worth 
simulating perhaps?

The next obvious solution is some form of bundling: Even if the bundle is not 
encrypted, routing a large bunch of requests together for some distance gives 
one sample instead of many. Short-lived bundles have the disadvantage that 
there are many of them so the attacker gets more samples if they happen to 
cross his path. However, we could do the don't-route-to-newbies trick with 
short-lived bundles, using a fixed path for the bundle's lifetime. 10 bundles 
each renewed once an hour beats hundreds of requests per hour! Long-lived 
bundles would probably have to automatically move to new nodes, and therefore 
could perhaps be traced back to source eventually - if the attacker managed to 
hook one, or more likely trace a stream of requests back to one.

Bundling is a lot more work, a lot more tuning, but of course more secure. It 
would replace the current no cache for a few hops, and would still check the 
local datastore.

Encrypted tunnels are a further evolution of bundling: We send out various 
randomly routed "anchors", which rendezvous to create a tunnel, which is a 
short encrypted (using a shared secret scheme) path to a random start node. 
This has most of the same issues as bundling, although it doesn't check the 
local datastore, and it provides a reasonable degree of protection against 
relatively nearby attackers.

Note that if Mallory cannot connect the requests, he can do very little. 
Randomising inserted data encryption keys will help a lot, but it is tricky and 
expensive with reinserts and is impossible with requests. We could use tunnels, 
random routing etc only on the top block etc, but they would still need to be 
not cached on the originator and therefore the next few nodes too.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20091031/cbd1e33c/attachment.pgp>

Reply via email to