[freenet-dev] Current security measures may be harming performance; better measures may help

Matthew Toseland Sat, 31 Oct 2009 01:19:18 +0000

Currently, requests are always routed the same way, but at high HTL we do not 
cache either replies to requests or incoming inserts.

Specifically, at HTL 18 and 17 we do not cache returned data from requests
(though we do check the datastore), and at HTL 18, 17 and 16 we do not cache
data from inserts. On average we spend 2 hops at HTL 18, including the
originator, so on average for an insert it is 4 hops before we cache, with a
minimum of 3 (or is it a minimum of 2? afaics we start at htl 18 and then we
may decrement it when sending to the next hop, so a minimum of 3).

Decrement at HTL 18 is probabilistic, with a 50% probability.

Simulations suggest that the "ideal" node is likely found around HTL 14 to 15.
So a significant proportion of requests and inserts will go past it while still
in the no caching phase. This may partly explain poor data retention, which
appears to affect some proportion of keys much more than the others.

Hence we might get better data retention if we e.g. random routed while in the
no-cache phase.

But here is another reason for random routing while in the no-cache phase:

Lets assume that we only care about remote attackers. Generally they are much
more scary. So we are talking about the mobile attacker source tracing attack.
This means that a bad guy is a long way away, and he gets a few requests by
chance which were part of the same splitfile insert or request originated by
you. He is able to determine that they are part of the same, interesting,
splitfile. For each request, he knows 1) that it was routed to him, and 2) its
target location. He can thus determine where on the keyspace the request could
have come from. This is a big vague due to backoff etc, but he can nonetheless
identify an area where the originator is most likely present, starting at his
location and extending in one direction or the other. In fact, he can identify
the opposite end of it as the most likely location of the originator. So he
then tries to get peers closer to this location, by announcement, path folding,
changing his own location etc. If he is right, he will then get requests from
this source much more quickly. And so he can keep on moving until he reaches
the originator. It has been suggested that we could mark requests so that they
will not be routed to new connections - the problem is this doesn't work for
long-lived requests e.g. big inserts.

The number of samples the attacker gets is proportional to the number of hops
from the originator to the "ideal" node, on average, since samples after the
"ideal" are much less informative. It is also proportional to the number of
requests sent, and inversely to the size of the network.

Random routing while the HTL is high, not to any specific location but to a
random peer at each hop (subject to e.g. backoff), would make the pre-ideal
samples much less useful, because they will each have effectively started at a
random node - not a truly random node, especially if we route randomly at each
hop, we won't have had enough hops for it to be a random node across the whole
keyspace, but it will still mean the picture is much more vague, and the
attacker will need a lot more samples. The post-ideal sample remains useless.
If the request reaches the attacker while it is still in the random routing
phase, this provides a useful sample to the attacker, but likely much less
useful than in the routed stage.

So, just maybe, we could improve data persistence (if not necessarily overall
performance), and maintain the current no-cache-at-high-htl, and improve
security, by random routing as well as not caching while HTL is high. Worth
simulating perhaps?

The next obvious solution is some form of bundling: Even if the bundle is not
encrypted, routing a large bunch of requests together for some distance gives
one sample instead of many. Short-lived bundles have the disadvantage that
there are many of them so the attacker gets more samples if they happen to
cross his path. However, we could do the don't-route-to-newbies trick with
short-lived bundles, using a fixed path for the bundle's lifetime. 10 bundles
each renewed once an hour beats hundreds of requests per hour! Long-lived
bundles would probably have to automatically move to new nodes, and therefore
could perhaps be traced back to source eventually - if the attacker managed to
hook one, or more likely trace a stream of requests back to one.

Bundling is a lot more work, a lot more tuning, but of course more secure. It
would replace the current no cache for a few hops, and would still check the
local datastore.

Encrypted tunnels are a further evolution of bundling: We send out various
randomly routed "anchors", which rendezvous to create a tunnel, which is a
short encrypted (using a shared secret scheme) path to a random start node.
This has most of the same issues as bundling, although it doesn't check the
local datastore, and it provides a reasonable degree of protection against
relatively nearby attackers.

Note that if Mallory cannot connect the requests, he can do very little.
Randomising inserted data encryption keys will help a lot, but it is tricky and
expensive with reinserts and is impossible with requests. We could use tunnels,
random routing etc only on the top block etc, but they would still need to be
not cached on the originator and therefore the next few nodes too.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20091031/cbd1e33c/attachment.pgp>

[freenet-dev] Current security measures may be harming performance; better measures may help

Reply via email to