Currently, requests are always routed the same way, but at high HTL we do not cache either replies to requests or incoming inserts.
Specifically, at HTL 18 and 17 we do not cache returned data from requests (though we do check the datastore), and at HTL 18, 17 and 16 we do not cache data from inserts. On average we spend 2 hops at HTL 18, including the originator, so on average for an insert it is 4 hops before we cache, with a minimum of 3 (or is it a minimum of 2? afaics we start at htl 18 and then we may decrement it when sending to the next hop, so a minimum of 3). Decrement at HTL 18 is probabilistic, with a 50% probability. Simulations suggest that the "ideal" node is likely found around HTL 14 to 15. So a significant proportion of requests and inserts will go past it while still in the no caching phase. This may partly explain poor data retention, which appears to affect some proportion of keys much more than the others. Hence we might get better data retention if we e.g. random routed while in the no-cache phase. But here is another reason for random routing while in the no-cache phase: Lets assume that we only care about remote attackers. Generally they are much more scary. So we are talking about the mobile attacker source tracing attack. This means that a bad guy is a long way away, and he gets a few requests by chance which were part of the same splitfile insert or request originated by you. He is able to determine that they are part of the same, interesting, splitfile. For each request, he knows 1) that it was routed to him, and 2) its target location. He can thus determine where on the keyspace the request could have come from. This is a big vague due to backoff etc, but he can nonetheless identify an area where the originator is most likely present, starting at his location and extending in one direction or the other. In fact, he can identify the opposite end of it as the most likely location of the originator. So he then tries to get peers closer to this location, by announcement, path folding, changing his own location etc. If he is right, he will then get requests from this source much more quickly. And so he can keep on moving until he reaches the originator. It has been suggested that we could mark requests so that they will not be routed to new connections - the problem is this doesn't work for long-lived requests e.g. big inserts. The number of samples the attacker gets is proportional to the number of hops from the originator to the "ideal" node, on average, since samples after the "ideal" are much less informative. It is also proportional to the number of requests sent, and inversely to the size of the network. Random routing while the HTL is high, not to any specific location but to a random peer at each hop (subject to e.g. backoff), would make the pre-ideal samples much less useful, because they will each have effectively started at a random node - not a truly random node, especially if we route randomly at each hop, we won't have had enough hops for it to be a random node across the whole keyspace, but it will still mean the picture is much more vague, and the attacker will need a lot more samples. The post-ideal sample remains useless. If the request reaches the attacker while it is still in the random routing phase, this provides a useful sample to the attacker, but likely much less useful than in the routed stage. So, just maybe, we could improve data persistence (if not necessarily overall performance), and maintain the current no-cache-at-high-htl, and improve security, by random routing as well as not caching while HTL is high. Worth simulating perhaps? The next obvious solution is some form of bundling: Even if the bundle is not encrypted, routing a large bunch of requests together for some distance gives one sample instead of many. Short-lived bundles have the disadvantage that there are many of them so the attacker gets more samples if they happen to cross his path. However, we could do the don't-route-to-newbies trick with short-lived bundles, using a fixed path for the bundle's lifetime. 10 bundles each renewed once an hour beats hundreds of requests per hour! Long-lived bundles would probably have to automatically move to new nodes, and therefore could perhaps be traced back to source eventually - if the attacker managed to hook one, or more likely trace a stream of requests back to one. Bundling is a lot more work, a lot more tuning, but of course more secure. It would replace the current no cache for a few hops, and would still check the local datastore. Encrypted tunnels are a further evolution of bundling: We send out various randomly routed "anchors", which rendezvous to create a tunnel, which is a short encrypted (using a shared secret scheme) path to a random start node. This has most of the same issues as bundling, although it doesn't check the local datastore, and it provides a reasonable degree of protection against relatively nearby attackers. Note that if Mallory cannot connect the requests, he can do very little. Randomising inserted data encryption keys will help a lot, but it is tricky and expensive with reinserts and is impossible with requests. We could use tunnels, random routing etc only on the top block etc, but they would still need to be not cached on the originator and therefore the next few nodes too. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 835 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20091031/cbd1e33c/attachment.pgp>