I and others have been recently coming to the conclusion that premix routing is not the next generation security enhancement that we had hoped it would be.
The objective here is to make various attacks harder. The two attacks we are most interested in: - Correlation attacks: What is my peer doing? This relies on recognising a splitfile (or a series of FMS posts for example) and doing statistics on it to see how likely it is that the peer is the originator. - Adaptive search: The attacker picks up a few requests from a splitfile (e.g. a big insert). Each one gives him a predecessor sample, and the precise key being fetched gives him a vague idea of where the originator is. On opennet, and to some degree on darknet, he can use this information to move closer to the target. If he is successful, the number and accuracy of samples increases as he moves closer to the target, leading to exponential growth in the effectiveness of the attack. Hence for a big insert or long lived identity there's a good chance of finding the target in a relatively short period. A partial solution to the latter is the recent obfuscated inserts proposal. This prevents the attacker from moving towards the target until after the key for the insert has been announced (or the top block inserted in the case of a well known SSK), even if he can guess the content. But he may still be able to get a good idea of the originator's location if he has logged enough blocks. Encrypted tunnels (e.g. premix routing) would also solve the datastore probing attack mentioned in The Register a while ago, and may enable some significant optimisations such as Bloom filters (which would likely save quite a few hops on many requests). Basically, premix routing as we have envisaged it up until recently would divide the (darknet) network up into semi-permanent cells. Within each cell, all nodes are reasonably densely/directly connected, and all nodes are kept up to date on the status of other nodes in the cell. When we make a new request, we choose two nodes from the cell at random and create an encrypted tunnel from ourself (the start point) to the first node and then to the second node. Then the requests exit the tunnel and begin their journey. Hence, a request from one node in a cell is indistinguishable (we hope) from a request from any other node, and you have a definite anonymity set: the cell. The main disadvantages: - It will cost around twice the diameter of the cell, in hops, at the beginning of a request. Note that this is an encrypted tunnel so caches cannot be checked during these hops. - It only protects the beginning of the request, *not* the main request routing phase. You can hide within the cell, but the attacker will eventually be able to trace your requests back to the cell. If the cell is small enough, and/or the attacker powerful enough, he may simply seize every node in the cell... - We can enlarge the cell but only at the expense of increasing the number of hops at the beginning of the request, as well as the required co-ordination traffic. Hence a better approach is needed. Michael Rogers and I discussed this fairly extensively at the anonymity summit on Monday. One paper that was presented there showed fairly convincingly that no current structured peer to peer system is likely to achieve better than c/n security, so that may be an acceptable goal. By this I mean a roughly c/n chance of a tunnel being compromised given that c nodes out of n have been compromised. Tor (not a structured network, every node knows about all nodes) achieves roughly c^2/n^2 in theory however there are attacks that may reduce that to c/n. Unencrypted tunnels: We would simply bundle a load of requests together and route them together (randomly, or to a random location) for a few hops. After that, they go on their way as normal. If the attacker doesn't manage to see the requests during the tunnel phase, instead of getting one predecessor sample for every request (4G insert = 270,000 requests), he would get one predecessor sample for each tunnel, which hopefully would carry very many requests. However, if he controls any of the nodes involved in the tunnel setup (probability around [tunnel length] * c / n), he gains a much stronger predecessor sample. One advantage with unencrypted tunnels, apart from simplicity, is that we can check the datastore on each node on the tunnel; on the other hand, we can't use Bloom filters and it doesn't solve The Register's datastore probing attack as we will still have to cache local requests locally. Encrypted tunnels via rendezvous at a location: We would create a key, split it into 2 or 3 parts (which can be XORed together or concatenated to recover the original), and choose a location and a unique ID. The first part would be routed directly to the target location. The second and third (or more) parts would be random routed for some hops and then routed to the target location. Once they exit the random routing phase they look exactly the same as the first part. Each "anchor" continues until it rendezvous's with the others, or it runs out of hops (randomly or with an HTL or some hybrid). If all parts meet, a tunnel completion message (including the hash of the complete key) is sent back to the originating node, along the shortest path (either explicitly identify the first one, but we lose some data; or have a random number which starts the same for all 3 but is incremented on each hop; or even go by the first one). Then we use this to set up an encrypted tunnel to send requests through. We should have an anonymity set roughly equal to the whole network. One problem is that we may have to go a significant number of hops. In terms of attacks, we give the attacker more location samples (on the direct-routed path), and more predecessor samples (on the random-routed prefixes). That's the main disadvantage: the search will visit lots of nodes, giving the attacker many opportunities to intercept it. If he manages to get all the pieces, he can decrypt the requests. Encrypted tunnels via random routing and rendezvous: Partly as a result of reading the Rumour Riding paper, Michael has proposed that we simply rely on random walks, instead of routing towards a specific location. The advantage is it doesn't provide location samples, and we can make the paths shorter than in the previous option (thus we can tune the tradeoff between the part of the network that could have originated it vs the cost). However, it is likely to be less reliable, and therefore require many random walks, many nodes have an opportunity to get a predecessor sample and/or the keys. Also imho it will frequently produce tunnels that are *too* short. I'll let Michael elaborate on it. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20080612/b0eb384d/attachment.pgp>
