[freenet-dev] Temporary insert obfuscation: The best of both worlds

Matthew Toseland Sat, 7 Jun 2008 00:46:54 +0100

On Saturday 07 June 2008 00:06, Matthew Toseland wrote:
> PROBLEM:
> If an attacker can identify that each block belongs to the same stream of 
> requests or inserts, he can move towards the originator increasingly 
rapidly: 
> each request gives him a bearing on the direction (in the keyspace) of the 
> originator, and he can use this information to connect to nodes closer to 
the 
> originator. Each time he does this, the amount of the stream that he sees 
> increases, and therefore his search accelerates. This attack is easiest on 
> opennet, but it may also be possible on darknet for some attackers (but much 
> slower!).
> 
> PARTIAL SOLUTION:
> We have a partial solution in that we don't insert the top block, or 
generate 
> the key, until after we have inserted all the data below it. However, in 
many 
> instances the data will be guessable (or guessable to within some computible 
> entropy e.g. all but a few bytes) and therefore the attacker can still 
> identify the blocks.
> 
> BAD SOLUTION:
> One proposed solution (for inserts) is to insert each splitfile encrypted 
with 
> a random key. This would help a lot with this attack (for inserts), but it 
> would cause much more data duplication, entirely losing the benefits of 
> convergent encryption (CHKs). To what degree convergent encryption is useful 
> is an open question, but there is a better way...
> 
> NEW PROPOSED SOLUTION:
> When Alice inserts a file, her node runs FEC encoding as usual. Then a 
random 
> obfuscation key is chosen, and each block is encoded, both with the random 
> obfuscation key, and with the normal CHK convergent encryption. Alice's node 
> inserts only the obfuscated blocks, but it computes the CHKs for the 
> non-obfuscated blocks if they were inserted.
> 
> The top block includes pointers to the top level of the splitfile for both 
the 
> obfuscated and non-obfuscated versions, but only the obfuscated version has 
> been inserted *by Alice*.
> 
> Alice announces the key on a public forum. Bob (hopefully there will be many 
> Bob's) starts to download it. For each block in the splitfile, Bob's node 
> tries the non-obfuscated block first, maybe giving it a 3-try head start. 
> Then when this fails it tries the obfuscated block. When each obfuscated 
> block is fetched, there is a chance that the non-obfuscated version will be 
> inserted by Bob's node.
> 
> Thus, Alice is protected by Bob. Most likely Alice is in the greater danger: 
> generally you want to go for the source of the data. The attacker will then 
> only gain a small amount of information from a splitfile insert: even though 
> he can identify the blocks in retrospect, he can't move towards the insertor 
> during the insert, so he gathers much less information. It will probably 
take 
> more than one splitfile insert to trace Alice. Of course, splitfile inserts 
> aren't the only thing that gives him bearings on Alice's location, her FMS 
> posts etc (e.g. announcing her files) will also betray her in sufficient 
> quantity. It would be good to have some sort of guesstimate that you have to 
> change identity every X messages or inserts to be reasonably safe...
> 
> Sadly protecting requestors in this way is impossible afaics (although in 
the 
> long term, premix routing and/or tunnels will help). And in the long run we 
> don't fetch the obfuscated data, only the non-obfuscated data, which will 
> collide for multiple inserts of the same data, so we don't waste any space. 
> Obviously this whole mechanism would be optional so that you can turn it off 
> for a slightly faster / less demanding insert (but with less security).
> 
> Problems:
> 
> ** Whether to include the non-obfuscated blocks in the splitfile metadata. 
**
> 
> Inserting the non-obfuscated blocks when we get the obfuscated ones, and 
> trying the non-obfuscated ones before the obfuscated ones, is intended to 
> result in the non-obfuscated blocks being preferred, and the obfuscated ones 
> falling out of the network. 
> 
> We have two options:
> 
> 1. We could have a parallel metadata structure, with only the obfuscated 
keys 
> in the obfuscated metadata, and only the non-obfuscated keys in the 
> non-obfuscated metadata, and the top block referring to both, but Alice only 
> inserting the obfuscated metadata.
> CON:
> - A requestor will not know the obfuscated blocks' keys until he has decoded 
> them. Thus, he cannot request them, until some requestor manages to download 
> the entire (obfuscated) file and insert the non-obfuscated metadata.
> 
> 2. We could include the keys of both the obfuscated and non-obfuscated 
blocks 
> in the obfuscated metadata.
> PRO:
> - We can immediately fetch the non-obfuscated blocks from day one. So we can 
> further propagate the blocks inserted by our fellow requestors, and not 
> propagate the obfuscated blocks, maximising efficiency and resulting in 
> faster downloads when a file is new.
> - Also, if the same file has been inserted previously, we can immediately 
pick 
> up the blocks inserted for the other file.
> CON:
> - The obfuscated metadata will be twice the size because twice the number of 
> keys. And it will not collide, because it is obfuscated.
> MITIGATION:
> - However, we can have non-obfuscated metadata in parallel. This could be 
> inserted by any requestor, and would be the same size as splitfile metadata 
> is now. Bob's node would request both the obfuscated and the non-obfuscated 
> metadata initially, but if the non-obfuscated metadata was found, there 
would 
> be no need to fetch the lower layers of the obfuscated metadata unless they 
> were needed i.e. problems were encountered fetching the data.
> 
> IMHO we should implement the second option, with the mitigation so that 
> long-term very few blocks are fetched from the obfuscated data, and it falls 
> off the network.
> 
> ** Possible optimisation: Timestamps **
> 
> We could include a couple of timestamps in the top level metadata:
> - Before time T1, fetch the non-obfuscated data as often as the obfuscated 
> data.
> - After time T2, don't fetch the non-obfuscated data at all (or only if 
really 
> desperate).
> 
> Obviously this would have to be optional, and the original time of insertion 
> would have to be obfuscated. And this will only work if we are sure that our 
> data will be downloaded by several others shortly after we announce it.
>

And if we combine this with some form of tunneling, we can get reasonably
respectable privacy for insertors: the insertor will need to be close enough
to do a correlation attack (how close this is is an open question).

Various forms of tunneling have been proposed, mrogers did a load of work on
it a while back. Here are the versions that spring to mind:

0. Undirected random routing for a few hops at the beginning of each insert.

This is quite vulnerable to an attacker close enough to do a correlation
attack. However, it doesn't give away which requests belong together, and of
course each sample from the main phase will give virtually no information.
Mallory needs to catch inserts in the random routing phase.

1. Route to a random location for a few hops at the beginning of each insert.

We choose a random location for the request or insert, and we route to it, for
a small number of random hops, possibly followed by a small HTL countdown.

This isn't strictly tunneling. It's fairly weak against an attacker close
enough to do a correlation attack. But an attacker trying to do a global
adaptive search, as the above, even once he's identified the blocks belonging
to the insert in his logs, they will provide no useful bearings, because each
request will have started in a different place. His best chance then is to
catch an insert in the first (routing to random location) phase. That will
give him a usable bearing. But requests spend more time in the main search
phase than the randomization phase.

2. Route a bundle of requests all towards the same location.

Choose a random location for a bundle of requests (probably all part of the
same splitfile, perhaps the whole splitfile), and route them all towards that
location. We keep them together i.e. "bundled", and so we minimise the number
of samples the attacker can take. The downside is it's obvious that the
requests belong together if they're in the same bundle. We may want to change
the random target location every so often even within a splitfile, to prevent
Mallory from tracing back far enough to reach the bundle phase. The tradeoff
is that every time we change the start location, we give Mallory another
chance to catch us in the bundle phase; if he does, he can take a sample of
our real location, and he can move slightly towards us. But if Mallory isn't
on the path the bundle passes through, he has to trace backwards until he
reaches it; so there will be an optimal number of requests between changing
location.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20080607/b5bc382b/attachment.pgp>

[freenet-dev] Temporary insert obfuscation: The best of both worlds

Reply via email to