Okay, the first thing I propose to implement is this:
- Implement a client-cache. This will be encrypted, if physical security is 
set to low then the key will be stored on disk, otherwise it will be reset 
each startup.
- Do not cache data returned by a request if the HTL is equal to the maximum 
when we start the request. I.e. both for local requests (even if we forward 
them to many nodes due to bad topology) and requests which we are forwarding 
at max HTL.

Clearly this is an improvement on the current situation security-wise, 
considered on its own. It finally eliminates the Register attack, for one 
thing. And the nodes we forward to already know the HTL was at maximum, if we 
forward it at maximum, or if we forward it at max-1; the only case where they 
get new information is if we sent it to many nodes because of bad topology, 
and even then they can probably guess this from swap data.

On the other hand, right now the decrement probability at max HTL is 10%, 
decrement probability at min is 25%, and max HTL is 10. We need to increase 
the decrement probability or it won't get cached anywhere! And we need the 
max HTL to go up to compensate: we don't want to reduce the overall HTL. 
There is a tradeoff between the guaranteed distance away from the originator 
when we cache (if we start caching at some HTL level lower than max-1), the 
number of nodes we don't cache on (too many = bad performance), and the 
security gain from staying at max HTL for longer (lower probability that the 
predecessor was the originator or cached the data: right now 10% pdrop means 
the HTL being max only tells you there was a 10% chance of the predecessor 
being the originator, however correlation attacks and datastore probing are 
much more powerful). I suggest:

50% pDecrement / 18 max HTL / cache at 16

Guaranteed 2 and average 4 hops away when we cache, but pDecrement is a bit 
high. This is less of a problem 2 hops away though. The 50% of blocks that 
got cached could have come from any node within 2 hops, although correlating 
blocks from different nodes will get a back-bearing.

Cache at 17 would give more caching at the cost of less security (caching 1 
hop away min, 3 avg instead of 2 and 4).

But there are other options.

A wildcard is random routing while htl is max. This might improve caching at 
the cost of sometimes causing more hops; on the other hand sometimes it would 
improve data reachability. Security impact is debatable, it might help blur 
the originator for e.g. mobile attacker tracing, but on the other hand it 
might increase the chance of them seeing the first request??

Anyway, this does reduce performance a bit, especially if the client-cache is 
ephemeral. So lets look at the next element.

Bloom filter sharing: Darknet peers will receive a Bloom filter for our SSK-, 
CHK-, store and cache (unless friends security level is HIGH). As will 
opennet peers with sufficient uptime. Any node which has received a Bloom 
filter may make a request resembling a GetOfferedKey, which means that they 
can effectively probe our datastore to confirm the guess they made from the 
Bloom filter. We can then track what proportion of these requests are 
answered and calculate whether there is something grossly wrong with the 
bloom filter.

Does this make it possible to reconstruct who has requested what after the 
event? An attacker targeting a single node can get connected to its 
likely/known peers, download their bloom filters, and look for blocks in the 
bloom filter corresponding to the keys he is interested in. She (this is 
probably Sybil) doesn't have to rely on timing attacks, and she knows that 
out of the proportion of requests routed to a given node, there is a 
pDecrement probability of the key being stored. Compare that to the current 
situation, though. Right now there is a 100% chance that all the keys are 
stored on the target node, and a 100% chance that each key routed to each 
peer node is cached. IMHO not having to rely on timing attacks is not a 
significant gain for a smart attacker. And the initial transfer of the bloom 
filters is pretty big. If an attacker wants to probe for *a lot* of keys, or 
is worried about HTL=min requests to probe the datastore being detected, 
having the bloom filter is helpful. But on the whole this is probably an 
improvement IMHO.

Transitional arrangements may however be a problem. Right now everyone's 
datastore is full of data they have inserted or requested. We do not want to 
expose all that to their opennet peers! Nodes created after the caching 
changes can reasonably share their bloom filters, but nodes created before 
this? Sharing the bloom filter for the store but not the cache *might* be 
safe. Sharing the filter for the cache would only be safe after it has been 
fully rewritten, or at least largely rewritten ... for a big store, this 
would be a long time, especially with salted hash store's overwriting 
existing keys much of the time. For my 250GB cache, filling it up (with LRU) 
would nominally take 58 days at the current 0.42 writes/sec, or 102 at the 
recent 0.24 writes/sec; we would have to wait several times that for it to be 
safe to expose bloom filters. At the same time, we don't want to wipe 
everyone's datastore, nor encourage them to do so. Reinserting the 64GB 
currently cached would take 63 days, but if the store had been fuller this 
wouldn't be an option and in any case the load consequences would be rather 
humongous.

Solutions? Is the best we can do to only share bloom filters on fresh nodes 
created after the storage changes, maybe plus those with network seclevel set 
to LOW? This will mean it takes a long time to see any tangible performance 
benefit from Bloom filters... :|
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20090506/4a7b1731/attachment.pgp>

Reply via email to