[freenet-dev] Security and caching issues with Bloom filters

Matthew Toseland Wed, 6 May 2009 13:09:49 +0100

Okay, the first thing I propose to implement is this:
- Implement a client-cache. This will be encrypted, if physical security is 
set to low then the key will be stored on disk, otherwise it will be reset 
each startup.
- Do not cache data returned by a request if the HTL is equal to the maximum 
when we start the request. I.e. both for local requests (even if we forward 
them to many nodes due to bad topology) and requests which we are forwarding 
at max HTL.

Clearly this is an improvement on the current situation security-wise,
considered on its own. It finally eliminates the Register attack, for one
thing. And the nodes we forward to already know the HTL was at maximum, if we
forward it at maximum, or if we forward it at max-1; the only case where they
get new information is if we sent it to many nodes because of bad topology,
and even then they can probably guess this from swap data.

On the other hand, right now the decrement probability at max HTL is 10%,
decrement probability at min is 25%, and max HTL is 10. We need to increase
the decrement probability or it won't get cached anywhere! And we need the
max HTL to go up to compensate: we don't want to reduce the overall HTL.
There is a tradeoff between the guaranteed distance away from the originator
when we cache (if we start caching at some HTL level lower than max-1), the
number of nodes we don't cache on (too many = bad performance), and the
security gain from staying at max HTL for longer (lower probability that the
predecessor was the originator or cached the data: right now 10% pdrop means
the HTL being max only tells you there was a 10% chance of the predecessor
being the originator, however correlation attacks and datastore probing are
much more powerful). I suggest:

50% pDecrement / 18 max HTL / cache at 16

Guaranteed 2 and average 4 hops away when we cache, but pDecrement is a bit
high. This is less of a problem 2 hops away though. The 50% of blocks that
got cached could have come from any node within 2 hops, although correlating
blocks from different nodes will get a back-bearing.

Cache at 17 would give more caching at the cost of less security (caching 1
hop away min, 3 avg instead of 2 and 4).

But there are other options.

A wildcard is random routing while htl is max. This might improve caching at
the cost of sometimes causing more hops; on the other hand sometimes it would
improve data reachability. Security impact is debatable, it might help blur
the originator for e.g. mobile attacker tracing, but on the other hand it
might increase the chance of them seeing the first request??

Anyway, this does reduce performance a bit, especially if the client-cache is
ephemeral. So lets look at the next element.

Bloom filter sharing: Darknet peers will receive a Bloom filter for our SSK-,
CHK-, store and cache (unless friends security level is HIGH). As will
opennet peers with sufficient uptime. Any node which has received a Bloom
filter may make a request resembling a GetOfferedKey, which means that they
can effectively probe our datastore to confirm the guess they made from the
Bloom filter. We can then track what proportion of these requests are
answered and calculate whether there is something grossly wrong with the
bloom filter.

Does this make it possible to reconstruct who has requested what after the
event? An attacker targeting a single node can get connected to its
likely/known peers, download their bloom filters, and look for blocks in the
bloom filter corresponding to the keys he is interested in. She (this is
probably Sybil) doesn't have to rely on timing attacks, and she knows that
out of the proportion of requests routed to a given node, there is a
pDecrement probability of the key being stored. Compare that to the current
situation, though. Right now there is a 100% chance that all the keys are
stored on the target node, and a 100% chance that each key routed to each
peer node is cached. IMHO not having to rely on timing attacks is not a
significant gain for a smart attacker. And the initial transfer of the bloom
filters is pretty big. If an attacker wants to probe for *a lot* of keys, or
is worried about HTL=min requests to probe the datastore being detected,
having the bloom filter is helpful. But on the whole this is probably an
improvement IMHO.

Transitional arrangements may however be a problem. Right now everyone's
datastore is full of data they have inserted or requested. We do not want to
expose all that to their opennet peers! Nodes created after the caching
changes can reasonably share their bloom filters, but nodes created before
this? Sharing the bloom filter for the store but not the cache *might* be
safe. Sharing the filter for the cache would only be safe after it has been
fully rewritten, or at least largely rewritten ... for a big store, this
would be a long time, especially with salted hash store's overwriting
existing keys much of the time. For my 250GB cache, filling it up (with LRU)
would nominally take 58 days at the current 0.42 writes/sec, or 102 at the
recent 0.24 writes/sec; we would have to wait several times that for it to be
safe to expose bloom filters. At the same time, we don't want to wipe
everyone's datastore, nor encourage them to do so. Reinserting the 64GB
currently cached would take 63 days, but if the store had been fuller this
wouldn't be an option and in any case the load consequences would be rather
humongous.

Solutions? Is the best we can do to only share bloom filters on fresh nodes
created after the storage changes, maybe plus those with network seclevel set
to LOW? This will mean it takes a long time to see any tangible performance
benefit from Bloom filters... :|
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20090506/4a7b1731/attachment.pgp>

[freenet-dev] Security and caching issues with Bloom filters

Reply via email to