Okay, the first thing I propose to implement is this: - Implement a client-cache. This will be encrypted, if physical security is set to low then the key will be stored on disk, otherwise it will be reset each startup. - Do not cache data returned by a request if the HTL is equal to the maximum when we start the request. I.e. both for local requests (even if we forward them to many nodes due to bad topology) and requests which we are forwarding at max HTL.
Clearly this is an improvement on the current situation security-wise, considered on its own. It finally eliminates the Register attack, for one thing. And the nodes we forward to already know the HTL was at maximum, if we forward it at maximum, or if we forward it at max-1; the only case where they get new information is if we sent it to many nodes because of bad topology, and even then they can probably guess this from swap data. On the other hand, right now the decrement probability at max HTL is 10%, decrement probability at min is 25%, and max HTL is 10. We need to increase the decrement probability or it won't get cached anywhere! And we need the max HTL to go up to compensate: we don't want to reduce the overall HTL. There is a tradeoff between the guaranteed distance away from the originator when we cache (if we start caching at some HTL level lower than max-1), the number of nodes we don't cache on (too many = bad performance), and the security gain from staying at max HTL for longer (lower probability that the predecessor was the originator or cached the data: right now 10% pdrop means the HTL being max only tells you there was a 10% chance of the predecessor being the originator, however correlation attacks and datastore probing are much more powerful). I suggest: 50% pDecrement / 18 max HTL / cache at 16 Guaranteed 2 and average 4 hops away when we cache, but pDecrement is a bit high. This is less of a problem 2 hops away though. The 50% of blocks that got cached could have come from any node within 2 hops, although correlating blocks from different nodes will get a back-bearing. Cache at 17 would give more caching at the cost of less security (caching 1 hop away min, 3 avg instead of 2 and 4). But there are other options. A wildcard is random routing while htl is max. This might improve caching at the cost of sometimes causing more hops; on the other hand sometimes it would improve data reachability. Security impact is debatable, it might help blur the originator for e.g. mobile attacker tracing, but on the other hand it might increase the chance of them seeing the first request?? Anyway, this does reduce performance a bit, especially if the client-cache is ephemeral. So lets look at the next element. Bloom filter sharing: Darknet peers will receive a Bloom filter for our SSK-, CHK-, store and cache (unless friends security level is HIGH). As will opennet peers with sufficient uptime. Any node which has received a Bloom filter may make a request resembling a GetOfferedKey, which means that they can effectively probe our datastore to confirm the guess they made from the Bloom filter. We can then track what proportion of these requests are answered and calculate whether there is something grossly wrong with the bloom filter. Does this make it possible to reconstruct who has requested what after the event? An attacker targeting a single node can get connected to its likely/known peers, download their bloom filters, and look for blocks in the bloom filter corresponding to the keys he is interested in. She (this is probably Sybil) doesn't have to rely on timing attacks, and she knows that out of the proportion of requests routed to a given node, there is a pDecrement probability of the key being stored. Compare that to the current situation, though. Right now there is a 100% chance that all the keys are stored on the target node, and a 100% chance that each key routed to each peer node is cached. IMHO not having to rely on timing attacks is not a significant gain for a smart attacker. And the initial transfer of the bloom filters is pretty big. If an attacker wants to probe for *a lot* of keys, or is worried about HTL=min requests to probe the datastore being detected, having the bloom filter is helpful. But on the whole this is probably an improvement IMHO. Transitional arrangements may however be a problem. Right now everyone's datastore is full of data they have inserted or requested. We do not want to expose all that to their opennet peers! Nodes created after the caching changes can reasonably share their bloom filters, but nodes created before this? Sharing the bloom filter for the store but not the cache *might* be safe. Sharing the filter for the cache would only be safe after it has been fully rewritten, or at least largely rewritten ... for a big store, this would be a long time, especially with salted hash store's overwriting existing keys much of the time. For my 250GB cache, filling it up (with LRU) would nominally take 58 days at the current 0.42 writes/sec, or 102 at the recent 0.24 writes/sec; we would have to wait several times that for it to be safe to expose bloom filters. At the same time, we don't want to wipe everyone's datastore, nor encourage them to do so. Reinserting the 64GB currently cached would take 63 days, but if the store had been fuller this wouldn't be an option and in any case the load consequences would be rather humongous. Solutions? Is the best we can do to only share bloom filters on fresh nodes created after the storage changes, maybe plus those with network seclevel set to LOW? This will mean it takes a long time to see any tangible performance benefit from Bloom filters... :| -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 835 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20090506/4a7b1731/attachment.pgp>
