[freenet-dev] Really secure inserts?

Matthew Toseland Thu, 20 Dec 2012 16:31:43 -0800

Something a bit like "bundles", a bit like "tunnels". But different, because 
Freenet is different. Here I am assuming that we are inserting large 
predictable content, e.g. reinserting a big splitfile.


Setup
-------

Take a bunch of keys. A self-contained insert: A big splitfile including the 
top SSK, or a freesite insert including the top SSK and all the splitfiles 
under it. Maybe include the announcement you're gonna make as well.

Pad the small keys to be the same size as big keys, since there are likely 
mostly big keys.

Pad the keys with extra redundancy until you get to a standard number of keys. 
This will be some standard formula like 1-7 * a power of 2.

Create two master keys, K and X.

Use K and X to derive K_n and X_n for each block.

Pre-insert
-------------

Do an "insert" of each block, encrypted with K_n, to H(X_n). The full 256-bit 
H(X_n) is included on the request, and that's what we route towards.
The last 5 hops or so (probably determined by HTL) put the key into their 
pre-insert cache, including X_n.
When all the keys have been pre-inserted, move to the next stage.

Reveal
---------

Send a Reveal request for each block. This includes X_n (which proves 
ownership), and K_n.

The first node on the chain which has the pre-insert in its pre-insert cache 
decrypts the block and does a normal insert. If the other nodes on the chain 
are still connected it will prove to them that the insert has been completed, 
so they can delete it from their pre-insert caches.

This is potentially a long time after the original insert, so it's important 
that it be routed normally, not rely on remembering where we sent it last. This 
also explains why we need it to be cached on multiple nodes at the end of the 
route.

At this stage, with the most naive implementation, we have considerable 
resistance to MAST:
- Blocks cannot be identified at all in pre-insert.
- The attacker can only identify blocks in reveal that he saw in pre-insert. 
I.e. he needs to be on the path for the pre-inserts, in which case he knows 
what was inserted, and has a predecessor sample.
- Can he approach the target during the reveal stage? Depends on how fast it is.
- The reveal phase can be very fast. We don't need to limit the number of 
reveals going on at once, or at least, it can be very large. Which means MAST 
is more or less impossible, and the reveal stage shouldn't take much longer 
than the slowest insert. Or we can allow nodes to queue confirmed pre-inserts, 
in which case we may or may not want any feedback on them.

However, we can improve considerably on this, and resist more advanced attacks 
than MAST.

FUNDAMENTAL DESIGN CHOICE:
------------------------------------------

Either:
1) Don't return an encryption key from the pre-insert. We only need K and X to 
reveal the insert, so MassReveal is *really tiny*, allowing for fairly 
interesting methods to anonymize the Reveal (we could use different K and X for 
different parts of the file if it's useful, this is doubtful), or
2) Do return an encryption key from the pre-insert. The beginning stage of the 
Reveal costs a lot more bandwidth, which may rule out some protection 
strategies for the revealer. However, the attacker can only identify the block 
if he was one of the nodes that stored the block and returned an encryption key 
for it, and was also the node first reached by the reveal. How important is 
this? The second condition requires he have much higher penetration than the 
first, so probably it does matter... And we can prove that any intermediaries 
involved in revealing are honest, i.e. that a MassReveal isn't ignored, that it 
is sent to the nodes it's supposed to be sent to, or
3) Return an encryption key, but don't use it to encrypt K_n, only to sign the 
response. This allows us to verify that a reveal has been forwarded to one of 
the nodes that accepted the pre-insert, and is especially useful if DC 
broadcasting MassReveal's. Obviously the signatures would have to be broadcast 
(but not anonymously, so cheap enough for moderate sized groups).

Can we avoid having to return an encryption key and still encrypt to the nodes? 
That would seem to be a contradiction - the last few nodes are no different to 
any of the nodes on the path, we can make them different by returning a key. An 
attacker can return a key but then he needs to be the one that gets found 
first, so still has a fairly heavy penetration requirement.

BUT the limiting factor: If it's a big file, the attacker is likely to be lucky 
enough to receive both the original insert and the reveal. So he gets some 
samples. If he also gets the MassReveal he's in business. But how often is he 
going to see it? One way to improve this is to send more than one block to each 
endpoint; the downside is reveal is then slower, the upside is there are fewer 
samples.

Protecting the reveal stage
-----------------------------------

There are various increasingly complex solutions to starting the reveal 
somewhere other than the originator. If we do #1 above, we can exploit the fact 
that MassReveal is just K, X, and n (the number of blocks), i.e. it is tiny. 
However, even if we do #2 above, it's still small - just maybe not small enough 
for Dining Cryptographers.
1: Route a MassReveal randomly, as a single request, with no protection.
The basic problem with this is that it might be on the same path as the 
original inserts.
2: Send a MassReveal through a rendezvous tunnel.
These would be set up in advance by sending several "anchor" requests which are 
randomly routed, and each contain a share of a secret. When they meet the 
shortest path becomes an encrypted tunnel, which no other node can decrypt.
3: Send a MassReveal through a global onion tunnel.
This involves discovering specific nodes and encrypting to their pubkeys, 
similar to Tor.
4: Dining Cryptographers anonymous broadcast
A cell would be set up automatically. Nodes should be discovered 
collaboratively i.e. must not be a figment of one member's imagination, not a 
pure chain. Ideally the chain would be distributed right across the network, 
different locations. This means more hops though. A message is anonymously 
broadcast, and the work is shared out by some sort of random allocation (e.g. a 
hashing rule), with some redundancy.

How do we ensure the reveal stage is executed correctly? An obvious attack is 
to accept reveals and not execute them:
- The original pre-insert response could include a nonce, or key.

Most of the above assume a single node initially handles the MassReveal.
- Accountability issues? DoS by not doing anything?
- Redundancy?

How do we prevent using MassReveal as a DoS?
- Nodes can reject a reveal if they haven't seen it before.
- If we see too many such rejects, something bad is happening. We (the 
recipient of the MassReveal, which is being used to do the DoS) will then stop 
sending messages.
- This means we shouldn't send all of the reveal's at once, we should send a 
subset first to see if we get a lot of rejects.

Further improvements
-----------------------------

We can do this multi-stage, like onion routing. I.e. do an pre-insert and then 
several stages of reveal (reinsert), each time going back to the pre-insert 
cache of different nodes. Of course this will be slow.

We can introduce delays/queueing/batching as a security feature (e.g. against 
traffic analysis), although that may mean we can't definitively say when an 
insert has finished except by fetching the data. But that's true anyway if 
there are malicious (lazy) nodes involved; right now inserts don't include any 
sort of verification.

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
[email protected]
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

[freenet-dev] Really secure inserts?

Reply via email to