On Saturday 22 Dec 2012 00:37:39 Arne Babenhauserheide wrote:
> A few comments:
> 
> Am Freitag, 21. Dezember 2012, 00:31:25 schrieb Matthew Toseland:
> > - Pad the small keys to be the same size as big keys, since there are likely
> > mostly big keys.
> > - Pad the keys with extra redundancy until you get to a standard number of
> > keys. This will be some standard formula like 1-7 * a power of 2.
> > - Use K and X to derive K_n and X_n for each block.
> 
> → anonymize the keys?

Exactly.
> 
> > into their pre-insert cache, including X_n. 
> 
> Why a special cache? Why not normal insert? 

We don't want the pre-inserted keys to stick around.

Hmmm, what about DoS attacks against the pre-insert cache? Maybe we should use 
the store; we do want to free up the slots when we've inserted successfully 
though (in spite of random replacement this still makes the store more 
efficient).
> 
> > The first node on the chain which has the pre-insert in its pre-insert cache
> > decrypts the block and does a normal insert.
> 
> I see an attack here:
> 
> I insert many chunks which go to the same part of the keyspace. When I send 
> the reveal, I can start a distributed replacement attack.
> 
> Any chance to avoid that?

Ahhh, you mean you can bombard one keyspace location with lots of spam inserts, 
in the hopes of driving a single document out, and while avoiding the usual 
load management limits between you and the target?

Hmmm... No immediately obvious solutions ... This is also possible even with 
"random route for the first N hops of an insert", a very simple solution that 
is planned for inserts and also has major security benefits. And it's even 
going to be true for any sort of tunneling, though it depends on how cheap it 
is to make new tunnels... Of course it's easy on opennet as you just keep 
announcing new nodes (another reason why we should introduce some limits on 
announcement, although ultimately this is futile).

How important is it? Multi-location, self-healing SSKs are planned (using PSKs 
of course), which may help, but only multiply the number of targets by a small 
factor. Also, we can't have these mechanisms be specific to CHKs - SSKs have a 
greater need for protection.

I guess it comes down to how effective will the attack be, given the structure 
of our datastores? How many keys on average do we need to send to a node to 
displace a target document, given they are close to the target location so will 
be routed to the target node, and given we don't know the salt value for that 
store? Won't that be a significant fraction of the datastore?

Also, the most targeted keys are likely to be popular keys, in which case we 
only need a mechanism for them to be reinserted when found in somebody's cache. 
We have such mechanisms, although they may be imperfect.

This is a censorship attack and so we need to have a robust answer for it ...

If we can quantify it, e.g. say that it's possible but you'd need to 
continually commit a significant amount of bandwidth per targeted key depending 
on its popularity, that's enough for me ...

Also, anything that makes finding keys more efficient will help, e.g. bloom 
filter sharing.
> 
> > chain are still connected it will prove to them that the insert has been
> 
> How do they prove it? 

Forwarding a portion of the reveal request? Not all of it, we don't want them 
to be able to decode the block; X_n is probably enough?
> 
> > There are various increasingly complex solutions to starting the reveal
> > somewhere other than the originator. If we do #1 above, we can exploit the
> > fact that MassReveal is just K, X, and n (the number of blocks), i.e. it is
> > tiny. However, even if we do #2 above, it's still small - just maybe not
> > small enough for Dining Cryptographers. 
> 
> Oh, that’s the how. 
> 
> Nice! 
> 
> http://en.wikipedia.org/wiki/Dining_cryptographers_problem

Right, we can use provably anonymous broadcast groups provided the message is 
very small, and provided that it can be announced all at once, before there is 
any churn in the membership of the group. The problem comes if we can't do the 
reveal for the whole file in one broadcast, or, if we do two stages, if a node 
receives both the original and the reveal so can connect the two broadcast 
groups. These sorts of things look very hard to execute though IMHO?

Reveal could be infinitely scalable if we don't need the preinsert targets to 
return encryption keys, i.e. we can just reveal K and X and then have it span 
out to enough nodes, but it looks to me like we need to encrypt the reveal keys 
individually for good security. If the attacker has a node in the group that 
receive K and X (the DC broadcast group perhaps but also the set of nodes which 
translate K, X into individual reveal messages), he can identify *ALL* the 
blocks which went past him, even where he was not the replyer; if only the 
first hop reached can decrypt it, he needs a significantly higher density; he 
can decrypt the block if 1) he receives the reveal for the block, before it 
reaches the first node which had sent a key back to the originator, and 2) he 
has sent a key back to the originator (regardless of whether he is the node 
reached first). We could just send one at a time, but that means more 
round-trips, and round-trips are *bad*...

So just giving K, X implies we probably want to partition it into smaller 
groups, tradeoffs?

Encrypting properly allows any of the preinsert repliers to decrypt the block, 
but no other blocks (assuming there aren't any problems with HTL etc; we might 
consider changing that for preinserts). The thing is, the preinsert repliers 
won't be on the path for the reveal unless they are close to the target 
location. So there is a density-of-attackers requirement here: First he must be 
on the preinsert path (cm/n), then he must be on the reveal path (cm/n again, 
but not independant; it comes down to having a node near the target location). 
c = attacker nodes, m = total blocks inserted, n = total nodes. The limiting 
factor then is that for big inserts, we can't anonymize the reveal as a single 
unit. One solution to that is to divide preinserts into a limited number of 
groups, each inserted separately, with the groups getting bigger for bigger 
inserts. Each group only needs two keys, so we can keep the broadcasted data 
small. It means it will take longer for the reveal to complete; big inserts 
take longer, this is normal; but it might affect reliability; one key per node 
guarantees that the total time to reveal is only going to be the time for a 
slow insert. Another possibility would be to use a pyramid, like in splitfiles 
- pre-insert and then reveal the list of blocks to reveal! The pyramid would 
have a good fan-out factor, each layer becoming much bigger than the previous, 
so there should be very few such levels. And we can use anonymous broadcast for 
the bottom layer.

Pre-insert the actual blocks 0...100,000.
Each returns a pubkey.
Create a "reveal file" consisting of the instructions for revealing the blocks.
Pre-insert the reveal file to block 0..100.
Anonymously broadcast instructions to reveal the reveal file, via DC broadcast, 
among a randomly (but robustly) constructed cell you happen to be momentarily a 
member of.
The instruction to reveal the reveal file is executed by some nodes in the 
cell, in accordance with the standard policy.
The nodes that receive the reveal of the reveal file, which originally got the 
pre-inserts of the reveal file, decrypt their chunks of the reveal file.
They then have their instructions for posting reveal's.
They post the reveals, which match up with the original preinserts.
The original preinserts execute the final inserts of the actual blocks.

Attacks: If you control the node receiving a chunk of the reveal file, and the 
node receiving a chunk of the original file, you can connect all the 
pre-inserts in that chunk, and therefore can get a predecessor sample on any 
nodes which you control which receive actual blocks from that file.

Performance: Total reveal time is equal to a few slow inserts. This minimises 
the chance of nodes going offline etc.

On a broader view of performance, the number of blocks lost before revealing 
may be an issue. Redundancy helps though.

Timing might also give stuff away. We can identify two reveals as belonging to 
the same group because they were started at the same time. Delays and 
thresholds may help with this.
> 
> Best wishes,
> Arne

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
[email protected]
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to