[freenet-dev] Distributed censorship attack was Re: Really secure inserts?

Matthew Toseland Mon, 07 Jan 2013 10:11:37 -0800

On Saturday 29 Dec 2012 14:53:51 Arne Babenhauserheide wrote:
> Am Samstag, 22. Dezember 2012, 02:17:24 schrieb Matthew Toseland:
> > Hmmm, what about DoS attacks against the pre-insert cache? Maybe we should
> > use the store; we do want to free up the slots when we've inserted
> > successfully though (in spite of random replacement this still makes the
> > store more efficient).
> 
> wouldn’t the pre-insert keys never be requested, so they would drop out 
> anyway 
> after 14 days on average? (following digger3’s stats)


Yes. I didn't think it was that bad though. :|
> 
> > > I insert many chunks which go to the same part of the keyspace. When I
> > > send
> > > the reveal, I can start a distributed replacement attack.
> > > 
> > > Any chance to avoid that?
> > 
> > Ahhh, you mean you can bombard one keyspace location with lots of spam
> > inserts, in the hopes of driving a single document out, and while avoiding
> > the usual load management limits between you and the target?
> 
> Yes, exactly that.
> 
> It allows attacking a given location without being detectable.
> 
> Though that’s essentially a condition to being anonymous, so I don’t know if 
> it is fixable…
> 
> Essentially you can pretend to be many different people at the same time. If 
> you can orchestrate the reveal, you could create very many inserts to some 
> location coming in one fast burst. And that could make it possible to kill a 
> document with known content before anyone can download it.

The attack isn't instant. You have to prepare it in advance and cannot retarget 
it. All that you get from the random routing is avoiding the bottlenecks 
between one node and the target location.

And yes, we will probably need random routing, for a whole set of reasons. Even 
with classic mixnet tunnels there will be multiple tunnels for an insert, and 
nothing to stop an attacker using more tunnels if he wants to.
> 
> > I guess it comes down to how effective will the attack be, given the
> > structure of our datastores? How many keys on average do we need to send to
> > a node to displace a target document, given they are close to the target
> > location so will be routed to the target node, and given we don't know the
> > salt value for that store? Won't that be a significant fraction of the
> > datastore?
> 
> That’s a very good point.
> 
> If we want to be reasonable sure that we can kill a given file, we might have 
> to displace 90% of the store or so.

Right. So we can simply impose a minimum store size of say 5GB, and this attack 
gets pretty hard.
> 
> I think that the assumption, that normal insert behaviour is pretty random 
> distributed in the keyspace, so a sudden surge of inserts to a given location 
> could be detected as an attack.

I'm not sure how easy it would be to detect.
> 
> And since that would very likely only happen with an attack, adding limits to 
> displacement in the store should kill those attacks but only marginally 
> affect 
> regular inserts.
> 
> In my 1GiB store, I have a write rate of 0.08/s and 13,414 keys, so if I 
> interpret the data correctly, it takes 40h till the number of write requests 
> equals the number of keys. 

We have both the store and the cache. Both of these are in practice mostly 
specialised around our location.

The store only gets written on inserts, and only when we're the sink. So 
remotely overwriting it from far away on the network may be rather difficult, 
unless they already know our location and peers - which they might. Conceivably 
we could detect an attempt to overwrite the *store* simply because the attacker 
would have to customise the keys' locations to ensure they are stored. It's not 
so easy with the cache but I guess it doesn't matter as much with the cache? Or 
they could keep a plausible distribution of locations / low hit rate, and 
require far more keys to overwrite everything...
> 
> To estimate the amount of keys displaced after 40h:
> 
> N: Number of keys in the store.
> First key: 100% chance of displacing an old key.
> Second: (1 - 1/N) chance of displacing an old key.
> …
> since I stumbled with the math, I realized it in a recursive function:
> 
> def displacedold(n, steps, step=0, nth=None):
>      if step == steps: 
>          return nth
>      if step == 0: 
>          return displacedold(n, steps, step=step+1, nth=1 - 1.0/n)
>      return displacedold(n, steps, step=step+1, nth=nth*(1 - 1.0/n))
> 
> def displacedourkey(n, steps):
>     dis = 1.0/n
>     for step in range(steps-1):
>         dis += displacedold(n, step+1)/n
>     return dis
> 
> print displacedourkey(100, 100)
> → 0.63
> print displacedourkey(100, 200)
> → 0.86
> print displacedourkey(100, 300)
> → 0.95
> print displacedourkey(100, 400)
> → 0.98
> 
> This stays similar for bigger numbers of keys, but my algo is bad enough that 
> it quickly exceeds the maximum recursion depths :) 
> 
> So to be 95% sure that you replaced a key, you have to create inserts of 3 
> times the size of the store. Which in case of my store would take about 6 
> days.
> 
> If you could triple the rate of writes to my store with the distributed 
> insert 
> attack, you could decrease the lifetime to 2 days.
> 
> But that could be stopped by throttling the rate of change of write traffic 
> to 
> the node (if that’s possible). Then an attacker would have to keep inserting 
> constantly and the attack would be twarted.
> 
> The attack draws its strength from being able to insert first and then using 
> the reveal step to force a huge number of writes to a given location in a 
> very 
> short time. Essentially it allows storing up bandwidth and then releasing it 
> in a much shorter timespan.

Yep.
> 
> And that is not possible if you have to keep up the insert traffic over a 
> longer timespan.
> 
> And that can be stopped by throttling the rate of change of write traffic to 
> the store. For example it would be possible to limit the writes per minute to 
> 3× the daily rate.

Good point. If you have to maintain traffic over a long period, you are better 
off doing regular inserts. Of course, the other thing to take into account is 
that the capacity of the node is limited, so many of the inserts will be 
redirected away from the originally targeted node.

If you are not targeting a specific, single node, you will need to use a lot 
more bandwidth, because it's hard to target the node exactly if you don't even 
know what it's location is; so most of your inserts won't be stored. However, 
it doesn't matter as much if your inserts are redirected, since you're 
targeting a group of nodes, a location. But it will require a *lot* more 
bandwidth.

If you are targeting a specific, single node, on opennet you can surround it 
(maybe indirectly, e.g. with a layer of innocent nodes), although we're working 
on ways to reduce that.
> 
> Then displacing a key in my store with high probability (95%) would take at 
> least 2 days. And that is much longer than the time needed for completing an 
> upload, announcing the upload on Sone and someone downloading it.
> 
> And as soon as the upload has been downloaded, it is in the cache and the 
> attack is useless, because downloaders can heal the inserted key.

I'm not sure I follow your scenario here. It takes longer to displace a key 
than to reinsert it to a different key?
> 
> (please doublecheck my logic, though!)
> 
> > If we can quantify it, e.g. say that it's possible but you'd need to
> > continually commit a significant amount of bandwidth per targeted key
> > depending on its popularity, that's enough for me ...
> 
> Throttling the write rate gives that.
> 
> The attack tries to make all nodes around you send you inserts all of a 
> sudden, so the write rate has a massive spike. Smooth that spike and the 
> attack is mostly useless.

So if our store rate suddenly spikes, we route the inserts anyway, but randomly 
refuse to store? We can't selectively reject inserts because backoff doesn't 
care whether it's an insert or a request... I'm not sure whether exploiting 
such a countermeasure would be a cheap way to DoS requests by causing us to 
backoff?
> 
> > > > chain are still connected it will prove to them that the insert has been
> > > 
> > > How do they prove it?
> > 
> > Forwarding a portion of the reveal request? Not all of it, we don't want
> > them to be able to decode the block; X_n is probably enough?
> 
> I don’t know it well enough to say something about that. But I trust your 
> judgement here.
> 
> > > http://en.wikipedia.org/wiki/Dining_cryptographers_problem
> > 
> > Right, we can use provably anonymous broadcast groups provided the message
> > is very small, and provided that it can be announced all at once, before
> > there is any churn in the membership of the group. The problem comes if we
> > can't do the reveal for the whole file in one broadcast, or, if we do two
> > stages, if a node receives both the original and the reveal so can connect
> > the two broadcast groups. These sorts of things look very hard to execute
> > though IMHO?
> 
> I don’t know enough to answer that.
> 
> > previous, so there should be very few such levels. And we can use anonymous
> > broadcast for the bottom layer.
> 
> I’m a bit wary of broadcast, since that creates very many messages. Is it a 
> real broadcast (reaching all nodes in the network) or limited somehow?

I said "broadcast groups". We'd set up groups of nodes and use DC anonymous 
broadcast within a group. The groups would have a limited, fixed number of 
participants. This is very inefficient, so could only be used for broadcasting 
K and X, or the top level of a multi-level reveal structure. Also it has a 
smaller anonymity set than using tunnels. 
> 
> But all in all, the scheme sounds pretty good to me. I don’t have sufficient 
> knowledge to really judge it completely, though.

Thanks!

My main concern is how does this compare to a more traditional tunneling 
approach, given the reliability issues on Freenet, the fact that on darknet one 
hop of a tunnel will be many actual hops and therefore will often be very slow 
(bottlenecked by the slowest node). On opennet conceivably we could use direct 
connections for tunnels, like I2P and Tor do, but bear in mind that opennet has 
other problems and we don't want to encourage its usage too much; an attacker 
can dominate the keyspace/topology and hence control most tunnels, for example. 
I.e. we can kill Sybil on darknet, but not on opennet.

We have several options:
1. Rendezvous tunnels. Effectively only a single hop tunnel. Probably the 
cheapest, simplest option, but security is mediocre.
2. Tunnels like I2P or Tor. Essentially real-time. Use many tunnels and create 
new ones constantly throughout an insert; attacker can try to observe tunnel 
creation after he's seen the first one.
3. Tunnels like mixmaster. Store-and-forward, but not reliable, because of node 
uptime issues. Limit unreliability by having many backup routes at each point 
(which means the attacker is more likely to be one of them; these would 
probably have to be around a single location to limit extra routing?). And we'd 
still need to use multiple tunnels, though not as many. Chunk size may be 
limited by node uptime. We may need additional redundancy.
4. Preinsert/reveal. Expensive, real-time tunnels for the reveal's. Preinserts 
travel twice (or 3x) the usual (storage) number of hops. May need additional 
redundancy, probably not as much as #3.

#3 and #4 only work for inserts. #1 and #2 could work for downloads too. It's 
probably worth implementing more than one strategy, so we have really good 
protection for bulk inserts, and adequate protection for downloads. However, in 
general, securing downloads is harder, and tolerance of slowness for downloads 
is likely less than for inserts.

#3 is probably safer than #2.

#4 is original, not well-studied, but it looks to me like it is comparable to 
#3 for security.

UI issues:
- #1, #2 no UI issues
- #3 significant UI issues: Takes time to finish the insert on 
store-and-forward, and it's probably best not to have any feedback?
- #4: Upload progress, then reveal is very fast. May or may not be feedback but 
should be done quickly after preinsert upload phase finishes so not a big deal.

PERFORMANCE DEPENDS ON NUMBER OF HOPS:

Preinsert/reveal needs to use twice the number of hops used for normal storage. 
This may well be too big; currently it's ~ 20. It's bigger than it needs to be 
partly because of wanting to store data in more than one place.

Other forms of tunnels use, approximately, multiples of the network diameter. 
Which could be less than the number of hops for storage. Lets say 8 or 10.

So conceivably #3 could be faster than #4 because data travels fewer hops. 

In an ideal world we'd combine #3 and #4 so we have many tunnels, all of which 
are totally useless to an attacker (unless they are happy to trace every 
tunnel!), then one reveal. But that'd be rather slow! (Other problem is how do 
we know when to post the reveal? Hence it's not really fire and forget, which 
ideally we'd like to be true of #3; and anything that distributes the reveal 
will allow an attacker to identify the tunnels).

PREVIOUS DETAILED THOUGHTS ON #4:
(Though I'm leaning to #3 now)

The big advantage of this scheme when combined with tunnels is the bit that's 
dangerous, that links all the blocks together - the reveal - gets tunneled; the 
preinserts aren't linkable at insert time, so are hard to trace. So we get 
reasonable performance while being able to use expensive pan-network tunnels 
for the dangerous bit.

A powerful attacker could:
a) Intercept the reveal, know all the blocks, and then correlate the data he 
gathered from them in the preinsert phase, or
b) Try to trace the reveal. He would need to sabotage tunnel setup by 
dominating the keyspace, or correlate multiple reveal-tunnels.

Both a) and b) require that he be on the reveal path, which is unlikely, 
compared to being on the preinsert paths. a) additionally requires that he log 
many of the preinserts. In both cases he needs at least one node which actually 
received the block and returned an encryption key for it; if he does this too 
early on the path, he may not receive the reveal message. So in both 
strategies, he needs fairly significant network penetration.

In principle the performance cost is comparable to that of tunneling 
everything. In practice IMHO it will be dramatically greater for comparable 
security, because for speed and robustness we would need many tunnels for a 
single insert, and any single tunnel could be very slow, for the same reason 
that requests can be very slow: transferring lots of data through a bottleneck 
takes time. We'd be perpetually creating new tunnels throughout a long insert, 
and the attacker would take advantage of this. Or we'd use a more 
store-and-forward tunneling pattern, which probably makes more sense, but then 
we'd need lots of backup routes at each hop.

For preinsert/reveal, if we reveal the same set of blocks twice, we have still 
only inserted the data from the originator once. The attacker needs:
1) To intercept any original preinsert, and return a key, AND
2) To intercept any secondary preinsert, and return a key, AND
3) To intercept the initial reveal, AND
4) To intercept the second reveal.

OR:

1) To intercept any original preinsert, and return a key, AND
2) To intercept a secondary preinsert corresponding to a preinsert he also 
intercepted, and return a key, AND
3) To intercept the initial reveal

OR to trace either of the reveals. I wonder if it makes sense to tunnel the 
second reveal through the first one, to deny him predecessor samples? We'll 
still need to find the intermediaries via other tunnels, and he might be able 
to identify them at that point...
> 
> Best wishes,
> Arne

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
[email protected]
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

[freenet-dev] Distributed censorship attack was Re: Really secure inserts?

Reply via email to