Forwarded Message
Subject:Re: [tor-dev] Onion Service - Intropoint DoS Defenses
Date: Thu, 4 Jul 2019 20:38:48 +0200
From: juanjo
To: David Goulet
These experiments and final note confirm what I thought about this rate
limiting feature from the start: it is missing important parts. Ok, you
can protect the network a little and the HS, but the general
availability is not affected so it actually does not help for that.
I wanna make a proposal including many things at the same time, but I
don't have much time to follow the guidelines to make a official
proposal. Maybe in some weeks?
Again, I repeat: things that should be done now:
-Authenticated rend signature. This would help a lot I think.
-Mid-term: PoW for the client when reaching the 305prop limit instead of
denying access? IDK, all always configurable.
-Deprecate clients or allow the Hidden Service to configure the IP to
allow access for old version clients (not supporting new antiDoS
features) or not. If we allow old version without protections, all
security measures are useless.
And just a new idea: what about make the rotation of IP dynamic based on
this prop305 values? + time based rotation:
One of the goal for rotation was defending against correlation attacks:
if we set a lower limit we have a potential DoS (right now), if we set
it high we have a potential correlation attack, bigger surface.
What about we join time based rotation (ex. 24 hours) + or limit reached
based on the prop305 values.
On 3/7/19 20:37, David Goulet wrote:
On 30 May (09:49:26), David Goulet wrote:
Greetings!
[snip]
Hi everyone,
I'm writing here to update on where we are about the introduction rate
limiting at the intro point feature.
The branch of #15516 (https://trac.torproject.org/15516) is ready to be merged
upstream which implements a simple rate/burst combo for controlling the amount
of INTRODUCE2 cells that are relayed to the service.
As previously detailed in this thread, the default values are a rate of 25
introduction per second and a burst of 200 per second. These values can be
controlled by consensus parameters meaning they can be changed network wide.
We've first asked big service operators, I'm not going to detail the values
they provided us in private, but those defaults are quite large enough to
sustain heavy traffic from what we can tell from what they gave us.
The second thing we did is do experimental testing to see how CPU usage and
availability is affected. We've tested this with 3 _fast_ introduction points
and then 3 rate limited introduction points.
The good news is that once the attack stops, the rain of introduction requests
to the service stops very quickly.
With the default rate/burst values, on a Intel(R) Xeon(R) CPU E5-2650 v4 @
2.20GHz (8 cores), the tor service CPU doesn't go above ~60% (on one single
core). And almost drops to 0 as soon as the attack ends.
The bad news is that availability is _not_ improved. One of the big reasons
for that is because the rate limit defenses, once engaged at the intro point,
will send back a NACK to the client. A vanilla tor client will stop using that
introduction point away for 120 seconds if it gets 3 NACKs from it. This leads
to tor quickly giving up on trying to connect and thus telling the client that
connection is impossible to the .onion.
We've hacked a tor client to play along and stop ignoring the NACKs to see how
much time it would take to reach it. On average, a client would roughly need
around 70 seconds with more than 40 NACKs on average.
However, it varied a _lot_ during our experiments with many outliers from 8
seconds with 1 NACK up to 160 seconds with 88 NACKs. (For this, the
SocksTimeout had to be bumped quite a bit).
There is an avenue of improvement here to make the intro point sends a
specific NACK reason (like "Under heavy load" or ...) which would make the
client consider it like "I should retry soon-ish" and thus making the client
possibly able to connect after many seconds (or until the SocksTimeout).
Another bad news there! We can't do that anytime soon because of this bug that
basically crash clients if an unknown status code is sent back (that is a new
NACK value):https://trac.torproject.org/30454. So yeah... quite unfortunate
there but also a superb reason for everyone out there to upgrade :).
One good news is that it seems that having fast intro points instead of slow
IPs doesn't change much on the overall load on the service so this for now,
our experiment, shows it doesn't matter.
Overall, this rate limit feature does two things:
1. Reduce the overall network load.
Soaking the introduction requests at the intro point helps avoid the
service creating pointless rendezvous circuits which makes it "less" of an
amplification attack.
2. Keep the service usable.
The tor daemon doesn't go in massive CPU load and thus can be actually used
properly during the attack.
The