[tor-relays] DoS attacks -- status update

2022-10-28 Thread Georg Koppen

Hello everyone!

It's been a while since we last provided some update on this mailing 
list about our ongoing work fighting several DoS attacks.


We can use the attached graph about detected overload over the last 
couple of months to show what is going on and what we do/plan to do 
about it.


The first noteworthy incident on that graph is the sharp rise in 
overloaded non-exit nodes since the middle of July caused by a drop of 
onionskins[1] which the relay's available CPU/memory can't handle 
anymore. There are currently two ideas we are working on to cope with 
such a flood of onionskins:


1. Developing a Proof of Work (PoW) system to has a rate-limitation knob 
rejecting the load of onionskins while letting legitimate ones 
through.[2] We still need to solve some design issues (feel free to 
help!) but hope to have that feature integrated into Tor soon.


2. Relay operators started to experiment with iptables/nftables rules 
and having the right ones available might be a good stopgap approach 
against the onionskin related DoS. We are coordinating that effort[3], 
so we have something available to propose to the wider community, which 
is kept up-to-date and limits the risks of traffic overblocking. Feel 
free to help as well with that effort.


The other noteworthy incident started around September 13 when exit 
nodes began to get overloaded (while the other DoS was and is still 
ongoing). Unfortunately, that exit related DoS is heavily impacting our 
users' experience as can be seen in our OnionPerf data[4]. While we are 
still investigating the nature of that DoS attack it turns out that 
blocking particular IP addresses with ExitPolicy rules seems to help on 
exit nodes this is tested on. The Artikel10 exit node operators provided 
even a script recently[5] to help with that (much appreciated, thanks!). 
This approach is highly experimental at this point and it might help us 
at least to come up with an actual design idea to counter that 
particular exit DoS.


Thanks,
Georg

[1] For information about overload in general and what "drop of 
onionskins" means, see: 
https://support.torproject.org/relay-operators/#relay-bridge-overloaded. 
It contains as well a guide on how to enable MetricsPort monitoring 
yourself so you see the actual metrics of your own relay.

[2] https://gitlab.torproject.org/tpo/core/tor/-/issues/40634
[3] https://gitlab.torproject.org/tpo/community/support/-/issues/40093
[4] https://metrics.torproject.org/torperf.html
[5] 
https://lists.torproject.org/pipermail/tor-relays/2022-October/020848.html


OpenPGP_signature
Description: OpenPGP digital signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] MetricsPort: tor_relay_connections_total type confusion

2022-10-28 Thread nat--- via tor-relays
to remind ourselves about the prometheus types and their differences
and when to use what:
https://prometheus.io/docs/concepts/metric_types/

> The patch I got in makes all the tor_relay_connections_total{} metrics
> "gauges" because in effect, some can go up and down and some might only go
> up but I figured even the one that only accumulate can also be gauges.

you are right that prometheus will scrape it anyway even if the type is
incorrectly defined but by defining a counter as gauge (or vice versa) you
tell people what it is and how to make sense of it, for example the type
implies "do (not) use rate() on this".

By looking at

# TYPE tor_relay_connections_total gauge
[...]

Anyone familiar with prometheus would expect it to be a gauge
- so for example would not use rate() on it - but that is not correct for
all labels.
For example creating graphs for state="created" without using rate() would
produce boring graphs because it is a counter. They might also wonder why
a gauge ends with .._total, because that is used for accumulating count
[1].

> Is that a problem to your knowledge from a Prometheus standpoint?

It is a problem because prometheus users/tools are used to
consume that TYPE line to learn what type it is.
They will get confused if they treat it according to the current type
definition.
graphing counters as if they were gauges will result in questions
like "oh why is my rate of x always increasing?"

before:
# TYPE tor_relay_connections counter
tor_relay_connections{type="OR",direction="initiated",state="opened"}

but that value is not a counter, it is a gauge. It can go down.

after yesterday's change:

# TYPE tor_relay_connections_total gauge
tor_relay_connections_total{type="OR",direction="received",state="created"}

but that is a counter, it can never go down.

To prevent this and to follow the usual prometheus practices
is best to have the current metric split into one for counters and
another one for gauges.

thanks for working on this!

best regards,
nat

[1] https://prometheus.io/docs/practices/naming/

___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


[tor-relays] Upcoming OpenSSL 3 security bugfix release

2022-10-28 Thread Georg Koppen

Hello relay operators!

If you are running OpenSSL 3 please be aware that you might need to 
upgrade it to 3.0.7 as fast as possible on next Tuesday (Nov 1). In an 
announcement[1] it got said 3.0.7 fixes a CRITICAL security issue.


Operators running on the 1.1 series are fine as far as we know so far.

Georg

[1] https://www.openwall.com/lists/oss-security/2022/10/25/4


OpenPGP_signature
Description: OpenPGP digital signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] MetricsPort: tor_relay_connections_total type confusion

2022-10-28 Thread David Goulet
On 28 Oct (11:04:09), n...@danwin1210.de wrote:
> Hello David,
> 
> again, thanks for your work on adding more metrics to tor's MetricsPort!
> Many relay operators will love this and documentation will be useful [1].
> 
> I reported
> https://gitlab.torproject.org/tpo/core/tor/-/issues/40699
> which got closed yesterday, but there was likely a misunderstanding and
> the changes did not solve the underlying issue.
> 
> The core issue is: The metric called
> tor_relay_connections(_total) [2][3]
> contains a mix of different types (gauge, counter).

The patch I got in makes all the tor_relay_connections_total{} metrics
"gauges" because in effect, some can go up and down and some might only go up
but I figured even the one that only accumulate can also be gauges.

Is that a problem to your knowledge from a Prometheus standpoint?

Cheers!
David

> 
> When mixing types in a single metric, the TYPE definition will always be
> wrong for one or the other value.
> For example grafana will show this if you use a counter metric without
> rate():
> "Selected metric is a counter. Consider calculating rate of counter by
> adding rate()."
> 
> It is a best practice to avoid mixing different types in a single metric.
> From the prometheus best practices [4]:
> "As a rule of thumb, either the sum() or the avg() over all dimensions of
> a given metric should be meaningful (though not necessarily useful). If it
> is not meaningful, split the data up into multiple metrics. For example,
> having the capacity of various queues in one metric is good, while mixing
> the capacity of a queue with the current number of elements in the queue
> is not."
> 
> An idea to address the underlying issue:
> One connection metric for counter and one for gauge:
> 
> - tor_relay_connections_total for counters, like the current label
> state="created"
> - tor_relay_connections for gauge metrics, like the current label
> state="opened". "rejected" also appears to be a gauge metric.
> 
> Another nice feature of these metrics would be to have a label for what
> type of system is connecting (src="relay", src="non-relay") - more on that
> in yesterday's email.
> A tool by toralf [4] also shows these and uses the source IP but tor
> itself does not need to look at the source IP to determine the type,
> something discussed in last week's relay operator meetup.
> 
> best regards,
> nat
> 
> [1] https://gitlab.torproject.org/tpo/web/support/-/issues/312
> [2]
> https://gitlab.torproject.org/tpo/core/tor/-/commit/06a26f18727d3831339c138ccec07ea2f7935014
> [3]
> https://gitlab.torproject.org/tpo/core/tor/-/commit/6d40e980fb149549bbef5d9e80dbdf886d87d207
> [4] https://prometheus.io/docs/practices/naming/
> 
> 

-- 
RRmWZi+kxUk/ehwUda6Z6UE/zsCYNl2ts0zzPswJAPI=


signature.asc
Description: PGP signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


[tor-relays] MetricsPort: tor_relay_connections_total type confusion

2022-10-28 Thread nat--- via tor-relays
Hello David,

again, thanks for your work on adding more metrics to tor's MetricsPort!
Many relay operators will love this and documentation will be useful [1].

I reported
https://gitlab.torproject.org/tpo/core/tor/-/issues/40699
which got closed yesterday, but there was likely a misunderstanding and
the changes did not solve the underlying issue.

The core issue is: The metric called
tor_relay_connections(_total) [2][3]
contains a mix of different types (gauge, counter).

When mixing types in a single metric, the TYPE definition will always be
wrong for one or the other value.
For example grafana will show this if you use a counter metric without
rate():
"Selected metric is a counter. Consider calculating rate of counter by
adding rate()."

It is a best practice to avoid mixing different types in a single metric.
>From the prometheus best practices [4]:
"As a rule of thumb, either the sum() or the avg() over all dimensions of
a given metric should be meaningful (though not necessarily useful). If it
is not meaningful, split the data up into multiple metrics. For example,
having the capacity of various queues in one metric is good, while mixing
the capacity of a queue with the current number of elements in the queue
is not."

An idea to address the underlying issue:
One connection metric for counter and one for gauge:

- tor_relay_connections_total for counters, like the current label
state="created"
- tor_relay_connections for gauge metrics, like the current label
state="opened". "rejected" also appears to be a gauge metric.

Another nice feature of these metrics would be to have a label for what
type of system is connecting (src="relay", src="non-relay") - more on that
in yesterday's email.
A tool by toralf [4] also shows these and uses the source IP but tor
itself does not need to look at the source IP to determine the type,
something discussed in last week's relay operator meetup.

best regards,
nat

[1] https://gitlab.torproject.org/tpo/web/support/-/issues/312
[2]
https://gitlab.torproject.org/tpo/core/tor/-/commit/06a26f18727d3831339c138ccec07ea2f7935014
[3]
https://gitlab.torproject.org/tpo/core/tor/-/commit/6d40e980fb149549bbef5d9e80dbdf886d87d207
[4] https://prometheus.io/docs/practices/naming/


___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays