[tor-relays] DoS attacks -- status update
Hello everyone! It's been a while since we last provided some update on this mailing list about our ongoing work fighting several DoS attacks. We can use the attached graph about detected overload over the last couple of months to show what is going on and what we do/plan to do about it. The first noteworthy incident on that graph is the sharp rise in overloaded non-exit nodes since the middle of July caused by a drop of onionskins[1] which the relay's available CPU/memory can't handle anymore. There are currently two ideas we are working on to cope with such a flood of onionskins: 1. Developing a Proof of Work (PoW) system to has a rate-limitation knob rejecting the load of onionskins while letting legitimate ones through.[2] We still need to solve some design issues (feel free to help!) but hope to have that feature integrated into Tor soon. 2. Relay operators started to experiment with iptables/nftables rules and having the right ones available might be a good stopgap approach against the onionskin related DoS. We are coordinating that effort[3], so we have something available to propose to the wider community, which is kept up-to-date and limits the risks of traffic overblocking. Feel free to help as well with that effort. The other noteworthy incident started around September 13 when exit nodes began to get overloaded (while the other DoS was and is still ongoing). Unfortunately, that exit related DoS is heavily impacting our users' experience as can be seen in our OnionPerf data[4]. While we are still investigating the nature of that DoS attack it turns out that blocking particular IP addresses with ExitPolicy rules seems to help on exit nodes this is tested on. The Artikel10 exit node operators provided even a script recently[5] to help with that (much appreciated, thanks!). This approach is highly experimental at this point and it might help us at least to come up with an actual design idea to counter that particular exit DoS. Thanks, Georg [1] For information about overload in general and what "drop of onionskins" means, see: https://support.torproject.org/relay-operators/#relay-bridge-overloaded. It contains as well a guide on how to enable MetricsPort monitoring yourself so you see the actual metrics of your own relay. [2] https://gitlab.torproject.org/tpo/core/tor/-/issues/40634 [3] https://gitlab.torproject.org/tpo/community/support/-/issues/40093 [4] https://metrics.torproject.org/torperf.html [5] https://lists.torproject.org/pipermail/tor-relays/2022-October/020848.html OpenPGP_signature Description: OpenPGP digital signature ___ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Re: [tor-relays] MetricsPort: tor_relay_connections_total type confusion
to remind ourselves about the prometheus types and their differences and when to use what: https://prometheus.io/docs/concepts/metric_types/ > The patch I got in makes all the tor_relay_connections_total{} metrics > "gauges" because in effect, some can go up and down and some might only go > up but I figured even the one that only accumulate can also be gauges. you are right that prometheus will scrape it anyway even if the type is incorrectly defined but by defining a counter as gauge (or vice versa) you tell people what it is and how to make sense of it, for example the type implies "do (not) use rate() on this". By looking at # TYPE tor_relay_connections_total gauge [...] Anyone familiar with prometheus would expect it to be a gauge - so for example would not use rate() on it - but that is not correct for all labels. For example creating graphs for state="created" without using rate() would produce boring graphs because it is a counter. They might also wonder why a gauge ends with .._total, because that is used for accumulating count [1]. > Is that a problem to your knowledge from a Prometheus standpoint? It is a problem because prometheus users/tools are used to consume that TYPE line to learn what type it is. They will get confused if they treat it according to the current type definition. graphing counters as if they were gauges will result in questions like "oh why is my rate of x always increasing?" before: # TYPE tor_relay_connections counter tor_relay_connections{type="OR",direction="initiated",state="opened"} but that value is not a counter, it is a gauge. It can go down. after yesterday's change: # TYPE tor_relay_connections_total gauge tor_relay_connections_total{type="OR",direction="received",state="created"} but that is a counter, it can never go down. To prevent this and to follow the usual prometheus practices is best to have the current metric split into one for counters and another one for gauges. thanks for working on this! best regards, nat [1] https://prometheus.io/docs/practices/naming/ ___ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
[tor-relays] Upcoming OpenSSL 3 security bugfix release
Hello relay operators! If you are running OpenSSL 3 please be aware that you might need to upgrade it to 3.0.7 as fast as possible on next Tuesday (Nov 1). In an announcement[1] it got said 3.0.7 fixes a CRITICAL security issue. Operators running on the 1.1 series are fine as far as we know so far. Georg [1] https://www.openwall.com/lists/oss-security/2022/10/25/4 OpenPGP_signature Description: OpenPGP digital signature ___ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Re: [tor-relays] MetricsPort: tor_relay_connections_total type confusion
On 28 Oct (11:04:09), n...@danwin1210.de wrote: > Hello David, > > again, thanks for your work on adding more metrics to tor's MetricsPort! > Many relay operators will love this and documentation will be useful [1]. > > I reported > https://gitlab.torproject.org/tpo/core/tor/-/issues/40699 > which got closed yesterday, but there was likely a misunderstanding and > the changes did not solve the underlying issue. > > The core issue is: The metric called > tor_relay_connections(_total) [2][3] > contains a mix of different types (gauge, counter). The patch I got in makes all the tor_relay_connections_total{} metrics "gauges" because in effect, some can go up and down and some might only go up but I figured even the one that only accumulate can also be gauges. Is that a problem to your knowledge from a Prometheus standpoint? Cheers! David > > When mixing types in a single metric, the TYPE definition will always be > wrong for one or the other value. > For example grafana will show this if you use a counter metric without > rate(): > "Selected metric is a counter. Consider calculating rate of counter by > adding rate()." > > It is a best practice to avoid mixing different types in a single metric. > From the prometheus best practices [4]: > "As a rule of thumb, either the sum() or the avg() over all dimensions of > a given metric should be meaningful (though not necessarily useful). If it > is not meaningful, split the data up into multiple metrics. For example, > having the capacity of various queues in one metric is good, while mixing > the capacity of a queue with the current number of elements in the queue > is not." > > An idea to address the underlying issue: > One connection metric for counter and one for gauge: > > - tor_relay_connections_total for counters, like the current label > state="created" > - tor_relay_connections for gauge metrics, like the current label > state="opened". "rejected" also appears to be a gauge metric. > > Another nice feature of these metrics would be to have a label for what > type of system is connecting (src="relay", src="non-relay") - more on that > in yesterday's email. > A tool by toralf [4] also shows these and uses the source IP but tor > itself does not need to look at the source IP to determine the type, > something discussed in last week's relay operator meetup. > > best regards, > nat > > [1] https://gitlab.torproject.org/tpo/web/support/-/issues/312 > [2] > https://gitlab.torproject.org/tpo/core/tor/-/commit/06a26f18727d3831339c138ccec07ea2f7935014 > [3] > https://gitlab.torproject.org/tpo/core/tor/-/commit/6d40e980fb149549bbef5d9e80dbdf886d87d207 > [4] https://prometheus.io/docs/practices/naming/ > > -- RRmWZi+kxUk/ehwUda6Z6UE/zsCYNl2ts0zzPswJAPI= signature.asc Description: PGP signature ___ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
[tor-relays] MetricsPort: tor_relay_connections_total type confusion
Hello David, again, thanks for your work on adding more metrics to tor's MetricsPort! Many relay operators will love this and documentation will be useful [1]. I reported https://gitlab.torproject.org/tpo/core/tor/-/issues/40699 which got closed yesterday, but there was likely a misunderstanding and the changes did not solve the underlying issue. The core issue is: The metric called tor_relay_connections(_total) [2][3] contains a mix of different types (gauge, counter). When mixing types in a single metric, the TYPE definition will always be wrong for one or the other value. For example grafana will show this if you use a counter metric without rate(): "Selected metric is a counter. Consider calculating rate of counter by adding rate()." It is a best practice to avoid mixing different types in a single metric. >From the prometheus best practices [4]: "As a rule of thumb, either the sum() or the avg() over all dimensions of a given metric should be meaningful (though not necessarily useful). If it is not meaningful, split the data up into multiple metrics. For example, having the capacity of various queues in one metric is good, while mixing the capacity of a queue with the current number of elements in the queue is not." An idea to address the underlying issue: One connection metric for counter and one for gauge: - tor_relay_connections_total for counters, like the current label state="created" - tor_relay_connections for gauge metrics, like the current label state="opened". "rejected" also appears to be a gauge metric. Another nice feature of these metrics would be to have a label for what type of system is connecting (src="relay", src="non-relay") - more on that in yesterday's email. A tool by toralf [4] also shows these and uses the source IP but tor itself does not need to look at the source IP to determine the type, something discussed in last week's relay operator meetup. best regards, nat [1] https://gitlab.torproject.org/tpo/web/support/-/issues/312 [2] https://gitlab.torproject.org/tpo/core/tor/-/commit/06a26f18727d3831339c138ccec07ea2f7935014 [3] https://gitlab.torproject.org/tpo/core/tor/-/commit/6d40e980fb149549bbef5d9e80dbdf886d87d207 [4] https://prometheus.io/docs/practices/naming/ ___ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays