Hi Sergei,
On 2024/08/02 11:30, Sergei Golubchik via discuss wrote:
Hi, Marc,
On Aug 02, Marc wrote:
If the server will delay enforcing of max_connections (that is,
the server will not reject connections about max_connections at
once), then this user in the above scenario will open all possible
connections your OS can handle and the computer will become
completely inaccessible.
The idea about this change is to have a more useful and expected
implementation of max_user_connections and max_connections.
Currently I am using max_connections not for what it is supposed to
be used, just because the max_user_connections is not doing as much
as it 'should'.
Hi Sergei, Is this something you are going to look in to? I am also
curious about this delay between first package and package with the
username. I can't imagine that being such a problem, to me this looks
feasible currently.
I'm afraid, I don't understand your use case.
There are, basically, three limits now: max_user_connections,
max_connections, OS limit.
An ordinary user would connect many times, hit max_user_connections
and stop. Or will keep connecting and get disconnects because of
max_user_connections.
A malicious user would connect and wouldn't authenticate, this will
exhaust max_connections and nobody will be able to connect to the server
anymore. max_user_connections won't help here.
I think let me explain a different way, and doesn't directly to what I
understand Marc's use-case to be, but relates, and what I reckon is not
a bad compromise, because I get where Marc is coming from. We've had
some interesting experiences with a remote party effectively DOS'ing
themselves from connecting to one of our haproxy instances as well
recently (https), so not directly related.
TCP connection establishment is phase one. This is limited by operating
system receive queues (yes, two of them, one for SYN_RECV, and one for
ESTABLISHED but not yet accept()ed), but in the case of Linux the
SYN_RECV queue can be exceeded if configured to use syn cookies. Bad
idea? Possibly as it prevents tcp options from being used, but it does
allow a connection to be established at all in case of SYN flood, so
IMHO, switching to SYN cookies once SYN_RECV queue is full is a good
idea since a degraded but working connection is significantly better
than no connection at all. This isn't mariadb specific, nor does it
relate to Marc's request but does give some level of background. It's
the same issue under-lyingly, just at a different layer.
Once MariaDB accept()s the connection I understand MariaDB counts it
against max_connections. If max_connections is then exceeded the new
connection is dropped. This can trivially deny service to legitimate
well-behaved clients.
This provides for a very, very simple DOS situation. Simply open a
connection from a remote side, and never send anything. Eventually
MariaDB will close this connection, not sure how long this would take,
dropping the connection count again, and only now legitimate users can
connect again. As I understand, this is what Marc is experiencing.
There are many reasons why this could happen under *normal operations*
but you're right, this is a "badly behaving client". You're also right
that not limiting this pre-auth would just move the problem to operating
system limits.
Our use-cases are controlled in most cases, we have one case where
unfortunately MariaDB needs to be world-exposed, and we've got no way
around that, and this would apply to us here as well. Fortunately we
have other mechanisms in place to rate limit how fast untrusted sources
can connect, which helps to mitigate this. I think one could also front
this with a tool like haproxy which can be configured to say if the
client side doesn't send something within the first X ms of a
connection, close the connection, which could be protection layer two.
That said, I agree with Marc that the situation can be improved MariaDB
side. He's worried about a mix of good and bad actors from the same IP
address, our use-case is from different IPs, but the same underlying
problem. MariaDB can (in my opinion) help in both cases.
I would suggest have a separate max_unauthenticated_connections counter,
and an authenticate_timeout variable (no more than 2s for most
use-cases, and I can't imagine this should be higher than 5s in any
situation).
I would probably run with something like:
max_connections = 5000
max_user_connections = 250
max_unauthenticated_connections = 500
Combining this with firewall rate-limits from untrusted sources one can
then get a fairly protected setup, so we normally do burst 100, max 1/s
connections by default, with a 5s timeout on mariadb side this permits a
"bad player" maximum 100 connections initially, but over time max 5
connections to DOS. Per source IP. So even with my suggestion one can
run into trouble, but at least it makes it harder. One could adjust
relevant rate limits to something like 1/min with a burst of 500, or
have small + large buckets and connections over time has to pass through
both, but this gets complicated, and if someone wants to DOS you that
desperately, there honestly isn't much you're going to do, but see below.
Once max_unauthenticated_connections is reached, I can think of two
possible strategies:
1. Drop the new connection.
2. Drop the connection we've been waiting for the longest to auth.
Each with pro's and cons. Possibly a third option would be to drop the
connection we've been waiting for longest from same source, else revert
to 1 or 2.
In this scenario, I don't think it matters significantly if
unauthenticated connections counts towards max_connections or not, but
my gut would go towards not.
To further mitigate the multiple sources it would be great if we can get
logs specifically for authentication results, ie, for each incoming
connection log exactly one line indicating the source IP, and the auth
result, eg:
Connection auth result: [email protected] accepted.
Connection auth result: a.b.c.d timed out.
Connection auth result: [email protected] auth failed.
Of course a.b.c.d could also be IPv6 dead::beef, as the case may be.
One can then feed this into fail2ban or similar to mitigate further.
There might be a way to log this already that I just haven't found yet,
I've only spent a very superficial amount of time looking for this.
Given this, we can adjust rate limits to higher limits once a
successfully authenticated connection happens, say what our current
defaults are, and run with even lower defaults than current (say burst
10, 1/min or something). Too many auth failed or timeouts in the absence
of successful auth can be used to outright ban source IPs for some time.
After your suggestion of delayed max_connections check - an ordinary
user would still connect max_user_connections times, nohing would change
for him. A malicious user, not stopped by max_connections anymore, would
completely exhaust OS capability for opening new connections making the
whole OS inaccessible.
Bingo. You're spot on.But the current mechanism does allow for a very
effective and trivial denial of service on any remote server.
That's what I mean - I don't understand your use case. It doesn't change
much if all users behave and it makes the situation much worse if a user
is malicious. So, in what use case your change would be an improvement?
I hope the above helped.
Kind regards,
Jaco
_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]