[MariaDB discuss] Re: possible bug in dropping max connections

Jaco Kroon via discuss Fri, 02 Aug 2024 03:15:19 -0700

Hi Sergei,

On 2024/08/02 11:30, Sergei Golubchik via discuss wrote:

Hi, Marc,


On Aug 02, Marc wrote:

If the server will delay enforcing of max_connections (that is,
the server will not reject connections about max_connections at
once), then this user in the above scenario will open all possible
connections your OS can handle and the computer will become
completely inaccessible.

The idea about this change is to have a more useful and expected
implementation of max_user_connections and max_connections.
Currently I am using max_connections not for what it is supposed to
be used, just because the max_user_connections is not doing as much
as it 'should'.

Hi Sergei, Is this something you are going to look in to? I am also
curious about this delay between first package and package with the
username. I can't imagine that being such a problem, to me this looks
feasible currently.

I'm afraid, I don't understand your use case.


There are, basically, three limits now: max_user_connections,
max_connections, OS limit.

An ordinary user would connect many times, hit max_user_connections
and stop. Or will keep connecting and get disconnects because of
max_user_connections.

A malicious user would connect and wouldn't authenticate, this will
exhaust max_connections and nobody will be able to connect to the server
anymore. max_user_connections won't help here.

I think let me explain a different way, and doesn't directly to what Iunderstand Marc's use-case to be, but relates, and what I reckon is nota bad compromise, because I get where Marc is coming from. We've hadsome interesting experiences with a remote party effectively DOS'ingthemselves from connecting to one of our haproxy instances as wellrecently (https), so not directly related.

TCP connection establishment is phase one. This is limited by operatingsystem receive queues (yes, two of them, one for SYN_RECV, and one forESTABLISHED but not yet accept()ed), but in the case of Linux theSYN_RECV queue can be exceeded if configured to use syn cookies. Badidea? Possibly as it prevents tcp options from being used, but it doesallow a connection to be established at all in case of SYN flood, soIMHO, switching to SYN cookies once SYN_RECV queue is full is a goodidea since a degraded but working connection is significantly betterthan no connection at all. This isn't mariadb specific, nor does itrelate to Marc's request but does give some level of background. It'sthe same issue under-lyingly, just at a different layer.

Once MariaDB accept()s the connection I understand MariaDB counts itagainst max_connections. If max_connections is then exceeded the newconnection is dropped. This can trivially deny service to legitimatewell-behaved clients.

This provides for a very, very simple DOS situation. Simply open aconnection from a remote side, and never send anything. EventuallyMariaDB will close this connection, not sure how long this would take,dropping the connection count again, and only now legitimate users canconnect again. As I understand, this is what Marc is experiencing. There are many reasons why this could happen under *normal operations*but you're right, this is a "badly behaving client". You're also rightthat not limiting this pre-auth would just move the problem to operatingsystem limits.

Our use-cases are controlled in most cases, we have one case whereunfortunately MariaDB needs to be world-exposed, and we've got no wayaround that, and this would apply to us here as well. Fortunately wehave other mechanisms in place to rate limit how fast untrusted sourcescan connect, which helps to mitigate this. I think one could also frontthis with a tool like haproxy which can be configured to say if theclient side doesn't send something within the first X ms of aconnection, close the connection, which could be protection layer two.

That said, I agree with Marc that the situation can be improved MariaDBside. He's worried about a mix of good and bad actors from the same IPaddress, our use-case is from different IPs, but the same underlyingproblem. MariaDB can (in my opinion) help in both cases.

I would suggest have a separate max_unauthenticated_connections counter,and an authenticate_timeout variable (no more than 2s for mostuse-cases, and I can't imagine this should be higher than 5s in anysituation).


I would probably run with something like:

max_connections = 5000
max_user_connections = 250
max_unauthenticated_connections = 500

Combining this with firewall rate-limits from untrusted sources one canthen get a fairly protected setup, so we normally do burst 100, max 1/sconnections by default, with a 5s timeout on mariadb side this permits a"bad player" maximum 100 connections initially, but over time max 5connections to DOS. Per source IP. So even with my suggestion one canrun into trouble, but at least it makes it harder. One could adjustrelevant rate limits to something like 1/min with a burst of 500, orhave small + large buckets and connections over time has to pass throughboth, but this gets complicated, and if someone wants to DOS you thatdesperately, there honestly isn't much you're going to do, but see below.

Once max_unauthenticated_connections is reached, I can think of twopossible strategies:


1.  Drop the new connection.
2.  Drop the connection we've been waiting for the longest to auth.

Each with pro's and cons. Possibly a third option would be to drop theconnection we've been waiting for longest from same source, else revertto 1 or 2.

In this scenario, I don't think it matters significantly ifunauthenticated connections counts towards max_connections or not, butmy gut would go towards not.

To further mitigate the multiple sources it would be great if we can getlogs specifically for authentication results, ie, for each incomingconnection log exactly one line indicating the source IP, and the authresult, eg:


Connection auth result: [email protected] accepted.
Connection auth result:  a.b.c.d timed out.
Connection auth result: [email protected] auth failed.

Of course a.b.c.d could also be IPv6 dead::beef, as the case may be.

One can then feed this into fail2ban or similar to mitigate further. There might be a way to log this already that I just haven't found yet,I've only spent a very superficial amount of time looking for this. Given this, we can adjust rate limits to higher limits once asuccessfully authenticated connection happens, say what our currentdefaults are, and run with even lower defaults than current (say burst10, 1/min or something). Too many auth failed or timeouts in the absenceof successful auth can be used to outright ban source IPs for some time.

After your suggestion of delayed max_connections check - an ordinary
user would still connect max_user_connections times, nohing would change
for him. A malicious user, not stopped by max_connections anymore, would
completely exhaust OS capability for opening new connections making the
whole OS inaccessible.

Bingo. You're spot on.But the current mechanism does allow for a veryeffective and trivial denial of service on any remote server.

That's what I mean - I don't understand your use case. It doesn't change
much if all users behave and it makes the situation much worse if a user
is malicious. So, in what use case your change would be an improvement?


I hope the above helped.

Kind regards,
Jaco

_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[MariaDB discuss] Re: possible bug in dropping max connections

Reply via email to