I'm seeking help from the collective wisdom of the Cyrus world.

In the past two days we have seen first a doubling, and then a quadrupling+ of badlogins to Cyrus. These appear to be coming from a botnet, in that the IPs are spread around in a way that evades fail2ban. It got so bad Friday afternoon, that we took the extraordinary step of blocking off-campus connections to IMAP (email can still be read via Webmail and the VPN).

The symptoms are that connections grow, and grow and grow until authentication slows, holding open connections longer and longer. It takes about 15 minutes for the connection number to be at a point at which service is interrupted. Friday night at attempt was made to re-enable off-campus IMAP, and the bots were still at it, service was again disrupted.

But the number of connections does not appear close to max permitted by Cyrus.

We have a Murder cluster: Three front-end servers, Two back-end servers, Two replication servers. The front-end servers are Ubuntu 14.04, Cyrus 2.4.17. The back-end and replication servers are Ubuntu 16.04, Cyrus 2.4.18. (Upgrading front-ends on the short list.)

Authentication is via saslauthd, configured to use PAM, which is using krb5. Kerberos is running on three different kerberos servers. Load on the kerberos servers is light, and the kerb-admin says nowhere close to saturated. In fact, it handled much higher numbers of authentications before imapproxy on the Webmail service. (That was years ago, previous kerb servers, so there is still the possibility the kerberos servers are somehow slowed....)

Each Front-end server is configured for 5000 imapd on 143, and 5000 on port 993. Netstat shows about 4-5,000 imap connections per front-end server when authentication slows. There are well under 5000 imapd processes of either type. And after the Friday evening test re-allowing off-campus IMAP, the network admin reported about 1600 connections to port 993 total as IMAP authentication is slowed to a crawl.

We are not close to file-max on any of the servers.

imapd.conf has a 10 second delay for a badlogin.

There are some mupdate log entries

   Thread timed out waiting for listener_lock
   Worker thread finished, for a total of 3 (2 spare)

Around the time of the Friday afternoon problems, when I was restarting Front-end servers to recover. And no mupdate log entries since. What does this mean? There are entries in syslog when mupdate is restarted, stating that it could not reset the file limit to 5k. mupdate_connections_max is 1024, so the failure to reset has no affect, unless that is the limitation. But I see no log entries indicating that.

Any other resources or limits in either Cyrus or Linux (Debian) that I should look at?

Thank you in advance for any help.

Mike


--
Michael D. Sofka               sof...@rpi.edu
C&MT Sr. Systems Programmer,   Email, TeX, Epistemology
Rensselaer Polytechnic Institute, Troy, NY.  http://www.rpi.edu/~sofkam/

----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Reply via email to