We are running dovecot to provide authentication for postfix, using two mysql servers in a multi-master replication set as the password source:

----------------------------------------
# 2.0.13: /etc/dovecot/dovecot.conf
# OS: Linux 2.6.37-gentoo-r4 x86_64 Gentoo Base System release 2.0.2
auth_mechanisms = plain login digest-md5 cram-md5
auth_verbose = yes
passdb {
  args = /etc/dovecot/dovecot-sql.conf
  driver = sql
}
protocols = none
service auth-worker {
  unix_listener auth-worker {
    user = postfix
  }
  user = $default_internal_user
}
service auth {
  unix_listener /var/spool/postfix/private/auth {
    group = postfix
    mode = 0660
    user = postfix
  }
  user = postfix
}
ssl = no
userdb {
  driver = passwd
}
---------------------------------------

With an sql config of:

-------------------------
driver = mysql
connect = host=mysql-1.unx.csupomona.edu host=mysql-2.unx.csupomona.edu dbname=idmgmt user=postfix password=XXXXXXX
default_pass_scheme = PLAIN
password_query = XXXXXXXXX
-------------------------

According to the sample SQL configuration file "HA / round-robin load-balancing is supported by giving multiple host settings, like: host=sql1.host.org host=sql2.host.org".

However, as far as I can tell dovecot only connects to the first listed host, and processes all queries through it, there does not appear to be any load-balancing going on.

That's not necessarily a dealbreaker; however, high-availability does not appear to be working either.

If I shutdown the first mysql server, dovecot starts to log connection failures:

Sep 9 15:47:34 tweak dovecot: auth: Error: mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - waiting for 1 seconds before retry

Sep 9 15:47:39 tweak dovecot: auth: Error: mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - waiting for 25 seconds before retry

And postfix starts to fail authentications:

Sep 9 15:47:35 tweak postfix/smtpd[5119]: warning: bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5 authentication failed: Connection lost to authentication server

Now and again the authentication process dies:

Sep 9 15:47:39 tweak dovecot: auth: Panic: file auth-request-handler.c: line 697 (auth_request_handler_flush_failures): assertion failed: (auth_request->state == AUTH_REQUEST_STATE_FINISHED) Sep 9 15:47:39 tweak dovecot: auth: Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x3f71a) [0x7f25822ca71a] -> /usr/lib64/dovecot/libdovecot.so.0(+0x3f766) [0x7f25822ca766] -> /usr/lib64/dovecot/libdovecot.so.0(+0x198ca) [0x7f25822a48ca] -> dovecot/auth() [0x4137f4] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handle_timeouts+0xd4) [0x7f25822d5fe4] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0x5b) [0x7f25822d6bcb] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28) [0x7f25822d5c48] -> /usr/lib64/dovecot/libdovecot.so.0(master_service_run+0x13) [0x7f25822c3de3] -> dovecot/auth(main+0x2be) [0x4179de] -> /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f2581898bbd] -> dovecot/auth() [0x40bdc9] Sep 9 15:47:39 tweak dovecot: master: Error: service(auth): child 4154 killed with signal 6 (core dumps disabled)

Requests start to pile up:

Sep 9 15:51:46 tweak dovecot: auth: Warning: auth workers: Auth request was queued for 25 seconds, 45 left in queue

Lookups time out:

Sep 9 15:57:22 tweak dovecot: auth: Error: auth worker: Aborted request: Lookup timed out

This occasionally pops up:

Sep 9 15:58:38 tweak dovecot: auth: Fatal: net_connect_unix(auth-worker) failed: Resource temporarily unavailable

And sometimes the auth process gets temporarily disabled:

Sep 9 15:58:57 tweak dovecot: master: Error: service(auth): command startup failed, throttling

Resulting in more postfix authentication failures:

Sep 9 15:58:57 tweak postfix/smtpd[6531]: warning: bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5 authentication failed: Sep 9 15:59:08 tweak postfix/smtpd[6551]: fatal: no SASL authentication mechanisms

To the point where postfix also temporarily throttles smtpd:

Sep 9 15:59:21 tweak postfix/master[6526]: warning: /usr/lib64/postfix/smtpd: bad command startup -- throttling

Resulting in a complete unavailability of smtp service, not just unavailability of authenticated services.


I don't think all authentications fail during the scenario, but I think the majority do. Based on the network traffic, dovecot is almost continuously trying to connect to the first listed server. It sometimes connects to the second listed server, but when it does, the connection does not persist, it goes away almost immediately.


Ideally, I would like no authentications to fail if one of the MySQL servers is unavailable. If a few fail just when the server dies, that would be undesirable but acceptable as long as they do not continuously fail while the server is down.

Am I doing something wrong? Does the example sql config have incorrect information?

We were previously running dovecot 1.2.11, we just recently upgraded to 2. In the previous version, we actually had two different passdb's configured, each one listing only one of the mysql servers. I seem to recall that was the recommendation at the time for high-availability. When that configuration did not seem to work under version 2, I found an updated recommendation to list both servers in the same passdb, which also does not appear to work correctly. I actually went back and tested the older version, and determined it seemed to work okay in the case where the server was up but the service was down, and connections were refused, but also failed a large number of authentication attempts when the server was completely down and connections were timing out.

Thanks much...

--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768

Reply via email to