Re: [Dovecot] mysql auth failover failing
On Sat, 2012-07-28 at 19:53 +0300, Timo Sirainen wrote: It's in my TODO, but I don't know when I'll get around to implementing it. So many things to do right now.. Thanks signature.asc Description: This is a digitally signed message part
Re: [Dovecot] mysql auth failover failing
As per this discussion almost a year ago, was there any attempt to introduce failover mode planned Timo? On Mon, 2011-09-12 at 13:26 -0700, Paul B. Henson wrote: Perhaps it could be an option, either load balancing between all available servers, or only using later listed servers when the earlier listed ones are failing. For my purposes, either way is fine, as long as authentications don't fail :). The other contributor to this thread, who has a local mysql replica listed first and the central master listed second probably wouldn't want the load balanced between them. signature.asc Description: This is a digitally signed message part
Re: [Dovecot] mysql auth failover failing
On 9/16/2011 2:21 AM, Timo Sirainen wrote: As far as I can tell, it still doesn't do load balancing. Oh. http://hg.dovecot.org/dovecot-2.0/rev/327698228158 should finally fix it. :) I installed the new 2.0.15 release including this change, and can confirm it does now successfully load balance across my two servers. Not only that, but with this change, there are no failed authentications at all when one of the servers goes away :). I have it running on one of my three production mail servers now, and barring any unexpected issues will deploy it on the other two next week, and then we'll be sitting pretty ;). Thanks again... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] mysql auth failover failing
On Fri, 2011-09-09 at 19:33 -0700, Paul B. Henson wrote: Sep 9 15:47:34 tweak dovecot: auth: Error: mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - waiting for 1 seconds before retry I did several fixes related to this in v2.0 hg. And postfix starts to fail authentications: Sep 9 15:47:35 tweak postfix/smtpd[5119]: warning: bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5 authentication failed: Connection lost to authentication server The reason why it kept failing with Postfix was because Dovecot had 10 second timeout for SQL connecting, and Postfix had 10 second timeout before failing authentication. So Postfix never waited long enough for Dovecot to attempt a second connection to the second MySQL server. I dropped Dovecot's SQL connect timeout to 5 seconds. Now and again the authentication process dies: Sep 9 15:47:39 tweak dovecot: auth: Panic: file auth-request-handler.c: line 697 (auth_request_handler_flush_failures): assertion failed: (auth_request-state == AUTH_REQUEST_STATE_FINISHED) This happened only with non-plaintext authentication (e.g. DIGEST-MD5). Fixed also.
Re: [Dovecot] mysql auth failover failing
Am 15.09.2011 12:53, schrieb Timo Sirainen: On Fri, 2011-09-09 at 19:33 -0700, Paul B. Henson wrote: Sep 9 15:47:34 tweak dovecot: auth: Error: mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - waiting for 1 seconds before retry I did several fixes related to this in v2.0 hg. And postfix starts to fail authentications: Sep 9 15:47:35 tweak postfix/smtpd[5119]: warning: bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5 authentication failed: Connection lost to authentication server The reason why it kept failing with Postfix was because Dovecot had 10 second timeout for SQL connecting, and Postfix had 10 second timeout before failing authentication. So Postfix never waited long enough for Dovecot to attempt a second connection to the second MySQL server. I dropped Dovecot's SQL connect timeout to 5 seconds. Now and again the authentication process dies: Sep 9 15:47:39 tweak dovecot: auth: Panic: file auth-request-handler.c: line 697 (auth_request_handler_flush_failures): assertion failed: (auth_request-state == AUTH_REQUEST_STATE_FINISHED) This happened only with non-plaintext authentication (e.g. DIGEST-MD5). Fixed also. Hi Timo, silly question is there really a native failover mysql in dovecot ? cant remember this , i only remember this as part of dovecot proxiing -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
Re: [Dovecot] mysql auth failover failing
On Thu, 2011-09-15 at 13:39 +0200, Robert Schetterer wrote: is there really a native failover mysql in dovecot ? cant remember this , i only remember this as part of dovecot proxiing For SQL authentication it can use multiple SQL server hosts (with both MySQL and PostgreSQL) and do HA/load balancing.
Re: [Dovecot] mysql auth failover failing
Am 15.09.2011 13:43, schrieb Timo Sirainen: On Thu, 2011-09-15 at 13:39 +0200, Robert Schetterer wrote: is there really a native failover mysql in dovecot ? cant remember this , i only remember this as part of dovecot proxiing For SQL authentication it can use multiple SQL server hosts (with both MySQL and PostgreSQL) and do HA/load balancing. ok, i see, but i have nearly all possible parameters in mysql ( i use a mysql cluster ), thx anyway for answer -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
Re: [Dovecot] mysql auth failover failing
On 9/15/2011 3:53 AM, Timo Sirainen wrote: I did several fixes related to this in v2.0 hg. I patched version 2.0.13 with these fixes and tested it out. As far as I can tell, it still doesn't do load balancing. When started, it only connects to the primary server, and as long as that server is available never seems to try and connect to the other one. However, the failover is much better. There are a few failed authentications when the primary server first becomes unavailable (seems to depend on load; under a light load, only a couple fail, the heavier the load, the more fail). After that blip though, authentications work fine. Thanks much for your help resolving this issue, I greatly appreciate it. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] mysql auth failover failing
On Mon, 2011-09-12 at 13:26 -0700, Paul B. Henson wrote: I'll try to debug this soon. Thanks; let me know if there's anything I could do to help, or if there are any potential fixes you would like tested. I can't seem to be able to reproduce this. It always connects to the second MySQL without any user visible errors. What does it log with the attached debug patch? diff -r d635bcf35df7 src/auth/main.c --- a/src/auth/main.c Tue Sep 13 02:09:02 2011 +0300 +++ b/src/auth/main.c Tue Sep 13 12:43:16 2011 +0300 @@ -9,6 +9,7 @@ #include child-wait.h #include sql-api.h #include module-dir.h +#include hostpid.h #include randgen.h #include process-title.h #include settings-parser.h @@ -282,6 +283,8 @@ while ((c = master_getopt(master_service)) 0) { switch (c) { case 'w': + master_service_init_log(master_service, +t_strdup_printf(auth-worker(%s): , my_pid)); worker = TRUE; break; default: diff -r d635bcf35df7 src/lib-sql/driver-sqlpool.c --- a/src/lib-sql/driver-sqlpool.c Tue Sep 13 02:09:02 2011 +0300 +++ b/src/lib-sql/driver-sqlpool.c Tue Sep 13 12:43:16 2011 +0300 @@ -172,6 +172,7 @@ static void sqlpool_reconnect(struct sql_db *conndb) { timeout_remove(conndb-to_reconnect); + i_debug(reconnecting from timeout); (void)sql_connect(conndb); } @@ -194,6 +195,7 @@ *host_idx_r = i; } } + i_debug(%u has least connections (%u), *host_idx_r, min-connection_count); return min; } @@ -231,10 +233,15 @@ /* if we have zero successful hosts and there still are hosts without connections, connect to one of them. */ + i_debug(connection failed); if (!sqlpool_have_successful_connections(db)) { host = sqlpool_find_host_with_least_connections(db, host_idx); - if (host-connection_count == 0) + if (host-connection_count == 0) { + i_debug( - none successful, adding %u, host_idx); (void)sqlpool_add_connection(db, host, host_idx); + } else { + i_debug( - none successful, already added all); + } } } @@ -264,6 +271,7 @@ struct sqlpool_connection *conn; host-connection_count++; + i_debug(%u adding, host_idx); conndb = db-driver-v.init(host-connect_string); i_array_init(conndb-module_contexts, 5); @@ -311,11 +319,13 @@ if (!SQL_DB_IS_READY(conndb)) { /* see if we could reconnect to it immediately */ + i_debug(%u trying to connect, conns[idx].host_idx); (void)sql_connect(conndb); } if (SQL_DB_IS_READY(conndb)) { db-last_query_conn_idx = idx; *all_disconnected_r = FALSE; + i_debug(%u is ready, conns[idx].host_idx); return conns[idx]; } if (conndb-state != SQL_DB_STATE_DISCONNECTED) @@ -333,6 +343,7 @@ unsigned int i, count; bool all_disconnected; + i_debug(sql pool getting connection); conn = sqlpool_find_available_connection(db, unwanted_host_idx, all_disconnected); if (conn == NULL unwanted_host_idx != -1U) { @@ -355,11 +366,16 @@ } if (conn == NULL) { /* still nothing. try creating new connections */ + i_debug(nothing, adding more); conn = sqlpool_add_new_connection(db); - if (conn != NULL) + if (conn != NULL) { + i_debug( - and connecting); (void)sql_connect(conn-db); - if (conn == NULL || !SQL_DB_IS_READY(conn-db)) + } + if (conn == NULL || !SQL_DB_IS_READY(conn-db)) { + i_debug( - not ready); return FALSE; + } } *conn_r = conn; return TRUE; @@ -509,10 +525,13 @@ const struct sqlpool_connection *conn; int ret = -1, ret2; + i_debug(connecting to first available connection); array_foreach(db-all_connections, conn) { ret2 = sql_connect(conn-db); - if (ret2 0) + if (ret2 0) { + i_debug(%u connected, conn-host_idx); return 1; + } if (ret2 == 0) ret = 0; }
Re: [Dovecot] mysql auth failover failing
On Fri, 2011-09-09 at 19:33 -0700, Paul B. Henson wrote: According to the sample SQL configuration file HA / round-robin load-balancing is supported by giving multiple host settings, like: host=sql1.host.org host=sql2.host.org. However, as far as I can tell dovecot only connects to the first listed host, and processes all queries through it, there does not appear to be any load-balancing going on. The current code creates connection to the second server only when the first connection is already busy with an SQL query, or when it's not working. Once there are more connections, it starts doing round robin lookups. This works okay enough with PostgreSQL because it does asynchronous lookups, so two simultaneous lookups create a second connection. MySQL does synchronous lookups though, so the second connection is normally never created. I suppose the fix to this would be to always connect to all SQL servers at startup. That's not necessarily a dealbreaker; however, high-availability does not appear to be working either. If I shutdown the first mysql server, dovecot starts to log connection failures: Sep 9 15:47:34 tweak dovecot: auth: Error: mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - waiting for 1 seconds before retry Sep 9 15:47:39 tweak dovecot: auth: Error: mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - waiting for 25 seconds before retry Those are intentional. And postfix starts to fail authentications: Sep 9 15:47:35 tweak postfix/smtpd[5119]: warning: bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5 authentication failed: Connection lost to authentication server It should have created the second connection here and not fail.. Now and again the authentication process dies: Sep 9 15:47:39 tweak dovecot: auth: Panic: file auth-request-handler.c: line 697 (auth_request_handler_flush_failures): assertion failed: (auth_request-state == AUTH_REQUEST_STATE_FINISHED) And this of course shouldn't happen either. Requests start to pile up: Sep 9 15:51:46 tweak dovecot: auth: Warning: auth workers: Auth request was queued for 25 seconds, 45 left in queue Lookups time out: Sep 9 15:57:22 tweak dovecot: auth: Error: auth worker: Aborted request: Lookup timed out These are the result of the previous failures. This occasionally pops up: Sep 9 15:58:38 tweak dovecot: auth: Fatal: net_connect_unix(auth-worker) failed: Resource temporarily unavailable Probably this too. And sometimes the auth process gets temporarily disabled: Sep 9 15:58:57 tweak dovecot: master: Error: service(auth): command startup failed, throttling Most likely related to the crash, although I think this still shouldn't have happened. I don't think all authentications fail during the scenario, but I think the majority do. Based on the network traffic, dovecot is almost continuously trying to connect to the first listed server. It sometimes connects to the second listed server, but when it does, the connection does not persist, it goes away almost immediately. There are multiple auth-worker processes, each one having their own internal MySQL connections with separate retry counters. I'll try to debug this soon.
Re: [Dovecot] mysql auth failover failing
On 9/12/2011 5:30 AM, Timo Sirainen wrote: This works okay enough with PostgreSQL because it does asynchronous lookups, so two simultaneous lookups create a second connection. MySQL does synchronous lookups though, so the second connection is normally never created. If I could, I think I'd rather run postgres; but so many things only support mysql you can't really get away with running only postgres, and it's not worth the effort to run two separate sql services sigh. I suppose the fix to this would be to always connect to all SQL servers at startup. Perhaps it could be an option, either load balancing between all available servers, or only using later listed servers when the earlier listed ones are failing. For my purposes, either way is fine, as long as authentications don't fail :). The other contributor to this thread, who has a local mysql replica listed first and the central master listed second probably wouldn't want the load balanced between them. It should have created the second connection here and not fail.. Based on the network traffic, it is really pounding the primary trying to connect, and occasionally connecting to the secondary only to immediately disconnect after either only one or very few queries. I'll try to debug this soon. Thanks; let me know if there's anything I could do to help, or if there are any potential fixes you would like tested. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] mysql auth failover failing
On Sat, Sep 10, 2011 at 01:49:59AM -0700, Noel Butler wrote: Sounds like you have bigger issues, maybe relating as to why the primary fails? For testing purposes, it fails because I stick a firewall rule in place preventing access to it ;). In production, it came to our attention because a hardware failure required downtime on one of the mysql servers to replace parts, and we received complaints of failed authentications while it was down. In general, both are up, but things using them need to be able to survive when one is down. primary (local slave copy) has gone away unless I'm deliberately upgrading mysql ) when doing so (tested) it hits the master server (as in secondary host=) right away, no auth failures. Hmm, what version of dovecot are you using? In version 1 failover seems to work if the primary returns connection refused (which your scenario would). In version 2, it seems flaky for both connection refused and connection timed out. Unless I've got something misconfigured, but there doesn't seem to be that much to it... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] mysql auth failover failing
On Fri, 2011-09-09 at 20:16 -0700, Paul B. Henson wrote: On Fri, Sep 09, 2011 at 08:02:57PM -0700, Noel Butler wrote: suggest, having just one master server, after all, dovecot and postfix just need to read, not alter/update/insert etc. True; but the pieces that are altering/updating/inserting the data that postfix/dovecot need to read need redundancy as well :). Yep, depends on your network design I suppose, I rather leave the front ends to be just that, with all interactions with master DB server and the NAS done via second interface on a dedicated private LAN so those nasty bored teenagers out there can't get near it :) yep thats correct because it has gone away but it still uses the second host immediately, thats just dovecot trying to re-establish its link with primary Based on my testing, it doesn't use the second host immediately, but only sporadically, with most of the authentications failing. Sounds like you have bigger issues, maybe relating as to why the primary fails? err postfix is not dovecot, you need to also add failover in postfix's sql lookup commands postfix relies on dovecot for authentication, this postfix error message is the result of dovecot not successfully processing an authentication request. postfix itself handles mysql failure well, it both load balances queries across both servers and also continues to function when one isn't available. my bad, I did see that and it is as how I do it (i'm not all there at present, had the flu for a week g) but I never had a situation where primary (local slave copy) has gone away unless I'm deliberately upgrading mysql ) when doing so (tested) it hits the master server (as in secondary host=) right away, no auth failures. Cheers attachment: face-smile.png signature.asc Description: This is a digitally signed message part
Re: [Dovecot] mysql auth failover failing
On Fri, 2011-09-09 at 19:33 -0700, Paul B. Henson wrote: default_pass_scheme = PLAIN Uhg i'll pretend I didnt see that :) According to the sample SQL configuration file HA / round-robin load-balancing is supported by giving multiple host settings, like: host=sql1.host.org host=sql2.host.org. However, as far as I can tell dovecot only connects to the first listed host, and processes all queries through it, there does not appear to be any load-balancing going on. I suspect the wording here is incorrect, its just a failover AFAIK, it only hits the first entry failing to second if no response. HA would be like running a mysql slave on all the front ends failing over to the master on your CRM server etc, which is what I do and suggest, having just one master server, after all, dovecot and postfix just need to read, not alter/update/insert etc. That's not necessarily a dealbreaker; however, high-availability does not appear to be working either. If I shutdown the first mysql server, dovecot starts to log connection failures: Sep 9 15:47:34 tweak dovecot: auth: Error: mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - waiting for 1 seconds before retry Sep 9 15:47:39 tweak dovecot: auth: Error: mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - waiting for 25 seconds before retry yep thats correct because it has gone away but it still uses the second host immediately, thats just dovecot trying to re-establish its link with primary And postfix starts to fail authentications: err postfix is not dovecot, you need to also add failover in postfix's sql lookup commands hosts = unix:/var/run/mysql/mysql.sock 10.10.10.2 (assuming .2 is your master sql server) Resulting in a complete unavailability of smtp service, not just unavailability of authenticated services. You could have a higher sec mx smtp box that uses postfix for virtual transport for cases of if dovecot is unavailable, this of course means storing partial paths in your mail db, for use only by that one non-behind-load-balancer separated sec mx, of course this wont solve users issue of sending unless you have multiple smtp behind a load balancer, but allows for inbound still, depends on how big your setup (and budget) is or can be :) (note: I talk of load balancer as in real hardware device, not as in pretend LB's as in software) Does the example sql config have incorrect information? I suspect so. attachment: face-smile.png signature.asc Description: This is a digitally signed message part