Re: [Dovecot] mysql auth failover failing

2012-07-28 Thread Noel Butler
On Sat, 2012-07-28 at 19:53 +0300, Timo Sirainen wrote:

 It's in my TODO, but I don't know when I'll get around to implementing it. So 
 many things to do right now..
 


Thanks




signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] mysql auth failover failing

2012-07-25 Thread Noel Butler
As per this discussion almost a year ago, was there any attempt to
introduce failover mode planned Timo?


On Mon, 2011-09-12 at 13:26 -0700, Paul B. Henson wrote:


 
 Perhaps it could be an option, either load balancing between all
 available servers, or only using later listed servers when the earlier
 listed ones are failing. For my purposes, either way is fine, as long as
 authentications don't fail :). The other contributor to this thread, who
 has a local mysql replica listed first and the central master listed
 second probably wouldn't want the load balanced between them.




signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] mysql auth failover failing

2011-09-16 Thread Paul B. Henson

On 9/16/2011 2:21 AM, Timo Sirainen wrote:


As far as I can tell, it still doesn't do load balancing.


Oh. http://hg.dovecot.org/dovecot-2.0/rev/327698228158 should finally
fix it. :)


I installed the new 2.0.15 release including this change, and can 
confirm it does now successfully load balance across my two servers. Not 
only that, but with this change, there are no failed authentications at 
all when one of the servers goes away :). I have it running on one of my 
three production mail servers now, and barring any unexpected issues 
will deploy it on the other two next week, and then we'll be sitting 
pretty ;).


Thanks again...


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] mysql auth failover failing

2011-09-15 Thread Timo Sirainen
On Fri, 2011-09-09 at 19:33 -0700, Paul B. Henson wrote:

 Sep  9 15:47:34 tweak dovecot: auth: Error: 
 mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): 
 Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - 
 waiting for 1 seconds before retry

I did several fixes related to this in v2.0 hg.

 And postfix starts to fail authentications:
 
 Sep  9 15:47:35 tweak postfix/smtpd[5119]: warning: 
 bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5 
 authentication failed: Connection lost to authentication server

The reason why it kept failing with Postfix was because Dovecot had 10
second timeout for SQL connecting, and Postfix had 10 second timeout
before failing authentication. So Postfix never waited long enough for
Dovecot to attempt a second connection to the second MySQL server. I
dropped Dovecot's SQL connect timeout to 5 seconds.

 Now and again the authentication process dies:
 
 Sep  9 15:47:39 tweak dovecot: auth: Panic: file auth-request-handler.c: 
 line 697 (auth_request_handler_flush_failures): assertion failed: 
 (auth_request-state == AUTH_REQUEST_STATE_FINISHED)

This happened only with non-plaintext authentication (e.g. DIGEST-MD5).
Fixed also.



Re: [Dovecot] mysql auth failover failing

2011-09-15 Thread Robert Schetterer
Am 15.09.2011 12:53, schrieb Timo Sirainen:
 On Fri, 2011-09-09 at 19:33 -0700, Paul B. Henson wrote:
 
 Sep  9 15:47:34 tweak dovecot: auth: Error: 
 mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): 
 Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - 
 waiting for 1 seconds before retry
 
 I did several fixes related to this in v2.0 hg.
 
 And postfix starts to fail authentications:

 Sep  9 15:47:35 tweak postfix/smtpd[5119]: warning: 
 bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5 
 authentication failed: Connection lost to authentication server
 
 The reason why it kept failing with Postfix was because Dovecot had 10
 second timeout for SQL connecting, and Postfix had 10 second timeout
 before failing authentication. So Postfix never waited long enough for
 Dovecot to attempt a second connection to the second MySQL server. I
 dropped Dovecot's SQL connect timeout to 5 seconds.
 
 Now and again the authentication process dies:

 Sep  9 15:47:39 tweak dovecot: auth: Panic: file auth-request-handler.c: 
 line 697 (auth_request_handler_flush_failures): assertion failed: 
 (auth_request-state == AUTH_REQUEST_STATE_FINISHED)
 
 This happened only with non-plaintext authentication (e.g. DIGEST-MD5).
 Fixed also.
 

Hi Timo,
silly question
is there really a native failover mysql in dovecot ?
cant remember this , i only remember this as part of dovecot proxiing

-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


Re: [Dovecot] mysql auth failover failing

2011-09-15 Thread Timo Sirainen
On Thu, 2011-09-15 at 13:39 +0200, Robert Schetterer wrote:
 
 is there really a native failover mysql in dovecot ?
 cant remember this , i only remember this as part of dovecot proxiing

For SQL authentication it can use multiple SQL server hosts (with both
MySQL and PostgreSQL) and do HA/load balancing.




Re: [Dovecot] mysql auth failover failing

2011-09-15 Thread Robert Schetterer
Am 15.09.2011 13:43, schrieb Timo Sirainen:
 On Thu, 2011-09-15 at 13:39 +0200, Robert Schetterer wrote:

 is there really a native failover mysql in dovecot ?
 cant remember this , i only remember this as part of dovecot proxiing
 
 For SQL authentication it can use multiple SQL server hosts (with both
 MySQL and PostgreSQL) and do HA/load balancing.
 
 
ok, i see, but i have nearly all possible parameters
in mysql ( i use  a mysql cluster ), thx anyway for answer

-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


Re: [Dovecot] mysql auth failover failing

2011-09-15 Thread Paul B. Henson

On 9/15/2011 3:53 AM, Timo Sirainen wrote:


I did several fixes related to this in v2.0 hg.


I patched version 2.0.13 with these fixes and tested it out.

As far as I can tell, it still doesn't do load balancing. When started, 
it only connects to the primary server, and as long as that server is 
available never seems to try and connect to the other one. However, the 
failover is much better. There are a few failed authentications when the 
primary server first becomes unavailable (seems to depend on load; under 
a light load, only a couple fail, the heavier the load, the more fail). 
After that blip though, authentications work fine.


Thanks much for your help resolving this issue, I greatly appreciate it.


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] mysql auth failover failing

2011-09-13 Thread Timo Sirainen
On Mon, 2011-09-12 at 13:26 -0700, Paul B. Henson wrote:

  I'll try to debug this soon.
 
 Thanks; let me know if there's anything I could do to help, or if there
 are any potential fixes you would like tested.

I can't seem to be able to reproduce this. It always connects to the
second MySQL without any user visible errors. What does it log with the
attached debug patch?

diff -r d635bcf35df7 src/auth/main.c
--- a/src/auth/main.c	Tue Sep 13 02:09:02 2011 +0300
+++ b/src/auth/main.c	Tue Sep 13 12:43:16 2011 +0300
@@ -9,6 +9,7 @@
 #include child-wait.h
 #include sql-api.h
 #include module-dir.h
+#include hostpid.h
 #include randgen.h
 #include process-title.h
 #include settings-parser.h
@@ -282,6 +283,8 @@
 	while ((c = master_getopt(master_service))  0) {
 		switch (c) {
 		case 'w':
+			master_service_init_log(master_service,
+t_strdup_printf(auth-worker(%s): , my_pid));
 			worker = TRUE;
 			break;
 		default:
diff -r d635bcf35df7 src/lib-sql/driver-sqlpool.c
--- a/src/lib-sql/driver-sqlpool.c	Tue Sep 13 02:09:02 2011 +0300
+++ b/src/lib-sql/driver-sqlpool.c	Tue Sep 13 12:43:16 2011 +0300
@@ -172,6 +172,7 @@
 static void sqlpool_reconnect(struct sql_db *conndb)
 {
 	timeout_remove(conndb-to_reconnect);
+	i_debug(reconnecting from timeout);
 	(void)sql_connect(conndb);
 }
 
@@ -194,6 +195,7 @@
 			*host_idx_r = i;
 		}
 	}
+	i_debug(%u has least connections (%u), *host_idx_r, min-connection_count);
 	return min;
 }
 
@@ -231,10 +233,15 @@
 
 	/* if we have zero successful hosts and there still are hosts
 	   without connections, connect to one of them. */
+	i_debug(connection failed);
 	if (!sqlpool_have_successful_connections(db)) {
 		host = sqlpool_find_host_with_least_connections(db, host_idx);
-		if (host-connection_count == 0)
+		if (host-connection_count == 0) {
+			i_debug( - none successful, adding %u, host_idx);
 			(void)sqlpool_add_connection(db, host, host_idx);
+		} else {
+			i_debug( - none successful, already added all);
+		}
 	}
 }
 
@@ -264,6 +271,7 @@
 	struct sqlpool_connection *conn;
 
 	host-connection_count++;
+	i_debug(%u adding, host_idx);
 
 	conndb = db-driver-v.init(host-connect_string);
 	i_array_init(conndb-module_contexts, 5);
@@ -311,11 +319,13 @@
 
 		if (!SQL_DB_IS_READY(conndb)) {
 			/* see if we could reconnect to it immediately */
+			i_debug(%u trying to connect, conns[idx].host_idx);
 			(void)sql_connect(conndb);
 		}
 		if (SQL_DB_IS_READY(conndb)) {
 			db-last_query_conn_idx = idx;
 			*all_disconnected_r = FALSE;
+			i_debug(%u is ready, conns[idx].host_idx);
 			return conns[idx];
 		}
 		if (conndb-state != SQL_DB_STATE_DISCONNECTED)
@@ -333,6 +343,7 @@
 	unsigned int i, count;
 	bool all_disconnected;
 
+	i_debug(sql pool getting connection);
 	conn = sqlpool_find_available_connection(db, unwanted_host_idx,
 		 all_disconnected);
 	if (conn == NULL  unwanted_host_idx != -1U) {
@@ -355,11 +366,16 @@
 	}
 	if (conn == NULL) {
 		/* still nothing. try creating new connections */
+		i_debug(nothing, adding more);
 		conn = sqlpool_add_new_connection(db);
-		if (conn != NULL)
+		if (conn != NULL) {
+			i_debug( - and connecting);
 			(void)sql_connect(conn-db);
-		if (conn == NULL || !SQL_DB_IS_READY(conn-db))
+		}
+		if (conn == NULL || !SQL_DB_IS_READY(conn-db)) {
+			i_debug( - not ready);
 			return FALSE;
+		}
 	}
 	*conn_r = conn;
 	return TRUE;
@@ -509,10 +525,13 @@
 	const struct sqlpool_connection *conn;
 	int ret = -1, ret2;
 
+	i_debug(connecting to first available connection);
 	array_foreach(db-all_connections, conn) {
 		ret2 = sql_connect(conn-db);
-		if (ret2  0)
+		if (ret2  0) {
+			i_debug(%u connected, conn-host_idx);
 			return 1;
+		}
 		if (ret2 == 0)
 			ret = 0;
 	}


Re: [Dovecot] mysql auth failover failing

2011-09-12 Thread Timo Sirainen
On Fri, 2011-09-09 at 19:33 -0700, Paul B. Henson wrote:

 According to the sample SQL configuration file HA / round-robin 
 load-balancing is supported by giving multiple host settings, like: 
 host=sql1.host.org host=sql2.host.org.
 
 However, as far as I can tell dovecot only connects to the first listed 
 host, and processes all queries through it, there does not appear to be 
 any load-balancing going on.

The current code creates connection to the second server only when the
first connection is already busy with an SQL query, or when it's not
working. Once there are more connections, it starts doing round robin
lookups.

This works okay enough with PostgreSQL because it does asynchronous
lookups, so two simultaneous lookups create a second connection. MySQL
does synchronous lookups though, so the second connection is normally
never created.

I suppose the fix to this would be to always connect to all SQL servers
at startup.

 That's not necessarily a dealbreaker; however, high-availability does 
 not appear to be working either.
 
 If I shutdown the first mysql server, dovecot starts to log connection 
 failures:
 
 Sep  9 15:47:34 tweak dovecot: auth: Error: 
 mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): 
 Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - 
 waiting for 1 seconds before retry
 
 Sep  9 15:47:39 tweak dovecot: auth: Error: 
 mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): 
 Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - 
 waiting for 25 seconds before retry

Those are intentional.

 And postfix starts to fail authentications:
 
 Sep  9 15:47:35 tweak postfix/smtpd[5119]: warning: 
 bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5 
 authentication failed: Connection lost to authentication server

It should have created the second connection here and not fail..

 Now and again the authentication process dies:
 
 Sep  9 15:47:39 tweak dovecot: auth: Panic: file auth-request-handler.c: 
 line 697 (auth_request_handler_flush_failures): assertion failed: 
 (auth_request-state == AUTH_REQUEST_STATE_FINISHED)

And this of course shouldn't happen either.

 Requests start to pile up:
 
 Sep  9 15:51:46 tweak dovecot: auth: Warning: auth workers: Auth request 
 was queued for 25 seconds, 45 left in queue
 
 Lookups time out:
 
 Sep  9 15:57:22 tweak dovecot: auth: Error: auth worker: Aborted 
 request: Lookup timed out

These are the result of the previous failures.

 This occasionally pops up:
 
 Sep  9 15:58:38 tweak dovecot: auth: Fatal: 
 net_connect_unix(auth-worker) failed: Resource temporarily unavailable

Probably this too.

 And sometimes the auth process gets temporarily disabled:
 
 Sep  9 15:58:57 tweak dovecot: master: Error: service(auth): command 
 startup failed, throttling

Most likely related to the crash, although I think this still shouldn't
have happened.

 I don't think all authentications fail during the scenario, but I think 
 the majority do. Based on the network traffic, dovecot is almost 
 continuously trying to connect to the first listed server. It sometimes 
 connects to the second listed server, but when it does, the connection 
 does not persist, it goes away almost immediately.

There are multiple auth-worker processes, each one having their own
internal MySQL connections with separate retry counters.

I'll try to debug this soon.



Re: [Dovecot] mysql auth failover failing

2011-09-12 Thread Paul B. Henson

On 9/12/2011 5:30 AM, Timo Sirainen wrote:


This works okay enough with PostgreSQL because it does asynchronous
lookups, so two simultaneous lookups create a second connection.
MySQL does synchronous lookups though, so the second connection is
normally never created.


If I could, I think I'd rather run postgres; but so many things only
support mysql you can't really get away with running only postgres, and
it's not worth the effort to run two separate sql services sigh.


I suppose the fix to this would be to always connect to all SQL
servers at startup.


Perhaps it could be an option, either load balancing between all
available servers, or only using later listed servers when the earlier
listed ones are failing. For my purposes, either way is fine, as long as
authentications don't fail :). The other contributor to this thread, who
has a local mysql replica listed first and the central master listed
second probably wouldn't want the load balanced between them.


It should have created the second connection here and not fail..


Based on the network traffic, it is really pounding the primary trying
to connect, and occasionally connecting to the secondary only to
immediately disconnect after either only one or very few queries.


I'll try to debug this soon.


Thanks; let me know if there's anything I could do to help, or if there
are any potential fixes you would like tested.


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] mysql auth failover failing

2011-09-11 Thread Paul B. Henson
On Sat, Sep 10, 2011 at 01:49:59AM -0700, Noel Butler wrote:
 
 Sounds like you have bigger issues, maybe relating as to why the primary
 fails?

For testing purposes, it fails because I stick a firewall rule in place
preventing access to it ;). In production, it came to our attention
because a hardware failure required downtime on one of the mysql servers
to replace parts, and we received complaints of failed authentications
while it was down. In general, both are up, but things using them need
to be able to survive when one is down.

 primary (local slave copy) has gone away unless I'm deliberately
 upgrading mysql ) when doing so (tested) it hits the master server (as
 in secondary host=) right away, no auth failures.

Hmm, what version of dovecot are you using? In version 1 failover
seems to work if the primary returns connection refused (which your
scenario would). In version 2, it seems flaky for both connection
refused and connection timed out. Unless I've got something
misconfigured, but there doesn't seem to be that much to it...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] mysql auth failover failing

2011-09-10 Thread Noel Butler
On Fri, 2011-09-09 at 20:16 -0700, Paul B. Henson wrote:

 On Fri, Sep 09, 2011 at 08:02:57PM -0700, Noel Butler wrote:



  suggest, having just one master server, after all, dovecot and postfix
  just need to read, not alter/update/insert etc.
 
 True; but the pieces that are altering/updating/inserting the data that
 postfix/dovecot need to read need redundancy as well :).
  


Yep, depends on your network design I suppose, I rather leave the front
ends to be just that, with all interactions with master DB server and
the NAS done via second  interface on a dedicated private LAN so those
nasty bored teenagers out there can't get near it :) 


  yep thats correct because it has  gone away but it still uses the
  second host immediately, thats just dovecot trying to re-establish its
  link with primary
 
 Based on my testing, it doesn't use the second host immediately, but
 only sporadically, with most of the authentications failing.


Sounds like you have bigger issues, maybe relating as to why the primary
fails?


  
  err postfix is not dovecot, you need to also add failover in postfix's
  sql lookup commands
 
 postfix relies on dovecot for authentication, this postfix error message
 is the result of dovecot not successfully processing an authentication
 request. postfix itself handles mysql failure well, it both load
 balances queries across both servers and also continues to function when
 one isn't available.
 


my bad, I did see that and it is as how I do it (i'm not all there at
present, had the flu for a week g) but I never had a situation where
primary (local slave copy) has gone away unless I'm deliberately
upgrading mysql ) when doing so (tested) it hits the master server (as
in secondary host=) right away, no auth failures.


Cheers

attachment: face-smile.png

signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] mysql auth failover failing

2011-09-09 Thread Noel Butler
On Fri, 2011-09-09 at 19:33 -0700, Paul B. Henson wrote:



 default_pass_scheme = PLAIN

Uhg i'll pretend I didnt see that  :)



 
 According to the sample SQL configuration file HA / round-robin 
 load-balancing is supported by giving multiple host settings, like: 
 host=sql1.host.org host=sql2.host.org.
 
 However, as far as I can tell dovecot only connects to the first listed 
 host, and processes all queries through it, there does not appear to be 
 any load-balancing going on.
 



I suspect the wording here is incorrect, its just a failover AFAIK, it
only hits the first entry failing to second if no response.
HA would be like running a mysql slave on all the front ends failing
over to the master on your CRM server etc, which is what I do and
suggest, having just one master server, after all, dovecot and postfix
just need to read, not alter/update/insert etc.


 That's not necessarily a dealbreaker; however, high-availability does 
 not appear to be working either.
 
 If I shutdown the first mysql server, dovecot starts to log connection 
 failures:
 
 Sep  9 15:47:34 tweak dovecot: auth: Error: 
 mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): 
 Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - 
 waiting for 1 seconds before retry
 
 Sep  9 15:47:39 tweak dovecot: auth: Error: 
 mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): 
 Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - 
 waiting for 25 seconds before retry
 


yep thats correct because it has  gone away but it still uses the
second host immediately, thats just dovecot trying to re-establish its
link with primary

 And postfix starts to fail authentications:
 


err postfix is not dovecot, you need to also add failover in postfix's
sql lookup commands
hosts = unix:/var/run/mysql/mysql.sock 10.10.10.2   (assuming .2 is your
master sql server)



 
 Resulting in a complete unavailability of smtp service, not just 
 unavailability of authenticated services.
 


You could have a higher sec mx smtp box that uses postfix for virtual
transport for cases of if dovecot is unavailable, this of course means
storing partial paths in your mail db, for use only by that one
non-behind-load-balancer separated sec mx, of course this wont solve
users issue of sending unless you have multiple smtp behind a load
balancer, but allows for inbound still, depends on how big your setup
(and budget) is or can be :)

(note: I talk of load balancer as in real hardware device, not as in
pretend LB's as in software)


 Does the example sql config have incorrect 
 information?
 


I suspect so.


attachment: face-smile.png

signature.asc
Description: This is a digitally signed message part