Re: [Dovecot] MySQL server has gone away
On 1/20/2012 1:06 PM, Timo Sirainen wrote: Hmh. Still doesn't work 100%: auth-worker(28788): Error: mysql: Query failed, retrying: MySQL server has gone away (idled for 181 secs) auth-worker(7413): Error: mysql: Query failed, retrying: MySQL server has gone away (idled for 298 secs) I'm not really sure why it's not killing itself after 60 seconds of idling. Probably related to how mysql code tracks idle time and how idle_kill tracks it.. Anyway, those errors are much more rare now. The mysql server starts tracking idle time as beginning after the last network communication with the client. So presumably if the auth worker gets marked as not idle by anything not involving interaction with the mysql server, they could get out of sync. Before you posted a potential fix to the idle timeout, I was looking at other possible ways to resolve the issue. Currently, an authentication request is tried exactly twice -- one initial try, and one retry. Looking at driver-sqlpool.c: if (result-failed_try_retry !request-retried) { Currently, retried is a boolean. What if retried was an integer instead, and a new configuration variable allowed you to specify how many times an authentication attempt should be retried? The default could be 2, which would result in exactly the same behavior. But then you could set it to 3 or 4 to prevent a request from hitting a timed out connection twice and failing completely. Ideally, a better fix would be for the client not to consider a MySQL server has gone away return as a failure, but instead immediately reconnect and try again without marking it as a retry. However, from reviewing the code, that would be a much more difficult and invasive change. Changing the existing retried variable to an integer count rather than a boolean is pretty simple. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] MySQL server has gone away
On Fri, Jan 20, 2012 at 09:16:57AM -0800, Timo Sirainen wrote: This is fixed in v2.1 hg. The default idle_kill of 60 seconds seems to have gotten rid of the MySQL server has gone away errors completely. So I guess the problem was that during some peak times a ton of auth worker processes were created, but afterwards they weren't used until the next peak happened, and then they failed. http://hg.dovecot.org/dovecot-2.1/rev/3963862a4086 http://hg.dovecot.org/dovecot-2.1/rev/58556a90259f Hmm, I tried to apply this to 2.0.17, and that didn't really work out. Before I spend too much time trying to hand port the changes, do you know off hand if they simply won't apply to 2.0.17 due to other changes made since then? It looks like 2.1 might be out soon, I guess maybe I should just wait for that. Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] MySQL server has gone away
On Sat, Jan 14, 2012 at 12:01:12AM -0800, Robert Schetterer wrote: Hmm, hadn't tried that, but flipped it on to see how it might work out. The only tradeoff is a potential delay between when an account is disabled and when it can stop authenticating. I set the timeout to 10 minutes for now, with an hour timeout for negative caching. dont know if i unserstand you right Before I turned on auth caching, every attempted authentication hit our mysql database, which in addition to the password itself contains a flag indicating whether or not the account is enabled. So if somebody was abusing smtp authentication, our helpdesk could disable their account, and it would *immediately* stop working. Whereas with authentication caching enabled, there is a window the size of the ttl where an account that has been disabled can continue to successfully authenticate. That page says you can send a USR2 signal to the auth process for cache stats? That doesn't seem to work. OTOH, that page is for version 1, not 2; is there some other way to generate cache stats in version 2? auth cache works with dove 2, no idea about dove 1 ,didnt test, but i guess it does I'm using dovecot 2; my question was that the documentation for dovecot 1 described a way to make dovecot dump the authentication cache statistics that doesn't seem to work for dovecot 2, and if there was some other way to get the cache statistics in dovecot 2. Thanks...
Re: [Dovecot] MySQL server has gone away
On Fri, Jan 13, 2012 at 01:36:38AM -0800, Timo Sirainen wrote: Also another idea to avoid them in the first place: service auth-worker { idle_kill = 20 } Ah, set the auth-worker timeout to less than the mysql timeout to prevent a stale mysql connection from ever being used. I'll try that, thanks. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] MySQL server has gone away
On Fri, Jan 13, 2012 at 11:38:28AM -0800, Robert Schetterer wrote: by the way , if you use sql for auth have you tried auth caching ? http://wiki.dovecot.org/Authentication/Caching Hmm, hadn't tried that, but flipped it on to see how it might work out. The only tradeoff is a potential delay between when an account is disabled and when it can stop authenticating. I set the timeout to 10 minutes for now, with an hour timeout for negative caching. That page says you can send a USR2 signal to the auth process for cache stats? That doesn't seem to work. OTOH, that page is for version 1, not 2; is there some other way to generate cache stats in version 2? Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] MySQL server has gone away
On 1/13/2012 10:29 AM, Mark Moseley wrote: connection prior to this, is the auth worker not recognizing its connection is already half-closed (in which case, it probably shouldn't even consider it a legitimate connection and just automatically reconnect, i.e. try #1, not the retry, which would happen after another failure). I don't think there's any way to tell from the mysql api that the server has closed the connection short of trying to use it and getting that specific error. I suppose that specific error could be special cased as an immediate try again with no penalty rather than considered a failure. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] MySQL server has gone away
On 1/12/2012 6:00 PM, Mark Moseley wrote: Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying: MySQL server has gone away I've actually been meaning to send a similar message for the last couple of months :). We run dovecot solely as a sasl authentication provider to postfix for smtp authentication. We're currently running 2.0.15 with a handful of patches from a few months ago when Timo fixed mysql failover. We also see sporadic messages like that in the logs: Jan 11 01:00:57 sparky dovecot: auth-worker: Error: mysql: Query failed, retrying: MySQL server has gone away We do have a timeout on the mysql servers, so I don't necessarily mind this message, except we also see some number of these: Jan 11 01:00:57 sparky dovecot: auth-worker: Error: sql(clgeurts,108.38.64.98): Password query failed: MySQL server has gone away The mysql servers have never been down or unresponsive, if it retries, it should succeed. I'm not sure what's happening here, perhaps it tries the query on one mysql server connection (we have two configured) which has timed out, and then tries the other one, and if the other one has also timed out just fails? I also see some auth timeouts: Jan 11 22:06:02 sparky dovecot: auth: CRAM-MD5(?,200.37.175.14): Request 10232.28 timeouted after 150 secs, state=2 I'm not sure if they're related to the mysql timeouts. There are also some postfix auth errors: Jan 11 23:55:41 sparky postfix/smtpd[20994]: warning: unknown[200.37.175.14]: SASL CRAM-MD5 authentication failed: Connection lost to authentication server Which I think happen when dovecot takes too long to respond. I haven't had time to dig into it or get any debugging info, but just thought I'd pipe up when I saw your similar question :). -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] dovecot2 auth-worker socket perms ignoring assigned ownership settings in conf.d/10-master.conf?
On Tue, Oct 11, 2011 at 08:20:13PM -0700, mephistophe...@operamail.com wrote: Maybe being too literal, or misunderstanding your 'extra', I changed to, Hmm, I just cut-and-pasted my config :), the missing piece was the unix_listener subconfig user, the user/group part in the service config didn't need to match mine exactly, although I think $default_internal_user is dovecot anyway. chown doveauth:dovecot /var/run/dovecot/auth-worker Hmm, perhaps I misunderstood you? I thought you were trying to get SASL auth working with postfix? But you're demonstrating an imap connection. Ah, yes, I see in your original email you showed an imap connection too. I just saw the /var/spool/postfix/private/auth and user/group postfix parts of the config and made an assumption. My config was for using Dovecot *just* to provide SASL authentication services to postfix for smtp auth, I'm not using any of its other features/services. Sorry for any confusion. I'm curious though, why are you setting the auth stuff up to be owned by postfix if you'd trying to authenticate dovecot imap processes? It seems you're mixing two different configs. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] mysql auth failover failing
On 9/16/2011 2:21 AM, Timo Sirainen wrote: As far as I can tell, it still doesn't do load balancing. Oh. http://hg.dovecot.org/dovecot-2.0/rev/327698228158 should finally fix it. :) I installed the new 2.0.15 release including this change, and can confirm it does now successfully load balance across my two servers. Not only that, but with this change, there are no failed authentications at all when one of the servers goes away :). I have it running on one of my three production mail servers now, and barring any unexpected issues will deploy it on the other two next week, and then we'll be sitting pretty ;). Thanks again... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] mysql auth failover failing
On 9/15/2011 3:53 AM, Timo Sirainen wrote: I did several fixes related to this in v2.0 hg. I patched version 2.0.13 with these fixes and tested it out. As far as I can tell, it still doesn't do load balancing. When started, it only connects to the primary server, and as long as that server is available never seems to try and connect to the other one. However, the failover is much better. There are a few failed authentications when the primary server first becomes unavailable (seems to depend on load; under a light load, only a couple fail, the heavier the load, the more fail). After that blip though, authentications work fine. Thanks much for your help resolving this issue, I greatly appreciate it. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] mysql auth failover failing
On 9/12/2011 5:30 AM, Timo Sirainen wrote: This works okay enough with PostgreSQL because it does asynchronous lookups, so two simultaneous lookups create a second connection. MySQL does synchronous lookups though, so the second connection is normally never created. If I could, I think I'd rather run postgres; but so many things only support mysql you can't really get away with running only postgres, and it's not worth the effort to run two separate sql services sigh. I suppose the fix to this would be to always connect to all SQL servers at startup. Perhaps it could be an option, either load balancing between all available servers, or only using later listed servers when the earlier listed ones are failing. For my purposes, either way is fine, as long as authentications don't fail :). The other contributor to this thread, who has a local mysql replica listed first and the central master listed second probably wouldn't want the load balanced between them. It should have created the second connection here and not fail.. Based on the network traffic, it is really pounding the primary trying to connect, and occasionally connecting to the secondary only to immediately disconnect after either only one or very few queries. I'll try to debug this soon. Thanks; let me know if there's anything I could do to help, or if there are any potential fixes you would like tested. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] mysql auth failover failing
On Sat, Sep 10, 2011 at 01:49:59AM -0700, Noel Butler wrote: Sounds like you have bigger issues, maybe relating as to why the primary fails? For testing purposes, it fails because I stick a firewall rule in place preventing access to it ;). In production, it came to our attention because a hardware failure required downtime on one of the mysql servers to replace parts, and we received complaints of failed authentications while it was down. In general, both are up, but things using them need to be able to survive when one is down. primary (local slave copy) has gone away unless I'm deliberately upgrading mysql ) when doing so (tested) it hits the master server (as in secondary host=) right away, no auth failures. Hmm, what version of dovecot are you using? In version 1 failover seems to work if the primary returns connection refused (which your scenario would). In version 2, it seems flaky for both connection refused and connection timed out. Unless I've got something misconfigured, but there doesn't seem to be that much to it... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
[Dovecot] mysql auth failover failing
as long as they do not continuously fail while the server is down. Am I doing something wrong? Does the example sql config have incorrect information? We were previously running dovecot 1.2.11, we just recently upgraded to 2. In the previous version, we actually had two different passdb's configured, each one listing only one of the mysql servers. I seem to recall that was the recommendation at the time for high-availability. When that configuration did not seem to work under version 2, I found an updated recommendation to list both servers in the same passdb, which also does not appear to work correctly. I actually went back and tested the older version, and determined it seemed to work okay in the case where the server was up but the service was down, and connections were refused, but also failed a large number of authentication attempts when the server was completely down and connections were timing out. Thanks much... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768