Re: [Dovecot] MySQL server has gone away / dict ? tests done
Am 15.02.2012 20:10, schrieb Robert Schetterer: > Am 15.02.2012 19:43, schrieb Timo Sirainen: >> On 15.2.2012, at 20.22, Robert Schetterer wrote: >> >>> sorry for some more question >>> what is the default behave if dict quota over sql cant be reached >>> i.e with lmtp >>> >>> something like log warning and deliver anyway ? >> >> I think it tempfails. Try. > > hm, test servers arent up recently > cant do it on production > > but i will test before i go try dove 2.1 > and report, @list perhaps anyone other having it tested before? >> > ok the test server was good enough for small tests with 2.0.13 dove private/dovecot-lmtp] said: 451 4.2.0 <@com> Internal error occurred. Refer to server log for more information. dict: Error: mysql(localhost2): Connect failed to database (postfix): Unknown MySQL server host '...' (1) - waiting for 125 seconds before retry youre right looks like default is tempfail 451 so it stays in postfix queue and is permanent retry after sql comming back , its getting delivered i.e force with Flush Queue so thats good enough for me i think -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
Re: [Dovecot] MySQL server has gone away / dict ?
Am 15.02.2012 19:43, schrieb Timo Sirainen: > On 15.2.2012, at 20.22, Robert Schetterer wrote: > >> sorry for some more question >> what is the default behave if dict quota over sql cant be reached >> i.e with lmtp >> >> something like log warning and deliver anyway ? > > I think it tempfails. Try. hm, test servers arent up recently cant do it on production but i will test before i go try dove 2.1 and report, @list perhaps anyone other having it tested before? > -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
Re: [Dovecot] MySQL server has gone away / dict ?
On 15.2.2012, at 20.22, Robert Schetterer wrote: > sorry for some more question > what is the default behave if dict quota over sql cant be reached > i.e with lmtp > > something like log warning and deliver anyway ? I think it tempfails. Try.
Re: [Dovecot] MySQL server has gone away / dict ?
Am 15.02.2012 17:10, schrieb Robert Schetterer: > Am 15.02.2012 14:07, schrieb Timo Sirainen: >> On Wed, 2012-02-15 at 09:10 +0100, Robert Schetterer wrote: >>> Hi Timo,just to make sure >>> i have an extra >>> dovecot-dict-quota-sql.conf.ext ( dove 2.0.18 ) >>> >>> connect = host=192.168.123.150 dbname=.. user=... password=... >> .. >>> is it possible to have i.e >>> >>> connect = host=192.168.123.150 host=127.0.0.1 dbname=.. user=... >>> password=... >>> >>> there too? >> >> Assuming your SQL database is configured to do synchronous multi-master >> replication, sure. >> >> > jep ,thx Timo > Hi Timo, sorry for some more question what is the default behave if dict quota over sql cant be reached i.e with lmtp something like log warning and deliver anyway ? -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
Re: [Dovecot] MySQL server has gone away / dict ?
Am 15.02.2012 14:07, schrieb Timo Sirainen: > On Wed, 2012-02-15 at 09:10 +0100, Robert Schetterer wrote: >> Hi Timo,just to make sure >> i have an extra >> dovecot-dict-quota-sql.conf.ext ( dove 2.0.18 ) >> >> connect = host=192.168.123.150 dbname=.. user=... password=... > .. >> is it possible to have i.e >> >> connect = host=192.168.123.150 host=127.0.0.1 dbname=.. user=... >> password=... >> >> there too? > > Assuming your SQL database is configured to do synchronous multi-master > replication, sure. > > jep ,thx Timo -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
Re: [Dovecot] MySQL server has gone away / dict ?
On Wed, 2012-02-15 at 09:10 +0100, Robert Schetterer wrote: > Hi Timo,just to make sure > i have an extra > dovecot-dict-quota-sql.conf.ext ( dove 2.0.18 ) > > connect = host=192.168.123.150 dbname=.. user=... password=... .. > is it possible to have i.e > > connect = host=192.168.123.150 host=127.0.0.1 dbname=.. user=... > password=... > > there too? Assuming your SQL database is configured to do synchronous multi-master replication, sure.
Re: [Dovecot] MySQL server has gone away / dict ?
Am 28.01.2012 22:06, schrieb Robert Schetterer: > Am 28.01.2012 21:07, schrieb Timo Sirainen: >> On 13.1.2012, at 20.29, Mark Moseley wrote: >> >>> If there are multiple hosts, it seems like the most robust thing to do >>> would be to exhaust the existing connections and if none of those >>> succeed, then start a new connection to one of them. It will probably >>> result in much more convoluted logic but it'd probably match better >>> what people expect from a retry. >> >> Done: http://hg.dovecot.org/dovecot-2.0/rev/4e7676b890f1 >> > > Hi Timo > > doc/example-config/dovecot-sql.conf.ext > from hg > has something like > > # Database connection string. This is driver-specific setting. > # HA / round-robin load-balancing is supported by giving multiple host > # settings, like: host=sql1.host.org host=sql2.host.org > > but i dont find it in > http://wiki2.dovecot.org/AuthDatabase/SQL > > Hi Timo,just to make sure i have an extra dovecot-dict-quota-sql.conf.ext ( dove 2.0.18 ) connect = host=192.168.123.150 dbname=.. user=... password=... map { pattern = priv/quota/storage table = quota2 username_field = username value_field = bytes } map { pattern = priv/quota/messages table = quota2 username_field = username value_field = messages } is it possible to have i.e connect = host=192.168.123.150 host=127.0.0.1 dbname=.. user=... password=... there too? -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
Re: [Dovecot] MySQL server has gone away
On Sat, Jan 28, 2012 at 12:07 PM, Timo Sirainen wrote: > On 13.1.2012, at 20.29, Mark Moseley wrote: > >> If there are multiple hosts, it seems like the most robust thing to do >> would be to exhaust the existing connections and if none of those >> succeed, then start a new connection to one of them. It will probably >> result in much more convoluted logic but it'd probably match better >> what people expect from a retry. > > Done: http://hg.dovecot.org/dovecot-2.0/rev/4e7676b890f1 > Excellent, thanks!
Re: [Dovecot] MySQL server has gone away
Am 28.01.2012 23:17, schrieb Timo Sirainen: > On 28.1.2012, at 23.06, Robert Schetterer wrote: > >> doc/example-config/dovecot-sql.conf.ext >> from hg >> has something like >> >> # Database connection string. This is driver-specific setting. >> # HA / round-robin load-balancing is supported by giving multiple host >> # settings, like: host=sql1.host.org host=sql2.host.org >> >> but i dont find it in >> http://wiki2.dovecot.org/AuthDatabase/SQL > > I added something about it there. > cool thanks ! -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
Re: [Dovecot] MySQL server has gone away
On 28.1.2012, at 23.06, Robert Schetterer wrote: > doc/example-config/dovecot-sql.conf.ext > from hg > has something like > > # Database connection string. This is driver-specific setting. > # HA / round-robin load-balancing is supported by giving multiple host > # settings, like: host=sql1.host.org host=sql2.host.org > > but i dont find it in > http://wiki2.dovecot.org/AuthDatabase/SQL I added something about it there.
Re: [Dovecot] MySQL server has gone away
Am 28.01.2012 21:07, schrieb Timo Sirainen: > On 13.1.2012, at 20.29, Mark Moseley wrote: > >> If there are multiple hosts, it seems like the most robust thing to do >> would be to exhaust the existing connections and if none of those >> succeed, then start a new connection to one of them. It will probably >> result in much more convoluted logic but it'd probably match better >> what people expect from a retry. > > Done: http://hg.dovecot.org/dovecot-2.0/rev/4e7676b890f1 > Hi Timo doc/example-config/dovecot-sql.conf.ext from hg has something like # Database connection string. This is driver-specific setting. # HA / round-robin load-balancing is supported by giving multiple host # settings, like: host=sql1.host.org host=sql2.host.org but i dont find it in http://wiki2.dovecot.org/AuthDatabase/SQL -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
Re: [Dovecot] MySQL server has gone away
On 13.1.2012, at 20.29, Mark Moseley wrote: > If there are multiple hosts, it seems like the most robust thing to do > would be to exhaust the existing connections and if none of those > succeed, then start a new connection to one of them. It will probably > result in much more convoluted logic but it'd probably match better > what people expect from a retry. Done: http://hg.dovecot.org/dovecot-2.0/rev/4e7676b890f1
Re: [Dovecot] MySQL server has gone away
On Fri, Jan 20, 2012 at 09:16:57AM -0800, Timo Sirainen wrote: > This is fixed in v2.1 hg. The default idle_kill of 60 seconds seems to > have gotten rid of the "MySQL server has gone away" errors completely. > So I guess the problem was that during some peak times a ton of auth > worker processes were created, but afterwards they weren't used until > the next peak happened, and then they failed. > > http://hg.dovecot.org/dovecot-2.1/rev/3963862a4086 > http://hg.dovecot.org/dovecot-2.1/rev/58556a90259f Hmm, I tried to apply this to 2.0.17, and that didn't really work out. Before I spend too much time trying to hand port the changes, do you know off hand if they simply won't apply to 2.0.17 due to other changes made since then? It looks like 2.1 might be out soon, I guess maybe I should just wait for that. Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] MySQL server has gone away
On 1/20/2012 9:17 AM, Timo Sirainen wrote: Works for me. Are you maybe sending it to wrong auth process (auth worker instead of master)? I had tried sending it to both; but the underlying problem turned out to be that the updated config hadn't actually been deployed yet 8-/ oops. Once I fixed that, sending the signal did generate the log output. Evidently nothing is printed out in the case where the authentication caching isn't enabled; maybe you should make it print out something like "Hey idiot, caching isn't turned on" ;). Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] MySQL server has gone away
On 1/20/2012 1:06 PM, Timo Sirainen wrote: Hmh. Still doesn't work 100%: auth-worker(28788): Error: mysql: Query failed, retrying: MySQL server has gone away (idled for 181 secs) auth-worker(7413): Error: mysql: Query failed, retrying: MySQL server has gone away (idled for 298 secs) I'm not really sure why it's not killing itself after 60 seconds of idling. Probably related to how mysql code tracks idle time and how idle_kill tracks it.. Anyway, those errors are much more rare now. The mysql server starts tracking idle time as beginning after the last network communication with the client. So presumably if the auth worker gets marked as not idle by anything not involving interaction with the mysql server, they could get out of sync. Before you posted a potential fix to the idle timeout, I was looking at other possible ways to resolve the issue. Currently, an authentication request is tried exactly twice -- one initial try, and one retry. Looking at driver-sqlpool.c: if (result->failed_try_retry && !request->retried) { Currently, retried is a boolean. What if retried was an integer instead, and a new configuration variable allowed you to specify how many times an authentication attempt should be retried? The default could be 2, which would result in exactly the same behavior. But then you could set it to 3 or 4 to prevent a request from hitting a timed out connection twice and failing completely. Ideally, a better fix would be for the client not to consider a "MySQL server has gone away" return as a failure, but instead immediately reconnect and try again without marking it as a retry. However, from reviewing the code, that would be a much more difficult and invasive change. Changing the existing retried variable to an integer count rather than a boolean is pretty simple. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] MySQL server has gone away
On 20.1.2012, at 19.16, Timo Sirainen wrote: > This is fixed in v2.1 hg. The default idle_kill of 60 seconds seems to have > gotten rid of the "MySQL server has gone away" errors completely. So I guess > the problem was that during some peak times a ton of auth worker processes > were created, but afterwards they weren't used until the next peak happened, > and then they failed. Hmh. Still doesn't work 100%: auth-worker(28788): Error: mysql: Query failed, retrying: MySQL server has gone away (idled for 181 secs) auth-worker(7413): Error: mysql: Query failed, retrying: MySQL server has gone away (idled for 298 secs) I'm not really sure why it's not killing itself after 60 seconds of idling. Probably related to how mysql code tracks idle time and how idle_kill tracks it.. Anyway, those errors are much more rare now.
Re: [Dovecot] MySQL server has gone away
On 14.1.2012, at 2.19, Paul B. Henson wrote: > On Fri, Jan 13, 2012 at 11:38:28AM -0800, Robert Schetterer wrote: > >> by the way , if you use sql for auth have you tried auth caching ? >> >> http://wiki.dovecot.org/Authentication/Caching > > That page says you can send a USR2 signal to the auth process for cache > stats? That doesn't seem to work. OTOH, that page is for version 1, not > 2; is there some other way to generate cache stats in version 2? Works for me. Are you maybe sending it to wrong auth process (auth worker instead of master)?
Re: [Dovecot] MySQL server has gone away
On 14.1.2012, at 1.19, Mark Moseley wrote: >>> Also another idea to avoid them in the first place: >>> >>> service auth-worker { >>> idle_kill = 20 >>> } >> >> Ah, set the auth-worker timeout to less than the mysql timeout to >> prevent a stale mysql connection from ever being used. I'll try that, >> thanks. > > I gave that a try. Sometimes it seems to kill off the auth-worker but > not till after a minute or so (with idle_kill = 20). Other times, the > worker stays around for more like 5 minutes (I gave up watching), > despite being idle -- and I'm the only person connecting to it, so > it's definitely idle. Does auth-worker perhaps only wake up every so > often to check its idle status? This is fixed in v2.1 hg. The default idle_kill of 60 seconds seems to have gotten rid of the "MySQL server has gone away" errors completely. So I guess the problem was that during some peak times a ton of auth worker processes were created, but afterwards they weren't used until the next peak happened, and then they failed. http://hg.dovecot.org/dovecot-2.1/rev/3963862a4086 http://hg.dovecot.org/dovecot-2.1/rev/58556a90259f
Re: [Dovecot] MySQL server has gone away
On Sat, Jan 14, 2012 at 12:01:12AM -0800, Robert Schetterer wrote: > > Hmm, hadn't tried that, but flipped it on to see how it might work out. > > The only tradeoff is a potential delay between when an account is > > disabled and when it can stop authenticating. I set the timeout to 10 > > minutes for now, with an hour timeout for negative caching. > > dont know if i unserstand you right Before I turned on auth caching, every attempted authentication hit our mysql database, which in addition to the password itself contains a flag indicating whether or not the account is enabled. So if somebody was abusing smtp authentication, our helpdesk could disable their account, and it would *immediately* stop working. Whereas with authentication caching enabled, there is a window the size of the ttl where an account that has been disabled can continue to successfully authenticate. > > That page says you can send a USR2 signal to the auth process for cache > > stats? That doesn't seem to work. OTOH, that page is for version 1, not > > 2; is there some other way to generate cache stats in version 2? > > auth cache works with dove 2, no idea about dove 1 ,didnt test, but i > guess it does I'm using dovecot 2; my question was that the documentation for dovecot 1 described a way to make dovecot dump the authentication cache statistics that doesn't seem to work for dovecot 2, and if there was some other way to get the cache statistics in dovecot 2. Thanks...
Re: [Dovecot] MySQL server has gone away
Am 14.01.2012 01:19, schrieb Paul B. Henson: > On Fri, Jan 13, 2012 at 11:38:28AM -0800, Robert Schetterer wrote: > >> by the way , if you use sql for auth have you tried auth caching ? >> >> http://wiki.dovecot.org/Authentication/Caching > > Hmm, hadn't tried that, but flipped it on to see how it might work out. > The only tradeoff is a potential delay between when an account is > disabled and when it can stop authenticating. I set the timeout to 10 > minutes for now, with an hour timeout for negative caching. dont know if i unserstand you right perhaps this is what you mean, i use this with/cause fail2ban # TTL for negative hits (user not found, password mismatch). # 0 disables caching them completely. auth_cache_negative_ttl = 0 > > That page says you can send a USR2 signal to the auth process for cache > stats? That doesn't seem to work. OTOH, that page is for version 1, not > 2; is there some other way to generate cache stats in version 2? auth cache works with dove 2, no idea about dove 1 ,didnt test, but i guess it does > > Thanks... > -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
Re: [Dovecot] MySQL server has gone away
On 1/13/2012 10:29 AM, Mark Moseley wrote: connection prior to this, is the auth worker not recognizing its connection is already half-closed (in which case, it probably shouldn't even consider it a legitimate connection and just automatically reconnect, i.e. try #1, not the retry, which would happen after another failure). I don't think there's any way to tell from the mysql api that the server has closed the connection short of trying to use it and getting that specific error. I suppose that specific error could be special cased as an immediate "try again with no penalty" rather than considered a failure. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] MySQL server has gone away
On Fri, Jan 13, 2012 at 11:38:28AM -0800, Robert Schetterer wrote: > by the way , if you use sql for auth have you tried auth caching ? > > http://wiki.dovecot.org/Authentication/Caching Hmm, hadn't tried that, but flipped it on to see how it might work out. The only tradeoff is a potential delay between when an account is disabled and when it can stop authenticating. I set the timeout to 10 minutes for now, with an hour timeout for negative caching. That page says you can send a USR2 signal to the auth process for cache stats? That doesn't seem to work. OTOH, that page is for version 1, not 2; is there some other way to generate cache stats in version 2? Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] MySQL server has gone away
On Fri, Jan 13, 2012 at 2:46 PM, Paul B. Henson wrote: > On Fri, Jan 13, 2012 at 01:36:38AM -0800, Timo Sirainen wrote: > >> Also another idea to avoid them in the first place: >> >> service auth-worker { >> idle_kill = 20 >> } > > Ah, set the auth-worker timeout to less than the mysql timeout to > prevent a stale mysql connection from ever being used. I'll try that, > thanks. I gave that a try. Sometimes it seems to kill off the auth-worker but not till after a minute or so (with idle_kill = 20). Other times, the worker stays around for more like 5 minutes (I gave up watching), despite being idle -- and I'm the only person connecting to it, so it's definitely idle. Does auth-worker perhaps only wake up every so often to check its idle status? To test, I kicked off a dsync, then grabbed a netstat: tcp0 0 10.1.15.129:40070 10.1.52.47:3306 ESTABLISHED 29146/auth worker [ tcp0 0 10.1.15.129:33369 10.1.52.48:3306 ESTABLISHED 29146/auth worker [ tcp0 0 10.1.15.129:54083 10.1.52.49:3306 ESTABLISHED 29146/auth worker [ then kicked off this loop: # while true; do date; ps p 29146 |tail -n1; sleep 1; done Fri Jan 13 18:05:14 EST 2012 29146 ?S 0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb] Fri Jan 13 18:05:15 EST 2012 29146 ?S 0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb] More lines of the loop ... Fri Jan 13 18:05:35 EST 2012 29146 ?S 0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb] 18:05:36.252976 IP 10.1.52.48.3306 > 10.1.15.129.33369: F 77:77(0) ack 92 win 91 18:05:36.288549 IP 10.1.15.129.33369 > 10.1.52.48.3306: . ack 78 win 913 Fri Jan 13 18:05:36 EST 2012 29146 ?S 0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb] 18:05:37.196204 IP 10.1.52.49.3306 > 10.1.15.129.54083: F 806:806(0) ack 1126 win 123 18:05:37.228594 IP 10.1.15.129.54083 > 10.1.52.49.3306: . ack 807 win 1004 18:05:37.411955 IP 10.1.52.47.3306 > 10.1.15.129.40070: F 806:806(0) ack 1126 win 123 18:05:37.448573 IP 10.1.15.129.40070 > 10.1.52.47.3306: . ack 807 win 1004 Fri Jan 13 18:05:37 EST 2012 29146 ?S 0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb] ... more lines of the loop ... Fri Jan 13 18:10:13 EST 2012 29146 ?S 0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb] Fri Jan 13 18:10:14 EST 2012 29146 ?S 0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb] ^C at which point I bailed out. Looking again a couple of minutes later, it was gone. Nothing else was going on and the logs don't show any activity between 18:05:07 and 18:10:44.
Re: [Dovecot] MySQL server has gone away
On Fri, Jan 13, 2012 at 01:36:38AM -0800, Timo Sirainen wrote: > Also another idea to avoid them in the first place: > > service auth-worker { > idle_kill = 20 > } Ah, set the auth-worker timeout to less than the mysql timeout to prevent a stale mysql connection from ever being used. I'll try that, thanks. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [Dovecot] MySQL server has gone away
On Fri, Jan 13, 2012 at 11:38 AM, Robert Schetterer wrote: > Am 13.01.2012 19:29, schrieb Mark Moseley: >> On Fri, Jan 13, 2012 at 1:36 AM, Timo Sirainen wrote: >>> On 13.1.2012, at 4.00, Mark Moseley wrote: >>> I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL server has gone away" errors, despite having multiple hosts defined in my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing the same thing with 2.0.16 on Debian Squeeze 64-bit. E.g.: Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying: MySQL server has gone away Our mail mysql servers are busy enough that wait_timeout is set to a whopping 30 seconds. On my regular boxes, I see a good deal of these in the logs. I've been doing a lot of mucking with doveadm/dsync (working on maildir->mdbox migration finally, yay!) on test boxes (same dovecot package & version) and when I get this error, despite the log saying it's retrying, it doesn't seem to be. Instead I get: dsync(root): Error: user ...: Auth USER lookup failed >>> >>> Try with only one host in the "connect" string? My guess: Both the >>> connections have timed out, and the retrying fails as well (there is only >>> one retry). Although if the retrying lookup fails, there should be an error >>> logged about it also (you don't see one?) >>> >>> Also another idea to avoid them in the first place: >>> >>> service auth-worker { >>> idle_kill = 20 >>> } >>> >> >> With just one 'connect' host, it seems to reconnect just fine (using >> the same tests as above) and I'm not seeing the same error. It worked >> every time that I tried, with no complaints of "MySQL server has gone >> away". >> >> If there are multiple hosts, it seems like the most robust thing to do >> would be to exhaust the existing connections and if none of those >> succeed, then start a new connection to one of them. It will probably >> result in much more convoluted logic but it'd probably match better >> what people expect from a retry. >> >> Alternatively, since in all my tests, the mysql server has closed the >> connection prior to this, is the auth worker not recognizing its >> connection is already half-closed (in which case, it probably >> shouldn't even consider it a legitimate connection and just >> automatically reconnect, i.e. try #1, not the retry, which would >> happen after another failure). >> >> I'll give the idle_kill a try too. I kind of like the idea of >> idle_kill for auth processes anyway, just to free up some connections >> on the mysql server. > > by the way , if you use sql for auth have you tried auth caching ? > > http://wiki.dovecot.org/Authentication/Caching > > i.e. > > # Authentication cache size (e.g. 10M). 0 means it's disabled. Note that > # bsdauth, PAM and vpopmail require cache_key to be set for caching to > be used. > > auth_cache_size = 10M > > # Time to live for cached data. After TTL expires the cached record is no > # longer used, *except* if the main database lookup returns internal > failure. > # We also try to handle password changes automatically: If user's previous > # authentication was successful, but this one wasn't, the cache isn't used. > # For now this works only with plaintext authentication. > > auth_cache_ttl = 1 hour > > # TTL for negative hits (user not found, password mismatch). > # 0 disables caching them completely. > > auth_cache_negative_ttl = 0 Yup, we have caching turned on for our production boxes. On this particular box, I'd just shut off caching so that I could work on a script for converting from maildir->mdbox and run it repeatedly on the same mailbox. I got tired of restarting dovecot between each test :)
Re: [Dovecot] MySQL server has gone away
Am 13.01.2012 19:29, schrieb Mark Moseley: > On Fri, Jan 13, 2012 at 1:36 AM, Timo Sirainen wrote: >> On 13.1.2012, at 4.00, Mark Moseley wrote: >> >>> I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL >>> server has gone away" errors, despite having multiple hosts defined in >>> my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing >>> the same thing with 2.0.16 on Debian Squeeze 64-bit. >>> >>> E.g.: >>> >>> Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying: >>> MySQL server has gone away >>> >>> Our mail mysql servers are busy enough that wait_timeout is set to a >>> whopping 30 seconds. On my regular boxes, I see a good deal of these >>> in the logs. I've been doing a lot of mucking with doveadm/dsync >>> (working on maildir->mdbox migration finally, yay!) on test boxes >>> (same dovecot package & version) and when I get this error, despite >>> the log saying it's retrying, it doesn't seem to be. Instead I get: >>> >>> dsync(root): Error: user ...: Auth USER lookup failed >> >> Try with only one host in the "connect" string? My guess: Both the >> connections have timed out, and the retrying fails as well (there is only >> one retry). Although if the retrying lookup fails, there should be an error >> logged about it also (you don't see one?) >> >> Also another idea to avoid them in the first place: >> >> service auth-worker { >> idle_kill = 20 >> } >> > > With just one 'connect' host, it seems to reconnect just fine (using > the same tests as above) and I'm not seeing the same error. It worked > every time that I tried, with no complaints of "MySQL server has gone > away". > > If there are multiple hosts, it seems like the most robust thing to do > would be to exhaust the existing connections and if none of those > succeed, then start a new connection to one of them. It will probably > result in much more convoluted logic but it'd probably match better > what people expect from a retry. > > Alternatively, since in all my tests, the mysql server has closed the > connection prior to this, is the auth worker not recognizing its > connection is already half-closed (in which case, it probably > shouldn't even consider it a legitimate connection and just > automatically reconnect, i.e. try #1, not the retry, which would > happen after another failure). > > I'll give the idle_kill a try too. I kind of like the idea of > idle_kill for auth processes anyway, just to free up some connections > on the mysql server. by the way , if you use sql for auth have you tried auth caching ? http://wiki.dovecot.org/Authentication/Caching i.e. # Authentication cache size (e.g. 10M). 0 means it's disabled. Note that # bsdauth, PAM and vpopmail require cache_key to be set for caching to be used. auth_cache_size = 10M # Time to live for cached data. After TTL expires the cached record is no # longer used, *except* if the main database lookup returns internal failure. # We also try to handle password changes automatically: If user's previous # authentication was successful, but this one wasn't, the cache isn't used. # For now this works only with plaintext authentication. auth_cache_ttl = 1 hour # TTL for negative hits (user not found, password mismatch). # 0 disables caching them completely. auth_cache_negative_ttl = 0 -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
Re: [Dovecot] MySQL server has gone away
On Fri, Jan 13, 2012 at 1:36 AM, Timo Sirainen wrote: > On 13.1.2012, at 4.00, Mark Moseley wrote: > >> I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL >> server has gone away" errors, despite having multiple hosts defined in >> my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing >> the same thing with 2.0.16 on Debian Squeeze 64-bit. >> >> E.g.: >> >> Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying: >> MySQL server has gone away >> >> Our mail mysql servers are busy enough that wait_timeout is set to a >> whopping 30 seconds. On my regular boxes, I see a good deal of these >> in the logs. I've been doing a lot of mucking with doveadm/dsync >> (working on maildir->mdbox migration finally, yay!) on test boxes >> (same dovecot package & version) and when I get this error, despite >> the log saying it's retrying, it doesn't seem to be. Instead I get: >> >> dsync(root): Error: user ...: Auth USER lookup failed > > Try with only one host in the "connect" string? My guess: Both the > connections have timed out, and the retrying fails as well (there is only one > retry). Although if the retrying lookup fails, there should be an error > logged about it also (you don't see one?) > > Also another idea to avoid them in the first place: > > service auth-worker { > idle_kill = 20 > } > With just one 'connect' host, it seems to reconnect just fine (using the same tests as above) and I'm not seeing the same error. It worked every time that I tried, with no complaints of "MySQL server has gone away". If there are multiple hosts, it seems like the most robust thing to do would be to exhaust the existing connections and if none of those succeed, then start a new connection to one of them. It will probably result in much more convoluted logic but it'd probably match better what people expect from a retry. Alternatively, since in all my tests, the mysql server has closed the connection prior to this, is the auth worker not recognizing its connection is already half-closed (in which case, it probably shouldn't even consider it a legitimate connection and just automatically reconnect, i.e. try #1, not the retry, which would happen after another failure). I'll give the idle_kill a try too. I kind of like the idea of idle_kill for auth processes anyway, just to free up some connections on the mysql server.
Re: [Dovecot] MySQL server has gone away
On 13.1.2012, at 4.00, Mark Moseley wrote: > I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL > server has gone away" errors, despite having multiple hosts defined in > my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing > the same thing with 2.0.16 on Debian Squeeze 64-bit. > > E.g.: > > Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying: > MySQL server has gone away > > Our mail mysql servers are busy enough that wait_timeout is set to a > whopping 30 seconds. On my regular boxes, I see a good deal of these > in the logs. I've been doing a lot of mucking with doveadm/dsync > (working on maildir->mdbox migration finally, yay!) on test boxes > (same dovecot package & version) and when I get this error, despite > the log saying it's retrying, it doesn't seem to be. Instead I get: > > dsync(root): Error: user ...: Auth USER lookup failed Try with only one host in the "connect" string? My guess: Both the connections have timed out, and the retrying fails as well (there is only one retry). Although if the retrying lookup fails, there should be an error logged about it also (you don't see one?) Also another idea to avoid them in the first place: service auth-worker { idle_kill = 20 }
Re: [Dovecot] MySQL server has gone away
On 1/12/2012 6:00 PM, Mark Moseley wrote: Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying: MySQL server has gone away I've actually been meaning to send a similar message for the last couple of months :). We run dovecot solely as a sasl authentication provider to postfix for smtp authentication. We're currently running 2.0.15 with a handful of patches from a few months ago when Timo fixed mysql failover. We also see sporadic messages like that in the logs: Jan 11 01:00:57 sparky dovecot: auth-worker: Error: mysql: Query failed, retrying: MySQL server has gone away We do have a timeout on the mysql servers, so I don't necessarily mind this message, except we also see some number of these: Jan 11 01:00:57 sparky dovecot: auth-worker: Error: sql(clgeurts,108.38.64.98): Password query failed: MySQL server has gone away The mysql servers have never been down or unresponsive, if it retries, it should succeed. I'm not sure what's happening here, perhaps it tries the query on one mysql server connection (we have two configured) which has timed out, and then tries the other one, and if the other one has also timed out just fails? I also see some auth timeouts: Jan 11 22:06:02 sparky dovecot: auth: CRAM-MD5(?,200.37.175.14): Request 10232.28 timeouted after 150 secs, state=2 I'm not sure if they're related to the mysql timeouts. There are also some postfix auth errors: Jan 11 23:55:41 sparky postfix/smtpd[20994]: warning: unknown[200.37.175.14]: SASL CRAM-MD5 authentication failed: Connection lost to authentication server Which I think happen when dovecot takes too long to respond. I haven't had time to dig into it or get any debugging info, but just thought I'd pipe up when I saw your similar question :). -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768