Re: [Dovecot] MySQL server has gone away / dict ? tests done

2012-02-15 Thread Robert Schetterer
Am 15.02.2012 20:10, schrieb Robert Schetterer:
> Am 15.02.2012 19:43, schrieb Timo Sirainen:
>> On 15.2.2012, at 20.22, Robert Schetterer wrote:
>>
>>> sorry for some more question
>>> what is the default behave if dict quota over sql cant be reached
>>> i.e with lmtp
>>>
>>> something like log warning and deliver anyway ?
>>
>> I think it tempfails. Try.
> 
> hm, test servers arent up recently
> cant do it on production
> 
> but i will test before i go try dove 2.1
> and report, @list perhaps anyone other having it tested before?
>>
> 

ok the test server was good enough for small tests
with 2.0.13 dove


private/dovecot-lmtp] said: 451 4.2.0 <@com> Internal error
occurred. Refer to server log for more information.

dict: Error: mysql(localhost2): Connect failed to database (postfix):
Unknown MySQL server host '...' (1) - waiting for 125 seconds before retry

youre right looks like default is tempfail 451

so it stays in postfix  queue
and is permanent retry

after sql comming back , its getting delivered
i.e force with Flush Queue

so thats good enough for me i think
-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


Re: [Dovecot] MySQL server has gone away / dict ?

2012-02-15 Thread Robert Schetterer
Am 15.02.2012 19:43, schrieb Timo Sirainen:
> On 15.2.2012, at 20.22, Robert Schetterer wrote:
> 
>> sorry for some more question
>> what is the default behave if dict quota over sql cant be reached
>> i.e with lmtp
>>
>> something like log warning and deliver anyway ?
> 
> I think it tempfails. Try.

hm, test servers arent up recently
cant do it on production

but i will test before i go try dove 2.1
and report, @list perhaps anyone other having it tested before?
> 

-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


Re: [Dovecot] MySQL server has gone away / dict ?

2012-02-15 Thread Timo Sirainen
On 15.2.2012, at 20.22, Robert Schetterer wrote:

> sorry for some more question
> what is the default behave if dict quota over sql cant be reached
> i.e with lmtp
> 
> something like log warning and deliver anyway ?

I think it tempfails. Try.



Re: [Dovecot] MySQL server has gone away / dict ?

2012-02-15 Thread Robert Schetterer
Am 15.02.2012 17:10, schrieb Robert Schetterer:
> Am 15.02.2012 14:07, schrieb Timo Sirainen:
>> On Wed, 2012-02-15 at 09:10 +0100, Robert Schetterer wrote:
>>> Hi Timo,just to make sure
>>> i have an extra
>>> dovecot-dict-quota-sql.conf.ext ( dove 2.0.18 )
>>>
>>> connect = host=192.168.123.150 dbname=.. user=... password=...
>> ..
>>> is it possible to have i.e
>>>
>>> connect = host=192.168.123.150 host=127.0.0.1 dbname=.. user=...
>>> password=...
>>>
>>> there too?
>>
>> Assuming your SQL database is configured to do synchronous multi-master
>> replication, sure.
>>
>>
> jep ,thx Timo
> 
Hi Timo,
sorry for some more question
what is the default behave if dict quota over sql cant be reached
i.e with lmtp

something like log warning and deliver anyway ?

-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


Re: [Dovecot] MySQL server has gone away / dict ?

2012-02-15 Thread Robert Schetterer
Am 15.02.2012 14:07, schrieb Timo Sirainen:
> On Wed, 2012-02-15 at 09:10 +0100, Robert Schetterer wrote:
>> Hi Timo,just to make sure
>> i have an extra
>> dovecot-dict-quota-sql.conf.ext ( dove 2.0.18 )
>>
>> connect = host=192.168.123.150 dbname=.. user=... password=...
> ..
>> is it possible to have i.e
>>
>> connect = host=192.168.123.150 host=127.0.0.1 dbname=.. user=...
>> password=...
>>
>> there too?
> 
> Assuming your SQL database is configured to do synchronous multi-master
> replication, sure.
> 
> 
jep ,thx Timo

-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


Re: [Dovecot] MySQL server has gone away / dict ?

2012-02-15 Thread Timo Sirainen
On Wed, 2012-02-15 at 09:10 +0100, Robert Schetterer wrote:
> Hi Timo,just to make sure
> i have an extra
> dovecot-dict-quota-sql.conf.ext ( dove 2.0.18 )
> 
> connect = host=192.168.123.150 dbname=.. user=... password=...
..
> is it possible to have i.e
> 
> connect = host=192.168.123.150 host=127.0.0.1 dbname=.. user=...
> password=...
> 
> there too?

Assuming your SQL database is configured to do synchronous multi-master
replication, sure.




Re: [Dovecot] MySQL server has gone away / dict ?

2012-02-15 Thread Robert Schetterer
Am 28.01.2012 22:06, schrieb Robert Schetterer:
> Am 28.01.2012 21:07, schrieb Timo Sirainen:
>> On 13.1.2012, at 20.29, Mark Moseley wrote:
>>
>>> If there are multiple hosts, it seems like the most robust thing to do
>>> would be to exhaust the existing connections and if none of those
>>> succeed, then start a new connection to one of them. It will probably
>>> result in much more convoluted logic but it'd probably match better
>>> what people expect from a retry.
>>
>> Done: http://hg.dovecot.org/dovecot-2.0/rev/4e7676b890f1
>>
> 
> Hi Timo
> 
> doc/example-config/dovecot-sql.conf.ext
> from hg
> has something like
> 
> # Database connection string. This is driver-specific setting.
> # HA / round-robin load-balancing is supported by giving multiple host
> # settings, like: host=sql1.host.org host=sql2.host.org
> 
> but i dont find it in
> http://wiki2.dovecot.org/AuthDatabase/SQL
> 
> 

Hi Timo,just to make sure
i have an extra
dovecot-dict-quota-sql.conf.ext ( dove 2.0.18 )

connect = host=192.168.123.150 dbname=.. user=... password=...


map {
  pattern = priv/quota/storage
  table = quota2
  username_field = username
  value_field = bytes
}
map {
  pattern = priv/quota/messages
  table = quota2
  username_field = username
  value_field = messages
}

is it possible to have i.e

connect = host=192.168.123.150 host=127.0.0.1 dbname=.. user=...
password=...

there too?

-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


Re: [Dovecot] MySQL server has gone away

2012-01-28 Thread Mark Moseley
On Sat, Jan 28, 2012 at 12:07 PM, Timo Sirainen  wrote:
> On 13.1.2012, at 20.29, Mark Moseley wrote:
>
>> If there are multiple hosts, it seems like the most robust thing to do
>> would be to exhaust the existing connections and if none of those
>> succeed, then start a new connection to one of them. It will probably
>> result in much more convoluted logic but it'd probably match better
>> what people expect from a retry.
>
> Done: http://hg.dovecot.org/dovecot-2.0/rev/4e7676b890f1
>

Excellent, thanks!


Re: [Dovecot] MySQL server has gone away

2012-01-28 Thread Robert Schetterer
Am 28.01.2012 23:17, schrieb Timo Sirainen:
> On 28.1.2012, at 23.06, Robert Schetterer wrote:
> 
>> doc/example-config/dovecot-sql.conf.ext
>> from hg
>> has something like
>>
>> # Database connection string. This is driver-specific setting.
>> # HA / round-robin load-balancing is supported by giving multiple host
>> # settings, like: host=sql1.host.org host=sql2.host.org
>>
>> but i dont find it in
>> http://wiki2.dovecot.org/AuthDatabase/SQL
> 
> I added something about it there.
> 

cool thanks !

-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


Re: [Dovecot] MySQL server has gone away

2012-01-28 Thread Timo Sirainen
On 28.1.2012, at 23.06, Robert Schetterer wrote:

> doc/example-config/dovecot-sql.conf.ext
> from hg
> has something like
> 
> # Database connection string. This is driver-specific setting.
> # HA / round-robin load-balancing is supported by giving multiple host
> # settings, like: host=sql1.host.org host=sql2.host.org
> 
> but i dont find it in
> http://wiki2.dovecot.org/AuthDatabase/SQL

I added something about it there.



Re: [Dovecot] MySQL server has gone away

2012-01-28 Thread Robert Schetterer
Am 28.01.2012 21:07, schrieb Timo Sirainen:
> On 13.1.2012, at 20.29, Mark Moseley wrote:
> 
>> If there are multiple hosts, it seems like the most robust thing to do
>> would be to exhaust the existing connections and if none of those
>> succeed, then start a new connection to one of them. It will probably
>> result in much more convoluted logic but it'd probably match better
>> what people expect from a retry.
> 
> Done: http://hg.dovecot.org/dovecot-2.0/rev/4e7676b890f1
> 

Hi Timo

doc/example-config/dovecot-sql.conf.ext
from hg
has something like

# Database connection string. This is driver-specific setting.
# HA / round-robin load-balancing is supported by giving multiple host
# settings, like: host=sql1.host.org host=sql2.host.org

but i dont find it in
http://wiki2.dovecot.org/AuthDatabase/SQL


-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


Re: [Dovecot] MySQL server has gone away

2012-01-28 Thread Timo Sirainen
On 13.1.2012, at 20.29, Mark Moseley wrote:

> If there are multiple hosts, it seems like the most robust thing to do
> would be to exhaust the existing connections and if none of those
> succeed, then start a new connection to one of them. It will probably
> result in much more convoluted logic but it'd probably match better
> what people expect from a retry.

Done: http://hg.dovecot.org/dovecot-2.0/rev/4e7676b890f1



Re: [Dovecot] MySQL server has gone away

2012-01-20 Thread Paul B. Henson
On Fri, Jan 20, 2012 at 09:16:57AM -0800, Timo Sirainen wrote:

> This is fixed in v2.1 hg. The default idle_kill of 60 seconds seems to
> have gotten rid of the "MySQL server has gone away" errors completely.
> So I guess the problem was that during some peak times a ton of auth
> worker processes were created, but afterwards they weren't used until
> the next peak happened, and then they failed.
> 
> http://hg.dovecot.org/dovecot-2.1/rev/3963862a4086
> http://hg.dovecot.org/dovecot-2.1/rev/58556a90259f

Hmm, I tried to apply this to 2.0.17, and that didn't really work out.
Before I spend too much time trying to hand port the changes, do you
know off hand if they simply won't apply to 2.0.17 due to other changes
made since then? It looks like 2.1 might be out soon, I guess maybe I
should just wait for that.

Thanks...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] MySQL server has gone away

2012-01-20 Thread Paul B. Henson

On 1/20/2012 9:17 AM, Timo Sirainen wrote:


Works for me. Are you maybe sending it to wrong auth process (auth worker 
instead of master)?


I had tried sending it to both; but the underlying problem turned out to 
be that the updated config hadn't actually been deployed yet 8-/ oops. 
Once I fixed that, sending the signal did generate the log output. 
Evidently nothing is printed out in the case where the authentication 
caching isn't enabled; maybe you should make it print out something like 
"Hey idiot, caching isn't turned on" ;).


Thanks...


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] MySQL server has gone away

2012-01-20 Thread Paul B. Henson

On 1/20/2012 1:06 PM, Timo Sirainen wrote:


Hmh. Still doesn't work 100%:

auth-worker(28788): Error: mysql: Query failed, retrying: MySQL
server has gone away (idled for 181 secs) auth-worker(7413): Error:
mysql: Query failed, retrying: MySQL server has gone away (idled for
298 secs)

I'm not really sure why it's not killing itself after 60 seconds of
idling. Probably related to how mysql code tracks idle time and how
idle_kill tracks it.. Anyway, those errors are much more rare now.


The mysql server starts tracking idle time as beginning after the last 
network communication with the client. So presumably if the auth worker 
gets marked as not idle by anything not involving interaction with the 
mysql server, they could get out of sync.


Before you posted a potential fix to the idle timeout, I was looking at 
other possible ways to resolve the issue. Currently, an authentication 
request is tried exactly twice -- one initial try, and one retry.


Looking at driver-sqlpool.c:

if (result->failed_try_retry && !request->retried) {

Currently, retried is a boolean. What if retried was an integer instead, 
and a new configuration variable allowed you to specify how many times 
an authentication attempt should be retried? The default could be 2, 
which would result in exactly the same behavior. But then you could set 
it to 3 or 4 to prevent a request from hitting a timed out connection 
twice and failing completely.


Ideally, a better fix would be for the client not to consider a "MySQL 
server has gone away" return as a failure, but instead immediately 
reconnect and try again without marking it as a retry. However, from 
reviewing the code, that would be a much more difficult and invasive 
change. Changing the existing retried variable to an integer count 
rather than a boolean is pretty simple.



--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] MySQL server has gone away

2012-01-20 Thread Timo Sirainen
On 20.1.2012, at 19.16, Timo Sirainen wrote:

> This is fixed in v2.1 hg. The default idle_kill of 60 seconds seems to have 
> gotten rid of the "MySQL server has gone away" errors completely. So I guess 
> the problem was that during some peak times a ton of auth worker processes 
> were created, but afterwards they weren't used until the next peak happened, 
> and then they failed.

Hmh. Still doesn't work 100%:

auth-worker(28788): Error: mysql: Query failed, retrying: MySQL server has gone 
away (idled for 181 secs)
auth-worker(7413): Error: mysql: Query failed, retrying: MySQL server has gone 
away (idled for 298 secs)

I'm not really sure why it's not killing itself after 60 seconds of idling. 
Probably related to how mysql code tracks idle time and how idle_kill tracks 
it.. Anyway, those errors are much more rare now.



Re: [Dovecot] MySQL server has gone away

2012-01-20 Thread Timo Sirainen
On 14.1.2012, at 2.19, Paul B. Henson wrote:

> On Fri, Jan 13, 2012 at 11:38:28AM -0800, Robert Schetterer wrote:
> 
>> by the way , if you use sql for auth have you tried auth caching ?
>> 
>> http://wiki.dovecot.org/Authentication/Caching
> 
> That page says you can send a USR2 signal to the auth process for cache
> stats? That doesn't seem to work. OTOH, that page is for version 1, not
> 2; is there some other way to generate cache stats in version 2?

Works for me. Are you maybe sending it to wrong auth process (auth worker 
instead of master)?



Re: [Dovecot] MySQL server has gone away

2012-01-20 Thread Timo Sirainen
On 14.1.2012, at 1.19, Mark Moseley wrote:

>>> Also another idea to avoid them in the first place:
>>> 
>>> service auth-worker {
>>>   idle_kill = 20
>>> }
>> 
>> Ah, set the auth-worker timeout to less than the mysql timeout to
>> prevent a stale mysql connection from ever being used. I'll try that,
>> thanks.
> 
> I gave that a try. Sometimes it seems to kill off the auth-worker but
> not till after a minute or so (with idle_kill = 20). Other times, the
> worker stays around for more like 5 minutes (I gave up watching),
> despite being idle -- and I'm the only person connecting to it, so
> it's definitely idle. Does auth-worker perhaps only wake up every so
> often to check its idle status?

This is fixed in v2.1 hg. The default idle_kill of 60 seconds seems to have 
gotten rid of the "MySQL server has gone away" errors completely. So I guess 
the problem was that during some peak times a ton of auth worker processes were 
created, but afterwards they weren't used until the next peak happened, and 
then they failed.

http://hg.dovecot.org/dovecot-2.1/rev/3963862a4086
http://hg.dovecot.org/dovecot-2.1/rev/58556a90259f



Re: [Dovecot] MySQL server has gone away

2012-01-15 Thread Paul B. Henson
On Sat, Jan 14, 2012 at 12:01:12AM -0800, Robert Schetterer wrote:

> > Hmm, hadn't tried that, but flipped it on to see how it might work out.
> > The only tradeoff is a potential delay between when an account is
> > disabled and when it can stop authenticating. I set the timeout to 10
> > minutes for now, with an hour timeout for negative caching.
> 
> dont know if i unserstand you right

Before I turned on auth caching, every attempted authentication hit our
mysql database, which in addition to the password itself contains a flag
indicating whether or not the account is enabled. So if somebody was
abusing smtp authentication, our helpdesk could disable their account,
and it would *immediately* stop working. Whereas with authentication
caching enabled, there is a window the size of the ttl where an account
that has been disabled can continue to successfully authenticate.

> > That page says you can send a USR2 signal to the auth process for cache
> > stats? That doesn't seem to work. OTOH, that page is for version 1, not
> > 2; is there some other way to generate cache stats in version 2?
> 
> auth cache works with dove 2, no idea about dove 1 ,didnt test, but i
> guess it does

I'm using dovecot 2; my question was that the documentation for dovecot
1 described a way to make dovecot dump the authentication cache
statistics that doesn't seem to work for dovecot 2, and if there was
some other way to get the cache statistics in dovecot 2.

Thanks...


Re: [Dovecot] MySQL server has gone away

2012-01-14 Thread Robert Schetterer
Am 14.01.2012 01:19, schrieb Paul B. Henson:
> On Fri, Jan 13, 2012 at 11:38:28AM -0800, Robert Schetterer wrote:
> 
>> by the way , if you use sql for auth have you tried auth caching ?
>>
>> http://wiki.dovecot.org/Authentication/Caching
> 
> Hmm, hadn't tried that, but flipped it on to see how it might work out.
> The only tradeoff is a potential delay between when an account is
> disabled and when it can stop authenticating. I set the timeout to 10
> minutes for now, with an hour timeout for negative caching.

dont know if i unserstand you right

perhaps this is what you mean, i use this with/cause fail2ban

# TTL for negative hits (user not found, password mismatch).
# 0 disables caching them completely.

auth_cache_negative_ttl = 0


> 
> That page says you can send a USR2 signal to the auth process for cache
> stats? That doesn't seem to work. OTOH, that page is for version 1, not
> 2; is there some other way to generate cache stats in version 2?

auth cache works with dove 2, no idea about dove 1 ,didnt test, but i
guess it does

> 
> Thanks...
> 


-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


Re: [Dovecot] MySQL server has gone away

2012-01-13 Thread Paul B. Henson

On 1/13/2012 10:29 AM, Mark Moseley wrote:


connection prior to this, is the auth worker not recognizing its
connection is already half-closed (in which case, it probably
shouldn't even consider it a legitimate connection and just
automatically reconnect, i.e. try #1, not the retry, which would
happen after another failure).


I don't think there's any way to tell from the mysql api that the server 
has closed the connection short of trying to use it and getting that 
specific error. I suppose that specific error could be special cased as 
an immediate "try again with no penalty" rather than considered a failure.



--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] MySQL server has gone away

2012-01-13 Thread Paul B. Henson
On Fri, Jan 13, 2012 at 11:38:28AM -0800, Robert Schetterer wrote:

> by the way , if you use sql for auth have you tried auth caching ?
> 
> http://wiki.dovecot.org/Authentication/Caching

Hmm, hadn't tried that, but flipped it on to see how it might work out.
The only tradeoff is a potential delay between when an account is
disabled and when it can stop authenticating. I set the timeout to 10
minutes for now, with an hour timeout for negative caching.

That page says you can send a USR2 signal to the auth process for cache
stats? That doesn't seem to work. OTOH, that page is for version 1, not
2; is there some other way to generate cache stats in version 2?

Thanks...

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] MySQL server has gone away

2012-01-13 Thread Mark Moseley
On Fri, Jan 13, 2012 at 2:46 PM, Paul B. Henson  wrote:
> On Fri, Jan 13, 2012 at 01:36:38AM -0800, Timo Sirainen wrote:
>
>> Also another idea to avoid them in the first place:
>>
>> service auth-worker {
>>   idle_kill = 20
>> }
>
> Ah, set the auth-worker timeout to less than the mysql timeout to
> prevent a stale mysql connection from ever being used. I'll try that,
> thanks.

I gave that a try. Sometimes it seems to kill off the auth-worker but
not till after a minute or so (with idle_kill = 20). Other times, the
worker stays around for more like 5 minutes (I gave up watching),
despite being idle -- and I'm the only person connecting to it, so
it's definitely idle. Does auth-worker perhaps only wake up every so
often to check its idle status?

To test, I kicked off a dsync, then grabbed a netstat:

tcp0  0 10.1.15.129:40070   10.1.52.47:3306
ESTABLISHED 29146/auth worker [
tcp0  0 10.1.15.129:33369   10.1.52.48:3306
ESTABLISHED 29146/auth worker [
tcp0  0 10.1.15.129:54083   10.1.52.49:3306
ESTABLISHED 29146/auth worker [

then kicked off this loop:

# while true; do date; ps p 29146 |tail -n1; sleep 1; done

Fri Jan 13 18:05:14 EST 2012
29146 ?S  0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb]
Fri Jan 13 18:05:15 EST 2012
29146 ?S  0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb]

  More lines of the loop ...

Fri Jan 13 18:05:35 EST 2012
29146 ?S  0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb]
18:05:36.252976 IP 10.1.52.48.3306 > 10.1.15.129.33369: F 77:77(0) ack
92 win 91 
18:05:36.288549 IP 10.1.15.129.33369 > 10.1.52.48.3306: . ack 78 win
913 
Fri Jan 13 18:05:36 EST 2012
29146 ?S  0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb]
18:05:37.196204 IP 10.1.52.49.3306 > 10.1.15.129.54083: F 806:806(0)
ack 1126 win 123 
18:05:37.228594 IP 10.1.15.129.54083 > 10.1.52.49.3306: . ack 807 win
1004 
18:05:37.411955 IP 10.1.52.47.3306 > 10.1.15.129.40070: F 806:806(0)
ack 1126 win 123 
18:05:37.448573 IP 10.1.15.129.40070 > 10.1.52.47.3306: . ack 807 win
1004 
Fri Jan 13 18:05:37 EST 2012
29146 ?S  0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb]

... more lines of the loop ...

Fri Jan 13 18:10:13 EST 2012
29146 ?S  0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb]
Fri Jan 13 18:10:14 EST 2012
29146 ?S  0:00 dovecot/auth worker [0 wait, 0 passdb, 0 userdb]
^C

at which point I bailed out. Looking again a couple of minutes later,
it was gone. Nothing else was going on and the logs don't show any
activity between 18:05:07 and 18:10:44.


Re: [Dovecot] MySQL server has gone away

2012-01-13 Thread Paul B. Henson
On Fri, Jan 13, 2012 at 01:36:38AM -0800, Timo Sirainen wrote:

> Also another idea to avoid them in the first place:
> 
> service auth-worker {
>   idle_kill = 20
> }

Ah, set the auth-worker timeout to less than the mysql timeout to
prevent a stale mysql connection from ever being used. I'll try that,
thanks.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] MySQL server has gone away

2012-01-13 Thread Mark Moseley
On Fri, Jan 13, 2012 at 11:38 AM, Robert Schetterer
 wrote:
> Am 13.01.2012 19:29, schrieb Mark Moseley:
>> On Fri, Jan 13, 2012 at 1:36 AM, Timo Sirainen  wrote:
>>> On 13.1.2012, at 4.00, Mark Moseley wrote:
>>>
 I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL
 server has gone away" errors, despite having multiple hosts defined in
 my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing
 the same thing with 2.0.16 on Debian Squeeze 64-bit.

 E.g.:

 Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying:
 MySQL server has gone away

 Our mail mysql servers are busy enough that wait_timeout is set to a
 whopping 30 seconds. On my regular boxes, I see a good deal of these
 in the logs. I've been doing a lot of mucking with doveadm/dsync
 (working on maildir->mdbox migration finally, yay!) on test boxes
 (same dovecot package & version) and when I get this error, despite
 the log saying it's retrying, it doesn't seem to be. Instead I get:

 dsync(root): Error: user ...: Auth USER lookup failed
>>>
>>> Try with only one host in the "connect" string? My guess: Both the 
>>> connections have timed out, and the retrying fails as well (there is only 
>>> one retry). Although if the retrying lookup fails, there should be an error 
>>> logged about it also (you don't see one?)
>>>
>>> Also another idea to avoid them in the first place:
>>>
>>> service auth-worker {
>>>  idle_kill = 20
>>> }
>>>
>>
>> With just one 'connect' host, it seems to reconnect just fine (using
>> the same tests as above) and I'm not seeing the same error. It worked
>> every time that I tried, with no complaints of "MySQL server has gone
>> away".
>>
>> If there are multiple hosts, it seems like the most robust thing to do
>> would be to exhaust the existing connections and if none of those
>> succeed, then start a new connection to one of them. It will probably
>> result in much more convoluted logic but it'd probably match better
>> what people expect from a retry.
>>
>> Alternatively, since in all my tests, the mysql server has closed the
>> connection prior to this, is the auth worker not recognizing its
>> connection is already half-closed (in which case, it probably
>> shouldn't even consider it a legitimate connection and just
>> automatically reconnect, i.e. try #1, not the retry, which would
>> happen after another failure).
>>
>> I'll give the idle_kill a try too. I kind of like the idea of
>> idle_kill for auth processes anyway, just to free up some connections
>> on the mysql server.
>
> by the way , if you use sql for auth have you tried auth caching ?
>
> http://wiki.dovecot.org/Authentication/Caching
>
> i.e.
>
> # Authentication cache size (e.g. 10M). 0 means it's disabled. Note that
> # bsdauth, PAM and vpopmail require cache_key to be set for caching to
> be used.
>
> auth_cache_size = 10M
>
> # Time to live for cached data. After TTL expires the cached record is no
> # longer used, *except* if the main database lookup returns internal
> failure.
> # We also try to handle password changes automatically: If user's previous
> # authentication was successful, but this one wasn't, the cache isn't used.
> # For now this works only with plaintext authentication.
>
> auth_cache_ttl = 1 hour
>
> # TTL for negative hits (user not found, password mismatch).
> # 0 disables caching them completely.
>
> auth_cache_negative_ttl = 0


Yup, we have caching turned on for our production boxes. On this
particular box, I'd just shut off caching so that I could work on a
script for converting from maildir->mdbox and run it repeatedly on the
same mailbox. I got tired of restarting dovecot between each test :)


Re: [Dovecot] MySQL server has gone away

2012-01-13 Thread Robert Schetterer
Am 13.01.2012 19:29, schrieb Mark Moseley:
> On Fri, Jan 13, 2012 at 1:36 AM, Timo Sirainen  wrote:
>> On 13.1.2012, at 4.00, Mark Moseley wrote:
>>
>>> I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL
>>> server has gone away" errors, despite having multiple hosts defined in
>>> my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing
>>> the same thing with 2.0.16 on Debian Squeeze 64-bit.
>>>
>>> E.g.:
>>>
>>> Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying:
>>> MySQL server has gone away
>>>
>>> Our mail mysql servers are busy enough that wait_timeout is set to a
>>> whopping 30 seconds. On my regular boxes, I see a good deal of these
>>> in the logs. I've been doing a lot of mucking with doveadm/dsync
>>> (working on maildir->mdbox migration finally, yay!) on test boxes
>>> (same dovecot package & version) and when I get this error, despite
>>> the log saying it's retrying, it doesn't seem to be. Instead I get:
>>>
>>> dsync(root): Error: user ...: Auth USER lookup failed
>>
>> Try with only one host in the "connect" string? My guess: Both the 
>> connections have timed out, and the retrying fails as well (there is only 
>> one retry). Although if the retrying lookup fails, there should be an error 
>> logged about it also (you don't see one?)
>>
>> Also another idea to avoid them in the first place:
>>
>> service auth-worker {
>>  idle_kill = 20
>> }
>>
> 
> With just one 'connect' host, it seems to reconnect just fine (using
> the same tests as above) and I'm not seeing the same error. It worked
> every time that I tried, with no complaints of "MySQL server has gone
> away".
> 
> If there are multiple hosts, it seems like the most robust thing to do
> would be to exhaust the existing connections and if none of those
> succeed, then start a new connection to one of them. It will probably
> result in much more convoluted logic but it'd probably match better
> what people expect from a retry.
> 
> Alternatively, since in all my tests, the mysql server has closed the
> connection prior to this, is the auth worker not recognizing its
> connection is already half-closed (in which case, it probably
> shouldn't even consider it a legitimate connection and just
> automatically reconnect, i.e. try #1, not the retry, which would
> happen after another failure).
> 
> I'll give the idle_kill a try too. I kind of like the idea of
> idle_kill for auth processes anyway, just to free up some connections
> on the mysql server.

by the way , if you use sql for auth have you tried auth caching ?

http://wiki.dovecot.org/Authentication/Caching

i.e.

# Authentication cache size (e.g. 10M). 0 means it's disabled. Note that
# bsdauth, PAM and vpopmail require cache_key to be set for caching to
be used.

auth_cache_size = 10M

# Time to live for cached data. After TTL expires the cached record is no
# longer used, *except* if the main database lookup returns internal
failure.
# We also try to handle password changes automatically: If user's previous
# authentication was successful, but this one wasn't, the cache isn't used.
# For now this works only with plaintext authentication.

auth_cache_ttl = 1 hour

# TTL for negative hits (user not found, password mismatch).
# 0 disables caching them completely.

auth_cache_negative_ttl = 0


-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


Re: [Dovecot] MySQL server has gone away

2012-01-13 Thread Mark Moseley
On Fri, Jan 13, 2012 at 1:36 AM, Timo Sirainen  wrote:
> On 13.1.2012, at 4.00, Mark Moseley wrote:
>
>> I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL
>> server has gone away" errors, despite having multiple hosts defined in
>> my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing
>> the same thing with 2.0.16 on Debian Squeeze 64-bit.
>>
>> E.g.:
>>
>> Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying:
>> MySQL server has gone away
>>
>> Our mail mysql servers are busy enough that wait_timeout is set to a
>> whopping 30 seconds. On my regular boxes, I see a good deal of these
>> in the logs. I've been doing a lot of mucking with doveadm/dsync
>> (working on maildir->mdbox migration finally, yay!) on test boxes
>> (same dovecot package & version) and when I get this error, despite
>> the log saying it's retrying, it doesn't seem to be. Instead I get:
>>
>> dsync(root): Error: user ...: Auth USER lookup failed
>
> Try with only one host in the "connect" string? My guess: Both the 
> connections have timed out, and the retrying fails as well (there is only one 
> retry). Although if the retrying lookup fails, there should be an error 
> logged about it also (you don't see one?)
>
> Also another idea to avoid them in the first place:
>
> service auth-worker {
>  idle_kill = 20
> }
>

With just one 'connect' host, it seems to reconnect just fine (using
the same tests as above) and I'm not seeing the same error. It worked
every time that I tried, with no complaints of "MySQL server has gone
away".

If there are multiple hosts, it seems like the most robust thing to do
would be to exhaust the existing connections and if none of those
succeed, then start a new connection to one of them. It will probably
result in much more convoluted logic but it'd probably match better
what people expect from a retry.

Alternatively, since in all my tests, the mysql server has closed the
connection prior to this, is the auth worker not recognizing its
connection is already half-closed (in which case, it probably
shouldn't even consider it a legitimate connection and just
automatically reconnect, i.e. try #1, not the retry, which would
happen after another failure).

I'll give the idle_kill a try too. I kind of like the idea of
idle_kill for auth processes anyway, just to free up some connections
on the mysql server.


Re: [Dovecot] MySQL server has gone away

2012-01-13 Thread Timo Sirainen
On 13.1.2012, at 4.00, Mark Moseley wrote:

> I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL
> server has gone away" errors, despite having multiple hosts defined in
> my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing
> the same thing with 2.0.16 on Debian Squeeze 64-bit.
> 
> E.g.:
> 
> Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying:
> MySQL server has gone away
> 
> Our mail mysql servers are busy enough that wait_timeout is set to a
> whopping 30 seconds. On my regular boxes, I see a good deal of these
> in the logs. I've been doing a lot of mucking with doveadm/dsync
> (working on maildir->mdbox migration finally, yay!) on test boxes
> (same dovecot package & version) and when I get this error, despite
> the log saying it's retrying, it doesn't seem to be. Instead I get:
> 
> dsync(root): Error: user ...: Auth USER lookup failed

Try with only one host in the "connect" string? My guess: Both the connections 
have timed out, and the retrying fails as well (there is only one retry). 
Although if the retrying lookup fails, there should be an error logged about it 
also (you don't see one?)

Also another idea to avoid them in the first place:

service auth-worker {
  idle_kill = 20
}



Re: [Dovecot] MySQL server has gone away

2012-01-12 Thread Paul B. Henson

On 1/12/2012 6:00 PM, Mark Moseley wrote:


Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying:
MySQL server has gone away


I've actually been meaning to send a similar message for the last couple 
of months :).


We run dovecot solely as a sasl authentication provider to postfix for 
smtp authentication. We're currently running 2.0.15 with a handful of 
patches from a few months ago when Timo fixed mysql failover.


We also see sporadic messages like that in the logs:

Jan 11 01:00:57 sparky dovecot: auth-worker: Error: mysql: Query failed, 
retrying: MySQL server has gone away


We do have a timeout on the mysql servers, so I don't necessarily mind 
this message, except we also see some number of these:


Jan 11 01:00:57 sparky dovecot: auth-worker: Error: 
sql(clgeurts,108.38.64.98): Password query failed: MySQL server has gone 
away


The mysql servers have never been down or unresponsive, if it retries, 
it should succeed. I'm not sure what's happening here, perhaps it tries 
the query on one mysql server connection (we have two configured) which 
has timed out, and then tries the other one, and if the other one has 
also timed out just fails?


I also see some auth timeouts:

Jan 11 22:06:02 sparky dovecot: auth: CRAM-MD5(?,200.37.175.14): Request 
10232.28 timeouted after 150 secs, state=2


I'm not sure if they're related to the mysql timeouts.

There are also some postfix auth errors:

Jan 11 23:55:41 sparky postfix/smtpd[20994]: warning: 
unknown[200.37.175.14]: SASL CRAM-MD5 authentication failed: Connection 
lost to authentication server


Which I think happen when dovecot takes too long to respond.


I haven't had time to dig into it or get any debugging info, but just 
thought I'd pipe up when I saw your similar question :).



--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


[Dovecot] MySQL server has gone away

2012-01-12 Thread Mark Moseley
I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL
server has gone away" errors, despite having multiple hosts defined in
my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing
the same thing with 2.0.16 on Debian Squeeze 64-bit.

E.g.:

Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying:
MySQL server has gone away

Our mail mysql servers are busy enough that wait_timeout is set to a
whopping 30 seconds. On my regular boxes, I see a good deal of these
in the logs. I've been doing a lot of mucking with doveadm/dsync
(working on maildir->mdbox migration finally, yay!) on test boxes
(same dovecot package & version) and when I get this error, despite
the log saying it's retrying, it doesn't seem to be. Instead I get:

dsync(root): Error: user ...: Auth USER lookup failed
dsync(root): Fatal: User lookup failed: Internal error occurred. Refer
to server log for more information.

Watching tcpdump at the same time, it looks like it's going through
some of the mysql servers, but all of them have by now disconnected
and are in CLOSE_WAIT.

Here's an (edited) example after doing a dsync that completes without
errors, with tcpdump running in the background:

# sleep 30; netstat -ant | grep 3306; dsync -C^ -u mail...@test.com
backup mdbox:~/mdbox

tcp1  0 10.1.15.129:57436   10.1.52.48:3306 CLOSE_WAIT
tcp1  0 10.1.15.129:49917   10.1.52.49:3306 CLOSE_WAIT
tcp1  0 10.1.15.129:35904   10.1.52.47:3306 CLOSE_WAIT

20:49:59.725005 IP 10.1.15.129.35904 > 10.1.52.47.3306: F 1126:1126(0)
ack 807 win 1004 
20:49:59.725459 IP 10.1.52.47.3306 > 10.1.15.129.35904: . ack 1127 win
123 
20:49:59.725568 IP 10.1.15.129.57436 > 10.1.52.48.3306: F 1126:1126(0)
ack 807 win 1004 
20:49:59.725779 IP 10.1.52.48.3306 > 10.1.15.129.57436: . ack 1127 win
123 

dsync(root): Error: user mail...@test.com: Auth USER lookup failed
dsync(root): Fatal: User lookup failed: Internal error occurred. Refer
to server log for more information.


10.1.15.129 in this case is the dovecot server, and the 10.1.52.0/24
boxes are mysql servers. That's the same pattern I've seen almost
every time. Just a FIN packet to two of the servers (ack'd by the
mysql server) and then it fails.

Is the retry mechanism supposed to transparently start a new
connection, or is this how it works? In connecting remotely to these
same servers (which aren't getting production traffic, so I'm the only
person connecting to them), I get seemingly random disconnects via
IMAP, always coinciding with a "MySQL server has gone away" error in
the logs.

This is non-production, so I'm happy to turn on whatever debugging
would be useful.

Here's doveconf -n from the box the tcpdump was on. This box is just
configured for lmtp (but have seen the same thing on one configured
for IMAP/POP as well), so it's pretty small, config-wise:

# 2.0.17: /etc/dovecot/dovecot/dovecot.conf
# OS: Linux 3.0.9-nx i686 Debian 5.0.9
auth_cache_negative_ttl = 0
auth_cache_ttl = 0
auth_debug = yes
auth_failure_delay = 0
base_dir = /var/run/dovecot/
debug_log_path = /var/log/dovecot/debug.log
default_client_limit = 3005
default_internal_user = doveauth
default_process_limit = 1500
deliver_log_format = M=%m, F=%f, S="%s" => %$
disable_plaintext_auth = no
first_valid_uid = 199
last_valid_uid = 201
lda_mailbox_autocreate = yes
listen = *
log_path = /var/log/dovecot/mail.log
mail_debug = yes
mail_fsync = always
mail_location = maildir:~/Maildir:INDEX=/var/cache/dovecot/%2Mu/%2.2Mu/%u
mail_nfs_index = yes
mail_nfs_storage = yes
mail_plugins = zlib quota
mail_privileged_group = mail
mail_uid = 200
managesieve_notify_capability = mailto
managesieve_sieve_capability = fileinto reject envelope
encoded-character vacation subaddress comparator-i;ascii-numeric
relational regex imap4flags copy include variables body enotify
environment mailbox date ihave
mdbox_rotate_interval = 1 days
mmap_disable = yes
namespace {
  hidden = no
  inbox = yes
  list = yes
  location =
  prefix = INBOX.
  separator = .
  subscriptions = yes
  type = private
}
passdb {
  args = /opt/dovecot/etc/lmtp/sql.conf
  driver = sql
}
plugin {
  info_log_path = /var/log/dovecot/dovecot-deliver.log
  log_path = /var/log/dovecot/dovecot-deliver.log
  quota = maildir:User quota
  quota_rule = *:bytes=25M
  quota_rule2 = INBOX.Trash:bytes=+10%%
  quota_rule3 = *:messages=3000
  sieve = ~/sieve/dovecot.sieve
  sieve_before = /etc/dovecot/scripts/spam.sieve
  sieve_dir = ~/sieve/
  zlib_save = gz
  zlib_save_level = 3
}
protocols = lmtp sieve
service auth-worker {
  unix_listener auth-worker {
mode = 0666
  }
  user = doveauth
}
service auth {
  client_limit = 8000
  unix_listener login/auth {
mode = 0666
  }
  user = doveauth
}
service lmtp {
  executable = lmtp -L
  process_min_avail = 10
  unix_listener lmtp {
mode = 0666
  }
}
ssl = no
userdb {
  driver = prefetch
}
userdb {
  args = /opt/dovecot/etc/lmtp/sql.conf
  driver = sql
}
verbo