Re: [Dovecot] MySQL server has gone away

2012-01-20 Thread Paul B. Henson

On 1/20/2012 1:06 PM, Timo Sirainen wrote:


Hmh. Still doesn't work 100%:

auth-worker(28788): Error: mysql: Query failed, retrying: MySQL
server has gone away (idled for 181 secs) auth-worker(7413): Error:
mysql: Query failed, retrying: MySQL server has gone away (idled for
298 secs)

I'm not really sure why it's not killing itself after 60 seconds of
idling. Probably related to how mysql code tracks idle time and how
idle_kill tracks it.. Anyway, those errors are much more rare now.


The mysql server starts tracking idle time as beginning after the last 
network communication with the client. So presumably if the auth worker 
gets marked as not idle by anything not involving interaction with the 
mysql server, they could get out of sync.


Before you posted a potential fix to the idle timeout, I was looking at 
other possible ways to resolve the issue. Currently, an authentication 
request is tried exactly twice -- one initial try, and one retry.


Looking at driver-sqlpool.c:

if (result-failed_try_retry  !request-retried) {

Currently, retried is a boolean. What if retried was an integer instead, 
and a new configuration variable allowed you to specify how many times 
an authentication attempt should be retried? The default could be 2, 
which would result in exactly the same behavior. But then you could set 
it to 3 or 4 to prevent a request from hitting a timed out connection 
twice and failing completely.


Ideally, a better fix would be for the client not to consider a MySQL 
server has gone away return as a failure, but instead immediately 
reconnect and try again without marking it as a retry. However, from 
reviewing the code, that would be a much more difficult and invasive 
change. Changing the existing retried variable to an integer count 
rather than a boolean is pretty simple.



--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] MySQL server has gone away

2012-01-20 Thread Paul B. Henson
On Fri, Jan 20, 2012 at 09:16:57AM -0800, Timo Sirainen wrote:

 This is fixed in v2.1 hg. The default idle_kill of 60 seconds seems to
 have gotten rid of the MySQL server has gone away errors completely.
 So I guess the problem was that during some peak times a ton of auth
 worker processes were created, but afterwards they weren't used until
 the next peak happened, and then they failed.
 
 http://hg.dovecot.org/dovecot-2.1/rev/3963862a4086
 http://hg.dovecot.org/dovecot-2.1/rev/58556a90259f

Hmm, I tried to apply this to 2.0.17, and that didn't really work out.
Before I spend too much time trying to hand port the changes, do you
know off hand if they simply won't apply to 2.0.17 due to other changes
made since then? It looks like 2.1 might be out soon, I guess maybe I
should just wait for that.

Thanks...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] MySQL server has gone away

2012-01-15 Thread Paul B. Henson
On Sat, Jan 14, 2012 at 12:01:12AM -0800, Robert Schetterer wrote:

  Hmm, hadn't tried that, but flipped it on to see how it might work out.
  The only tradeoff is a potential delay between when an account is
  disabled and when it can stop authenticating. I set the timeout to 10
  minutes for now, with an hour timeout for negative caching.
 
 dont know if i unserstand you right

Before I turned on auth caching, every attempted authentication hit our
mysql database, which in addition to the password itself contains a flag
indicating whether or not the account is enabled. So if somebody was
abusing smtp authentication, our helpdesk could disable their account,
and it would *immediately* stop working. Whereas with authentication
caching enabled, there is a window the size of the ttl where an account
that has been disabled can continue to successfully authenticate.

  That page says you can send a USR2 signal to the auth process for cache
  stats? That doesn't seem to work. OTOH, that page is for version 1, not
  2; is there some other way to generate cache stats in version 2?
 
 auth cache works with dove 2, no idea about dove 1 ,didnt test, but i
 guess it does

I'm using dovecot 2; my question was that the documentation for dovecot
1 described a way to make dovecot dump the authentication cache
statistics that doesn't seem to work for dovecot 2, and if there was
some other way to get the cache statistics in dovecot 2.

Thanks...


Re: [Dovecot] MySQL server has gone away

2012-01-13 Thread Paul B. Henson
On Fri, Jan 13, 2012 at 01:36:38AM -0800, Timo Sirainen wrote:

 Also another idea to avoid them in the first place:
 
 service auth-worker {
   idle_kill = 20
 }

Ah, set the auth-worker timeout to less than the mysql timeout to
prevent a stale mysql connection from ever being used. I'll try that,
thanks.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] MySQL server has gone away

2012-01-13 Thread Paul B. Henson
On Fri, Jan 13, 2012 at 11:38:28AM -0800, Robert Schetterer wrote:

 by the way , if you use sql for auth have you tried auth caching ?
 
 http://wiki.dovecot.org/Authentication/Caching

Hmm, hadn't tried that, but flipped it on to see how it might work out.
The only tradeoff is a potential delay between when an account is
disabled and when it can stop authenticating. I set the timeout to 10
minutes for now, with an hour timeout for negative caching.

That page says you can send a USR2 signal to the auth process for cache
stats? That doesn't seem to work. OTOH, that page is for version 1, not
2; is there some other way to generate cache stats in version 2?

Thanks...

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] MySQL server has gone away

2012-01-13 Thread Paul B. Henson

On 1/13/2012 10:29 AM, Mark Moseley wrote:


connection prior to this, is the auth worker not recognizing its
connection is already half-closed (in which case, it probably
shouldn't even consider it a legitimate connection and just
automatically reconnect, i.e. try #1, not the retry, which would
happen after another failure).


I don't think there's any way to tell from the mysql api that the server 
has closed the connection short of trying to use it and getting that 
specific error. I suppose that specific error could be special cased as 
an immediate try again with no penalty rather than considered a failure.



--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] MySQL server has gone away

2012-01-12 Thread Paul B. Henson

On 1/12/2012 6:00 PM, Mark Moseley wrote:


Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying:
MySQL server has gone away


I've actually been meaning to send a similar message for the last couple 
of months :).


We run dovecot solely as a sasl authentication provider to postfix for 
smtp authentication. We're currently running 2.0.15 with a handful of 
patches from a few months ago when Timo fixed mysql failover.


We also see sporadic messages like that in the logs:

Jan 11 01:00:57 sparky dovecot: auth-worker: Error: mysql: Query failed, 
retrying: MySQL server has gone away


We do have a timeout on the mysql servers, so I don't necessarily mind 
this message, except we also see some number of these:


Jan 11 01:00:57 sparky dovecot: auth-worker: Error: 
sql(clgeurts,108.38.64.98): Password query failed: MySQL server has gone 
away


The mysql servers have never been down or unresponsive, if it retries, 
it should succeed. I'm not sure what's happening here, perhaps it tries 
the query on one mysql server connection (we have two configured) which 
has timed out, and then tries the other one, and if the other one has 
also timed out just fails?


I also see some auth timeouts:

Jan 11 22:06:02 sparky dovecot: auth: CRAM-MD5(?,200.37.175.14): Request 
10232.28 timeouted after 150 secs, state=2


I'm not sure if they're related to the mysql timeouts.

There are also some postfix auth errors:

Jan 11 23:55:41 sparky postfix/smtpd[20994]: warning: 
unknown[200.37.175.14]: SASL CRAM-MD5 authentication failed: Connection 
lost to authentication server


Which I think happen when dovecot takes too long to respond.


I haven't had time to dig into it or get any debugging info, but just 
thought I'd pipe up when I saw your similar question :).



--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] dovecot2 auth-worker socket perms ignoring assigned ownership settings in conf.d/10-master.conf?

2011-10-11 Thread Paul B. Henson
On Tue, Oct 11, 2011 at 08:20:13PM -0700, mephistophe...@operamail.com wrote:

 Maybe being too literal, or misunderstanding your 'extra', I changed to,
 
Hmm, I just cut-and-pasted my config :), the missing piece was the
unix_listener subconfig user, the user/group part in the service config
didn't need to match mine exactly, although I think
$default_internal_user is dovecot anyway.

   chown doveauth:dovecot /var/run/dovecot/auth-worker

Hmm, perhaps I misunderstood you? I thought you were trying to get SASL
auth working with postfix? But you're demonstrating an imap connection.

Ah, yes, I see in your original email you showed an imap connection too.
I just saw the /var/spool/postfix/private/auth and user/group postfix
parts of the config and made an assumption.

My config was for using Dovecot *just* to provide SASL authentication
services to postfix for smtp auth, I'm not using any of its other
features/services.

Sorry for any confusion.

I'm curious though, why are you setting the auth stuff up to be owned by
postfix if you'd trying to authenticate dovecot imap processes? It seems
you're mixing two different configs.

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] mysql auth failover failing

2011-09-16 Thread Paul B. Henson

On 9/16/2011 2:21 AM, Timo Sirainen wrote:


As far as I can tell, it still doesn't do load balancing.


Oh. http://hg.dovecot.org/dovecot-2.0/rev/327698228158 should finally
fix it. :)


I installed the new 2.0.15 release including this change, and can 
confirm it does now successfully load balance across my two servers. Not 
only that, but with this change, there are no failed authentications at 
all when one of the servers goes away :). I have it running on one of my 
three production mail servers now, and barring any unexpected issues 
will deploy it on the other two next week, and then we'll be sitting 
pretty ;).


Thanks again...


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] mysql auth failover failing

2011-09-15 Thread Paul B. Henson

On 9/15/2011 3:53 AM, Timo Sirainen wrote:


I did several fixes related to this in v2.0 hg.


I patched version 2.0.13 with these fixes and tested it out.

As far as I can tell, it still doesn't do load balancing. When started, 
it only connects to the primary server, and as long as that server is 
available never seems to try and connect to the other one. However, the 
failover is much better. There are a few failed authentications when the 
primary server first becomes unavailable (seems to depend on load; under 
a light load, only a couple fail, the heavier the load, the more fail). 
After that blip though, authentications work fine.


Thanks much for your help resolving this issue, I greatly appreciate it.


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] mysql auth failover failing

2011-09-12 Thread Paul B. Henson

On 9/12/2011 5:30 AM, Timo Sirainen wrote:


This works okay enough with PostgreSQL because it does asynchronous
lookups, so two simultaneous lookups create a second connection.
MySQL does synchronous lookups though, so the second connection is
normally never created.


If I could, I think I'd rather run postgres; but so many things only
support mysql you can't really get away with running only postgres, and
it's not worth the effort to run two separate sql services sigh.


I suppose the fix to this would be to always connect to all SQL
servers at startup.


Perhaps it could be an option, either load balancing between all
available servers, or only using later listed servers when the earlier
listed ones are failing. For my purposes, either way is fine, as long as
authentications don't fail :). The other contributor to this thread, who
has a local mysql replica listed first and the central master listed
second probably wouldn't want the load balanced between them.


It should have created the second connection here and not fail..


Based on the network traffic, it is really pounding the primary trying
to connect, and occasionally connecting to the secondary only to
immediately disconnect after either only one or very few queries.


I'll try to debug this soon.


Thanks; let me know if there's anything I could do to help, or if there
are any potential fixes you would like tested.


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


Re: [Dovecot] mysql auth failover failing

2011-09-11 Thread Paul B. Henson
On Sat, Sep 10, 2011 at 01:49:59AM -0700, Noel Butler wrote:
 
 Sounds like you have bigger issues, maybe relating as to why the primary
 fails?

For testing purposes, it fails because I stick a firewall rule in place
preventing access to it ;). In production, it came to our attention
because a hardware failure required downtime on one of the mysql servers
to replace parts, and we received complaints of failed authentications
while it was down. In general, both are up, but things using them need
to be able to survive when one is down.

 primary (local slave copy) has gone away unless I'm deliberately
 upgrading mysql ) when doing so (tested) it hits the master server (as
 in secondary host=) right away, no auth failures.

Hmm, what version of dovecot are you using? In version 1 failover
seems to work if the primary returns connection refused (which your
scenario would). In version 2, it seems flaky for both connection
refused and connection timed out. Unless I've got something
misconfigured, but there doesn't seem to be that much to it...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768


[Dovecot] mysql auth failover failing

2011-09-09 Thread Paul B. Henson
 as long as they do not continuously 
fail while the server is down.


Am I doing something wrong? Does the example sql config have incorrect 
information?


We were previously running dovecot 1.2.11, we just recently upgraded to 
2. In the previous version, we actually had two different passdb's 
configured, each one listing only one of the mysql servers. I seem to 
recall that was the recommendation at the time for high-availability. 
When that configuration did not seem to work under version 2, I found an 
updated recommendation to list both servers in the same passdb, which 
also does not appear to work correctly. I actually went back and tested 
the older version, and determined it seemed to work okay in the case 
where the server was up but the service was down, and connections were 
refused, but also failed a large number of authentication attempts when 
the server was completely down and connections were timing out.


Thanks much...

--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768