Re: [Dovecot] (new) director issues in 2.1.10

2012-10-22 Thread Kelsey Cummings
On Mon, Oct 22, 2012 at 03:39:34PM +0300, Timo Sirainen wrote:
> On 26.9.2012, at 21.06, Kelsey Cummings wrote:
> 
> > 09:25:21 .. User X host lookup failed: Timeout - queued for 30 secs (Ring 
> > synced for 5032 secs)
> > 09:25:55 .. User X host lookup failed: Timeout - queued for 30 secs (Ring 
> > synced for 5066 secs, weak user, user refreshed 64 secs ago)
> > 09:26:28 .. User X host lookup failed: Timeout - queued for 30 secs (Ring 
> > synced for 5099 secs, weak user, user refreshed 97 secs ago)
> 
> Looks like I had broken this in v2.1.8. 
> http://hg.dovecot.org/dovecot-2.1/rev/e4c337f38ed6 fixes this. I also added a 
> bunch of other things to give better error messages and to try to fix any 
> unexpected problems.

Thanks Timo!

-- 
Kelsey Cummings - k...@corp.sonic.net  sonic.net, inc.
System Architect  2260 Apollo Way
707.522.1000  Santa Rosa, CA 95407


Re: [Dovecot] (new) director issues in 2.1.10

2012-10-22 Thread Timo Sirainen
On 26.9.2012, at 21.06, Kelsey Cummings wrote:

> 09:25:21 .. User X host lookup failed: Timeout - queued for 30 secs (Ring 
> synced for 5032 secs)
> 09:25:55 .. User X host lookup failed: Timeout - queued for 30 secs (Ring 
> synced for 5066 secs, weak user, user refreshed 64 secs ago)
> 09:26:28 .. User X host lookup failed: Timeout - queued for 30 secs (Ring 
> synced for 5099 secs, weak user, user refreshed 97 secs ago)

Looks like I had broken this in v2.1.8. 
http://hg.dovecot.org/dovecot-2.1/rev/e4c337f38ed6 fixes this. I also added a 
bunch of other things to give better error messages and to try to fix any 
unexpected problems.



Re: [Dovecot] (new) director issues in 2.1.10

2012-09-26 Thread Kelsey Cummings

On 09/26/12 11:06, Kelsey Cummings wrote:

No, there continued to be a mix of both.  The pattern seems to look like
this.  I'll run some stats later but it looks like a pretty significant
number of users where affected.


Timo, it looks like the total number of affected users was only about 
250 and that most of their erred connections were surrounded by 
successful sessions.


-K





Re: [Dovecot] (new) director issues in 2.1.10

2012-09-26 Thread Kelsey Cummings
On Wed, Sep 26, 2012 at 08:57:58PM +0300, Timo Sirainen wrote:
> On 26.9.2012, at 20.34, Kelsey Cummings wrote:
> 
> > The following errors on the directors that started after this went 
> > unnoticed until this AM.
> > 
> > director: User bb host lookup failed: Timeout - queued for 30 secs (Ring 
> > synced for 36 secs)
> > director: User cc host lookup failed: Timeout - queued for 48 secs (Ring 
> > synced for 66 secs, user refreshed 12 secs ago)
> > director: User dd host lookup failed: Timeout - queued for 124 secs (Ring 
> > synced for 119 secs, weak user, user refreshed 155 secs ago)
> > director: User ee host lookup failed: Timeout - queued for 79 secs (Ring 
> > synced for 119 secs, weak user, user refreshed 113 secs ago)
> > ...
> > User ff host lookup failed: Timeout - queued for 30 secs (Ring synced for 
> > 7427 secs, weak user, user refreshed 620 secs ago)
> > 
> > This continued, combined with occasional login timeouts (as reported by 
> > some internal imap clients.)  The login delays/timeouts got bad enough that 
> > our load balancers dropped both the servers while I was investigating. They 
> > seem to be okay after being restarted.
> 
> After the first few minutes, did all the rest of the error messages contain 
> "weak user" string? Did this happen to a lot of different users 
> (few/some/most)? director_user_expire setting is the default 15 minutes?

No, there continued to be a mix of both.  The pattern seems to look like
this.  I'll run some stats later but it looks like a pretty significant
number of users where affected.

09:25:21 .. User X host lookup failed: Timeout - queued for 30 secs (Ring 
synced for 5032 secs)
09:25:55 .. User X host lookup failed: Timeout - queued for 30 secs (Ring 
synced for 5066 secs, weak user, user refreshed 64 secs ago)
09:26:28 .. User X host lookup failed: Timeout - queued for 30 secs (Ring 
synced for 5099 secs, weak user, user refreshed 97 secs ago)


-- 
Kelsey Cummings - k...@corp.sonic.net  sonic.net, inc.
System Architect  2260 Apollo Way
707.522.1000  Santa Rosa, CA 95407


Re: [Dovecot] (new) director issues in 2.1.10

2012-09-26 Thread Timo Sirainen
On 26.9.2012, at 20.34, Kelsey Cummings wrote:

> The following errors on the directors that started after this went unnoticed 
> until this AM.
> 
> director: User bb host lookup failed: Timeout - queued for 30 secs (Ring 
> synced for 36 secs)
> director: User cc host lookup failed: Timeout - queued for 48 secs (Ring 
> synced for 66 secs, user refreshed 12 secs ago)
> director: User dd host lookup failed: Timeout - queued for 124 secs (Ring 
> synced for 119 secs, weak user, user refreshed 155 secs ago)
> director: User ee host lookup failed: Timeout - queued for 79 secs (Ring 
> synced for 119 secs, weak user, user refreshed 113 secs ago)
> ...
> User ff host lookup failed: Timeout - queued for 30 secs (Ring synced for 
> 7427 secs, weak user, user refreshed 620 secs ago)
> 
> This continued, combined with occasional login timeouts (as reported by some 
> internal imap clients.)  The login delays/timeouts got bad enough that our 
> load balancers dropped both the servers while I was investigating. They seem 
> to be okay after being restarted.

After the first few minutes, did all the rest of the error messages contain 
"weak user" string? Did this happen to a lot of different users 
(few/some/most)? director_user_expire setting is the default 15 minutes?



[Dovecot] (new) director issues in 2.1.10

2012-09-26 Thread Kelsey Cummings
Timo - I upgraded to 2.1.10 on our director servers two nights ago and 
apart from errors associated with the directors processes restarting 
everything looked great for ~24 hours until I failed our the real 
servers last night to update the nfs mount options for the spools.


I followed the suggested procedure for each backend server, just run on 
one of the directors, which seemed to work as expected.


doveadm director add x.x.x.x 0
doveadm director flush x.x.x.x

The following errors on the directors that started after this went 
unnoticed until this AM.


director: User bb host lookup failed: Timeout - queued for 30 secs (Ring 
synced for 36 secs)
director: User cc host lookup failed: Timeout - queued for 48 secs (Ring 
synced for 66 secs, user refreshed 12 secs ago)
director: User dd host lookup failed: Timeout - queued for 124 secs 
(Ring synced for 119 secs, weak user, user refreshed 155 secs ago)
director: User ee host lookup failed: Timeout - queued for 79 secs (Ring 
synced for 119 secs, weak user, user refreshed 113 secs ago)

...
User ff host lookup failed: Timeout - queued for 30 secs (Ring synced 
for 7427 secs, weak user, user refreshed 620 secs ago)



This continued, combined with occasional login timeouts (as reported by 
some internal imap clients.)  The login delays/timeouts got bad enough 
that our load balancers dropped both the servers while I was 
investigating. They seem to be okay after being restarted.



-K