Re: [Dovecot] restarting director

2011-04-08 Thread Timo Sirainen
On Mon, 2011-02-07 at 17:18 -0800, Kelsey Cummings wrote:
> On Fri, Jan 21, 2011 at 08:00:08PM +0200, Timo Sirainen wrote:
> > On Fri, 2011-01-21 at 19:59 +0200, Timo Sirainen wrote:
> > 
> > > I can take a look at it, but it would help if you were able to reproduce
> > > the problem.
> > 
> > More clearly: Reliably reproduce this in a test setup :)
> 
> Timo & Cor, did you guys ever nail this down?  We're looking at
> migration to a director config soon but I'd like to see this resolved
> first.  Anything we can do do help?

If you're still interested and you can (at least sometimes) reproduce
this error:

See if this helps at all:

http://hg.dovecot.org/dovecot-2.0/rev/2b7af3a16521

If not, apply http://hg.dovecot.org/dovecot-2.0/rev/e9139f74c451 and
set:

service director {
  executable = director -D
}

Then gather the error/warning/debug logs from all directors around the
time when it's not working correctly. Be sure that the error and debug
messages go to the same log file so that the message ordering is
preserved.



Re: [Dovecot] restarting director

2011-02-07 Thread Kelsey Cummings
On Fri, Jan 21, 2011 at 08:00:08PM +0200, Timo Sirainen wrote:
> On Fri, 2011-01-21 at 19:59 +0200, Timo Sirainen wrote:
> 
> > I can take a look at it, but it would help if you were able to reproduce
> > the problem.
> 
> More clearly: Reliably reproduce this in a test setup :)

Timo & Cor, did you guys ever nail this down?  We're looking at
migration to a director config soon but I'd like to see this resolved
first.  Anything we can do do help?

-K



Re: [Dovecot] restarting director

2011-01-21 Thread Timo Sirainen
On Fri, 2011-01-21 at 13:42 -0400, Cor Bosman wrote:
> Hi all, anyone having any problems with restarting the director? Every
> time I bring down 1 of the director servers, reboot it, or just
> restart it for whatever reason, im seeing all kinds of problems.
> Dovecot generally always gives me this error:
> 
> Jan 20 22:49:55 imapdirector3 dovecot: director: Error: Director
> 194.109.26.173:444/right disconnected before handshake finished

I'm not sure if that itself is a problem..

> It seems the directors cant agree on forming a ring anymore, and this
> may be leading to problems with clients. I mostly have to resort to
> bringing down all directors, and restarting them all at once. Not
> really a workable solution.  As an example, last night for a few hours
> we were getting complaints from customers about being disconnected,
> and the only obvious error in the log was the one above, after one of
> my colleagues had to restart a director because of some changes in the
> syslog daemon. After I restarted all directors withing a few seconds
> of each other, all complaints disappeared.

I can take a look at it, but it would help if you were able to reproduce
the problem. I'm still lagging a lot behind in emails (=bugfixes)..



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] restarting director

2011-01-21 Thread Timo Sirainen
On Fri, 2011-01-21 at 19:59 +0200, Timo Sirainen wrote:

> I can take a look at it, but it would help if you were able to reproduce
> the problem.

More clearly: Reliably reproduce this in a test setup :)



signature.asc
Description: This is a digitally signed message part


[Dovecot] restarting director

2011-01-21 Thread Cor Bosman
Hi all, anyone having any problems with restarting the director? Every time I 
bring down 1 of the director servers, reboot it, or just restart it for 
whatever reason, im seeing all kinds of problems. Dovecot generally always 
gives me this error:

Jan 20 22:49:55 imapdirector3 dovecot: director: Error: Director 
194.109.26.173:444/right disconnected before handshake finished

It seems the directors cant agree on forming a ring anymore, and this may be 
leading to problems with clients. I mostly have to resort to bringing down all 
directors, and restarting them all at once. Not really a workable solution.  As 
an example, last night for a few hours we were getting complaints from 
customers about being disconnected, and the only obvious error in the log was 
the one above, after one of my colleagues had to restart a director because of 
some changes in the syslog daemon. After I restarted all directors withing a 
few seconds of each other, all complaints disappeared.

Timo, i know ive asked similar questions before, but the answer just eludes me. 

If I have 3 director servers, and need to take one down and restart it, what is 
the proper method to reconnect the ring? In practice, I cant seem to work it 
out and I mostly end up with the above error until I just restart them all. Not 
fun with 20.000 clients connected.

Cor