Re: Replication: sync_client -r dies

2007-11-13 Thread David Carter
On Mon, 12 Nov 2007, Bron Gondwana wrote:

 It seems to me that the replication code ought to be a bit more robust
 than this when a replica goes down or loses network connectivity.  Is
 the 2.3.10 code any better than 2.3.9 in the way this kind of situation
 is handled?

 I believe David Carter has been working on some stuff for this which is
 lined up to go in soon.

The autorestart stuff is already in 2.3.10.

It was Ken's work, based on a suggestion on my part.

-- 
David Carter Email: [EMAIL PROTECTED]
University Computing Service,Phone: (01223) 334502
New Museums Site, Pembroke Street,   Fax:   (01223) 334679
Cambridge UK. CB2 3QH.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-11-11 Thread Bron Gondwana
On Sat, Nov 10, 2007 at 07:09:53PM -0800, Rich Wales wrote:
 After about a week of having synchronization running perfectly in my
 2.3.9 system, I finally got another bailout incident with sync_client
 on my master server.
 
 This happened just after I shut down my replica server (to move it to
 a different location).  About two minutes after the replica went down,
 sync_client on the master said Error in do_sync(): bailing out! with
 no other messages of any kind.
 
 It seems to me that the replication code ought to be a bit more robust
 than this when a replica goes down or loses network connectivity.  Is
 the 2.3.10 code any better than 2.3.9 in the way this kind of situation
 is handled?

I believe David Carter has been working on some stuff for this which is
lined up to go in soon.

We just have a monitor_sync script that runs every 10 minutes from cron
and can recover from this and a variety of other interesting situations.

Yeah - it would be nice to have a way to tell the master going down
now, be back later.

Bron.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-11-10 Thread Rich Wales
After about a week of having synchronization running perfectly in my
2.3.9 system, I finally got another bailout incident with sync_client
on my master server.

This happened just after I shut down my replica server (to move it to
a different location).  About two minutes after the replica went down,
sync_client on the master said Error in do_sync(): bailing out! with
no other messages of any kind.

It seems to me that the replication code ought to be a bit more robust
than this when a replica goes down or loses network connectivity.  Is
the 2.3.10 code any better than 2.3.9 in the way this kind of situation
is handled?

-- 
Rich Wales  ===  Palo Alto, CA, USA  === [EMAIL PROTECTED]
http://www.richw.org   ===   http://en.wikipedia.org/wiki/User:Richwales
The difference between theory and practice is that, in theory,
theory and practice are identical -- whereas in practice, they aren't.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-11-05 Thread Kenneth Marshall
On Sun, Nov 04, 2007 at 08:42:21PM -0800, Rich Wales wrote:
 Wesley Craig wrote:
 
  But most sync_server errors that will cause sync_client to bail out
  ought to cause sync_server to give a reasonably unique log message
  for the failure.
 
 As I explained earlier this evening, I didn't see ANYTHING AT ALL in
 the replica server's logs that resembled any sort of error indication
 at the times when sync_client bailed out.
 
 Is it possible that something I should be seeing is being filtered out
 by syslog.conf?  What syslog facility name is used by sync_server?
 
 -- 
 Rich Wales  ===  Palo Alto, CA, USA  === [EMAIL PROTECTED]
 http://www.richw.org   ===   http://en.wikipedia.org/wiki/User:Richwales
 The difference between theory and practice is that, in theory,
 theory and practice are identical -- whereas in practice, they aren't.
 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
 
Rich and Wesley,

We are seeing similar behavior here where everything looks fine but
we are not replicating messages in folders and the process bails out
with no errors. Just another data point. Any help would be appreciated.

Ken Marshall
--
Mgr./Middleware, Infrastructure  Development
[EMAIL PROTECTED] / 713-348-5294

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-11-04 Thread Wesley Craig
On 03 Nov 2007, at 23:20, Rich Wales wrote:
 Wesley Craig wrote:
 It usually dies for a reason, i.e., some discrepancy that either
 sync_client or sync_server couldn't handle.  The typical way to
 handle it is to contact someone.

 What sort of debugging output am I going to need to generate in order
 for anyone to have a chance of tracking the problem down?

 I added the -l and -v flags to sync_client, but right now, the only
 clue I have is that /var/log/messages includes a couple of errors
 saying Error in do_sync(): bailing out!.

Both sync_client and sync_server typically log problems.  Those logs  
are probably immediately helpful.  Further information would depends  
on the reason for the bail out.

 No core dump files anywhere in sight.

Personally, I see more cases of unresolvable discrepancies than core  
buts, but both exist.

:wes

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-11-04 Thread Rich Wales
Wesley Craig wrote:

 Both sync_client and sync_server typically log problems.  Those
 logs are probably immediately helpful.  Further information would
 depend on the reason for the bail out.

Where would I find this log info?  As I said earlier, the only info I've
found so far are the Error in do_sync(): bailing out! notices in the
/var/log/messages file.  Are there some other log files saved somewhere
else?  I do have -l and -v specified for the sync_client command.
Should I add any additional options in order to get debugging info?

-- 
Rich Wales  ===  Palo Alto, CA, USA  === [EMAIL PROTECTED]
http://www.richw.org   ===   http://en.wikipedia.org/wiki/User:Richwales
The difference between theory and practice is that, in theory,
theory and practice are identical -- whereas in practice, they aren't.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-11-04 Thread Wesley Craig
On 04 Nov 2007, at 13:11, Rich Wales wrote:
 Where would I find this log info?  As I said earlier, the only info  
 I've
 found so far are the Error in do_sync(): bailing out! notices in the
 /var/log/messages file.  Are there some other log files saved  
 somewhere
 else?  I do have -l and -v specified for the sync_client command.
 Should I add any additional options in order to get debugging info?

The replica sync_server will also log to syslog.

:wes

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-11-04 Thread Rich Wales
Wesley Craig wrote:

 The replica sync_server will also log to syslog.

No, sorry, as best I can tell, there isn't anything non-routine in any
of the log files on my replica server.

Do I need to specify any command-line flags to sync_server?

-- 
Rich Wales  ===  Palo Alto, CA, USA  === [EMAIL PROTECTED]
http://www.richw.org   ===   http://en.wikipedia.org/wiki/User:Richwales
The difference between theory and practice is that, in theory,
theory and practice are identical -- whereas in practice, they aren't.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-11-04 Thread Wesley Craig
On 04 Nov 2007, at 14:27, Rich Wales wrote:
 No, sorry, as best I can tell, there isn't anything non-routine in any
 of the log files on my replica server.
 Do I need to specify any command-line flags to sync_server?

No, sync_server doesn't take much in the way of command line  
options.  If the problem appears to be reproducible, you can enable  
telemetry logging or examine the mailboxes that are causing problem.

:wes

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-11-04 Thread Rich Wales
Wesley Craig wrote:

 No, sync_server doesn't take much in the way of command line options.

Hmmm.  OK, thanks.

 If the problem appears to be reproducible, you can enable telemetry
 logging or examine the mailboxes that are causing problem.

I currently have absolutely no idea as to what is causing sync_client
to bail out, or which mailbox(es) or other factors may be causing it.

The only other possible piece of evidence I'm aware of might be the
log files left over in my /cyrus/config/sync directory.  If I were to
post these or send them to you, is there a chance this might reveal
anything?

-- 
Rich Wales  ===  Palo Alto, CA, USA  === [EMAIL PROTECTED]
http://www.richw.org   ===   http://en.wikipedia.org/wiki/User:Richwales
The difference between theory and practice is that, in theory,
theory and practice are identical -- whereas in practice, they aren't.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-11-04 Thread Rich Wales
Wesley Craig wrote:

 The log files are pretty obvious in what they say, e.g., they just
 list mailboxes or users to check.  So I suspect they would reveal
 to you which mailboxes are problematic.  I sort of assume that
 you're running sync_client with -l, otherwise it doesn't log much.
 If it's run with -l, it should mention the action that preceded
 the problem.

Here's one of the leftover log files, time-stamped right at the time
of one of these bailout crashed.

APPEND user.marie
SEEN marie user.marie
MAILBOX user.marie
MAILBOX user.marie.Junk
SEEN marie user.marie.Junk
SEEN marie user.marie
MAILBOX user.marie

Other messages have been delivered since that time to both user.marie
and user.marie.Junk without crashing sync_client.  So if this log is
telling me there's a problem with either of these mailboxes, I don't
understand what the log is telling me.

Yes, I am using the -l flag (sync_client -r -l -v), but I'll have
to say once again that neither /var/log/messages nor any other log
file on my master server shows any other sync-related error message
of any sort whatsoever, except for the Error in do_sync(): bailing
out! message I mentioned previously.  In /var/log/debug.log, I see
things such as these:

Nov  3 18:55:44 whodunit sync_client[6875]: seen_db: user richw opened
/var/imap/user/r/richw.seen
Nov  3 19:03:29 whodunit sync_client[7008]: DIGEST-MD5 client step 1
Nov  3 19:03:29 whodunit sync_client[7008]: DIGEST-MD5 client step 2
Nov  3 19:03:29 whodunit sync_client[7008]: DIGEST-MD5 client step 3
Nov  3 19:13:44 whodunit sync_client[7127]: DIGEST-MD5 client step 1
Nov  3 19:13:44 whodunit sync_client[7127]: DIGEST-MD5 client step 2
Nov  3 19:13:44 whodunit sync_client[7127]: DIGEST-MD5 client step 3

(the bailout message itself occurred at 19:14:37)

and on the replica server, I see the following:

Nov  3 19:03:29 flipflop syncserver[15942]: accepted connection
Nov  3 19:03:29 flipflop syncserver[15942]: cmdloop(): startup
Nov  3 19:13:45 flipflop master[16110]: about to exec /usr/cyrus/bin/sync_server
Nov  3 19:13:45 flipflop syncserver[15942]: accepted connection
Nov  3 19:13:45 flipflop syncserver[15942]: cmdloop(): startup
Nov  3 19:13:45 flipflop syncserver[16110]: executed

but that's all.  Should I be seeing additional debugging output?  If
so, where should I be looking for it?

I'm very sorry, but at the moment, all I can see is that the sync
software mysteriously dies every so often, with no intelligible clue
as to why.  I understand you're saying that there should be additional
info, but either it's NOT there or I don't know where to look for it.
If anyone can help me figure out what I'm doing wrong here, I'll be
grateful.

-- 
Rich Wales  ===  Palo Alto, CA, USA  === [EMAIL PROTECTED]
http://www.richw.org   ===   http://en.wikipedia.org/wiki/User:Richwales
The difference between theory and practice is that, in theory,
theory and practice are identical -- whereas in practice, they aren't.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-11-04 Thread Wesley Craig
If you're running with -r -l (-v is for interactive use -- it causes  
printf output), you should be getting messages like:

APPEND user.marie

in syslog at level INFO.  If you're not seeing those, then you have  
syslog configured to filter them.  See the man page for syslog.conf.

:wes

On 04 Nov 2007, at 22:13, Rich Wales wrote:
 Should I be seeing additional debugging output?


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-11-04 Thread Wesley Craig
The log files are pretty obvious in what they say, e.g., they just  
list mailboxes or users to check.  So I suspect they would reveal to  
you which mailboxes are problematic.  I sort of assume that you're  
running sync_client with -l, otherwise it doesn't log much.  If it's  
run with -l, it should mention the action that preceded the problem.

:wes

On 04 Nov 2007, at 20:36, Rich Wales wrote:
 I currently have absolutely no idea as to what is causing sync_client
 to bail out, or which mailbox(es) or other factors may be causing it.

 The only other possible piece of evidence I'm aware of might be the
 log files left over in my /cyrus/config/sync directory.  If I were to
 post these or send them to you, is there a chance this might reveal
 anything?

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-11-04 Thread Rich Wales
Wesley Craig wrote:

 If you're running with -r -l . . . , you should be getting messages
 like:APPEND user.mariein syslog at level INFO.  If you're
 not seeing those, then you have syslog configured to filter them.

Thanks.  It looks like that's what was happening.  I modified my
syslog.conf and kicked syslogd, and now I'm seeing those entries in
/var/log/messages.

I'll let the list know if I experience any more sync_client crashes.

Should I be looking for similar syslog messages on my replica server
too (and checking syslog.conf on that system if necessary)?

-- 
Rich Wales  ===  Palo Alto, CA, USA  === [EMAIL PROTECTED]
http://www.richw.org   ===   http://en.wikipedia.org/wiki/User:Richwales
The difference between theory and practice is that, in theory,
theory and practice are identical -- whereas in practice, they aren't.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-11-04 Thread Wesley Craig
On 04 Nov 2007, at 22:57, Rich Wales wrote:
 Should I be looking for similar syslog messages on my replica server
 too (and checking syslog.conf on that system if necessary)?

No, not similar.  But most sync_server errors that will cause  
sync_client to bail out ought to cause sync_server to give a  
reasonably unique log message for the failure.

:wes

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-11-04 Thread Rich Wales
Wesley Craig wrote:

 But most sync_server errors that will cause sync_client to bail out
 ought to cause sync_server to give a reasonably unique log message
 for the failure.

As I explained earlier this evening, I didn't see ANYTHING AT ALL in
the replica server's logs that resembled any sort of error indication
at the times when sync_client bailed out.

Is it possible that something I should be seeing is being filtered out
by syslog.conf?  What syslog facility name is used by sync_server?

-- 
Rich Wales  ===  Palo Alto, CA, USA  === [EMAIL PROTECTED]
http://www.richw.org   ===   http://en.wikipedia.org/wiki/User:Richwales
The difference between theory and practice is that, in theory,
theory and practice are identical -- whereas in practice, they aren't.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-10-31 Thread Ken Murchison
sync_client in 2.3.10 should be much more resilient.


Rich Wales wrote:
 I'm running 2.3.9 on a FreeBSD 6.2 system.
 
 Recently, I installed 2.3.9 on an Ubuntu 7.10 system and set it up as
 a replica of my original server.
 
 Everything seems to be running well, except that the sync_client -r
 processes on the master tend to die after a while -- at which point no
 more sync activity happens, ever, until I restart Cyrus on the master.
 
 Here are the sync-related lines in my configuration files.
 
 On the master:
 
 cyrus.conf:
 syncclient cmd=/usr/local/cyrus/bin/sync_client -r
 
 imapd.conf:
 sync_host: flipflop.richw.org
 sync_authname: admin
 sync_password: 
 sync_machineid: 1
 sync_log: true
 sync_repeat_interval: 15
 
 On the replica:
 
 cyrus.conf:
 syncserver cmd=/usr/cyrus/bin/sync_server listen=csync
 
 imapd.conf:
 sync_machineid: 1
 
 Can anyone suggest what I might be doing wrong, and/or what I can do
 to make sure a sync_client stays running on my master server?
 


-- 
Kenneth Murchison
Systems Programmer
Project Cyrus Developer/Maintainer
Carnegie Mellon University

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-10-31 Thread Wesley Craig
On 31 Oct 2007, at 22:42, Rich Wales wrote:
 Can you (or anyone else) suggest anything I can do in the meantime
 (while I'm still running 2.3.9) to ensure that a sync_client stays
 running on my master server, or to start a new one as needed if it
 dies?

It usually dies for a reason, i.e., some discrepancy that either  
sync_client or sync_server couldn't handle.  The typical way to  
handle it is to contact someone.  It's reasonably safe to  
automatically start it back up, but you'll be left with a missed sync  
log as well.  Whatever problem that caused sync_client to exit will  
need to be corrected and the missed sync log will need to be run.

:wes

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Replication: sync_client -r dies

2007-10-31 Thread Rich Wales
Ken Murchison wrote:

 sync_client in 2.3.10 should be much more resilient.

That's good to know.  However, I'm reluctant to upgrade quite yet,
given that 2.3.10 has only been out for a week and (judging from the
Cyrus IMAPd 2.3.10 Released thread) seems to have a few problems.

Can you (or anyone else) suggest anything I can do in the meantime
(while I'm still running 2.3.9) to ensure that a sync_client stays
running on my master server, or to start a new one as needed if it
dies?

-- 
Rich Wales  ===  Palo Alto, CA, USA  === [EMAIL PROTECTED]
http://www.richw.org   ===   http://en.wikipedia.org/wiki/User:Richwales
The difference between theory and practice is that, in theory,
theory and practice are identical -- whereas in practice, they aren't.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html