Re: Replication: sync_client -r dies
On Mon, 12 Nov 2007, Bron Gondwana wrote: It seems to me that the replication code ought to be a bit more robust than this when a replica goes down or loses network connectivity. Is the 2.3.10 code any better than 2.3.9 in the way this kind of situation is handled? I believe David Carter has been working on some stuff for this which is lined up to go in soon. The autorestart stuff is already in 2.3.10. It was Ken's work, based on a suggestion on my part. -- David Carter Email: [EMAIL PROTECTED] University Computing Service,Phone: (01223) 334502 New Museums Site, Pembroke Street, Fax: (01223) 334679 Cambridge UK. CB2 3QH. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
On Sat, Nov 10, 2007 at 07:09:53PM -0800, Rich Wales wrote: After about a week of having synchronization running perfectly in my 2.3.9 system, I finally got another bailout incident with sync_client on my master server. This happened just after I shut down my replica server (to move it to a different location). About two minutes after the replica went down, sync_client on the master said Error in do_sync(): bailing out! with no other messages of any kind. It seems to me that the replication code ought to be a bit more robust than this when a replica goes down or loses network connectivity. Is the 2.3.10 code any better than 2.3.9 in the way this kind of situation is handled? I believe David Carter has been working on some stuff for this which is lined up to go in soon. We just have a monitor_sync script that runs every 10 minutes from cron and can recover from this and a variety of other interesting situations. Yeah - it would be nice to have a way to tell the master going down now, be back later. Bron. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
After about a week of having synchronization running perfectly in my 2.3.9 system, I finally got another bailout incident with sync_client on my master server. This happened just after I shut down my replica server (to move it to a different location). About two minutes after the replica went down, sync_client on the master said Error in do_sync(): bailing out! with no other messages of any kind. It seems to me that the replication code ought to be a bit more robust than this when a replica goes down or loses network connectivity. Is the 2.3.10 code any better than 2.3.9 in the way this kind of situation is handled? -- Rich Wales === Palo Alto, CA, USA === [EMAIL PROTECTED] http://www.richw.org === http://en.wikipedia.org/wiki/User:Richwales The difference between theory and practice is that, in theory, theory and practice are identical -- whereas in practice, they aren't. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
On Sun, Nov 04, 2007 at 08:42:21PM -0800, Rich Wales wrote: Wesley Craig wrote: But most sync_server errors that will cause sync_client to bail out ought to cause sync_server to give a reasonably unique log message for the failure. As I explained earlier this evening, I didn't see ANYTHING AT ALL in the replica server's logs that resembled any sort of error indication at the times when sync_client bailed out. Is it possible that something I should be seeing is being filtered out by syslog.conf? What syslog facility name is used by sync_server? -- Rich Wales === Palo Alto, CA, USA === [EMAIL PROTECTED] http://www.richw.org === http://en.wikipedia.org/wiki/User:Richwales The difference between theory and practice is that, in theory, theory and practice are identical -- whereas in practice, they aren't. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html Rich and Wesley, We are seeing similar behavior here where everything looks fine but we are not replicating messages in folders and the process bails out with no errors. Just another data point. Any help would be appreciated. Ken Marshall -- Mgr./Middleware, Infrastructure Development [EMAIL PROTECTED] / 713-348-5294 Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
On 03 Nov 2007, at 23:20, Rich Wales wrote: Wesley Craig wrote: It usually dies for a reason, i.e., some discrepancy that either sync_client or sync_server couldn't handle. The typical way to handle it is to contact someone. What sort of debugging output am I going to need to generate in order for anyone to have a chance of tracking the problem down? I added the -l and -v flags to sync_client, but right now, the only clue I have is that /var/log/messages includes a couple of errors saying Error in do_sync(): bailing out!. Both sync_client and sync_server typically log problems. Those logs are probably immediately helpful. Further information would depends on the reason for the bail out. No core dump files anywhere in sight. Personally, I see more cases of unresolvable discrepancies than core buts, but both exist. :wes Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
Wesley Craig wrote: Both sync_client and sync_server typically log problems. Those logs are probably immediately helpful. Further information would depend on the reason for the bail out. Where would I find this log info? As I said earlier, the only info I've found so far are the Error in do_sync(): bailing out! notices in the /var/log/messages file. Are there some other log files saved somewhere else? I do have -l and -v specified for the sync_client command. Should I add any additional options in order to get debugging info? -- Rich Wales === Palo Alto, CA, USA === [EMAIL PROTECTED] http://www.richw.org === http://en.wikipedia.org/wiki/User:Richwales The difference between theory and practice is that, in theory, theory and practice are identical -- whereas in practice, they aren't. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
On 04 Nov 2007, at 13:11, Rich Wales wrote: Where would I find this log info? As I said earlier, the only info I've found so far are the Error in do_sync(): bailing out! notices in the /var/log/messages file. Are there some other log files saved somewhere else? I do have -l and -v specified for the sync_client command. Should I add any additional options in order to get debugging info? The replica sync_server will also log to syslog. :wes Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
Wesley Craig wrote: The replica sync_server will also log to syslog. No, sorry, as best I can tell, there isn't anything non-routine in any of the log files on my replica server. Do I need to specify any command-line flags to sync_server? -- Rich Wales === Palo Alto, CA, USA === [EMAIL PROTECTED] http://www.richw.org === http://en.wikipedia.org/wiki/User:Richwales The difference between theory and practice is that, in theory, theory and practice are identical -- whereas in practice, they aren't. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
On 04 Nov 2007, at 14:27, Rich Wales wrote: No, sorry, as best I can tell, there isn't anything non-routine in any of the log files on my replica server. Do I need to specify any command-line flags to sync_server? No, sync_server doesn't take much in the way of command line options. If the problem appears to be reproducible, you can enable telemetry logging or examine the mailboxes that are causing problem. :wes Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
Wesley Craig wrote: No, sync_server doesn't take much in the way of command line options. Hmmm. OK, thanks. If the problem appears to be reproducible, you can enable telemetry logging or examine the mailboxes that are causing problem. I currently have absolutely no idea as to what is causing sync_client to bail out, or which mailbox(es) or other factors may be causing it. The only other possible piece of evidence I'm aware of might be the log files left over in my /cyrus/config/sync directory. If I were to post these or send them to you, is there a chance this might reveal anything? -- Rich Wales === Palo Alto, CA, USA === [EMAIL PROTECTED] http://www.richw.org === http://en.wikipedia.org/wiki/User:Richwales The difference between theory and practice is that, in theory, theory and practice are identical -- whereas in practice, they aren't. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
Wesley Craig wrote: The log files are pretty obvious in what they say, e.g., they just list mailboxes or users to check. So I suspect they would reveal to you which mailboxes are problematic. I sort of assume that you're running sync_client with -l, otherwise it doesn't log much. If it's run with -l, it should mention the action that preceded the problem. Here's one of the leftover log files, time-stamped right at the time of one of these bailout crashed. APPEND user.marie SEEN marie user.marie MAILBOX user.marie MAILBOX user.marie.Junk SEEN marie user.marie.Junk SEEN marie user.marie MAILBOX user.marie Other messages have been delivered since that time to both user.marie and user.marie.Junk without crashing sync_client. So if this log is telling me there's a problem with either of these mailboxes, I don't understand what the log is telling me. Yes, I am using the -l flag (sync_client -r -l -v), but I'll have to say once again that neither /var/log/messages nor any other log file on my master server shows any other sync-related error message of any sort whatsoever, except for the Error in do_sync(): bailing out! message I mentioned previously. In /var/log/debug.log, I see things such as these: Nov 3 18:55:44 whodunit sync_client[6875]: seen_db: user richw opened /var/imap/user/r/richw.seen Nov 3 19:03:29 whodunit sync_client[7008]: DIGEST-MD5 client step 1 Nov 3 19:03:29 whodunit sync_client[7008]: DIGEST-MD5 client step 2 Nov 3 19:03:29 whodunit sync_client[7008]: DIGEST-MD5 client step 3 Nov 3 19:13:44 whodunit sync_client[7127]: DIGEST-MD5 client step 1 Nov 3 19:13:44 whodunit sync_client[7127]: DIGEST-MD5 client step 2 Nov 3 19:13:44 whodunit sync_client[7127]: DIGEST-MD5 client step 3 (the bailout message itself occurred at 19:14:37) and on the replica server, I see the following: Nov 3 19:03:29 flipflop syncserver[15942]: accepted connection Nov 3 19:03:29 flipflop syncserver[15942]: cmdloop(): startup Nov 3 19:13:45 flipflop master[16110]: about to exec /usr/cyrus/bin/sync_server Nov 3 19:13:45 flipflop syncserver[15942]: accepted connection Nov 3 19:13:45 flipflop syncserver[15942]: cmdloop(): startup Nov 3 19:13:45 flipflop syncserver[16110]: executed but that's all. Should I be seeing additional debugging output? If so, where should I be looking for it? I'm very sorry, but at the moment, all I can see is that the sync software mysteriously dies every so often, with no intelligible clue as to why. I understand you're saying that there should be additional info, but either it's NOT there or I don't know where to look for it. If anyone can help me figure out what I'm doing wrong here, I'll be grateful. -- Rich Wales === Palo Alto, CA, USA === [EMAIL PROTECTED] http://www.richw.org === http://en.wikipedia.org/wiki/User:Richwales The difference between theory and practice is that, in theory, theory and practice are identical -- whereas in practice, they aren't. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
If you're running with -r -l (-v is for interactive use -- it causes printf output), you should be getting messages like: APPEND user.marie in syslog at level INFO. If you're not seeing those, then you have syslog configured to filter them. See the man page for syslog.conf. :wes On 04 Nov 2007, at 22:13, Rich Wales wrote: Should I be seeing additional debugging output? Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
The log files are pretty obvious in what they say, e.g., they just list mailboxes or users to check. So I suspect they would reveal to you which mailboxes are problematic. I sort of assume that you're running sync_client with -l, otherwise it doesn't log much. If it's run with -l, it should mention the action that preceded the problem. :wes On 04 Nov 2007, at 20:36, Rich Wales wrote: I currently have absolutely no idea as to what is causing sync_client to bail out, or which mailbox(es) or other factors may be causing it. The only other possible piece of evidence I'm aware of might be the log files left over in my /cyrus/config/sync directory. If I were to post these or send them to you, is there a chance this might reveal anything? Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
Wesley Craig wrote: If you're running with -r -l . . . , you should be getting messages like:APPEND user.mariein syslog at level INFO. If you're not seeing those, then you have syslog configured to filter them. Thanks. It looks like that's what was happening. I modified my syslog.conf and kicked syslogd, and now I'm seeing those entries in /var/log/messages. I'll let the list know if I experience any more sync_client crashes. Should I be looking for similar syslog messages on my replica server too (and checking syslog.conf on that system if necessary)? -- Rich Wales === Palo Alto, CA, USA === [EMAIL PROTECTED] http://www.richw.org === http://en.wikipedia.org/wiki/User:Richwales The difference between theory and practice is that, in theory, theory and practice are identical -- whereas in practice, they aren't. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
On 04 Nov 2007, at 22:57, Rich Wales wrote: Should I be looking for similar syslog messages on my replica server too (and checking syslog.conf on that system if necessary)? No, not similar. But most sync_server errors that will cause sync_client to bail out ought to cause sync_server to give a reasonably unique log message for the failure. :wes Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
Wesley Craig wrote: But most sync_server errors that will cause sync_client to bail out ought to cause sync_server to give a reasonably unique log message for the failure. As I explained earlier this evening, I didn't see ANYTHING AT ALL in the replica server's logs that resembled any sort of error indication at the times when sync_client bailed out. Is it possible that something I should be seeing is being filtered out by syslog.conf? What syslog facility name is used by sync_server? -- Rich Wales === Palo Alto, CA, USA === [EMAIL PROTECTED] http://www.richw.org === http://en.wikipedia.org/wiki/User:Richwales The difference between theory and practice is that, in theory, theory and practice are identical -- whereas in practice, they aren't. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
sync_client in 2.3.10 should be much more resilient. Rich Wales wrote: I'm running 2.3.9 on a FreeBSD 6.2 system. Recently, I installed 2.3.9 on an Ubuntu 7.10 system and set it up as a replica of my original server. Everything seems to be running well, except that the sync_client -r processes on the master tend to die after a while -- at which point no more sync activity happens, ever, until I restart Cyrus on the master. Here are the sync-related lines in my configuration files. On the master: cyrus.conf: syncclient cmd=/usr/local/cyrus/bin/sync_client -r imapd.conf: sync_host: flipflop.richw.org sync_authname: admin sync_password: sync_machineid: 1 sync_log: true sync_repeat_interval: 15 On the replica: cyrus.conf: syncserver cmd=/usr/cyrus/bin/sync_server listen=csync imapd.conf: sync_machineid: 1 Can anyone suggest what I might be doing wrong, and/or what I can do to make sure a sync_client stays running on my master server? -- Kenneth Murchison Systems Programmer Project Cyrus Developer/Maintainer Carnegie Mellon University Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
On 31 Oct 2007, at 22:42, Rich Wales wrote: Can you (or anyone else) suggest anything I can do in the meantime (while I'm still running 2.3.9) to ensure that a sync_client stays running on my master server, or to start a new one as needed if it dies? It usually dies for a reason, i.e., some discrepancy that either sync_client or sync_server couldn't handle. The typical way to handle it is to contact someone. It's reasonably safe to automatically start it back up, but you'll be left with a missed sync log as well. Whatever problem that caused sync_client to exit will need to be corrected and the missed sync log will need to be run. :wes Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Replication: sync_client -r dies
Ken Murchison wrote: sync_client in 2.3.10 should be much more resilient. That's good to know. However, I'm reluctant to upgrade quite yet, given that 2.3.10 has only been out for a week and (judging from the Cyrus IMAPd 2.3.10 Released thread) seems to have a few problems. Can you (or anyone else) suggest anything I can do in the meantime (while I'm still running 2.3.9) to ensure that a sync_client stays running on my master server, or to start a new one as needed if it dies? -- Rich Wales === Palo Alto, CA, USA === [EMAIL PROTECTED] http://www.richw.org === http://en.wikipedia.org/wiki/User:Richwales The difference between theory and practice is that, in theory, theory and practice are identical -- whereas in practice, they aren't. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html