Re: Making Replication Robust

2007-10-19 Thread David Carter

On Tue, 9 Oct 2007, David Carter wrote:

I've never faced a spilt brain situation which involved more than two or 
three messages (the outstanding log on an old master system).


I suppose that it was predicable that a week after writing this I faced my 
first serious split brain (3000 messages lost after a hardware fault).


My solution was to write a little script which given a list of mailboxes 
(the sync_log file on the old master), scanned over the cyrus.index files 
looking for messages with an internaldate greater than a given cutoff.


These messages were then transferred across to the new master to be 
reinjected. Replication from the new master to the old master then 
resolves the split brain situation (master wins in case of ambiguity), 
which is the way it was designed to work. From memory Fastmail did 
something similar when they faced a split brain situation.


The procedure works well, but I think that it would be useful to have some 
tools in the Cyrus distribution rather than having to knock up one off 
tools. I'm happy to work on this if we can beat out some requirements.


I'm not keen about trying to fix split brain situations within the 
replication protocol itself: at the moment sync_client doesn't try to mess 
with the data on the master, which is a property I like.


There are also certain situations that replication just can't fix.

Envision a hypothetical replication engine which can cope with GUID 
mismatches, adding messages to both master and replica. Then imagine:


1) Replication dies because of hardware or software fault.

2) Master continues to limp along for a bit before dying. Split brain.

3) Message delivered to a non user mailbox (Sieve or + addressing)

4) Master dies entirely: failover to replica with missing messages

5) User logs in and deletes the mailbox in question on the new master,
   unaware that they are actually missing a message from that mailbox.

6) sysadmin starts replication from new master to old master. They hope
   that this will automatically resolve all conflicts without losing
   anything because we promise that replication is magic.

6) Replication engine deletes the entire mailbox (including the message
   that we want to recover), as it doesn't exist on the new master.

/* == */

Just for everyone's amusement: what happened to us on Tuesday evening
=

This isn't good:

  Oct 16 20:56:21 cyrus-24 kernel:
Uhhuh. NMI received for unknown reason 21 on CPU 0.
  Oct 16 20:56:21 cyrus-24 kernel:
Dazed and confused, but trying to continue
  Oct 16 20:56:21 cyrus-24 kernel:
Do you have a strange power saving mode enabled?

But it is nowhere near as bad as:

  Oct 16 20:56:31
cyrus-24 sync_client[11985]: Unknown system flag: \snswered
   ^ Oops

You know that a machine is unhappy when sync_client -u on a given
account randomly:

  1) Works without problems
  2) segfaults
  3) Attempts to reserve every message on the account on the server,
 presumably as a prelude to a mass UPLOAD.

I infer that the machine has a motherboard fault which caused kernel 
memory corruption in some small lump of buffer cache. I am amazed that the 
filesystems passed fsck when I attached the disks to a new machine. The 
original machine refused to reboot cleanly because umount segfaulted. It 
also failed two DIMMs on each POST until the machine ran out of memory.


--
David Carter Email: [EMAIL PROTECTED]
University Computing Service,Phone: (01223) 334502
New Museums Site, Pembroke Street,   Fax:   (01223) 334679
Cambridge UK. CB2 3QH.


Re: Making Replication Robust

2007-10-13 Thread David Carter

On Sat, 13 Oct 2007, Bron Gondwana wrote:


Apart from a couple of short-lived command line utilities it looks like
the only use of signal() is a bunch of 'signal(SIGPIPE, SIG_IGN);'
scattered through just about everything.

Most of the interesting signal handling is done with sigaction
already.


And indeed it uses an explicit SA_RESTART flag, so it looks like I've
been worrying about nothing all along. By all means signal away!

--
David Carter Email: [EMAIL PROTECTED]
University Computing Service,Phone: (01223) 334502
New Museums Site, Pembroke Street,   Fax:   (01223) 334679
Cambridge UK. CB2 3QH.


Re: Making Replication Robust

2007-10-12 Thread Rob Mueller



This would seem to be a significant advantage of running sync_client
outside master.

When I shut down master, sync_client continues to process the outstanding 
log. I can then use sync_shutdown_file when it has finished and is idle.


We do something similar.

But it means you have to develop a bunch of your own infrastructure to make 
cyrus replication robust. It's not currently a start it and it just works 
until you shut it down solution, which means either people have to 
replicate the same extra infrastructure work everywhere separately, or 
people are going to get burnt not realising that what they're doing isn't 
safe.


That's why I'd really like this to be in cyrus itself. I think we should be 
able to say in the documentation something like:


Shuting down a cyrus master with a SIGQUIT ensures that all actions have 
been replicated to the replica side.


It makes writing init scripts and the like a lot easier.


There seems to be a spilt of opinion between BSD and SVR4: BSD tries to
retry while SVR4 throws EINTR. Linux of course can work either way:

http://www.gnu.org/software/libc/manual/html_node/Interrupted-Primitives.html


Isn't a lot of writes already wrapped up in some retry_write() function. I 
admit I haven't looked closely.


Anyway, is this really a problem. Basically shouldn't you be able to kill 
cyrus at any point, and files are left in a consistent restartable state? If 
so, if something returns EINTR, won't it just move on and eventually exit? 
Or is the problem that you have something like:


write to file 1
write to file 2

And if the first returns EINTR but is ignored, and then it writes the 
complete data to the second, things are in an inconsistent state?


Rob



Re: Making Replication Robust

2007-10-12 Thread David Carter

On Wed, 10 Oct 2007, Rob Mueller wrote:


I think the problem at the moment is that the process you really want is:

1. Stop new imap/pop/lmtp/sieve/etc connections
2. Finish and close existing connections cleanly but as quickly as possible
3. Finish running any sync log files
4. Fully shutdown

There's currently no clean way to do this. Basically you have to SIGTERM 
master which hard kills it and all children, then manually run 
sync_client -f on any remaining log files.


This would seem to be a significant advantage of running sync_client 
outside master.


When I shut down master, sync_client continues to process the outstanding 
log. I can then use sync_shutdown_file when it has finished and is idle.


sync_client could catch SIGQUIT to initiate some form of clean shutdown.

I'm still a little bothered about signal handling and EINTR. I did some 
experiments after our last chat about signals. In practice disk IO system 
calls seem to be reasonably safe against EINTR on both Linux and Solaris, 
but a trip to Google suggests that there are few guarantees:


http://archives.postgresql.org/pgsql-hackers/2005-12/msg01259.php

There seems to be a spilt of opinion between BSD and SVR4: BSD tries to
retry while SVR4 throws EINTR. Linux of course can work either way:

http://www.gnu.org/software/libc/manual/html_node/Interrupted-Primitives.html

--
David Carter Email: [EMAIL PROTECTED]
University Computing Service,Phone: (01223) 334502
New Museums Site, Pembroke Street,   Fax:   (01223) 334679
Cambridge UK. CB2 3QH.


Re: Making Replication Robust

2007-10-12 Thread Bron Gondwana
On Fri, Oct 12, 2007 at 10:29:53AM -0400, Carson Gaspar wrote:
 David Carter wrote:

 I'm still a little bothered about signal handling and EINTR. I did some 
 experiments after our last chat about signals. In practice disk IO system 
 calls seem to be reasonably safe against EINTR on both Linux and Solaris, 
 but a trip to Google suggests that there are few guarantees:
 http://archives.postgresql.org/pgsql-hackers/2005-12/msg01259.php
 There seems to be a spilt of opinion between BSD and SVR4: BSD tries to
 retry while SVR4 throws EINTR. Linux of course can work either way:
 http://www.gnu.org/software/libc/manual/html_node/Interrupted-Primitives.html

 I suggest reading the POSIX specs. Restart vs. EINTR is specified 
 explicitly on any POSIX platform (including all the *BSDs, Solaris, Linux, 
 ...). If cyrus is still using signal() and friends anywhere, they 
 desperately need to be replaced with sigaction().

Apart from a couple of short-lived command line utilities it looks like
the only use of signal() is a bunch of 'signal(SIGPIPE, SIG_IGN);'
scattered through just about everything.

Most of the interesting signal handling is done with sigaction
already.

Bron.


Re: Making Replication Robust

2007-10-12 Thread Rob Mueller



Or is the problem that you have something like:

write to file 1
write to file 2

And if the first returns EINTR but is ignored, and then it writes the 
complete data to the second, things are in an inconsistent state?


This is my concern.


Doing an ack 'write\(' reveals a scary mix of write, retry_write and 
fwrite calls. My initial reaction was that binary files seem to use 
open/retry_write, and text files use fopen/fwrite, but doesn't quite seem to 
be the case...


mailbox.c
1242:r = write(newheader_fd, MAILBOX_HEADER_MAGIC,
1359:n = retry_write(mailbox-index_fd, buf, header_size);
1428:n = retry_write(mailbox-index_fd, buf, INDEX_RECORD_SIZE);
1477:n = retry_write(mailbox-index_fd, buf, len);
1642:fwrite(buf, 1, INDEX_HEADER_SIZE, newindex);
1659:fwrite(bufp, INDEX_RECORD_SIZE, 1, newindex);
1710:fwrite(buf, INDEX_RECORD_SIZE, 1, newindex);
1721:   fwrite(buf+OFFSET_DELETED,
1952:   n = retry_write(expunge_fd, buf, mailbox-record_size);
1979:   if (newindex) fwrite(buf, 1, mailbox-record_size, newindex);
1999:   /* fwrite will automatically call write() in a sane way */
2000:   fwrite(cacheitembegin, 1, cache_record_size, newcache);
2004:   fwrite(buf, 1, mailbox-record_size, newindex);
2058:fwrite(buf, 1, mailbox-start_offset, newindex);
2215:   fwrite(buf, 1, sizeof(bit32), newcache);
2219:fwrite(buf, 1, mailbox-start_offset, newindex);
2263:   n = retry_write(expunge_fd, buf, mailbox-start_offset);
2342:   r = quota_write(mailbox-quota, tid);
2363:   fwrite(buf, 1, mailbox-start_offset, newexpungeindex);
2424:   n = retry_write(expunge_fd, buf, mailbox-start_offset);
2719:   n = retry_write(mailbox.cache_fd, (char *)mailbox.generation_no, 
4);

2823:   r = quota_write(mailbox-quota, tid);
3056:   r = quota_write((newmailbox-quota), tid);
3309:   r = quota_write(newmailbox.quota, tid);
3319:   r2 = quota_write(newmailbox.quota, tid);
3398:n = retry_write(destfd, src_base, src_size);

It seems mixing up fd's or FILE * structs all over the place. *sigh*

Does fwrite() retry a write on EINTR? It looks like that's the whole point 
of retry_write() anyway.


If fwrite() does retry, then about the only other work would be changing any 
naked write() calls to retry_write(), which actually doesn't seem that many.


Thoughts?

Rob



Re: Making Replication Robust

2007-10-09 Thread David Carter


On Mon, 8 Oct 2007, Rudy Gevaert wrote:

Note, we are running 2.3.7, I'm going to upgrade when 2.3.10 is out. 
We have replication in place, but daren't use it.  If I have a method to 
check if the replica is in sync then I'll dare to do a fail over.


I do this using -v -v to sync_client, which gives a running commentary
about just what is going on:

  cyrus-28[cyrus:~]$ replicate -s cyrus-27 -v -v -u dpc99
  USER dpc99
 USER_ALL dpc99
 SELECT user.dpc99
 UPLOAD [1 msgs]
 ENDUSER

A very high tech grep -v USER /tmp/out picks out actual updates.

This is one of the things which got dropped when replication was merged 
into 2.3 (my original implementation just didn't fit cleanly). I would 
like to put something similar into 2.3, as this is a quick and easy way to 
check for consistency while fixing up problems. A dry run mode which 
supresses updates would also be useful, although probably more work.


The kind of random sampling which Fastmail do probably wouldn't hurt as an 
extra sanity check.


--
David Carter Email: [EMAIL PROTECTED]
University Computing Service,Phone: (01223) 334502
New Museums Site, Pembroke Street,   Fax:   (01223) 334679
Cambridge UK. CB2 3QH.


Re: Making Replication Robust

2007-10-09 Thread David Carter

On Mon, 8 Oct 2007, Bron Gondwana wrote:

We already run a sync_server on our masters as well because we use it 
for user moves:


Generally takes about 15 seconds for the critical path bit, and
the initial sync doesn't matter how long it takes.


As do we. In fact when I first showed the replication system to Rob 
Siemborski (few a years back now), he was thinking about using replication 
to replace XFER in a murder environment.


I have a special -y flag to sync_client which disables fsync() on the 
replica for fast seeding of replicas. We also use replication to dump data 
from the live systems to a tape spooling array each night. Replication is 
transaction safe rsync for Cyrus, tailored around the cyrus.index files.


More later.

--
David Carter Email: [EMAIL PROTECTED]
University Computing Service,Phone: (01223) 334502
New Museums Site, Pembroke Street,   Fax:   (01223) 334679
Cambridge UK. CB2 3QH.


Re: Making Replication Robust

2007-10-09 Thread David Carter

On Thu, 4 Oct 2007, Bron Gondwana wrote:


a) MUST never lose a message that's been accepted for
  delivery except in the case of total drive failure.

b) MUST have a standard way to integrity check and
  repair a replica-pair after a system crash.


A replica system is automatically repaired to match its master, but this 
doesn't help with the split brain scenarios that you are worried about.


I've never faced a spilt brain situation which involved more than two or 
three messages (the outstanding log on an old master system). I suspect 
that this is simply because I've never had to run an unreliable 
replication engine which bails out on my production systems.



c) MUST have a clean process to soft-failover to the
  replica machine, making sure that all replication
  events from the ex-master have been synchronised.


Something more than sync_shutdown_file plus automatic retries on
recent work files?


d) MUST have replication start/restart automatically when
  the replica is available rather than requiring it be
  online at master start time.


Work in progress from Ken.


e) SHOULD be able to copy back messages which only exist
  on the replica due to a hard-failover, handling UIDs
  gracefully (more on this later),


This is the hard one. I think that assigning a new UIDvalidity and new 
UIDs for all the messages would be best as messages can then be sorted in 
the replacement mailbox based on their arrival time. Actually this would 
look remarkably like the new sync_combine_commit() on the replica side.


What I don't know is how we then synchronise back to the master. Up to now 
the replication engine has been very careful about _not_ making changes on 
the master, so that it only has the potential to mess up the spare system.



  alternatively as least
  MUST (to satisfy point 'a') notify the administrator
  that the message has different GUIDs on the two copies
  and something will need to be done about it (to satisfy
  point 'd' this must be done without bailing out
  replication for the remaining messages in the folder)


At the moment we replace messages (on the master knows best principle).

It would be easy enough to leave message in place and generate warnings 
instead, although this would generate a lot of warnings, one for every bad 
message every time that a given mailbox is updated.



f) SHOULD keep replicating in the face of an error which
  affects a single mailbox, keeping track of that mailbox
  so that a sysadmin can fix the issue and then replicate
  that mailbox hand.


You could try disabling the MAILBOX - USER promotion to see what happens: 
the 3 x MAILBOXES retry will fix most transient problems caused by 
mailboxes moving around, leaving just the permanent errors.


The MAILBOX - USER promotion was originally there on the principle that a 
mailbox disappearing under our feet was likely to appear somewhere else in 
the same account (without shared mailboxes to worry about).


My nightmare scenario is a replication engine which carries on running in 
the face of mboxlist corruption on the master: you could lose a lot of 
mailboxes on the replica that way.



g) MAY have a method to replicate to two different replicas
  concurrently (replay the same sync_log messages twice)
  allowing one replica to be taken out of service and
  a new one created while having no gaps in which there
  is no second copy alive (we use rsync, rsync again,
  stop replication, rsync a third time, start replication
  to the new site - but it's messy and gappy)


It would be easy enough to generate multiple replication log files.

MySQL keeps a single transaction log for multiple replicas, but that file 
contains quite a lot of information about each transaction. In contrast 
the Cyrus sync log is just a list of objects we need to pay attention to: 
the files have much less state, particularly without duplicates.


--
David Carter Email: [EMAIL PROTECTED]
University Computing Service,Phone: (01223) 334502
New Museums Site, Pembroke Street,   Fax:   (01223) 334679
Cambridge UK. CB2 3QH.


Re: Making Replication Robust

2007-10-09 Thread Rob Mueller



c) MUST have a clean process to soft-failover to the
  replica machine, making sure that all replication
  events from the ex-master have been synchronised.


Something more than sync_shutdown_file plus automatic retries on
recent work files?


I think the problem at the moment is that the process you really want is:

1. Stop new imap/pop/lmtp/sieve/etc connections
2. Finish and close existing connections cleanly but as quickly as possible
3. Finish running any sync log files
4. Fully shutdown

There's currently no clean way to do this. Basically you have to SIGTERM 
master which hard kills it and all children, then manually run 
sync_client -f on any remaining log files.


We've got a patch which makes master handle SIGQUIT much more nicely. 
Basically it appears there was some existing infrastructure that was 
designed to handle a cleaner shutdown, look at the code to all the places 
that call signals_poll(). It looks like the idea was that you could send 
child processes SIGQUIT and they would continue their current action until 
their main loop and check if they'd been sent a QUIT, and then exit 
cleanly. Unfortunately if you sent SIGQUIT to master, it would just SIGTERM 
all children, not SIGQUIT them.


This patch attempts to fix this, so that sending SIGQUIT to master, sends 
SIGQUIT to all children, and then waits for them to all exit cleanly.


http://cyrus.brong.fastmail.fm/#cyrus-clean-shutdown-2.3.8.diff

This solves step 1  2 above, though it doesn't deal with the case of a 
crazy child that doesn't respond to SIGQUIT. Personally our init script 
sends SIGQUIT, and if the master process is still there after 10 seconds, 
then it sends SIGTERM to force and exit. In general we find that everything 
exits after a couple of seconds of SIGQUIT.


To do step 3, I think the best might be to have a new cyrus.conf section, a 
SHUTDOWN section which gives some commands to run on shutdown. Basically 
after all children have accepted a SIGQUIT and exited, then we run the 
SHUTDOWN section, which would run a final sync_client -r on the sync dir to 
finish up any remaining log files.


With all of that in place, it means you could send a SIGQUIT to a cyrus 
master process on a master server, and it would cleanly shutdown all 
children and ensure that all replication events have been correctly played 
to the replica. You could then do the same to the replica, then reverse 
their roles, and bring them both back up and you've got a safe soft 
failover.



At the moment we replace messages (on the master knows best principle).

It would be easy enough to leave message in place and generate warnings 
instead, although this would generate a lot of warnings, one for every bad 
message every time that a given mailbox is updated.


That's what this patch does.

http://cyrus.brong.fastmail.fm/#cyrus-warnmismatcheduuids-2.3.8.diff

In theory with clean soft failovers, you should NEVER have UIDs with 
mismatched UUIDs. After a hard failover, you obviously might, but in those 
cases, just replacing the message means we're almost certainly overwriting a 
delivered message and loosing it which is bad. At least making it an option 
to overwrite or log I think is a sane idea.


My nightmare scenario is a replication engine which carries on running in 
the face of mboxlist corruption on the master: you could lose a lot of 
mailboxes on the replica that way.


That would be bad, though hard to detect and stop. I guess that's what 
backups are for...



It would be easy enough to generate multiple replication log files.

MySQL keeps a single transaction log for multiple replicas, but that file 
contains quite a lot of information about each transaction. In contrast 
the Cyrus sync log is just a list of objects we need to pay attention to: 
the files have much less state, particularly without duplicates.


The other option is rather than using the rotate log, play it, delete it 
system, you generate one log file but you keep track of offsets within the 
file to tell you where each replica is up to. That's what mysql does, so you 
can have multiple replicas because each replica is playing off the same 
log files, they're just up to different offsets at any point in time.


Rob



Re: Making Replication Robust

2007-10-08 Thread Rudy Gevaert

Hello,

I agree with Bron.  However I do think some parts are more important 
than others.  I'll try to explain my point of view.


Note, we are running 2.3.7, I'm going to upgrade when 2.3.10 is out.  We 
have replication in place, but daren't use it.  If I have a method to 
check if the replica is in sync then I'll dare to do a fail over.


For me points a, e and f are most important, but the others are also 
important.


Bron Gondwana wrote:


So I'd like to start a dialogue on the topic of making Cyrus
replication robust across failures with the following goals:

a) MUST never lose a message that's been accepted for 
   delivery except in the case of total drive failure.


b) MUST have a standard way to integrity check and 
   repair a replica-pair after a system crash.


Do you mean that if the replica crashes it should be able to catch up again?



c) MUST have a clean process to soft-failover to the 
   replica machine, making sure that all replication

   events from the ex-master have been synchronised.


In deed this is nice, but it would still need a lot of site specific 
tools.  E.g. I know (I think I do) that Fastmail runs master/replica in 
the same subnet.  We don't.  So soft-failover isn't that easy.


For us it's more important that all mail that isn't delivered gets 
queued at the MTA (it's not on the same machine as cyrus).  All 
delivered mails are replicated. We then still need to update the DNS or 
/etc/hosts file.



d) MUST have replication start/restart automatically when
   the replica is available rather than requiring it be 
   online at master start time.


This would be great if there are some tools available for doing 
automatic failover, recovery, ...



e) SHOULD be able to copy back messages which only exist
   on the replica due to a hard-failover, handling UIDs 
   gracefully (more on this later), alternatively as least

   MUST (to satisfy point 'a') notify the administrator
   that the message has different GUIDs on the two copies
   and something will need to be done about it (to satisfy
   point 'd' this must be done without bailing out 
   replication for the remaining messages in the folder)


f) SHOULD keep replicating in the face of an error which
   affects a single mailbox, keeping track of that mailbox
   so that a sysadmin can fix the issue and then replicate
   that mailbox hand.

g) MAY have a method to replicate to two different replicas
   concurrently (replay the same sync_log messages twice)
   allowing one replica to be taken out of service and
   a new one created while having no gaps in which there
   is no second copy alive (we use rsync, rsync again,
   stop replication, rsync a third time, start replication
   to the new site - but it's messy and gappy)


Is again a good idea, and would be very usable.  But this is depending 
what you will be doing with the second replica.  If it would be possible 
to take out the second replica, to make it conssistent and back it up, 
and then make it up to date it would be a neat way have consistent backup.


Kind regards,

Rudy


--
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
Rudy Gevaert  [EMAIL PROTECTED]  tel:+32 9 264 4734
Directie ICT, afd. Infrastructuur ICT Department, Infrastructure office
Groep SystemenSystems group
Universiteit Gent Ghent University
Krijgslaan 281, gebouw S9, 9000 Gent, Belgie   www.UGent.be
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --


Re: Making Replication Robust

2007-10-08 Thread Bron Gondwana
On Mon, Oct 08, 2007 at 10:03:31AM +0200, Rudy Gevaert wrote:
 For me points a, e and f are most important, but the others are also 
 important.

 Bron Gondwana wrote:

 So I'd like to start a dialogue on the topic of making Cyrus
 replication robust across failures with the following goals:
 a) MUST never lose a message that's been accepted fordelivery except 
 in the case of total drive failure.
 b) MUST have a standard way to integrity check andrepair a 
 replica-pair after a system crash.

 Do you mean that if the replica crashes it should be able to catch up 
 again?

No, when a master fails and replication wasn't 100% up to date
and you decided to bring the replica online, then later switched
back to the original master, you don't overwrite messages.

 c) MUST have a clean process to soft-failover to thereplica machine, 
 making sure that all replication
events from the ex-master have been synchronised.

 In deed this is nice, but it would still need a lot of site specific tools. 
  E.g. I know (I think I do) that Fastmail runs master/replica in the same 
 subnet.  We don't.  So soft-failover isn't that easy.

True - it's easy for us because we have different configs that bind
to the same IP address and use arp broadcasts so nothing else needs
to change.

The bit I care more about is that you can shut a master down cleanly
and guarantee that all replication events finish sending as part of
the shutdown process.  We already do this with an external init script
(written in Perl) but would prefer that it's a general option available
to everyone and supported upstream.

 For us it's more important that all mail that isn't delivered gets queued 
 at the MTA (it's not on the same machine as cyrus).  All delivered mails 
 are replicated. We then still need to update the DNS or /etc/hosts file.

We have that too of course, it's more the ones that are delivered but
not yet replicated when we call shutdown that matter (see also APPEND)

 d) MUST have replication start/restart automatically when
the replica is available rather than requiring it beonline at 
 master start time.

 This would be great if there are some tools available for doing automatic 
 failover, recovery, ...

Yeah, we get this with the '-o' option to sync_client meaning it
just doesn't start replicating, but then monitorsync.pl runs every
10 minutes from cron, and it checks that there are running 
sync_client processes for each master and attempts to start them
if the replica is marked as up in the database.  It also deals
with old log files left lying around.

 e) SHOULD be able to copy back messages which only exist
on the replica due to a hard-failover, handling UIDsgracefully 
 (more on this later), alternatively as least
MUST (to satisfy point 'a') notify the administrator
that the message has different GUIDs on the two copies
and something will need to be done about it (to satisfy
point 'd' this must be done without bailing outreplication for the 
 remaining messages in the folder)
 f) SHOULD keep replicating in the face of an error which
affects a single mailbox, keeping track of that mailbox
so that a sysadmin can fix the issue and then replicate
that mailbox hand.
 g) MAY have a method to replicate to two different replicas
concurrently (replay the same sync_log messages twice)
allowing one replica to be taken out of service and
a new one created while having no gaps in which there
is no second copy alive (we use rsync, rsync again,
stop replication, rsync a third time, start replication
to the new site - but it's messy and gappy)

 Is again a good idea, and would be very usable.  But this is depending what 
 you will be doing with the second replica.  If it would be possible to take 
 out the second replica, to make it conssistent and back it up, and then 
 make it up to date it would be a neat way have consistent backup.

Yeah, that's a point.  That would be very nice :)  We're generally
doing it because we want to take a drive unit out of service, or
even a whole machine, and we'd rather not have a gap where there's
only one live copy of data.

I've been thinking evil thoughts about writing a sync_server protocol
compatibility library and poking cyrus through it.  We already run a
sync_server on our masters as well because we use it for user moves:

*) create custom config file and mailboxes.db snippet
*) sync user to new store using custom config
*) lock the user against lmtp/pop/imap at the proxy level and
   kill off all current connections (scans $confdir/proc)
*) sync user again
*) run checkreplication.pl in paranoid mode to make sure
   everything actually matches
*) update database field for store name and broadcast a cache
   invalidation packet to all the apps that cache user data
   (again, one subnet makes broadcast cache management reasonable)
*) re-enable delivery and logins.

Generally takes about 15 seconds for the critical path