Re: Making Replication Robust

Rob Mueller Tue, 09 Oct 2007 16:52:06 -0700

c) MUST have a clean process to "soft-failover" to the
  replica machine, making sure that all replication
  events from the ex-master have been synchronised.


Something more than sync_shutdown_file plus automatic retries on
recent work files?


I think the problem at the moment is that the process you really want is:

1. Stop new imap/pop/lmtp/sieve/etc connections
2. Finish and close existing connections cleanly but as quickly as possible
3. Finish running any sync log files
4. Fully shutdown

There's currently no clean way to do this. Basically you have to SIGTERMmaster which hard kills it and all children, then manually runsync_client -f on any remaining log files.

We've got a patch which makes master handle SIGQUIT much more nicely.Basically it appears there was some existing infrastructure that wasdesigned to handle a cleaner shutdown, look at the code to all the placesthat call signals_poll(). It looks like the idea was that you could sendchild processes SIGQUIT and they would continue their current action untiltheir "main loop" and check if they'd been sent a QUIT, and then exitcleanly. Unfortunately if you sent SIGQUIT to master, it would just SIGTERMall children, not SIGQUIT them.

This patch attempts to fix this, so that sending SIGQUIT to master, sendsSIGQUIT to all children, and then waits for them to all exit cleanly.


http://cyrus.brong.fastmail.fm/#cyrus-clean-shutdown-2.3.8.diff

This solves step 1 & 2 above, though it doesn't deal with the case of a"crazy child" that doesn't respond to SIGQUIT. Personally our init scriptsends SIGQUIT, and if the master process is still there after 10 seconds,then it sends SIGTERM to force and exit. In general we find that everythingexits after a couple of seconds of SIGQUIT.

To do step 3, I think the best might be to have a new cyrus.conf section, aSHUTDOWN section which gives some commands to run on shutdown. Basicallyafter all children have accepted a SIGQUIT and exited, then we run theSHUTDOWN section, which would run a final sync_client -r on the sync dir tofinish up any remaining log files.

With all of that in place, it means you could send a SIGQUIT to a cyrusmaster process on a master server, and it would cleanly shutdown allchildren and ensure that all replication events have been correctly playedto the replica. You could then do the same to the replica, then reversetheir roles, and bring them both back up and you've got a safe softfailover.

At the moment we replace messages (on the "master knows best" principle).
It would be easy enough to leave message in place and generate warningsinstead, although this would generate a lot of warnings, one for every badmessage every time that a given mailbox is updated.


That's what this patch does.

http://cyrus.brong.fastmail.fm/#cyrus-warnmismatcheduuids-2.3.8.diff

In theory with clean soft failovers, you should NEVER have UIDs withmismatched UUIDs. After a hard failover, you obviously might, but in thosecases, just replacing the message means we're almost certainly overwriting adelivered message and loosing it which is bad. At least making it an optionto overwrite or log I think is a sane idea.

My nightmare scenario is a replication engine which carries on running inthe face of mboxlist corruption on the master: you could lose a lot ofmailboxes on the replica that way.

That would be bad, though hard to detect and stop. I guess that's whatbackups are for...

It would be easy enough to generate multiple replication log files.
MySQL keeps a single transaction log for multiple replicas, but that filecontains quite a lot of information about each transaction. In contrastthe Cyrus sync log is just a list of objects we need to pay attention to:the files have much less state, particularly without duplicates.

The other option is rather than using the "rotate log, play it, delete it"system, you generate one log file but you keep track of "offsets" within thefile to tell you where each replica is up to. That's what mysql does, so youcan have multiple replicas because each replica is "playing" off the samelog files, they're just up to different offsets at any point in time.

Rob

Re: Making Replication Robust

Reply via email to