Re: Funding Cyrus High Availability

Paul Dekkers Mon, 20 Sep 2004 02:21:18 -0700

David Carter wrote:

5. Active/Active
designate one of the boxes as primary and identify all items in the datastore that absolutly must not be subject to race conditions between the two boxes (message UUID for example). In addition to implementing the replication needed for #1 modify all functions that need to update these critical pieces of data to update them on the master and let the master update the other box.
We may be talking at cross purposes (and its entirely likely that I've
got the wrong end of the stick!), but I consider active-active to be
the case where there is no primary: users can make changes to either
system, and if the two systems lose touch with each other they have
to resolve their differences when contact is reestablished.

I'd go for #5 as well: Since this is a setup where there is no primary at all, I suppose this is quite some different design then the #1-4 solutions. And because of that, I would think that it's rather useless to have these steps done in order to get #5 right, but I might as well be wrong.

I would be most happy when the work would start on #5. Personally I don't care that much at this moment for #6, but I can imagine that this is different for others. But well; if the design is that every machine tracks changes and they have them propagated (actively or passively) to n hosts (it's not so hard to keep track of that, "all hosts had this change; remove it") there is no risk of missing things or not recovering I guess. (It's only possible that a slave is out of sync for a very short time, and well - why would that be so wrong? And if that is so wrong, then maybe fix that later since this would make the work easier?)

This could be the task of the cyrus daemon, but it can as well be the work of murder as Jure suggests. (Or both?) I'm not entirely sure that that is what we want, but it could be done if that fits nicely (and it can be asured that there is always a murder to talk to).

If there is a problem with UID selection, I don't see a problem in that one of the servers is responsible for that task. We don't even need an election system for that, you could define a sequence for the servers; if a server with the highest preference is down, then take over its job. It's just that for the users the machines should appear all active. (And that in case of failover the remaining machines remain active, and not readonly or only active after manual intervention.)

Paul


---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: Funding Cyrus High Availability

Reply via email to