David Carter wrote:

5. Active/Active

designate one of the boxes as primary and identify all items in the datastore that absolutly must not be subject to race conditions between the two boxes (message UUID for example). In addition to implementing the replication needed for #1 modify all functions that need to update these critical pieces of data to update them on the master and let the master update the other box.

We may be talking at cross purposes (and its entirely likely that I've got the wrong end of the stick!), but I consider active-active to be the case where there is no primary: users can make changes to either system, and if the two systems lose touch with each other they have to resolve their differences when contact is reestablished.

I'd go for #5 as well:
Since this is a setup where there is no primary at all, I suppose this is quite some different design then the #1-4 solutions. And because of that, I would think that it's rather useless to have these steps done in order to get #5 right, but I might as well be wrong.


I would be most happy when the work would start on #5. Personally I don't care that much at this moment for #6, but I can imagine that this is different for others. But well; if the design is that every machine tracks changes and they have them propagated (actively or passively) to n hosts (it's not so hard to keep track of that, "all hosts had this change; remove it") there is no risk of missing things or not recovering I guess. (It's only possible that a slave is out of sync for a very short time, and well - why would that be so wrong? And if that is so wrong, then maybe fix that later since this would make the work easier?)

This could be the task of the cyrus daemon, but it can as well be the work of murder as Jure suggests. (Or both?) I'm not entirely sure that that is what we want, but it could be done if that fits nicely (and it can be asured that there is always a murder to talk to).

If there is a problem with UID selection, I don't see a problem in that one of the servers is responsible for that task. We don't even need an election system for that, you could define a sequence for the servers; if a server with the highest preference is down, then take over its job. It's just that for the users the machines should appear all active. (And that in case of failover the remaining machines remain active, and not readonly or only active after manual intervention.)

Paul


--- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Reply via email to