a few observations on this: -- andrew:
-- -- syncronous (real time) mirroring is best if it is practical, however, asyncronous ackowledgements can be acceptable if both the primary and secondary mirrors do not have be accessble to users at the same time, which is the case if i understand the problem description correctly. -- -- i definately suggest taking a look at sun availability suite, specifically the "geographic cluster edition." sun-cluster has provisions for syncronous mirroring and the geographic edition, which is designed for use over large geographic areas, has provisions for asyncronous mirroring. -- -- the main attraction of rsync style solutions is their ability to work with high latency communications between sites. from cursory examination, it appears thet the asyncronous mirroring features in sun-cluster geographic edition solve the same problem, and in a better way than rsync. setup may be more complex for you though. -- it's been some time since i worked with sun-cluster, and i haven't used the geographic edition. it would be nice to get some input from someone who has. -- regarding sync vs async: -- -- updates of a remote copy of a local/remote mirrored pair involve a write to each copy and an acknowledgement from each copy. in what we usually call "syncronous mirroring" writes to both copies are part of the same opperation and begin at the same time. the transaction dose not complete until both acknowledgements are received. in cases where latency tends to delay the ackowledgement of the remote write this can slow performance of the local host. this can occur due to slow communicans, long distances or both. In cases where latency is a problem and both copies do not need to be writable by users at the same time, "asyncronous mirroring" can be used. In this case, the local host does not wait for the acknowledgement of the remote write. however, both writes still begin at once. sun-cluster's "asyncronous mirroring" involves syncronous writes with asyncronous acknowledgements. the value in the current context is that it can be used via tcp/ip. so there is a sun-cluster option that can be used without san hardware or fc via optical fiber between buildings, if that stuff is not available. -- -- the rsync is a different animal in that it is not tied in with the file system. it is not automatically triggered by a write and does not know where new writes are when it starts running. it has to find them. so not only are the acknowledgements asyncronous, the writes are (very) asyncronous as well. -- -- AFAIK fully syncronous mirroring is always the most desirable solution when it can be implemented. it is not only better from a data-integrity standpoint, it can be more convenient for both the administrator and the users. However, if constraints imposed by geography, budget, schedule, or whatever, prevent this, an asyncronous solution can be a necessary evil and can be made to work in a variety of circumstances. -- -- AFAIK there are two major issues with asyncronous mirroring: -- -- -- keeping only one copy of the data accessable to users at a time. -- -- -- making sure the copy that is about to become accessable to users is up to date. -- -- when fail-overs are done deliberately, it's not terribly difficult to arrange these things. when fail-overs are automatic things get a bit more complicated. presumably, sun-cluster geographic edition is smart enough to manage this stuff well. for example it probably knows that if the heartbeat from the other host stops, that udates that are pending or in progress will not complete, and do something sensible about that. -- -- one potential gotcha for a cluster is if the heartbeat and the disk data use different communications media followind differnt physical routes which can be interupted differently, it's possible to wind up with a situation where the heartbeat is still good but the disk data transmition is interupted, in which case open disk transactions will never complete. dunno if sun-cluster has a mechanism for dealing with this, but it's something to watch out for. (if there's construction going on you've the possbility of someone taking a sawzall to a wall and cutting some but not all network cables.) -- if the periods of unusually high risk are known (the weekends were mentioned in the original message) the following scenario would work: -- -- only the local file system is available to users dufing the week. -- -- the remote file system is updated incrementaly during the week with rsync (or a derivative) to keep the frinday night tranfer managable. (it could run every twenty minutes or so. note that the script it runs from should check to see if there's already an instance running. if this is too bandwidth piggy for biz hours, it could run once over night.) -- -- the remote file system is udpdated on friday nights. -- -- users are failed over deliberately after the update. only the remote file system is available during the weekend. -- -- the local file system is updated via rsync (or derivative) on sunday night. -- -- users are deliberately failed back after the update. granted, not nearly as elegent as a sun-cluster solution, but may be easier to implement in some cases, and if the problem is as circumscribed as i'm inferring, it will probably suffice. the big drawbacks to this kind of solution are -- -- you have to invent your own failover mechanism. (which could just be a matter of including updates to a dns entry or client vfstabs in your script, but you still need to figure it out, write it and test it.) -- -- it's a one shot bandaid. a sun-cluster solution is something with enduring value that can be built on for the future. This message posted from opensolaris.org _______________________________________________ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org