[osol-discuss] Re: Distributed File System for Solaris

Eric Fluger Sun, 14 May 2006 10:38:44 -0700

a few observations on this: 

-- andrew:


-- --  syncronous (real time) mirroring is best if it is practical, however, 
asyncronous ackowledgements can be acceptable if both the primary and secondary 
mirrors do not have be accessble to users at the same time, which is the case 
if i understand the problem description correctly.

-- -- i definately suggest taking a look at sun availability suite, 
specifically the "geographic cluster edition."   sun-cluster has provisions for 
syncronous mirroring and the geographic edition, which is designed for use over 
large geographic areas, has provisions for asyncronous mirroring.   

-- -- the main attraction of rsync style solutions is their ability to work 
with high latency communications between sites.  from cursory examination, it 
appears thet the asyncronous mirroring features in sun-cluster geographic 
edition solve the same problem, and in a better way than rsync.   setup may be 
more complex for you though.  

-- it's been some time since i worked with sun-cluster, and i haven't used the 
geographic edition.  it would be nice to get some input from someone who has.

-- regarding sync vs async:  

-- -- updates of a remote copy of a local/remote mirrored pair involve a write 
to each copy and an acknowledgement from each copy.  

in what we usually call "syncronous mirroring" writes to both copies are part 
of the same opperation and begin at the same time.  the transaction dose not 
complete until both acknowledgements are received.  in cases where latency 
tends to delay the ackowledgement of the remote write this can slow performance 
of the local host.  this can occur due to slow communicans, long distances or 
both.  

In cases where latency is a problem and both copies do not need to be writable 
by users at the same time, "asyncronous mirroring" can be used.   In this case, 
the local host does not wait for the acknowledgement of the remote write.  
however, both writes still begin at once.   sun-cluster's "asyncronous 
mirroring" involves syncronous writes with asyncronous acknowledgements.  the 
value in the current context is that it can be used via tcp/ip.  so there is a 
sun-cluster option that can be used without san hardware or fc via optical 
fiber between buildings, if that stuff is not available.

-- -- the rsync is a different animal in that it is not tied in with the file 
system.  it is not automatically triggered by a write and does not know where 
new writes are when it starts running.  it has to find them.   so not only are 
the acknowledgements asyncronous, the writes are (very) asyncronous as well.  

-- -- AFAIK fully syncronous mirroring is always the most desirable solution 
when it can be implemented.   it is not only better from a data-integrity 
standpoint, it can be more convenient for both the administrator and the users. 
 However, if constraints imposed by geography, budget, schedule, or whatever, 
prevent this, an asyncronous solution can be a necessary evil and can be made 
to work in a variety of circumstances.  

-- -- AFAIK there are two major issues with asyncronous mirroring: 

-- -- -- keeping only one copy of the data accessable to users at a time. 

-- -- -- making sure the copy that is about to become accessable to users is up 
to date.  

-- -- when fail-overs are done deliberately, it's not terribly difficult to 
arrange these things.  when fail-overs are automatic things get a bit more 
complicated.  presumably, sun-cluster geographic edition is smart enough to 
manage this stuff well.  for example it probably knows that  if the heartbeat 
from the other host stops, that udates that are pending or in progress will not 
complete, and do something sensible about that. 

-- -- one potential gotcha for a cluster is if the heartbeat and the disk data 
use different communications media followind differnt physical routes which can 
be interupted differently, it's possible to wind up with a situation where the 
heartbeat is still good but the disk data transmition is interupted, in which 
case open disk transactions will never complete.   dunno if sun-cluster has a 
mechanism for dealing with this, but it's something to watch out for.  (if 
there's construction going on you've the possbility of someone taking a sawzall 
to a wall and cutting some but not all network cables.) 

-- if the periods of unusually high risk are known (the weekends were mentioned 
in the original message)  the following scenario would work: 

-- -- only the local file system is available to users dufing the week.

-- -- the remote file system is updated incrementaly during the week with rsync 
(or a derivative) to keep  the frinday night tranfer managable.  (it could run 
every twenty minutes or so.  note that the script it runs from should check to 
see if there's already an instance running.  if this is too bandwidth piggy for 
biz hours, it could run once over night.)

-- -- the remote file system is udpdated on friday nights. 

-- -- users are failed over deliberately after the update.  only the remote 
file system is available during the weekend. 

-- -- the local file system is updated via rsync (or derivative) on sunday 
night.  

-- -- users are deliberately failed back after the update.  

granted, not nearly as elegent as a sun-cluster solution, but may be easier to 
implement in some cases, and if the problem is as circumscribed as i'm 
inferring, it will probably suffice.   the big drawbacks to this kind of 
solution are  

-- -- you have to invent your own failover mechanism.  (which could just be a 
matter of including updates to a dns entry or client vfstabs in your script, 
but you still need to figure it out, write it and test it.)

-- -- it's a one shot bandaid.   a sun-cluster solution is something with 
enduring value that can be built on for the future.
 
 
This message posted from opensolaris.org
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

[osol-discuss] Re: Distributed File System for Solaris

Reply via email to