Hi!
When not having any cluster manager, there's 3 main
issues what you have to deal with:
1) hearbeat - verifying whether primary node (or
other nodes) are alive
2) storage - making failed node's storage
accessible to backup node
3) connectivity - allowing clients to transparently
connect to another node (usually IP address transfer)
1) If you got max 15 min switchover time, it could
be done with a script on secondary or monitoring node, which actually verifies
whether your Oracle database is running (the easiest is
just connect and select * from dual in a script every minute). If either
connect or select fails, you take appropriate steps described below
2) storage - both of your servers have to see
the disks of course. If you got SAN over fibre, it's no problem to share
disks between several servers. With some external SCSI arrays it shouldn't be a
problem either to share between 2 servers.
When your heartbeat mechanism detects that your
primary Oracle service isn't running, it first tries to kill Oracle and unmount
file systems on primary server - two nodes writing to the same data without
coordination will cause a mess. The kill & unmount could be done by
secondary or monitoring server using rsh, ssh or whatever remote exec mechanism.
If remote exec doesn't work, then we rely on ping, to see whether the primary
host is alive. When it isn't alive, we are free to mount file system on
secondary node and start the instance (of course the primary node should not
have automatic instance startup scripts in it's rc.d). The problematic issue is,
when ping show primary instance as alive, but remote exec to
shutdown&unmount fails. This is the place, where cluster managers should be
better than home-made high-availability solutions.
3) For connectivity to be directed to backup node
you either try to transfer the IP in a script (I don't know solaris commands by
heart, you could just have two sets of network config files as well). Other
solution would be to play around with tnsnames.ora entries, which always have
primary host IP first in address list and secondary host as second. Also you got
to set fail_over or smth like that parameter in tnsnames.
If you do have lot's of storage space or do not
have sharable storage, then go with standby databases (can be done with standard
ed. too) and forget about issues in point 2.
So, you got to do some planning and scripting for
that, but cheap HA for simple systems is very possible. I personally
like these simple solutions over expensive software packages, guards, agents,
which often bring an additional layer of complexity to sysadmins jobs and don't
always work as expected themselves. But of course, these home-made one-weekend
solutions aren't appropriate everywhere...
Tanel.
|
Title: Recommendation for "cheap" HA solution
- Recommendation for "cheap" HA solution Daiminger, Helmut
- Re: Recommendation for "cheap" HA solutio... kathy . robb
- RE: Recommendation for "cheap" HA solutio... Tanel Poder
- RE: Recommendation for "cheap" HA solutio... Stephen Lee