Title: Recommendation for "cheap" HA solution
Hi!
 
When not having any cluster manager, there's 3 main issues what you have to deal with:
 
1) hearbeat - verifying whether primary node (or other nodes) are alive
2) storage - making failed node's storage accessible to backup node
3) connectivity - allowing clients to transparently connect to another node (usually IP address transfer)
 
1) If you got max 15 min switchover time, it could be done with a script on secondary or monitoring node, which actually verifies whether your Oracle database is running (the easiest is just connect and select * from dual in a script every minute). If either connect or select fails, you take appropriate steps described below
2) storage - both of your servers have to see the disks of course. If you got SAN over fibre, it's no problem to share disks between several servers. With some external SCSI arrays it shouldn't be a problem either to share between 2 servers.
When your heartbeat mechanism detects that your primary Oracle service isn't running, it first tries to kill Oracle and unmount file systems on primary server - two nodes writing to the same data without coordination will cause a mess. The kill & unmount could be done by secondary or monitoring server using rsh, ssh or whatever remote exec mechanism. If remote exec doesn't work, then we rely on ping, to see whether the primary host is alive. When it isn't alive, we are free to mount file system on secondary node and start the instance (of course the primary node should not have automatic instance startup scripts in it's rc.d). The problematic issue is, when ping show primary instance as alive, but remote exec to shutdown&unmount fails. This is the place, where cluster managers should be better than home-made high-availability solutions.
3) For connectivity to be directed to backup node you either try to transfer the IP in a script (I don't know solaris commands by heart, you could just have two sets of network config files as well). Other solution would be to play around with tnsnames.ora entries, which always have primary host IP first in address list and secondary host as second. Also you got to set fail_over or smth like that parameter in tnsnames.
 
If you do have lot's of storage space or do not have sharable storage, then go with standby databases (can be done with standard ed. too) and forget about issues in point 2.
 
So, you got to do some planning and scripting for that, but cheap HA for simple systems is very possible. I personally like these simple solutions over expensive software packages, guards, agents, which often bring an additional layer of complexity to sysadmins jobs and don't always work as expected themselves. But of course, these home-made one-weekend solutions aren't appropriate everywhere...
 
Tanel.
----- Original Message -----
Sent: Thursday, July 24, 2003 5:49 PM
Subject: Recommendation for "cheap" HA solution

Hi!

We are looking into establishing some sort of high availability solution here. We are running 9.2.0 on Sun Fire 280 (2 processors).

Since we are on a tight budget, we are looking into various solutions for HA.

One option would be to use Sun Cluster Server or Veritas Cluster Server. If one box fails, the db just fails over to the other node. The problem is that we don't have a cluster guy here...

The other Option would be to use RAC, but this is the most expensive solution, I guess...

Does anybody use any other HA solution that is affordable? Failover time should be less than 15 minutes, although "frequent" outages (i.e. once a month or so) are tolerable.

Don't blame me for these requirements; it was not my idea...

This is 9.2.0 on Sun Solaris.

Thanks,
Helmut

Reply via email to