Re: Distributed/Fault Tolerant DB operation... possible?

Michael Conlen Thu, 19 Jun 2003 14:11:49 -0700

First get an acceptable outtage rate. Your only going to get so many nines, and your budget depends on how many. The system will fail at some point, no matter what, even if it's only for a few seconds. That's reality. Figure out what kinds of failures you can tolerate based on how many 9's you get and what kinds you have to design around. From there you can figure out a budget. 99.999% uptime is 5 minutes and 15 seconds per year of total downtime. 99.99% is 52.56 minutes and so on. At some point something will happen, and I've never seen anyone offer more than 5 9's, and IBM charges a lot for that. Then, figure out everything that could cause an outtage, figure out how to work around them and give them a budget. Watch how many 9's come off that requirement.

If you have to use MySQL I'd ditch PC hardware and go with some nice Sun kit if you haven't already, or maybe a IBM mainframe. Sun's Ex8xx line should let you do just about anything without taking it down (like change the memory while it's running). Then I'd get a bunch of them. Then I'd recode the application to handle the multiple writes to multiple servers and keep everything atomic, then test the hell out of it. There's a lot of issues to consider in there, and you probably want someone with a graduate degree in computer science to look over the design for you. (anything this critical and I get someone smarter than me to double check my designs and implementations). It may be best to just build it in to the driver so the apps are consistent.

On the other hand, if you have all this money, look at some of the comerical solutions. This is probably heresy on this list, but hey, it's about the best solution for the needs right? Sybase or DB2 would be my first choices depending on the hardware platform (Sun or Mainframe). The systems are setup to handle failover of the master server. I know for Sun you want to be looking at Sun Clustering technology, a nice SAN and a couple of nice servers. You write to one server, but when it fails the backup server starts accepting the write operations as if it were the master. There's a general rule with software engineering that says "if you can buy 80% of what you want, your better off doing that than trying to engineer 100%"

Think about the networking. two datapaths everywhere there's one. Two switches, two NIC cards for each interface, each going to a different switch.

Depending on where your "clients" are you need to look at your datacenter. Is your database server feeding data to clients outside your building? If so you probably want a few servers in a few different datacenters. At least something like one on the east coast and one on the west coast in the US, or the equivelent in your country, both of whom have different uplinks to the Internet. Get portable IP addresses and do your own BGP. That way if a WAN link fails the IP addresses will show up on the other WAN link even though it's from a different provider.

This is just a quick run down of immediate issues in a 24x7x365, it's not exhaustive. Think about every cable, every cord, every component, from a processor to a memory chip and think about what happens when you pull it out or unplug it, then make it redundant.

--
Michael Conlen

Rick Franchuk wrote:

Hi guys,

I've been doing some digging around and found some information about master/slave database duplication, but it always sees to focus on increasing query performance by spreading the db out.

My situation is that there's a database which must absolutely, guaranteedly be operational 24x7x365 always forever. It must survive and still be operational through power failures, machine locks, and any other manner of scheduled or unscheduled downtime short of a bomb dropping on the co-lo.

This would be relatively easy to do if the system was purely read-only: I'd simply duplicate my data across numerous machines and pull queries from them, perhaps on the other side of a load balancer to make no one machine have to be too painfully hit.

However, this system is write-heavy (at least 50%, with periods of time reaching 80% or more). Therefore, I need to be able to do a store to one of the servers, and have that store propogate to the other machines (with appropriate software design to compensate for propogation delays and insert-order neutrality).

Has anyone done this with two (or more, if possible!) machines? Is it possible to do at the present time?


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Distributed/Fault Tolerant DB operation... possible?

Reply via email to