> I was considering having a floating IP for each of the machines, so that > if one dies, the other takes over the others IP address, thus making > changes at the application level unnecessary really.
I am no expert on playing around with IP addresses, but I would think this a rather dodgy option. Wouldn't connections which you appear to have open still get through, and connect to something unexpected? Dynamic DND would probably work. I cannot guarantee access to the DNS system I (or rather, my customers) are using, so this is not an option. I have therefore had to implement failover at the application level. > > Reconstruction when the failed machine comes back is > > more of a problem. > I would imagine that taking a snapshot of the databases and restarting > replication should solve that one tho? The problem is more ensuring that things do *not* start up unexpectedly. If the slave has suffered only a short outage, then comes back up again, it will try and restart replication. But it must not do so because it is no longer the master and its databases are now out of date. I therefore have the following features: On failover, the surviving machine is told to stop replication from the deceased, even if it returns. Machines are not set to start slaving automatically at powerup. Instead the application level checks to see if the two are in sync (by a special one-entry table incremented every time the system cold starts) and only starts the slaving process if both are at the same synch level. When the deceased machine does return, the application orders it to drop and reload the databases from the master. Once this has done, slaving can resume. I use not circular but linear replication. A->B->C->... A is write master for all tables, but B, C and D can be use as read-only copies for queries. Since I have probably a 4:1 read to write ratio, this balances quite well. > Using circular replication, I imagine I could have N machines, with each > machine having its own RW DB, and each machine having N-1 RO dbs? > Obviously fail over in this instance would be more of a problem to deal > with, but manageable. Yes Alec -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]