> I was considering having a floating IP for each of the machines, so that
> if one dies, the other takes over the others IP address, thus making
> changes at the application level unnecessary really.

I am no expert on playing around with IP addresses, but I would think this
a rather dodgy option. Wouldn't connections which you appear to have open
still get through, and connect to something unexpected? Dynamic DND would
probably work. I cannot guarantee access to the DNS system I (or rather, my
customers) are using, so this is not an option. I have therefore had to
implement failover at the application level.

> > Reconstruction when the failed machine comes back is
> > more of a problem.
> I would imagine that taking a snapshot of the databases and restarting
> replication should solve that one tho?

The problem is more ensuring that things do *not* start up unexpectedly. If
the slave has suffered only a short outage, then comes back up again, it
will try and restart replication. But it must not do so because it is no
longer the master and its databases are now out of date.  I therefore have
the following features:
On failover, the surviving machine is told to stop replication from the
deceased, even if it returns.
Machines are not set to start slaving automatically at powerup. Instead the
application level checks to see if the two are in sync (by a special
one-entry table incremented every time the system cold starts) and only
starts the slaving process if both are at the same synch level.
When the deceased machine does return, the application orders it to drop
and reload the databases from the master. Once this has done, slaving can
resume.

I use not circular but linear replication. A->B->C->... A is write master
for all tables, but B, C and D can be use as read-only copies for queries.
Since I have probably a 4:1 read to write ratio, this balances quite well.

> Using circular replication, I imagine I could have N machines, with each
> machine having its own RW DB, and each machine having N-1 RO dbs?
> Obviously fail over in this instance would be more of a problem to deal
> with, but manageable.

Yes

      Alec




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to