Good issue, I totally had the same concerns, so we built our own
system. As a side note we run an admin server which we use to
generate reports, run backups and so on - it takes load off the
production servers, where speed is critical. Recovery from backup
however is a whole other issue, and a corrupting error gets
replicated very quickly - With the new system we use we have cut
recover from backup from 4 hours to 30 minutes, and the data is newer.
Our approach is a little different, but basically the same. In an
ideal world I'd like to see MySQL add a feature to replication that
allowed me to set a variable to control how quickly the SQL_THREAD in
replication executes it's queries... eg a variable such as
replication-delay=3600 could tell MySQL's replication thread to hold
off executing any command until the time is 3600 seconds beyond the
timestamp in the binary log.
Some of my questions:
1) What are the benefits to using relay_log_file and relay_log_pos
instead of master_log_file and master_log_pos? that the slave
binlogs would already exist locally? Perhaps that's good or bad?
thoughts?
Relay logs are better to use for this for one major reason - Assuming
one of the reasons this server exists is to provide backup to the
primary then having the data copied from the master server to the
slave server provides a copy of the logs where you need them in the
event of a hardware failure on the master - or in other words, if you
manipulate the slave's SQL_THREAD and keep the IO_THREAD running you
are copying your data pretty close to instantly, so you have it
somewhere else. Most people would put that in the "good" category.
the solution I proposed above does this too, keeping the data copied
off the master all the time. Our home built set of scripts doesn't do
this, we manage the process through controlling the IO_THREAD, it's
easy, and we have two primary servers running as a pair so we have a
live "backup" on a separate system. We would however prefer to have
the data in the relay log current all the time, and manipulate the
SQL_THREAD.
2) Has anyone done something like this?
Yes.. our "like this" is simple scripts to start and stop the
IO_THREAD and SQL_THREAD at certain times, run by cron. Specifically
the sequence (which repeats every two hours) goes like this:
4:00pm stop SQL_THREAD
4:01pm flush logs, start IO_THREAD
4:05pm stop IO_THREAD
4:10pm start SQL_THREAD
The net impact of this is that the data on the admin server is on the
low side 5 minutes behind live, and on the high side a little over 2
hours behind.
We also set all the "start" scripts to first check for the existence
of a file (/var/mysql/replicate) and if the file doesn't exist, don't
start anything... and we have a script that stops all replication and
nukes the test file, which can be run on the machine and is actually
tied to a tcp port so all we have to do is hit a specific port with a
telnet connection or a web browser and replication is immediately
stopped and won't be started again by cron until the file get's put
back, so we have managed the emergency stop issue (thus we felt
comfortable with a 5 minute low on this process).
By having the logs flush on the slave and the master every 2 hours as
part of this process we have small chunks of binary log we can apply
to an overnight backup if we miss the replication stop before the
disaster hits this server... making recovery to a fairly recent point
in time simple - we could then spend some time munging through the
relevant binary log to eliminate the corrupting event before applying
the rest of them, while our services is back online.
3) If I made it robust and flexible would people be interested in it?
Because our production servers are a MASTER-MASTER pair we are kind
of OK with the method we use in controlling the IO_THREAD in this
way, but I do acknowledge there is an attraction to having the relay
log (flushed regularly) have the latest data thus ensuring the admin
server has all the data in one form or another, even if it's not
actually executed on the database. So MAYBE we would be interested.
4) Is there a better way?
Yes - I still firmly believe the BEST solution here should come from
MySQL... give us a replication-delay type variable that allows us to
set an implementation delay on replication, the relay logs are up to
date and the queries are executed by the SQL_THREAD after nnnn
seconds from the timestamp in the original binary log - no need to
change the log formats, fairly simple piece of code to add at MySQL's
end... clearly the default value would be 0 so it only comes into
play when someone wants it... I'd much rather set the value to 30
minutes or an hour or something and no exactly how far behind my
backup server is, instead of the sliding 2 hour range I have now...
Outside of a MySQL based solution, our system works great, it's OK to
manage it without keeping relay logs current because the production
pair means there's two copies of the data live protecting us from
hardware failure. Something like what you describe may or may not be
worth doing for us.
Best Regards, Bruce
Here's a simplified version of our scripts (simplified because our
admin server has multiple instances of mysql it manages):
Crontab:
0 */2 * * * /usr/local/bin/sqlpause stop > /dev/null 2>&1
1 */2 * * * /usr/local/bin/iopause start > /dev/null 2>&1
4 */2 * * * /usr/local/bin/iopause stop > /dev/null 2>&1
10 */2 * * * /usr/local/bin/sqlpause start > /dev/null 2>&1
[mysql-admin:/usr/local/bin] root# more sqlpause
#!/bin/sh
#
#
# Bruce's MySQL Replication Management Scripts
#
# Suitable for use on LiveWorld's network only... There
# are other ways that may be more effective for other
# networks or setup, refer to MySQL documentation
#
#
# check to see if we have been told to "start"
#
if [ $1 = "start" ]
then
#
# First Flush local binlogs
#
mysqladmin --socket=/tmp/mysql.sock flush-logs
#
# Then start SQL thread
#
if [ -f /var/mysql/replicate ]
then
mysql --socket=/tmp/$mysql.sock -e "SLAVE START SQL_THREAD;"
fi
#
# If we are not told to start, then stop
#
else
mysql --socket=/tmp/mysql.sock -e "SLAVE STOP SQL_THREAD;"
done
fi
[mysql-admin:/usr/local/bin] root# more iopause
#!/bin/sh
#
#
# Bruce's MySQL Replication Management Script
#
# Suitable for use on LiveWorld's network only... There
# are other ways that may be more effective for other
# networks or setup, refer to MySQL documentation
#
host="insert.your.master.hostname.here"
#
# check to see if we have been told to "start"
#
if [ $1 = "start" ]
then
#
# First Flush remote binlogs
#
mysqladmin -h $host flush-logs
#
# Then start IO thread if Replication is Authorized
#
if [ -f /var/mysql/replicate ]
then
mysql --socket=/tmp/mysql.sock -e "SLAVE START IO_THREAD;"
fi
#
# If we are not told to start, then stop
#
else
mysql --socket=/tmp/mysql.sock "SLAVE STOP IO_THREAD;"
fi
And the emergency stop script:
[mysql-admin:/usr/local/bin] root# more noreplicate
#!/bin/sh
#
#
# Bruce's MySQL Replication Authorization Script
#
#
# The assumption here is that this script must stop replication
# and by removing the file /var/mysql/replicate prevent
# replication from starting again as all scripts used to start
# replication check for the existence of this file
#
rm /var/mysql/replicate
/usr/local/bin/sqlpause stop
/usr/local/bin/iopause stop
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]