Good issue, I totally had the same concerns, so we built our own system. As a side note we run an admin server which we use to generate reports, run backups and so on - it takes load off the production servers, where speed is critical. Recovery from backup however is a whole other issue, and a corrupting error gets replicated very quickly - With the new system we use we have cut recover from backup from 4 hours to 30 minutes, and the data is newer.

Our approach is a little different, but basically the same. In an ideal world I'd like to see MySQL add a feature to replication that allowed me to set a variable to control how quickly the SQL_THREAD in replication executes it's queries... eg a variable such as replication-delay=3600 could tell MySQL's replication thread to hold off executing any command until the time is 3600 seconds beyond the timestamp in the binary log.

Some of my questions:
1) What are the benefits to using relay_log_file and relay_log_pos
instead of master_log_file and master_log_pos?  that the slave
binlogs would already exist locally?  Perhaps that's good or bad?
thoughts?

Relay logs are better to use for this for one major reason - Assuming one of the reasons this server exists is to provide backup to the primary then having the data copied from the master server to the slave server provides a copy of the logs where you need them in the event of a hardware failure on the master - or in other words, if you manipulate the slave's SQL_THREAD and keep the IO_THREAD running you are copying your data pretty close to instantly, so you have it somewhere else. Most people would put that in the "good" category. the solution I proposed above does this too, keeping the data copied off the master all the time. Our home built set of scripts doesn't do this, we manage the process through controlling the IO_THREAD, it's easy, and we have two primary servers running as a pair so we have a live "backup" on a separate system. We would however prefer to have the data in the relay log current all the time, and manipulate the SQL_THREAD.


2) Has anyone done something like this?

Yes.. our "like this" is simple scripts to start and stop the IO_THREAD and SQL_THREAD at certain times, run by cron. Specifically the sequence (which repeats every two hours) goes like this:

4:00pm stop SQL_THREAD
4:01pm flush logs, start IO_THREAD
4:05pm stop IO_THREAD
4:10pm start SQL_THREAD

The net impact of this is that the data on the admin server is on the low side 5 minutes behind live, and on the high side a little over 2 hours behind.

We also set all the "start" scripts to first check for the existence of a file (/var/mysql/replicate) and if the file doesn't exist, don't start anything... and we have a script that stops all replication and nukes the test file, which can be run on the machine and is actually tied to a tcp port so all we have to do is hit a specific port with a telnet connection or a web browser and replication is immediately stopped and won't be started again by cron until the file get's put back, so we have managed the emergency stop issue (thus we felt comfortable with a 5 minute low on this process).

By having the logs flush on the slave and the master every 2 hours as part of this process we have small chunks of binary log we can apply to an overnight backup if we miss the replication stop before the disaster hits this server... making recovery to a fairly recent point in time simple - we could then spend some time munging through the relevant binary log to eliminate the corrupting event before applying the rest of them, while our services is back online.

3) If I made it robust and flexible would people be interested in it?


Because our production servers are a MASTER-MASTER pair we are kind of OK with the method we use in controlling the IO_THREAD in this way, but I do acknowledge there is an attraction to having the relay log (flushed regularly) have the latest data thus ensuring the admin server has all the data in one form or another, even if it's not actually executed on the database. So MAYBE we would be interested.

4) Is there a better way?

Yes - I still firmly believe the BEST solution here should come from MySQL... give us a replication-delay type variable that allows us to set an implementation delay on replication, the relay logs are up to date and the queries are executed by the SQL_THREAD after nnnn seconds from the timestamp in the original binary log - no need to change the log formats, fairly simple piece of code to add at MySQL's end... clearly the default value would be 0 so it only comes into play when someone wants it... I'd much rather set the value to 30 minutes or an hour or something and no exactly how far behind my backup server is, instead of the sliding 2 hour range I have now...

Outside of a MySQL based solution, our system works great, it's OK to manage it without keeping relay logs current because the production pair means there's two copies of the data live protecting us from hardware failure. Something like what you describe may or may not be worth doing for us.

Best Regards, Bruce

Here's a simplified version of our scripts (simplified because our admin server has multiple instances of mysql it manages):

Crontab:

0 */2 * * * /usr/local/bin/sqlpause stop > /dev/null 2>&1
1 */2 * * * /usr/local/bin/iopause start > /dev/null 2>&1
4 */2 * * * /usr/local/bin/iopause stop > /dev/null 2>&1
10 */2 * * * /usr/local/bin/sqlpause start > /dev/null 2>&1


[mysql-admin:/usr/local/bin] root# more sqlpause
#!/bin/sh
#
#
# Bruce's MySQL Replication Management Scripts
#
# Suitable for use on LiveWorld's network only... There
# are other ways that may be more effective for other
# networks or setup, refer to MySQL documentation
#

#
# check to see if we have been told to "start"
#

if [ $1 = "start" ]
then

#
# First Flush local binlogs
#
      mysqladmin --socket=/tmp/mysql.sock flush-logs

#
# Then start SQL thread
#

   if [ -f /var/mysql/replicate ]
   then
         mysql --socket=/tmp/$mysql.sock -e "SLAVE START SQL_THREAD;"
   fi

#
# If we are not told to start, then stop
#

else
      mysql --socket=/tmp/mysql.sock -e "SLAVE STOP SQL_THREAD;"
   done
fi








[mysql-admin:/usr/local/bin] root# more iopause
#!/bin/sh
#
#
# Bruce's MySQL Replication Management Script
#
# Suitable for use on LiveWorld's network only... There
# are other ways that may be more effective for other
# networks or setup, refer to MySQL documentation
#

host="insert.your.master.hostname.here"

#
# check to see if we have been told to "start"
#

if [ $1 = "start" ]
then

#
# First Flush remote binlogs
#

      mysqladmin -h $host flush-logs


#
# Then start IO thread if Replication is Authorized
#

   if [ -f /var/mysql/replicate ]
   then
         mysql --socket=/tmp/mysql.sock -e "SLAVE START IO_THREAD;"
   fi

#
# If we are not told to start, then stop
#

else
      mysql --socket=/tmp/mysql.sock "SLAVE STOP IO_THREAD;"
fi



And the emergency stop script:

[mysql-admin:/usr/local/bin] root# more noreplicate
#!/bin/sh
#
#
# Bruce's MySQL Replication Authorization Script
#

#
# The assumption here is that this script must stop replication
# and by removing the file /var/mysql/replicate prevent
# replication from starting again as all scripts used to start
# replication check for the existence of this file
#

rm /var/mysql/replicate

/usr/local/bin/sqlpause stop
/usr/local/bin/iopause stop


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to