Hi Chris,
I have run into this myself before as well, where the partition housing
the bin log filled up. In our case, there weren't failed queries on the
master; the integrity of data on the master was fine, and the slave was
simply out of date, but not error-full. However, even in that case (and
it sounds like your case is more extreme than ours was), we had to
rebuild the slaves. Without *any* log of what queries were run on the
master, we were unable to bring the slave up to date.
Unless you have some other log (mysql's general query log, or if your
application logs all queries somewhere independent of mysql), you won't
be able to restore replication. Even if you do, parsing through such a
query log and figuring out what the slaves have/have not run will be
difficult and error-prone.
I don't understand what would cause DELETE queries to fail; that sounds
like it could have caused corruption of your tables on the master
itself, particularly if you are not using a transactional database
engine. Really, I don't know, I've never seen that error myself.
As per recreating the slaves and restarting replication fresh, there are
some good methods of doing that. If you are using the InnoDB storage
engine, then you can easily create a new slave w/o interrupting service
on the master. See http://www.innodb.com/index.php for commercial
software, and see http://dev.mysql.com/doc/mysql/en/mysqldump.html and
http://dev.mysql.com/doc/mysql/en/backup.html for an explanation of how
to use mysqldump utility program to do the same thing if you are running
mysql 4.1.8 or newer.
One final point, I highly recommend creating a monitoring program that
watches disk usage and the status of replication between your master and
slave servers, and removes old binary log files before the disk fills
up. I wrote a perl script that does this simply by executing "FLUSH
MASTER LOGS TO 'filename'", and removes only the oldest binary log file
if the disk is above 80% capacity. That way, this dreadful sort of
problem should never come up again.
Good luck!
Devananda vdv
Chris Knipe wrote:
Hi all,
The moral of the story, is don't run out of disk space, but it's a bit to
late for that now.
A quick scenario.... One master server, two backups replicating from the
master. Our data and bin logs are on two different partitions, and the
partition holding the bin logs, ran out of disk space. We saw allot of
errors in the mysql log on the master, stating that DELETE queries failed
because it was unable to write this to the bin log.
Question... Why would only DELETE fail? If it cannot write to the bin log
because it is out of disk space, shouldn't INSERT / UPDATE also fail?
Now, our slaves are going completely crazy right now. The data is beyond
inconsistent, and we're desperately trying to figure out a way to restore
the replication, without having to manually execute a good couple of
million
of DELETE queries on two seperate slaves, OR to take new snapshots from the
master and redo the replication setup. It would SEEM to us that the bin
log
has gotten corrupted some time during the lack of disk space.
Thus, I want to know now...
- Generally, our slaves are missing ALLOT of DELETE queries, and the slave
is now failing because it is getting duplicate records.
- Running the slave with skip-errors untill it is up to date, is not a
option. We NEED the DELETE queries to execute, because certain rows are
DELETED and then RE-INSERTED with new values. Yes, I know we should use
update, I'm just a administrator, not a programmer / developer. This is
something that the developers needs to take up.
- *IF* push comes to pull and we need to re-setup the slaves and
replication, is there a way to take a snapshot from the master, WITHOUT
having to shut down the database, OR lock the tables for long periods of
time (We are talking about a DB that executes a good 20 queries per second
on a slow day).
- Can replication be 're-started' from the CURRENT bin-log position on the
master, and if that has been done, can the 'missing' gaps in the two bin
log
positions (place of failure and place of current position) be manually /
semi automatically replicated?
I hope there is someone with some wise ideas.... I can use allot of them
right now.
Thanks,
Chris.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]