Re: database problems.

Devananda Wed, 20 Jul 2005 12:45:35 -0700

Hi Chris,

I have run into this myself before as well, where the partition housingthe bin log filled up. In our case, there weren't failed queries on themaster; the integrity of data on the master was fine, and the slave wassimply out of date, but not error-full. However, even in that case (andit sounds like your case is more extreme than ours was), we had torebuild the slaves. Without *any* log of what queries were run on themaster, we were unable to bring the slave up to date.

Unless you have some other log (mysql's general query log, or if yourapplication logs all queries somewhere independent of mysql), you won'tbe able to restore replication. Even if you do, parsing through such aquery log and figuring out what the slaves have/have not run will bedifficult and error-prone.

I don't understand what would cause DELETE queries to fail; that soundslike it could have caused corruption of your tables on the masteritself, particularly if you are not using a transactional databaseengine. Really, I don't know, I've never seen that error myself.

As per recreating the slaves and restarting replication fresh, there aresome good methods of doing that. If you are using the InnoDB storageengine, then you can easily create a new slave w/o interrupting serviceon the master. See http://www.innodb.com/index.php for commercialsoftware, and see http://dev.mysql.com/doc/mysql/en/mysqldump.html andhttp://dev.mysql.com/doc/mysql/en/backup.html for an explanation of howto use mysqldump utility program to do the same thing if you are runningmysql 4.1.8 or newer.

One final point, I highly recommend creating a monitoring program thatwatches disk usage and the status of replication between your master andslave servers, and removes old binary log files before the disk fillsup. I wrote a perl script that does this simply by executing "FLUSHMASTER LOGS TO 'filename'", and removes only the oldest binary log fileif the disk is above 80% capacity. That way, this dreadful sort ofproblem should never come up again.



Good luck!
Devananda vdv



Chris Knipe wrote:

Hi all,

The moral of the story, is don't run out of disk space, but it's a bit to
late for that now.

A quick scenario.... One master server, two backups replicating from the
master. Our data and bin logs are on two different partitions, and the
partition holding the bin logs, ran out of disk space.  We saw allot of
errors in the mysql log on the master, stating that DELETE queries failed
because it was unable to write this to the bin log.

Question... Why would only DELETE fail?  If it cannot write to the bin log
because it is out of disk space, shouldn't INSERT / UPDATE also fail?

Now, our slaves are going completely crazy right now.  The data is beyond
inconsistent, and we're desperately trying to figure out a way to restore

the replication, without having to manually execute a good couple ofmillion

of DELETE queries on two seperate slaves, OR to take new snapshots from the

master and redo the replication setup. It would SEEM to us that the binlog

has gotten corrupted some time during the lack of disk space.

Thus, I want to know now...
- Generally, our slaves are missing ALLOT of DELETE queries, and the slave
is now failing because it is getting duplicate records.
- Running the slave with skip-errors untill it is up to date, is not a
option.  We NEED the DELETE queries to execute, because certain rows are
DELETED and then RE-INSERTED with new values.  Yes, I know we should use
update, I'm just a administrator, not a programmer / developer.  This is
something that the developers needs to take up.
- *IF* push comes to pull and we need to re-setup the slaves and
replication, is there a way to take a snapshot from the master, WITHOUT
having to shut down the database, OR lock the tables for long periods of
time (We are talking about a DB that executes a good 20 queries per second
on a slow day).
- Can replication be 're-started' from the CURRENT bin-log position on the

master, and if that has been done, can the 'missing' gaps in the two binlog

positions (place of failure and place of current position) be manually /
semi automatically replicated?

I hope there is someone with some wise ideas.... I can use allot of them
right now.

Thanks,
Chris.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: database problems.

Reply via email to