One cause of heavy replication lag we noticed was due to a misbehaving application blasting updates (and commits) onto the master InnoDB tables from multiple clients. Since slave replication is single-threaded, it couldn't keep up I/O-wise, while the master seemed to show reasonably low load throughout.
The temporary fix was to just set innodb_flush_log_at_trx_commit = 2 to only flush the log file to disk once every second. Result was the lag went from 5,000 seconds behind and climbing to 0 in literally seconds, and the slave load dropped way below 1 again. The catch (there's always one, of course) is if the server crashes, you could lose up to 1 seconds' worth of uncommitted transactions. Howard ________________________________________ From: Claudio Nanni [claudio.na...@gmail.com] Sent: Sunday, October 23, 2011 2:27 PM To: Tyler Poland Cc: mysql@lists.mysql.com Subject: Re: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, and Recovers Without Intervention Luis, Very hard to tackle. In my experience, excluding external(to mysql) bottlenecks, like hardware, o.s. etc, 'suspects' are the shared resources 'guarded' by unique mutexes, like on the query cache or key cache. Since you do not use MySQL it cannot be the key cache. Since you use percona the query cache is disabled by default. You should go a bit lower level and catch the system calls with one of the tools you surely know to see if there are waits on the semaphores. I also would like to tell that the 'seconds behind master' reported by the slave is not reliable. Good luck! Claudio 2011/10/23 Tyler Poland <tpol...@engineyard.com> > Luis, > > How large is your database? Have you checked for an increase in write > activity on the master leading up to this? Are you running a backup against > the replica? > > Thank you, > Tyler > > Sent from my Droid Bionic > On Oct 23, 2011 5:40 AM, "Luis Motta Campos" <luismottacam...@yahoo.co.uk> > wrote: > > > Fellow DBAs and MySQL Users > > > > [apologies for eventual duplicates - I've posted this to > > percona-discuss...@googlegroups.com also] > > > > I've been hunting an issue with my database cluster for several months > now > > without much success. Maybe I'm overlooking something here. > > > > I've been observing the database slowing down and lagging behind for > > thousands of seconds (sometimes over the course of several days) even > > without any query load besides replication itself. > > > > I am running Percona MySQL 5.1.51 (InnoDB plug-in version 1.12) on Dell > > R710 (6 x 3.5 inch 15K RPM disks in RAID10; 24GB RAM; 2x Quad-core Intel > > processors) running Debian Lenny. MySQL data, binary logs, relay logs, > > innodb log files are on separated partitions from each other, on a RAID > > system separated from the operating system disks. > > > > Default Storage Engine is InnoDB, and the usual InnoDB memory structures > > are stable and look healthy. > > > > I have about 500 (read) queries per second on average, and about 10% of > > this as writes on the master. > > > > I've been observing something that looks like between 6 and 10 pending > > reads per second uniformly on my cacti graphs. > > > > The issue is characterized by the server suddenly slowing down writes > > without any previous warning or change, and lagging behind for several > > thousand seconds (triggering all sorts of alerts on my monitoring > system). I > > don't observe extra CPU activity, just a reduced disk access ratio (from > > about 5-6MB/s to 500KB/s) and replication lagging. I could correlate it > > neither InnoDB hashing activity, nor with long-running-queries, nor with > > background read/write thread activities. > > > > I don't have any clues of what is causing this behavior, and I'm unable > to > > reproduce it under controlled conditions. I've observed the issue both on > > severs with and without workload (apart from the usual replication load). > I > > am sure no changes were applied to the server or to the cluster. > > > > I'm looking forward for suggestions and theories on the issue - all ideas > > are welcome. > > Thank you for your time and attention, > > Kind regards, > > -- > > Luis Motta Campos > > is a DBA, Foodie, and Photographer > > > > > > -- > > MySQL General Mailing List > > For list archives: http://lists.mysql.com/mysql > > To unsubscribe: > > http://lists.mysql.com/mysql?unsub=tpol...@engineyard.com > > > > > -- Claudio -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=arch...@jab.org