One cause of heavy replication lag we noticed was due to a misbehaving 
application blasting updates (and commits) onto the master InnoDB tables from 
multiple clients. Since slave replication is single-threaded, it couldn't keep 
up I/O-wise, while the master seemed to show reasonably low load throughout. 

The temporary fix was to just set innodb_flush_log_at_trx_commit = 2 to only 
flush the log file to disk once every second. Result was the lag went from 
5,000 seconds behind and climbing to 0 in literally seconds, and the slave 
load dropped way below 1 again.

The catch (there's always one, of course) is if the server crashes, you could 
lose up to 1 seconds' worth of uncommitted transactions.

Howard
________________________________________
From: Claudio Nanni [claudio.na...@gmail.com]
Sent: Sunday, October 23, 2011 2:27 PM
To: Tyler Poland
Cc: mysql@lists.mysql.com
Subject: Re: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, and 
Recovers Without Intervention

Luis,

Very hard to tackle.
In my experience, excluding external(to mysql) bottlenecks, like hardware,
o.s. etc, 'suspects' are the shared resources 'guarded' by unique mutexes,
like on the query cache or key cache.
Since you do not use MySQL it cannot be the key cache. Since you use percona
the query cache is disabled by default.
You should go a bit lower level and catch the system calls with one of the
tools you surely know to see if there are waits on the semaphores.

I also would like to tell that the 'seconds behind master' reported by the
slave is not reliable.

Good luck!

Claudio

2011/10/23 Tyler Poland <tpol...@engineyard.com>

> Luis,
>
> How large is your database?  Have you checked for an increase in write
> activity on the master leading up to this? Are you running a backup against
> the replica?
>
> Thank you,
> Tyler
>
> Sent from my Droid Bionic
> On Oct 23, 2011 5:40 AM, "Luis Motta Campos" <luismottacam...@yahoo.co.uk>
> wrote:
>
> > Fellow DBAs and MySQL Users
> >
> > [apologies for eventual duplicates - I've posted this to
> > percona-discuss...@googlegroups.com also]
> >
> > I've been hunting an issue with my database cluster for several months
> now
> > without much success. Maybe I'm overlooking something here.
> >
> > I've been observing the database slowing down and lagging behind for
> > thousands of seconds (sometimes over the course of several days) even
> > without any query load besides replication itself.
> >
> > I am running Percona MySQL 5.1.51 (InnoDB plug-in version 1.12) on Dell
> > R710 (6 x 3.5 inch 15K RPM disks in RAID10; 24GB RAM; 2x Quad-core Intel
> > processors) running Debian Lenny. MySQL data, binary logs, relay logs,
> > innodb log files are on separated partitions from each other, on a RAID
> > system separated from the operating system disks.
> >
> > Default Storage Engine is InnoDB, and the usual InnoDB memory structures
> > are stable and look healthy.
> >
> > I have about 500 (read) queries per second on average, and about 10% of
> > this as writes on the master.
> >
> > I've been observing something that looks like between 6 and 10 pending
> > reads per second uniformly on my cacti graphs.
> >
> > The issue is characterized by the server suddenly slowing down writes
> > without any previous warning or change, and lagging behind for several
> > thousand seconds (triggering all sorts of alerts on my monitoring
> system). I
> > don't observe extra CPU activity, just a reduced disk access ratio (from
> > about 5-6MB/s to 500KB/s) and replication lagging. I could correlate it
> > neither InnoDB hashing activity, nor with long-running-queries, nor with
> > background read/write thread activities.
> >
> > I don't have any clues of what is causing this behavior, and I'm unable
> to
> > reproduce it under controlled conditions. I've observed the issue both on
> > severs with and without workload (apart from the usual replication load).
> I
> > am sure no changes were applied to the server or to the cluster.
> >
> > I'm looking forward for suggestions and theories on the issue - all ideas
> > are welcome.
> > Thank you for your time and attention,
> > Kind regards,
> > --
> > Luis Motta Campos
> > is a DBA, Foodie, and Photographer
> >
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:
> > http://lists.mysql.com/mysql?unsub=tpol...@engineyard.com
> >
> >
>



--
Claudio

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to