Hi, On Wednesday, August 15, 2012 12:10:42 PM val...@gmail.com wrote: > The following bug has been logged on the website: > > Bug reference: 7494 > Logged by: Valentine Gogichashvili > Email address: val...@gmail.com > PostgreSQL version: 9.0.7 > Operating system: Linux version 2.6.32-5-amd64 (Debian 2.6.32-41) > Description: > > We are experiencing strange(?) behavior on the replication slave machines. > The master machine has a very heavy update load, where many processes are > updating lots of data. It generates up to 30GB of WAL files per hour. > Normally it is not a problem for the slave machines to replay this amount > of WAL files on time and keep on with the master. But at some moments, the > slaves are “hanging” with 100% CPU usage on the WAL replay process and 3% > IOWait, needing up to 30 seconds to process one WAL file. If this tipping > point is reached, then a huge WAL replication lag is building up quite > fast, that also leads to overfill of the XLOG directory on the slave > machines, as the WAL receiver is putting the WAL files it gets via > streaming replication the XLOG directory (that, in many cases are quite a > limited size separate disk partition). Could you try to get a profile of that 100% cpu time?
Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs