Also, I'm not having any issue with the database restarting itself, simply becoming unresponsive / slow to respond, to the point where just sshing to the box takes about 30 seconds if not longer. Performing a pg_ctl restart on the cluster resolves the issue.

I looked through the logs for any segmentation faults, none found. In fact the only thing in my log that seems to be 'bad' are the following.

Oct 27 08:53:18 <snip> postgres[17517]: [28932839-1] user=<snip>,db=<snip> ERROR: deadlock detected Oct 27 11:49:22 <snip> postgres[608]: [19-1] user=<snip>,db=<snip> ERROR: could not serialize access due to concurrent update

I don't believe these occurred too close to the slowdown.

- Brian F

On 10/27/2011 02:09 PM, Brian Fehrle wrote:
On 10/27/2011 01:48 PM, Scott Marlowe wrote:
On Thu, Oct 27, 2011 at 12:39 PM, Brian Fehrle
<>  wrote:
Looking at top, I see no SWAP usage, very little IOWait, and there are a large number of postmaster processes at 100% cpu usage (makes sense, at this
point there are 150 or so queries currently executing on the database).

  Tasks: 713 total,  44 running, 668 sleeping,   0 stopped,   1 zombie
Cpu(s):  4.4%us, 92.0%sy,  0.0%ni,  3.0%id,  0.0%wa,  0.0%hi,  0.3%si,
Mem: 134217728k total, 131229972k used, 2987756k free, 462444k buffers Swap: 8388600k total, 296k used, 8388304k free, 119029580k cached
OK, a few points.  1: You've got a zombie process.  Find out what's
causing that, it could be a trigger of some type for this behaviour.
2: You're 92% sys.  That's bad.  It means the OS is chewing up 92% of
your 32 cores doing something.  what tasks are at the top of the list
in top?

Out of the top 50 processes in top, 48 of them are postmasters, one is syslog, and one is psql. Each of the postmasters have a high %CPU, the top ones being 80% and higher, the rest being anywhere between 30% - 60%. Would postmaster 'queries' that are running attribute to the sys CPU usage, or should they be under the 'us' CPU usage?

Try running vmstat 10 for a a minute or so then look at cs and int
columns.  If cs or int is well over 100k there could be an issue with
thrashing, where your app is making some change to the db that
requires all backends to be awoken at once and the machine just falls
over under the load.

We've restarted the postgresql cluster, so the issue is not happening at this moment. but running a vmstat 10 had my 'cs' average at 3K and 'in' averaging around 9.5K.

- Brian F

Sent via pgsql-general mailing list (
To make changes to your subscription:

Reply via email to