Hello,

Redhat EL4 update 8, 2.6.9-89.0.23.ELsmp
Quad Proc, Dual Core Xeon, 16GB RAM
Postgres 8.1.18

I'm having some trouble pinning down exactly what is causing our Postgres cluster to run slowly. After some initial investigation, I noticed that the disk write activity is consistently high, and (if I'm reading the output of dstat correctly) the majority of it is being caused by the stats collector process. Here's a snippet of what I typically see in dstat.

./dstat -cd --top-bio
----total-cpu-usage---- -dsk/total- --------most-expensive---------
usr sys idl wai hiq siq| read  writ|      block i/o process
 2   2  87  10   0   0|3964k   19M|postgres: stats coll   0    35M
 2   1  85  12   0   0|4612k   20M|postgres: stats coll   0    18M
 2   2  85  11   0   0|2360k   36M|postgres: stats coll   0    24M
 1   2  83  14   0   0|1564k   36M|postgres: stats coll   0    29M
 1   1  84  13   0   0|5556k   21M|postgres: stats coll   0    20M
 2   2  82  14   0   0|  10M   19M|postgres: stats coll   0    33M
 2   1  87  10   0   0|9864k   35M|postgres: stats coll   0    24M
 2   2  87  10   0   0|  10M   19M|postgres: stats coll   0    29M
 2   1  86  11   0   0|  10M   19M|postgres: stats coll   0    24M
 3   2  84  12   0   0|8096k   19M|postgres: stats coll   0    29M
 2   1  86  10   0   0|5432k   33M|postgres: stats coll   0    32M
 2   2  86  10   0   0|9200k   19M|postgres: stats coll   0    21M
 2   1  82  14   0   0|3344k   34M|postgres: stats coll   0    21M
 2   2  86  11   0   0|8600k   19M|postgres: stats coll   0    31M
 2   1  82  15   0   0|5392k   19M|postgres: stats coll   0    29M

If there are no queries going on, then the disk usage is virtually nothing, but it only takes a query or two to make it shoot up to this level. I have the following stats related options enabled in postgresql.conf

stats_command_string = on
stats_row_level = on

When I disabled stats_row_level (and even stats_command_string, I believe) and restarted, I was still seeing some high disk I/O. If I disable stats_start_collector, I'm pretty sure the I/O dropped completely off (I can't verify right now since I'd need a maintenance window). However, this make Postgres unable to keep track of database connections/queries in pg_stat_activity, which is very important for us. The odd thing is that when I was playing around with these options, I restarted multiple times to apply them, eventually ending back where I started, but after the final restart, the disk I/O actually dropped to reasonable levels. This lasted for about a day, then went back up to it's current levels (and once again showing the stats collector at the top).

I saw some previous posts with similar conditions (but different Postgres version, high CPU load, not disk I/O, etc). http://archives.postgresql.org/pgsql-performance/2010-04/msg00163.php
http://archives.postgresql.org/pgsql-general/2010-01/msg01076.php
http://archives.postgresql.org/pgsql-performance/2009-06/msg00088.php

I don't think there are a lot of CREATE/DROP table statements, but I do know there are some larger update queries that run inside transactions (large in the sense of data they have to read, not the number of queries). Autovacuum is enabled on the server, and I haven't tweaked any settings from the defaults. My $PGDATA/global/pgstat.stat file is about 18MB, if that helps. Does it really rewrite this entire file every 500ms? Alvaro suggested resetting the stats, but I'm having trouble figuring out how to do that. Seems like pg_stat_reset() is post- 8.1 ...?

I have a strong suspicion it's ultimately due to some usage pattern of the database, but I'm not sure what it could be. What type of operations would typically cause the stats collector to be doing this much writing to the filesystem? Is there any way to "see" what it's writing? Are there other config options that can/should be tweaked to help this?

Thanks.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to