Hello,
Redhat EL4 update 8, 2.6.9-89.0.23.ELsmp
Quad Proc, Dual Core Xeon, 16GB RAM
Postgres 8.1.18
I'm having some trouble pinning down exactly what is causing our
Postgres cluster to run slowly. After some initial investigation, I
noticed that the disk write activity is consistently high, and (if I'm
reading the output of dstat correctly) the majority of it is being
caused by the stats collector process. Here's a snippet of what I
typically see in dstat.
./dstat -cd --top-bio
----total-cpu-usage---- -dsk/total- --------most-expensive---------
usr sys idl wai hiq siq| read writ| block i/o process
2 2 87 10 0 0|3964k 19M|postgres: stats coll 0 35M
2 1 85 12 0 0|4612k 20M|postgres: stats coll 0 18M
2 2 85 11 0 0|2360k 36M|postgres: stats coll 0 24M
1 2 83 14 0 0|1564k 36M|postgres: stats coll 0 29M
1 1 84 13 0 0|5556k 21M|postgres: stats coll 0 20M
2 2 82 14 0 0| 10M 19M|postgres: stats coll 0 33M
2 1 87 10 0 0|9864k 35M|postgres: stats coll 0 24M
2 2 87 10 0 0| 10M 19M|postgres: stats coll 0 29M
2 1 86 11 0 0| 10M 19M|postgres: stats coll 0 24M
3 2 84 12 0 0|8096k 19M|postgres: stats coll 0 29M
2 1 86 10 0 0|5432k 33M|postgres: stats coll 0 32M
2 2 86 10 0 0|9200k 19M|postgres: stats coll 0 21M
2 1 82 14 0 0|3344k 34M|postgres: stats coll 0 21M
2 2 86 11 0 0|8600k 19M|postgres: stats coll 0 31M
2 1 82 15 0 0|5392k 19M|postgres: stats coll 0 29M
If there are no queries going on, then the disk usage is virtually
nothing, but it only takes a query or two to make it shoot up to this
level. I have the following stats related options enabled in postgresql.conf
stats_command_string = on
stats_row_level = on
When I disabled stats_row_level (and even stats_command_string, I
believe) and restarted, I was still seeing some high disk I/O. If I
disable stats_start_collector, I'm pretty sure the I/O dropped
completely off (I can't verify right now since I'd need a maintenance
window). However, this make Postgres unable to keep track of database
connections/queries in pg_stat_activity, which is very important for us.
The odd thing is that when I was playing around with these options, I
restarted multiple times to apply them, eventually ending back where I
started, but after the final restart, the disk I/O actually dropped to
reasonable levels. This lasted for about a day, then went back up to
it's current levels (and once again showing the stats collector at the top).
I saw some previous posts with similar conditions (but different
Postgres version, high CPU load, not disk I/O, etc).
http://archives.postgresql.org/pgsql-performance/2010-04/msg00163.php
http://archives.postgresql.org/pgsql-general/2010-01/msg01076.php
http://archives.postgresql.org/pgsql-performance/2009-06/msg00088.php
I don't think there are a lot of CREATE/DROP table statements, but I do
know there are some larger update queries that run inside transactions
(large in the sense of data they have to read, not the number of
queries). Autovacuum is enabled on the server, and I haven't tweaked any
settings from the defaults. My $PGDATA/global/pgstat.stat file is about
18MB, if that helps. Does it really rewrite this entire file every
500ms? Alvaro suggested resetting the stats, but I'm having trouble
figuring out how to do that. Seems like pg_stat_reset() is post- 8.1 ...?
I have a strong suspicion it's ultimately due to some usage pattern of
the database, but I'm not sure what it could be. What type of operations
would typically cause the stats collector to be doing this much writing
to the filesystem? Is there any way to "see" what it's writing? Are
there other config options that can/should be tweaked to help this?
Thanks.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general