Re: [PERFORM] stats collector suddenly causing lots of IO

Scott Carey Fri, 16 Apr 2010 10:22:45 -0700

On Apr 16, 2010, at 9:48 AM, Tom Lane wrote:

> Josh Kupershmidt <schmi...@gmail.com> writes:
>> On Fri, Apr 16, 2010 at 11:41 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> Wow.  Well, we have a smoking gun here: for some reason, autovacuum
>>> isn't running, or isn't doing its job if it is.  If it's not running
>>> at all, that would explain failure to prune the stats collector's file
>>> too.
> 
>> Hrm, well autovacuum is at least trying to do work: it's currently
>> stuck on those bloated pg_catalog tables, of course. Another developer
>> killed an autovacuum of pg_attribute (or maybe it was pg_attrdef)
>> after it had been running for two weeks. See current pg_stat_activity
>> output attached, which shows the three autovacuum workers running plus
>> two manual VACUUM ANALYZEs I started yesterday.
> 
> Two weeks?  What have you got the autovacuum cost delays set to?
> 
> Once you're up to three AV workers, no new ones can get launched until
> one of those finishes or is killed.  So that would explain failure to
> prune the stats collector's tables (the tabpurge code is only run during
> AV worker launch).  So what we need to figure out is why it's taking so
> obscenely long to vacuum these tables ...
>


On any large system with good I/O I have had to significantly increase the 
aggressiveness of autovacuum.
Even with the below settings, it doesn't interfere with other activity 
(~2200iops random, ~900MB/sec sequential capable I/O).

My relevant autovacuum parameters are (from 'show *'):
 autovacuum                      | on                                    | 
Starts the autovacuum subprocess.
 autovacuum_analyze_scale_factor | 0.1                                   | 
Number of tuple inserts, updates or deletes prior to analyze as a fraction of 
reltuples.
 autovacuum_analyze_threshold    | 50                                    | 
Minimum number of tuple inserts, updates or deletes prior to analyze.
 autovacuum_freeze_max_age       | 200000000                             | Age 
at which to autovacuum a table to prevent transaction ID wraparound.
 autovacuum_max_workers          | 3                                     | Sets 
the maximum number of simultaneously running autovacuum worker processes.
 autovacuum_naptime              | 1min                                  | Time 
to sleep between autovacuum runs.
 autovacuum_vacuum_cost_delay    | 20ms                                  | 
Vacuum cost delay in milliseconds, for autovacuum.
 autovacuum_vacuum_cost_limit    | 2000                                  | 
Vacuum cost amount available before napping, for autovacuum.
 autovacuum_vacuum_scale_factor  | 0.2                                   | 
Number of tuple updates or deletes prior to vacuum as a fraction of reltuples.
 autovacuum_vacuum_threshold     | 50       




For what it is worth, I just went onto one of my systems -- one with lots of 
partition tables and temp table creation/destruction -- and looked at the 
system tables in question there.

Postgres 8.4, using dt+  (trimmed result below to interesting tables)

   Schema   |          Name           | Type  |  Owner   |    Size    | 
Description 
------------+-------------------------+-------+----------+------------+-------------
 pg_catalog | pg_attrdef              | table | postgres | 195 MB     | 
 pg_catalog | pg_attribute            | table | postgres | 1447 MB    | 
 pg_catalog | pg_class                | table | postgres | 1694 MB    | 
 pg_catalog | pg_constraint           | table | postgres | 118 MB     | 
 pg_catalog | pg_depend               | table | postgres | 195 MB     | 
 pg_catalog | pg_statistic            | table | postgres | 2300 MB    | 
 pg_catalog | pg_type                 | table | postgres | 181 MB     | 


So, I did a vacuum full; reindex table; analyze;   sequence on each of these.  
I wish I could just CLUSTER them but the above works.

now the tables are:
   Schema   |          Name           | Type  |  Owner   |    Size    | 
Description 
------------+-------------------------+-------+----------+------------+-------------
 pg_catalog | pg_attrdef              | table | postgres | 44 MB      | 
 pg_catalog | pg_attribute            | table | postgres | 364 MB     | 
 pg_catalog | pg_class                | table | postgres | 1694 MB    | 
 pg_catalog | pg_constraint           | table | postgres | 118 MB     | 
 pg_catalog | pg_depend               | table | postgres | 195 MB     | 
 pg_catalog | pg_statistic            | table | postgres | 656 MB     | 
 pg_catalog | pg_type                 | table | postgres | 45 MB      | 


I've learned to accept about 50% bloat (2x the compacted size) in postgres as 
just the way it usually is on a busy table, but the 3x and 4x bloat of 
statistic, attrdef, and attribute have me wondering.

I have had some 'idle in transaction' connections hanging out from time to time 
that have caused issues on this machine that could explain the above 
perma-bloat.  That is one thing that could affect the case reported here as 
well.  The worst thing about those, is you can't even force kill those 
connections from within postgres (pg_cancel_backend doesn't work on them, and 
killing them via the OS bounces postgres ...) so you have to hunt down the 
offending client.


>                       regards, tom lane
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] stats collector suddenly causing lots of IO

Reply via email to