Josh Berkus wrote:
Or should it perhaps be a per-table counter in pg_stat_user_tables,
given your statement above?
Or even a timestamp: last_autovacuum_attempt, which would record the
last time autovacuum was tried. If that's fairly recent and you have a
large number of dead rows, you know what kind of problem you have and
can turn on debug.
These are both reasonable ideas. But there was just some kickback on
Tomas's "keeping timestamp of the lasts stats reset" patch recently,
from the perspective of trying to limit per-table stats bloat. I think
it's relatively easy to make a case that this situation is difficult
enough to diagnose that a little bit of extra low-level logging is
worthwhile. That Josh and I have both been bit by it enough to be
thinking about patches to make it easier to diagnost suggests it's
obviously too hard to nail down. But is this so common and difficult to
recognize that it's worth making all the table stats bigger? That's a
harder call.
It's already possible to detect the main symptom--dead row percentage is
much higher than the autovacuum threshold, but there's been no recent
autovacuum. That makes me less enthusiastic that there's such a genuine
need to justify the overhead of storing more table stats just to detect
the same thing a little more easily. I've been playing with the Munin
PG plug-in more recently, and I was just thinking of adding a dead row
trend graph/threshold to it to address this general area instead.
We could argue both sides of the trade-off of tracking this directly in
stats for some time, and I'd never expect there to be a clear victory
for either perspective. I've run into this vacuum problem a few times,
but certainly less than I've run into "why is the stats table so huge?"
--
Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books