I wrote: > 3. Ideally autovacuum would know enough to perform ANALYZEs on > inheritance parents after enough churn has occurred in their child > table(s). I am not entirely clear about a good way to do that. > We could have it just directly force an ANALYZE on parent(s) of any > table it has chosen to ANALYZE, but that might be overkill --- in > particular leading to excess ANALYZEs when several children receive > a lot of updates.
I've been looking at this for a bit, and I think the only reasonable way to do it is to make the pgstats mechanism track the need for ANALYZE on a parent table. A hack like I suggested above would make the autovacuum.c code even messier than it already is, and it seems inevitable that we'd get duplicate analyze actions from different autovac workers. Now, I don't really want to add Yet Another per-table counter to pgstats for this. The stats are big enough already. However, the existing mechanism for triggering ANALYZE looks pretty bogus to me as I look at it now: there's a last_anl_tuples value with a very hazy definition, and what's worse it's being computed off numbers that may be only crude estimates from ANALYZE. What I propose doing is to replace that counter with a "changes_since_analyze" counter, which can be managed very simply: * when a tabstat message comes in, increment changes_since_analyze by the sum of t_tuples_inserted + t_tuples_updated + t_tuples_deleted; * when an analyze report message comes in, reset changes_since_analyze to zero. This gives us a number that is actually pretty credible and can still be compared to the analyze threshold the same as before. I think the current definition dates from before we had accurate insert/delete/update tracking, but now that we have that, we should use it. Now, having done that, what I would suggest doing is having autovacuum propagate the changes_since_analyze count that it sees up to the parent table(s) whenever it does an autoanalyze. (This would require adding a new message type that allows reporting a changes_since_analyze increment independently of inserted/updated/deleted, or else adding changes_since_analyze as an independent field in regular tabstat messages.) In most cases, with the parent table probably smaller than the child tables, this would immediately make the parent a candidate for analyze. That might be overkill, in which case we could try multiplying the count by some sort of derating factor, but getting hold of a good derating factor might be more expensive than it's worth --- I think you'd have to look at all the other children of the same parent to see how big the current one is compared to the rest. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers