On Sun, Jun 7, 2009 at 3:24 PM, Tom Lane<t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> Am I wrong to be frightened by the implications of updating this value >> only once in a blue moon? > > It's not great, but I think it's probably not catastrophic either. > Keep in mind that all we need from reltuples is that the ratio > reltuples/relpages be a reasonable estimate of the density of live > tuples, because what the planner actually uses is > GetRelationNumberOfBlocks() * reltuples / relpages. > So for example an append-only table isn't a big problem, even if it's > been quite a while since we updated reltuples and relpages.
My first reaction was to be relieved by this explanation, but on further thought I'm not sure I was right to be relieved. The frequency of anti-wraparound vacuums is so low that it doesn't seem inconceivable that this ratio could change considerably on a far shorter time scale. (For me, at least in some cases, it's less than the frequency with which I dump+reload for maintenance reasons like OS upgrade, PG upgrade, HW upgrade...) > There was some mention of having a partial vacuum extrapolate a value of > reltuples and update pg_class with that. I'm afraid that that could be > a seriously bad idea; because there is no very good reason to suppose > that the subset of recently-modified pages forms a good sample of the > whole table as far as live-tuple density goes. I think you're right about that. > [ thinks a bit and reads the code some more ... ] There is a > considerably safer alternative, which is to let ANALYZE update the > reltuples estimate based on the pages it sampled; which should be a > considerably less biased sample than the pages a partial vacuum would > have looked at. And we have already got the code doing that, either > in a standalone ANALYZE or an ANALYZE attached to a vacuum that turned > out to be partial. I'm confused with how this squares with the previous discussion - I thought you observed a case where this wasn't happening. > So actually I think we are in better shape than I realized, so far as > the heap reltuples numbers go. The case that's a bit nasty is where > we are propagating the heap reltuples number to the index reltuples > number for a GIN index. (Remember this only matters for a partial > index.) As the code stands today, what we'll be propagating is the > reltuples estimate from the most recent ANALYZE, not the ANALYZE that > we might be about to conduct. This is not great; but considering that > we are completely ignoring the first-order problem of the partial index > predicate's selectivity, quibbling about a second-order effect like the > estimate being out of date is pretty pointless. Being one ANALYZE behind shouldn't be too bad. >> Do we have any reasonable manual way of forcing >> VACUUM to scan the entire heap? > > You could use VACUUM FREEZE, for instance. That'll generate a fair amount of I/O. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers