Dne 28.12.2010 00:04, Kevin Grittner napsal(a): > Tom Lane <t...@sss.pgh.pa.us> wrote: > >> Well, first, those scans occur only once every few hundred million >> transactions, which is not likely a suitable timescale for >> maintaining statistics. > > I was assuming that the pass of the entire table was priming for the > incremental updates described at the start of this thread. I'm not > clear on how often the base needs to be updated for the incremental > updates to keep the numbers "close enough".
Well, that really depends on the workload. If you never remove all occurences of a given value (e.g. a ZIP code), you don't need to rescan the table at all. All you need is to build the stats once and then update them incrementally - and I belive this could be handled by autovacuum. >> And second, we keep on having discussions about rejiggering >> the whole tuple-freezing strategy. Even if piggybacking on those >> scans looked useful, it'd be unwise to assume it'll continue to >> work the same way it does now. > > Sure, it might need to trigger its own scan in the face of heavy > deletes anyway, since the original post points out that the > algorithm handles inserts better than deletes, but as long as we > currently have some sequential pass of the data, it seemed sane to > piggyback on it when possible. And maybe we should be considering > things like this when we weigh the pros and cons of rejiggering. > This issue of correlated values comes up pretty often.... Again, there are two types of stats - one of them needs to scan the whole table (estimate of distinct values), the other one does not (multi-dimentional histograms). These two cases are independent - you don't necessarily need both. Better ndistinct estimates would actually solve some of the current issues, it's not just a matter of cross-column stats. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers