On Wed, Oct 12, 2011 at 9:52 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > What bothers me considerably more is the issue about how specific > queries might see an all-visible fraction that's very substantially > different from the table's overall ratio, especially in examples such as > historical-data tables where most of the update and query activity has > to do with recently-added rows. I don't see any practical way to attack > that problem with statistics; we're just going to have to adopt some > estimation rule. What I suggest as a first cut for that is: simply > derate the visibility fraction as the fraction of the table expected to > be scanned gets smaller. That is, if the query fetches nearly all of > the table, take the stored visibility ratio at face value; if it fetches > only one block, never believe that that will be an all-visible block; > and in general if we're expecting to read a fraction f of the pages, > multiply the whole-table visibility ratio by f before using it in the > cost estimate. This amounts to assuming that the historical-data case > is the usual case, but I'm not sure that's unfair.
I don't think that's an unfair assumption -- in fact I think it's exactly the right assumption -- but I'm worried about how the math works out with that specific proposal. - Suppose VACUUM processes the table and makes it all-visible. Then, somebody comes along and updates one tuple on every page, making them all not-all-visible, but not trigger VACUUM because we're nowhere close the 20% threshold. Now COUNT(*) will think it should use an index-scan, but really... not so much. In fact, even if it's only that a tuple has been updated on 25% of the pages, we're probably in trouble. - Suppose the table has a million rows and we're going to read 100 of them, or 0.01%. Now it might appear that a covering index has a negligible advantage over a non-covering index, but in fact I think we still want to err on the side of trying to use the covering index. In fact, even if we're only reading a single row, we probably still generally want to pick up the covering index, to cater to the case where someone is doing primary key fetches against a gigantic table and hoping that index-only scans will save them from random I/O hell. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers