Re: [HACKERS] COUNT(*) and index-only scans

Robert Haas Wed, 12 Oct 2011 07:18:45 -0700

On Wed, Oct 12, 2011 at 9:52 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> What bothers me considerably more is the issue about how specific
> queries might see an all-visible fraction that's very substantially
> different from the table's overall ratio, especially in examples such as
> historical-data tables where most of the update and query activity has
> to do with recently-added rows.  I don't see any practical way to attack
> that problem with statistics; we're just going to have to adopt some
> estimation rule.  What I suggest as a first cut for that is: simply
> derate the visibility fraction as the fraction of the table expected to
> be scanned gets smaller.  That is, if the query fetches nearly all of
> the table, take the stored visibility ratio at face value; if it fetches
> only one block, never believe that that will be an all-visible block;
> and in general if we're expecting to read a fraction f of the pages,
> multiply the whole-table visibility ratio by f before using it in the
> cost estimate.  This amounts to assuming that the historical-data case
> is the usual case, but I'm not sure that's unfair.


I don't think that's an unfair assumption -- in fact I think it's
exactly the right assumption -- but I'm worried about how the math
works out with that specific proposal.

- Suppose VACUUM processes the table and makes it all-visible.  Then,
somebody comes along and updates one tuple on every page, making them
all not-all-visible, but not trigger VACUUM because we're nowhere
close the 20% threshold.  Now COUNT(*) will think it should use an
index-scan, but really... not so much.  In fact, even if it's only
that a tuple has been updated on 25% of the pages, we're probably in
trouble.

- Suppose the table has a million rows and we're going to read 100 of
them, or 0.01%.  Now it might appear that a covering index has a
negligible advantage over a non-covering index, but in fact I think we
still want to err on the side of trying to use the covering index.  In
fact, even if we're only reading a single row, we probably still
generally want to pick up the covering index, to cater to the case
where someone is doing primary key fetches against a gigantic table
and hoping that index-only scans will save them from random I/O hell.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COUNT(*) and index-only scans

Reply via email to