There's been some previous discussion of getting rid of the pg_class
columns relpages and reltuples, in favor of having the planner check the
current relation block count directly (RelationGetNumberOfBlocks) and
extrapolate the current tuple count based on the most recently measured
tuples-per-page density.  A couple of past threads are
http://archives.postgresql.org/pgsql-performance/2004-10/msg00367.php
http://archives.postgresql.org/pgsql-general/2004-08/msg01422.php
and the point came up again today:
http://archives.postgresql.org/pgsql-performance/2004-11/msg00401.php
where we were again reminded of the problems incurred by obsolete
estimates.

It occurs to me that we could get most of the bang for the buck without
making any incompatible changes: just leave the existing fields in place
but make the planner use reltuples-divided-by-relpages as the density
estimate.  That is, in place of where we have

        rel->pages = relation->rd_rel->relpages;
        rel->tuples = relation->rd_rel->reltuples;

in plancat.c, just do

        rel->pages = RelationGetNumberOfBlocks(relation);
        if (relation->rd_rel->relpages > 0)
            density = relation->rd_rel->reltuples / relation->rd_rel->relpages;
        else
            density = some_default_estimate;
        rel->tuples = round(rel->pages * density);

In addition to this we'd perhaps want to hack VACUUM so that when the
table is empty, it doesn't simply zero out relpages/reltuples, but
somehow preserves the previous density value so we don't have to fall
back to the default density estimate.  (This of course assumes that we
will refill the table with a density roughly similar to the last meaured
density; which might be wrong but it's still better than just using a
default, I think.)  One way to do that is to set relpages = zero
(truthfully) but set reltuples to the previously estimated density
(we can do this because it's already a float field).  It might look a
little funny to have nonzero reltuples when relpages is zero, but I
think it wouldn't break anything.  Then the above logic becomes

        rel->pages = RelationGetNumberOfBlocks(relation);
        if (relation->rd_rel->relpages > 0)
            density = relation->rd_rel->reltuples / relation->rd_rel->relpages;
        else if (relation->rd_rel->reltuples > 0)  /* already a density */
            density = relation->rd_rel->reltuples;
        else
            density = some_default_estimate;
        rel->tuples = round(rel->pages * density);

A variant of this is to set reltuples = density, relpages = 1 instead
of 0, which makes the relpages value a lie but would be even less likely
to confuse client-side code.

Comments?  Does this seem like something reasonable to do for 8.0?

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Reply via email to