"Zeugswetter Andreas SB SD" <[EMAIL PROTECTED]> writes:
> Imho one of the biggest sources for problems is people creating new
> indexes on populated tables when the rest of the db/table has badly
> outdated statistics or even only default statistics in place.
> In this situation the optimizer is badly misguided, because it now
> sees completely inconsistent statistics to work on.
> (e.g. old indexes on that table may seem way too cheap compared 
> to table scan) 

I don't think any of this is correct.  We don't have per-index
statistics.  The only stats updated by CREATE INDEX are the same ones
updated by plain VACUUM, viz the number-of-tuples and number-of-pages
counts in pg_class.  I believe it's reasonable to update those stats
more often than the pg_statistic stats (in fact, if we could keep them
constantly up-to-date at a reasonable cost, we'd do so).  The
pg_statistic stats are designed as much as possible to be independent
of the absolute number of rows in the table, so that it's okay if they
are out of sync with the pg_class stats.

The major reason why "you vacuumed but you never analyzed" is such a
killer is that in the absence of any pg_statistic data, the default
selectivity estimates are such that you may get either an index or seq
scan depending on how big the table is.  The cost estimates are
nonlinear (correctly so, IMHO, though I wouldn't necessarily defend the
exact shape of the curve) and ye olde default 0.01 will give you an
indexscan for a small table but not for a big one.  In 7.2 I have
reduced the default selectivity estimate to 0.005, for a number of
reasons but mostly to get it out of the range where the decision will
flip-flop.  Observe:

test71=# create table foo (f1 int);
CREATE
test71=# create index fooi on foo(f1);
CREATE
test71=# explain select * from foo where f1 = 42;
NOTICE:  QUERY PLAN:

Index Scan using fooi on foo  (cost=0.00..8.14 rows=10 width=4)

EXPLAIN
test71=# select reltuples,relpages from pg_class where relname = 'foo';
 reltuples | relpages
-----------+----------
      1000 |       10
(1 row)

EXPLAIN
test71=# update pg_class set reltuples = 100000, relpages = 1000 where relname = 'foo';
UPDATE 1
test71=# explain select * from foo where f1 = 42;
NOTICE:  QUERY PLAN:

Index Scan using fooi on foo  (cost=0.00..1399.04 rows=1000 width=4)

EXPLAIN
test71=# update pg_class set reltuples = 1000000, relpages = 10000 where relname = 
'foo';
UPDATE 1
test71=# explain select * from foo where f1 = 42;
NOTICE:  QUERY PLAN:

Seq Scan on foo  (cost=0.00..22500.00 rows=10000 width=4)

EXPLAIN
test71=#

In current sources you keep getting an indexscan as you increase the
number of tuples...

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Reply via email to