"Zeugswetter Andreas SB SD" <[EMAIL PROTECTED]> writes: > Imho one of the biggest sources for problems is people creating new > indexes on populated tables when the rest of the db/table has badly > outdated statistics or even only default statistics in place. > In this situation the optimizer is badly misguided, because it now > sees completely inconsistent statistics to work on. > (e.g. old indexes on that table may seem way too cheap compared > to table scan)
I don't think any of this is correct. We don't have per-index statistics. The only stats updated by CREATE INDEX are the same ones updated by plain VACUUM, viz the number-of-tuples and number-of-pages counts in pg_class. I believe it's reasonable to update those stats more often than the pg_statistic stats (in fact, if we could keep them constantly up-to-date at a reasonable cost, we'd do so). The pg_statistic stats are designed as much as possible to be independent of the absolute number of rows in the table, so that it's okay if they are out of sync with the pg_class stats. The major reason why "you vacuumed but you never analyzed" is such a killer is that in the absence of any pg_statistic data, the default selectivity estimates are such that you may get either an index or seq scan depending on how big the table is. The cost estimates are nonlinear (correctly so, IMHO, though I wouldn't necessarily defend the exact shape of the curve) and ye olde default 0.01 will give you an indexscan for a small table but not for a big one. In 7.2 I have reduced the default selectivity estimate to 0.005, for a number of reasons but mostly to get it out of the range where the decision will flip-flop. Observe: test71=# create table foo (f1 int); CREATE test71=# create index fooi on foo(f1); CREATE test71=# explain select * from foo where f1 = 42; NOTICE: QUERY PLAN: Index Scan using fooi on foo (cost=0.00..8.14 rows=10 width=4) EXPLAIN test71=# select reltuples,relpages from pg_class where relname = 'foo'; reltuples | relpages -----------+---------- 1000 | 10 (1 row) EXPLAIN test71=# update pg_class set reltuples = 100000, relpages = 1000 where relname = 'foo'; UPDATE 1 test71=# explain select * from foo where f1 = 42; NOTICE: QUERY PLAN: Index Scan using fooi on foo (cost=0.00..1399.04 rows=1000 width=4) EXPLAIN test71=# update pg_class set reltuples = 1000000, relpages = 10000 where relname = 'foo'; UPDATE 1 test71=# explain select * from foo where f1 = 42; NOTICE: QUERY PLAN: Seq Scan on foo (cost=0.00..22500.00 rows=10000 width=4) EXPLAIN test71=# In current sources you keep getting an indexscan as you increase the number of tuples... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly