Simon Waters wrote: The best advice is to "upgrade at your earliest convenience" with performance questions and 7.4 - you're missing a *lot* of improvements. You say you're planning to anyway, and I'd recommend putting effort into the upgrade rather than waste effort on tuning a system you're leaving.
> I assume that the histogram_bounds for strings are alphabetical in order, so > that "DEMOSTART" falls between "DELETE" and "IDEMAIL". Even on a worst case > of including both these common values, the planner ought to have assumed that > less than <10% of records were likely covered by the value selected, so it > seems unlikely to me that not using the index would be a good idea. Well, the real question is how many blocks need to be read to find those DEMOSTART rows. At some point around 5-10% of the table it's easier just to read the whole table than go back and fore between index and table. The precise point will depend on how much RAM you have, disk speeds etc. > => SELECT COUNT(*) FROM log WHERE event='DEMOSTART'; > (...lots of time passes...) > count > ------- > 1432 > (1 row) OK, not many. The crucial bit is below though. These are the 10 values it will hold stats on, and all it knows is that DEMOSTART has less than 57000 entries. OK, it's more complicated than that, but basically there are values it tracks and everything else. So - it assumes that all other values have the same chance of occuring. > => SELECT COUNT(*), event FROM log GROUP BY event ORDER BY count; > > count | event > --------+----------- [snip] > 57022 | NEWUSR > 64907 | PUBREC0 > 65449 | UNPUBLISH > 92843 | LOGOUT > 99018 | KILLSESS > 128900 | UPLOAD > 134994 | LOGIN > 137608 | NEWPAGE > 447556 | PUBREC1 > 489572 | PUBLISH Which is why it guesses 20436 rows below. If you'd done "SET enable_seqscan = off" then run the explain again it should have estimated a cost for the index that was more than 54317.14 > => EXPLAIN SELECT * FROM log WHERE event='DEMOSTART'; > QUERY PLAN > ------------------------------------------------------------ > Seq Scan on log (cost=0.00..54317.14 rows=20436 width=93) > Filter: (event = 'DEMOSTART'::text) > (2 rows) > > > => ALTER TABLE log ALTER COLUMN events SET STATISTICS 50; ANALYSE > LOG(event); > ALTER TABLE > ANALYZE > > > => EXPLAIN SELECT COUNT(*) FROM log WHERE event='DEMOSTART'; > QUERY PLAN > ---------------------------------------------------------------------------- > ------- > Aggregate (cost=5101.43..5101.43 rows=1 width=0) > -> Index Scan using log_event on log (cost=0.00..5098.15 rows=1310 > width=0) > Index Cond: (event = 'DEMOSTART'::text) > (3 rows) Not bad - now it knows how many rows it will find, and it sees that the index is cheaper. It's not completely accurate - it uses a statistical sampling (and of course it's out of date as soon as you update the table). HTH -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance