Sean Chittenden <[EMAIL PROTECTED]> said: >> Here is the analysis of the query: >> --------- >> explain analyze SELECT distinct ip FROM rawdata WHERE sid='2'; > > <puts_on_database_consulting_hat/>
Ah, the hat is XHTML 1.0 Strict, I see ;-) > The selectivity of your qual is terrible. sid=2::SMALLINT is returning > half of the table as a potential match! It should be no surprise then > that the planner decided to do a sequential scan. You have a *lot* of > duplicate IP addresses in that table. [snip] > different table that uniquifies the IPs that way the database only has > to go over a small number of IPs that are unique instead of a large [snip] This never came up in our discussions of whether to use one or two tables for fastheaders (one table in key/value pairs, or two tables in id/key + keyid/value relationship)... but it seems to me that we'd have a problem if 10% of the keys were From, another 10% To, then Subject, Date, Received... and then the last 10% a bunch of random weird headers, like from spam checkers or whatever else. (Obviously these are example figures, there are many more common headers). Trying to search for all keys in a given mailbox with such and such criteria might also trigger a sequential scan of death, no? Aaron