Sean Chittenden <[EMAIL PROTECTED]> said:

>> Here is the analysis of the query:
>> ---------
>> explain analyze SELECT distinct ip FROM rawdata WHERE sid='2';
> 
> <puts_on_database_consulting_hat/>

Ah, the hat is XHTML 1.0 Strict, I see ;-)

> The selectivity of your qual is terrible.  sid=2::SMALLINT is returning 
> half of the table as a potential match!  It should be no surprise then 
> that the planner decided to do a sequential scan.  You have a *lot* of 
> duplicate IP addresses in that table.
[snip]
> different table that uniquifies the IPs that way the database only has 
> to go over a small number of IPs that are unique instead of a large 
[snip]

This never came up in our discussions of whether to use one or two tables
for fastheaders (one table in key/value pairs, or two tables in id/key +
keyid/value relationship)... but it seems to me that we'd have a problem
if 10% of the keys were From, another 10% To, then Subject, Date,
Received... and then the last 10% a bunch of random weird headers, like
from spam checkers or whatever else. (Obviously these are example figures,
there are many more common headers). Trying to search for all keys in a
given mailbox with such and such criteria might also trigger a sequential
scan of death, no?

Aaron

Reply via email to