Here is the analysis of the query:
---------
explain analyze SELECT distinct ip FROM rawdata WHERE sid='2';
<puts_on_database_consulting_hat/>
The selectivity of your qual is terrible. sid=2::SMALLINT is returning
half of the table as a potential match! It should be no surprise then
that the planner decided to do a sequential scan. You have a *lot* of
duplicate IP addresses in that table.
Look at the estimated rows (566130) and actual rows (2232934) for the
sid qual. You could potentially improve the estimates by increasing
the stats target for your sid column.
ALTER TABLE tbl ALTER COLUMN sid SET STATISTICS 20;
Assuming you're at the default of 10. Increase them by 2x at a time
until you get something that gives you a good estimate. But you'll
likely never get the planner to use anything but a sequential scan
given the number of rows it has to examine.
You'll likely need to use a stored procedure and shovel off IPs to a
different table that uniquifies the IPs that way the database only has
to go over a small number of IPs that are unique instead of a large
number of IPs that contains duplicates. It'd be faster potentially for
SELECTs if you're doing this frequently, but you're essentially asking
for a sequential scan given your data set.
Hope that helps. If you'd like more assistance than what's above, send
me an email off list. -sc
--
Sean Chittenden