Here is the analysis of the query:
---------
explain analyze SELECT distinct ip FROM rawdata WHERE sid='2';

<puts_on_database_consulting_hat/>

The selectivity of your qual is terrible. sid=2::SMALLINT is returning half of the table as a potential match! It should be no surprise then that the planner decided to do a sequential scan. You have a *lot* of duplicate IP addresses in that table.

Look at the estimated rows (566130) and actual rows (2232934) for the sid qual. You could potentially improve the estimates by increasing the stats target for your sid column.

ALTER TABLE tbl ALTER COLUMN sid SET STATISTICS 20;

Assuming you're at the default of 10. Increase them by 2x at a time until you get something that gives you a good estimate. But you'll likely never get the planner to use anything but a sequential scan given the number of rows it has to examine.

You'll likely need to use a stored procedure and shovel off IPs to a different table that uniquifies the IPs that way the database only has to go over a small number of IPs that are unique instead of a large number of IPs that contains duplicates. It'd be faster potentially for SELECTs if you're doing this frequently, but you're essentially asking for a sequential scan given your data set.

Hope that helps. If you'd like more assistance than what's above, send me an email off list. -sc

--
Sean Chittenden

Reply via email to