I am sorry for asking this here, but I thought it might also benefit dbmail in the end.
I have database with ~560.000 rows. One of the columns is an IP address (type:inet). Now, I want to only get the unique ip addresses out. That takes about 3minutes on a P4 2.66Ghz! Here is the analysis of the query: --------- explain analyze SELECT distinct ip FROM rawdata WHERE sid='2'; Unique (cost=66182.54..69013.19 rows=125 width=11) (actual time=140856.389..183592.925 rows=125 loops=1) -> Sort (cost=66182.54..67597.86 rows=566130 width=11) (actual time=140856.375..162975.490 rows=2232934 loops=1) Sort Key: ip -> Seq Scan on rawdata (cost=0.00..12086.62 rows=566130 width=11) (actual time=0.041..51072.133 rows=2232934 loops=1) Filter: (sid = 2::smallint) Total runtime: 183836.671 ms (6 rows) And the description of the table: -------- \d rawdata Table "public.rawdata" Column | Type | Modifiers -----------+-----------------------------+---------------------------------------------------------- rid | bigint | not null default nextval('public.rawdata_rid_seq'::text) sid | smallint | not null direction | smallint | not null ip | inet | not null bytes | integer | not null timestamp | timestamp without time zone | not null month | smallint | default date_part('month'::text, now()) Indexes: "rawdata_sid_dir" btree (sid, direction) "rawdata_sid_ip_dir" btree (sid, ip, direction) Now, as you can see the problem here is that pgsql first sorts all the 560k rows before applying the index to them in order to find out what is really relevant here. Sorting takes up 2/3 of the time spent on this query. Now, any ideas to speed this up except having a separate table for the unique ip's? Adding a separate index for only the ip field does not help, it does not get used. -HK