I am sorry for asking this here, but I thought it might
also benefit dbmail in the end.

I have database with ~560.000 rows. One of the columns is an
IP address (type:inet). Now, I want to only get the unique ip
addresses out. That takes about 3minutes on a P4 2.66Ghz!

Here is the analysis of the query:
---------
explain analyze SELECT distinct ip FROM rawdata WHERE sid='2';

 Unique  (cost=66182.54..69013.19 rows=125 width=11) (actual
time=140856.389..183592.925 rows=125 loops=1)
   ->  Sort  (cost=66182.54..67597.86 rows=566130 width=11) (actual
time=140856.375..162975.490 rows=2232934 loops=1)
         Sort Key: ip
         ->  Seq Scan on rawdata  (cost=0.00..12086.62 rows=566130
width=11) (actual time=0.041..51072.133 rows=2232934 loops=1)
               Filter: (sid = 2::smallint)
 Total runtime: 183836.671 ms
(6 rows)


And the description of the table:
--------
 \d rawdata
                                       Table "public.rawdata"
  Column   |            Type             |                       
Modifiers
-----------+-----------------------------+----------------------------------------------------------
 rid       | bigint                      | not null default
nextval('public.rawdata_rid_seq'::text)
 sid       | smallint                    | not null
 direction | smallint                    | not null
 ip        | inet                        | not null
 bytes     | integer                     | not null
 timestamp | timestamp without time zone | not null
 month     | smallint                    | default
date_part('month'::text, now())
Indexes:
    "rawdata_sid_dir" btree (sid, direction)
    "rawdata_sid_ip_dir" btree (sid, ip, direction)


Now, as you can see the problem here is that pgsql first sorts all the
560k rows before applying the index to them in order to find out what
is really relevant here.

Sorting takes up 2/3 of the time spent on this query. Now, any ideas
to speed this up except having a separate table for the unique ip's?

Adding a separate index for only the ip field does not help, it does
not get used.

-HK

Reply via email to