[GENERAL] DISTINCT vs EXISTS performance

Joseph Shraibman Tue, 08 Jul 2003 19:17:15 -0700

I have a query where I want to select the usertable records that have a matching entry in an event table. There are two ways to do this.

1) SELECT COUNT(DISTINCT u.uid) FROM usertable u, eventlog e WHERE u.uid = e.uid AND e.type = XX; 2) SELECT COUNT(u.uid) FROM usertable u WHERE EXISTS(SELECT 1 FROM eventlog e WHERE u.uid = e.uid AND e.type = XX);

In a real life query 1 took 46284.58 msec and 45856.66 msec for first and second runs 
and
query 2 took 38736.77 msec and 32833.08 msec.

Here are the explain analyse outputs:

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=180116.76..180116.76 rows=1 width=20) (actual time=46267.93..46267.93 rows=1 loops=1) -> Nested Loop (cost=195.13..180116.75 rows=1 width=20) (actual time=46094.32..46265.81 rows=8 loops=1) -> Nested Loop (cost=195.13..180112.96 rows=1 width=16) (actual time=46064.86..46189.25 rows=8 loops=1) -> Hash Join (cost=195.13..179856.40 rows=43 width=12) (actual time=46054.33..46153.15 rows=8 loops=1) Hash Cond: ("outer".typeid = "inner".id) -> Seq Scan on eventlog (cost=0.00..174675.16 rows=664742 width=8) (actual time=357.04..45349.36 rows=580655 loops=1) Filter: ("type" = 4) -> Hash (cost=194.97..194.97 rows=65 width=4) (actual time=21.83..21.83 rows=0 loops=1) -> Index Scan using clickthru_jid_and_id_key on clickthru (cost=0.00..194.97 rows=65 width=4) (actual time=21.47..21.71 rows=63 loops=1) Index Cond: (jobid = 7899) -> Index Scan using usertable_pkey on usertable u (cost=0.00..5.91 rows=1 width=4) (actual time=4.50..4.50 rows=1 loops=8) Index Cond: (("outer".uid = u.userkey) AND (u.podkey = 259)) -> Index Scan using d_pkey on d (cost=0.00..3.78 rows=1 width=4) (actual time=9.56..9.56 rows=1 loops=8) Index Cond: ("outer".uid = d.ukey) Total runtime: 46284.58 msec 45856.66 msec (15 rows)

QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=4325054.14..4325054.14 rows=1 width=8) (actual time=38736.62..38736.62 rows=1 loops=1) -> Nested Loop (cost=0.00..4325052.01 rows=852 width=8) (actual time=12451.09..38736.58 rows=6 loops=1) -> Index Scan using usertable_podkey_key on usertable u (cost=0.00..4321822.44 rows=852 width=4) (actual time=12450.95..38735.85 rows=6 loops=1) Index Cond: (pkey = 259) Filter: (subplan) SubPlan -> Nested Loop (cost=0.00..2532.25 rows=1 width=8) (actual time=49.59..49.59 rows=0 loops=752) -> Index Scan using eventlog_uid_and_jid_and_type_key on eventlog (cost=0.00..2343.37 rows=62 width=4) (actual time=29.64..48.91 rows=4 loops=752) Index Cond: (uid = $0) Filter: ("type" = 4) -> Index Scan using clickthru_pkey on clickthru (cost=0.00..3.02 rows=1 width=4) (actual time=0.18..0.18 rows=0 loops=2725) Index Cond: ("outer".typeid = clickthru.id) Filter: (jobid = 7899) -> Index Scan using directory_pkey on d (cost=0.00..3.78 rows=1 width=4) (actual time=0.11..0.11 rows=1 loops=6) Index Cond: (d.ukey = "outer".userkey) Total runtime: 38736.77 msec 32833.08 msec (16 rows)

... so the questions are:

Why are the plans so different? Why does the planner think query 2 will take so long when it ends up being faster than query 1? Is there anything I can do to speed up the query?

version is: 7.3.3

BTW does this belong on the performance list or is that list about tuning the database config paramaters only?


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

[GENERAL] DISTINCT vs EXISTS performance

Reply via email to