I have a query where I want to select the usertable records that have a matching entry in an event table. There are two ways to do this.

1) SELECT COUNT(DISTINCT u.uid) FROM usertable u, eventlog e WHERE u.uid = e.uid AND e.type = XX;
2) SELECT COUNT(u.uid) FROM usertable u WHERE EXISTS(SELECT 1 FROM eventlog e WHERE u.uid = e.uid AND e.type = XX);


In a real life query 1 took 46284.58 msec and 45856.66 msec for first and second runs 
and
query 2 took 38736.77 msec and 32833.08 msec.

Here are the explain analyse outputs:

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=180116.76..180116.76 rows=1 width=20) (actual time=46267.93..46267.93 rows=1 loops=1)
-> Nested Loop (cost=195.13..180116.75 rows=1 width=20) (actual time=46094.32..46265.81 rows=8 loops=1)
-> Nested Loop (cost=195.13..180112.96 rows=1 width=16) (actual time=46064.86..46189.25 rows=8 loops=1)
-> Hash Join (cost=195.13..179856.40 rows=43 width=12) (actual time=46054.33..46153.15 rows=8 loops=1)
Hash Cond: ("outer".typeid = "inner".id)
-> Seq Scan on eventlog (cost=0.00..174675.16 rows=664742 width=8) (actual time=357.04..45349.36 rows=580655 loops=1)
Filter: ("type" = 4)
-> Hash (cost=194.97..194.97 rows=65 width=4) (actual time=21.83..21.83 rows=0 loops=1)
-> Index Scan using clickthru_jid_and_id_key on clickthru (cost=0.00..194.97 rows=65 width=4) (actual time=21.47..21.71 rows=63 loops=1)
Index Cond: (jobid = 7899)
-> Index Scan using usertable_pkey on usertable u (cost=0.00..5.91 rows=1 width=4) (actual time=4.50..4.50 rows=1 loops=8)
Index Cond: (("outer".uid = u.userkey) AND (u.podkey = 259))
-> Index Scan using d_pkey on d (cost=0.00..3.78 rows=1 width=4) (actual time=9.56..9.56 rows=1 loops=8)
Index Cond: ("outer".uid = d.ukey)
Total runtime: 46284.58 msec
45856.66 msec
(15 rows)




QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=4325054.14..4325054.14 rows=1 width=8) (actual time=38736.62..38736.62 rows=1 loops=1)
-> Nested Loop (cost=0.00..4325052.01 rows=852 width=8) (actual time=12451.09..38736.58 rows=6 loops=1)
-> Index Scan using usertable_podkey_key on usertable u (cost=0.00..4321822.44 rows=852 width=4) (actual time=12450.95..38735.85 rows=6 loops=1)
Index Cond: (pkey = 259)
Filter: (subplan)
SubPlan
-> Nested Loop (cost=0.00..2532.25 rows=1 width=8) (actual time=49.59..49.59 rows=0 loops=752)
-> Index Scan using eventlog_uid_and_jid_and_type_key on eventlog (cost=0.00..2343.37 rows=62 width=4) (actual time=29.64..48.91 rows=4 loops=752)
Index Cond: (uid = $0)
Filter: ("type" = 4)
-> Index Scan using clickthru_pkey on clickthru (cost=0.00..3.02 rows=1 width=4) (actual time=0.18..0.18 rows=0 loops=2725)
Index Cond: ("outer".typeid = clickthru.id)
Filter: (jobid = 7899)
-> Index Scan using directory_pkey on d (cost=0.00..3.78 rows=1 width=4) (actual time=0.11..0.11 rows=1 loops=6)
Index Cond: (d.ukey = "outer".userkey)
Total runtime: 38736.77 msec
32833.08 msec
(16 rows)




... so the questions are:

Why are the plans so different?
Why does the planner think query 2 will take so long when it ends up being faster than query 1?
Is there anything I can do to speed up the query?


version is: 7.3.3

BTW does this belong on the performance list or is that list about tuning the database config paramaters only?


---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Reply via email to