[PERFORM] simple left join slows query more than expected

Jonathan Knopp Thu, 27 Nov 2003 23:37:29 -0800

I'v spent a couple days playing with this problem and searching the mailing lists and
docs etc but come up with nothing. Any help would be much appreciated.


Setup is postgres 7.3.2 on redhat 7.1 on a 1.3GHz Athlon machine with 1G pc133 ram and
SCSI.

Here is the same query with the addition of a left join onto a list of contacts to
grab the last name of each connected contact. I'd think this should be real quick since
it jsut has to grab around 100 names from the list, and if its smart enough to grab 
just
distinct IDs, then it's just like 10 rows it has to grab using the primary field. But 
as
far as i can tell (and i may VERY well be reading the explain syntax wrong), it is
grabbing them all and joining them first, rather than doing the operation that limits
the result rows to a mere 100 and THEN doing the join to contacts. It would be faster
if i did a separate query using a big IN(id1,id2,...) condition, which makes no sense 
to
me. Plus i REALLY want to avoid this as the selected fields and the joins and 
conditions
are all variable and controlled (indirectly and transparently) by the user.

Point is, why does a simple left join slow things down so much? in my experience
(primarily with mysql but also over a year with postgre) simple left joins are usually
quite quick. I can only guess that a bad plan is being chosen. PLEASE don't tell me i
need to store a copy of the names in the events table to get acceptable speed, cause
this would be plain sacrilegious in terms of DB design. Or is this simply as fast as
these queries can go? Just seems too long for the work that's being done IME.

events table has 12355 rows
contacts has 20064
event_managers has 8502

All fields with conditions (object_ids, contact, event_id, user_id, deleted_on) are 
indexed with btree.

Here is the query with the left join.

sauce=# explain analyze SELECT top.object_id , top.who, top.datetime, top.priority, 
top.subject, top.action, top_contact_.last_name, top.object_id, top_contact_.object_id
                        FROM event_managers AS managers
                                JOIN ONLY events AS top 
ON(managers.event_id=top.object_id)
                                LEFT JOIN contacts AS top_contact_ 
ON(top.contact=top_contact_.object_id and top_contact_.deleted_on IS NULL)
                        WHERE true AND managers.user_id=238;
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Merge Join  (cost=5569.24..5671.22 rows=100 width=91) (actual time=485.95..526.25 
rows=208 loops=1)
   Merge Cond: ("outer".contact = "inner".object_id)
   Join Filter: ("inner".deleted_on IS NULL)
   ->  Sort  (cost=2467.17..2467.42 rows=100 width=60) (actual time=143.67..143.75 
rows=208 loops=1)
         Sort Key: top.contact
         ->  Hash Join  (cost=143.63..2463.83 rows=100 width=60) (actual 
time=0.89..142.64 rows=208 loops=1)
               Hash Cond: ("outer".object_id = "inner".event_id)
               ->  Seq Scan on events top  (cost=0.00..1830.19 rows=12219 width=56) 
(actual time=0.05..131.33 rows=12219 loops=1)
               ->  Hash  (cost=143.45..143.45 rows=69 width=4) (actual time=0.65..0.65 
rows=0 loops=1)
                     ->  Index Scan using event_managers_user_id on event_managers 
managers  (cost=0.00..143.45 rows=69 width=4) (actual time=0.14..0.50 rows=139 loops=1)
                           Index Cond: (user_id = 238)
   ->  Sort  (cost=3102.07..3152.23 rows=20064 width=31) (actual time=342.23..360.29 
rows=19964 loops=1)
         Sort Key: top_contact_.object_id
         ->  Append  (cost=0.00..1389.64 rows=20064 width=31) (actual 
time=0.06..115.63 rows=20064 loops=1)
               ->  Seq Scan on contacts top_contact_  (cost=0.00..1383.43 rows=20043 
width=31) (actual time=0.06..101.04 rows=20043 loops=1)
               ->  Seq Scan on users top_contact_  (cost=0.00..6.21 rows=21 width=31) 
(actual time=0.05..0.29 rows=21 loops=1)
 Total runtime: 527.47 msec
(17 rows)


The same thing but without the left join. Much faster. Anything slower than
this would be unacceptable, especailly given how small the tables are at this
point. They are expected to grow ALOT bigger within a year.

sauce=# explain analyze SELECT top.object_id , top.who, top.datetime, top.priority, 
top.subject, top.action, top.object_id
                        FROM event_managers AS managers
                                JOIN ONLY events AS top 
ON(managers.event_id=top.object_id)
                        WHERE true AND managers.user_id=238;
                                                                          QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=143.63..2463.83 rows=100 width=56) (actual time=1.48..137.74 
rows=208 loops=1)
   Hash Cond: ("outer".object_id = "inner".event_id)
   ->  Seq Scan on events top  (cost=0.00..1830.19 rows=12219 width=52) (actual 
time=0.06..125.80 rows=12219 loops=1)
   ->  Hash  (cost=143.45..143.45 rows=69 width=4) (actual time=1.20..1.20 rows=0 
loops=1)
         ->  Index Scan using event_managers_user_id on event_managers managers  
(cost=0.00..143.45 rows=69 width=4) (actual time=0.21..1.03 rows=139 loops=1)
               Index Cond: (user_id = 238)
 Total runtime: 137.96 msec
(7 rows)

again, many thanks for any feedback!


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

[PERFORM] simple left join slows query more than expected

Reply via email to