I'v spent a couple days playing with this problem and searching the mailing lists and docs etc but come up with nothing. Any help would be much appreciated.
Setup is postgres 7.3.2 on redhat 7.1 on a 1.3GHz Athlon machine with 1G pc133 ram and SCSI. Here is the same query with the addition of a left join onto a list of contacts to grab the last name of each connected contact. I'd think this should be real quick since it jsut has to grab around 100 names from the list, and if its smart enough to grab just distinct IDs, then it's just like 10 rows it has to grab using the primary field. But as far as i can tell (and i may VERY well be reading the explain syntax wrong), it is grabbing them all and joining them first, rather than doing the operation that limits the result rows to a mere 100 and THEN doing the join to contacts. It would be faster if i did a separate query using a big IN(id1,id2,...) condition, which makes no sense to me. Plus i REALLY want to avoid this as the selected fields and the joins and conditions are all variable and controlled (indirectly and transparently) by the user. Point is, why does a simple left join slow things down so much? in my experience (primarily with mysql but also over a year with postgre) simple left joins are usually quite quick. I can only guess that a bad plan is being chosen. PLEASE don't tell me i need to store a copy of the names in the events table to get acceptable speed, cause this would be plain sacrilegious in terms of DB design. Or is this simply as fast as these queries can go? Just seems too long for the work that's being done IME. events table has 12355 rows contacts has 20064 event_managers has 8502 All fields with conditions (object_ids, contact, event_id, user_id, deleted_on) are indexed with btree. Here is the query with the left join. sauce=# explain analyze SELECT top.object_id , top.who, top.datetime, top.priority, top.subject, top.action, top_contact_.last_name, top.object_id, top_contact_.object_id FROM event_managers AS managers JOIN ONLY events AS top ON(managers.event_id=top.object_id) LEFT JOIN contacts AS top_contact_ ON(top.contact=top_contact_.object_id and top_contact_.deleted_on IS NULL) WHERE true AND managers.user_id=238; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=5569.24..5671.22 rows=100 width=91) (actual time=485.95..526.25 rows=208 loops=1) Merge Cond: ("outer".contact = "inner".object_id) Join Filter: ("inner".deleted_on IS NULL) -> Sort (cost=2467.17..2467.42 rows=100 width=60) (actual time=143.67..143.75 rows=208 loops=1) Sort Key: top.contact -> Hash Join (cost=143.63..2463.83 rows=100 width=60) (actual time=0.89..142.64 rows=208 loops=1) Hash Cond: ("outer".object_id = "inner".event_id) -> Seq Scan on events top (cost=0.00..1830.19 rows=12219 width=56) (actual time=0.05..131.33 rows=12219 loops=1) -> Hash (cost=143.45..143.45 rows=69 width=4) (actual time=0.65..0.65 rows=0 loops=1) -> Index Scan using event_managers_user_id on event_managers managers (cost=0.00..143.45 rows=69 width=4) (actual time=0.14..0.50 rows=139 loops=1) Index Cond: (user_id = 238) -> Sort (cost=3102.07..3152.23 rows=20064 width=31) (actual time=342.23..360.29 rows=19964 loops=1) Sort Key: top_contact_.object_id -> Append (cost=0.00..1389.64 rows=20064 width=31) (actual time=0.06..115.63 rows=20064 loops=1) -> Seq Scan on contacts top_contact_ (cost=0.00..1383.43 rows=20043 width=31) (actual time=0.06..101.04 rows=20043 loops=1) -> Seq Scan on users top_contact_ (cost=0.00..6.21 rows=21 width=31) (actual time=0.05..0.29 rows=21 loops=1) Total runtime: 527.47 msec (17 rows) The same thing but without the left join. Much faster. Anything slower than this would be unacceptable, especailly given how small the tables are at this point. They are expected to grow ALOT bigger within a year. sauce=# explain analyze SELECT top.object_id , top.who, top.datetime, top.priority, top.subject, top.action, top.object_id FROM event_managers AS managers JOIN ONLY events AS top ON(managers.event_id=top.object_id) WHERE true AND managers.user_id=238; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=143.63..2463.83 rows=100 width=56) (actual time=1.48..137.74 rows=208 loops=1) Hash Cond: ("outer".object_id = "inner".event_id) -> Seq Scan on events top (cost=0.00..1830.19 rows=12219 width=52) (actual time=0.06..125.80 rows=12219 loops=1) -> Hash (cost=143.45..143.45 rows=69 width=4) (actual time=1.20..1.20 rows=0 loops=1) -> Index Scan using event_managers_user_id on event_managers managers (cost=0.00..143.45 rows=69 width=4) (actual time=0.21..1.03 rows=139 loops=1) Index Cond: (user_id = 238) Total runtime: 137.96 msec (7 rows) again, many thanks for any feedback! ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend