Re: [PERFORM] performance hit when joining with a view?

2003-09-25 Thread Tom Lane
Palle Girgensohn <[EMAIL PROTECTED]> writes:
> Vydefinition: SELECT p.userid, p.giver, p.first_name, p.last_name, p.email, 
> p.default_language, p.created, p.created_by, w.course_id FROM (person p 
> LEFT JOIN wiol w ON ((p.userid = w.userid)));

>  explain  analyze select p.pim_id, p.recipient, p.sender, p.message, p.ts, 
> p.type, case when sender.userid is not null then sender.first_name || ' ' 
> || sender.last_name else null end as sender_name, sender.course_id is not 
> null as is_online from pim p left outer join person_wiol_view sender on 
> (sender.userid = p.sender) where p.recipient = 'axto6551' and p.type >= 0 
> limit 1;

> explain  analyze select p.pim_id, p.recipient, p.sender, p.message, p.ts, 
> p.type, case when sender.userid is not null then sender.first_name || ' ' 
> || sender.last_name else null end as sender_name, w.course_id is not null 
> as is_online from pim p left outer join person sender on (sender.userid = 
> p.sender) left join wiol w on (w.userid=sender.userid) where p.recipient = 
> 'axto6551' and p.type >= 0  limit 1;

These are not actually the same query.  In the former case the implicit
parenthesization of the joins is
pim left join (person left join wiol)
whereas in the latter case the implicit parenthesization is left-to-right:
(pim left join person) left join wiol
Since the only restriction conditions you have provided are on pim, the
first parenthesization implies forming the entire join of person and
wiol :-(.

If you were using plain joins then the two queries would be logically
equivalent, but outer joins are in general not associative, so the
planner will not consider re-ordering them.

There is some work in 7.4 to make the planner smarter about outer joins,
but offhand I don't think any of it will improve results for this
particular example.

I have seen some academic papers about how to prove that a particular
pair of outer join operators can safely be swapped (as I think is true
in this example).  Some knowledge of that sort may eventually get into
the planner, but it ain't there now.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] performance hit when joining with a view?

2003-09-25 Thread Palle Girgensohn
Hi!

A performance question:

I have some tables:

  Tabell "public.person"
 Kolumn  |   Typ|  Modifierare
--+--+---
userid   | text | not null
giver| text |
first_name   | text |
last_name| text |
email| text |
default_language | text | default 'sv'
created  | timestamp with time zone | default now()
created_by   | text |
Index: person_pkey primärnyckel btree (userid),
  person_index unik btree (userid),
  person_giver_idx btree (giver)
Främmande nyckel-villkor: pp_fk9 FOREIGN KEY (giver) REFERENCES 
providers(giver) ON UPDATE CASCADE ON DELETE CASCADE,
 pp_fk2 FOREIGN KEY (created_by) REFERENCES 
person(userid) ON UPDATE CASCADE ON DELETE SET NULL

Tabell "public.wiol"
Kolumn  | Typ |  Modifierare
-+-+---
userid  | text| not null
course_id   | integer |
login_ts| timestamp without time zone | default now()
latest_event_ts | timestamp without time zone | default now()
Främmande nyckel-villkor: pp_fk2 FOREIGN KEY (course_id) REFERENCES 
course(id) ON UPDATE CASCADE ON DELETE CASCADE,
 pp_fk1 FOREIGN KEY (userid) REFERENCES 
person(userid) ON UPDATE CASCADE ON DELETE CASCADE

and a view:

  Vy "public.person_wiol_view"
 Kolumn  |   Typ| Modifierare
--+--+-
userid   | text |
giver| text |
first_name   | text |
last_name| text |
email| text |
default_language | text |
created  | timestamp with time zone |
created_by   | text |
course_id| integer  |
Vydefinition: SELECT p.userid, p.giver, p.first_name, p.last_name, p.email, 
p.default_language, p.created, p.created_by, w.course_id FROM (person p 
LEFT JOIN wiol w ON ((p.userid = w.userid)));

Now, with about 3 tuples in person and about 40 in wiol, executing a 
left outer join with the view gives horrible performance:

explain  analyze select p.pim_id, p.recipient, p.sender, p.message, p.ts, 
p.type, case when sender.userid is not null then sender.first_name || ' ' 
|| sender.last_name else null end as sender_name, sender.course_id is not 
null as is_online from pim p left outer join person_wiol_view sender on 
(sender.userid = p.sender) where p.recipient = 'axto6551' and p.type >= 0 
limit 1;
QUERY PLAN
---
--
Limit  (cost=0.00..1331.26 rows=1 width=180) (actual time=866.14..1135.65 
rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..1331.26 rows=1 width=180) (actual 
time=866.13..1135.63 rows=2 loops=1)
Join Filter: ("inner".userid = "outer".sender)
->  Seq Scan on pim p  (cost=0.00..0.00 rows=1 width=112) (actual 
time=0.05..0.18 rows=2 loops=1)
  Filter: ((recipient = 'axto6551'::text) AND ("type" >= 0))
->  Materialize  (cost=956.15..956.15 rows=30009 width=68) (actual 
time=369.33..437.86 rows=22045 loops=2)
  ->  Hash Join  (cost=0.00..956.15 rows=30009 width=68) 
(actual time=0.45..605.21 rows=30013 loops=1)
Hash Cond: ("outer".userid = "inner".userid)
->  Seq Scan on person p  (cost=0.00..806.09 
rows=30009 width=32) (actual time=0.16..279.28 rows=30009 loops=1)
->  Hash  (cost=0.00..0.00 rows=1 width=36) (actual 
time=0.13..0.13 rows=0 loops=1)
  ->  Seq Scan on wiol w  (cost=0.00..0.00 rows=1 
width=36) (actual time=0.02..0.09 rows=8 loops=1)
Total runtime: 1143.93 msec
(12 rader)

but rewriting the question with an explicit join uses the indices, and runs 
*much* faster:

explain  analyze select p.pim_id, p.recipient, p.sender, p.message, p.ts, 
p.type, case when sender.userid is not null then sender.first_name || ' ' 
|| sender.last_name else null end as sender_name, w.course_id is not null 
as is_online from pim p left outer join person sender on (sender.userid = 
p.sender) left join wiol w on (w.userid=sender.userid) where p.recipient = 
'axto6551' and p.type >= 0  limit 1;
QUERY PLAN
---
-
Limit  (cost=0.00..6.03 rows=