Hello.

We have a view that is very generic, and we noticed that PostgreSQL is not very 
good at removing useless joins, which makes our queries very slow.
We could change our code to avoid the view and write ad-hoc queries to the 
underlying tables, but would prefer not to, if there is a way around it.
(BTW: We are currently using psql 9.4)

Here is a simplified implementation:
# create table a (id int primary key, name varchar(128));
# create table b (id int primary key, name varchar(128));
# create table c (id int primary key, a_id int references a(id), b1_id int 
references b(id), b2_id int references b(id), b3_id int references b(id));
# create view v as select c.id, c.a_id, c.b1_id, c.b2_id , c.b3_id, a.name 
a_name, b1.name b1_name, b2.name b2_name, b3.name b3_name from c,  a, b b1, b 
b2, b b3 where c.a_id=a.id and c.b1_id=b1.id and c.b2_id=b2.id and 
c.b3_id=b3.id;

When I try to get just info from tables c and b1:
# select id, b1_name from v
it still does all the joins (see below).

I would expect just one join (due to the request of columns from the two 
tables),
since all joins are on foreign constrains referencing primary keys,
there are no filters on the other tables,
so it is guaranteed that the useless joins will always return exactly one 
answer.

Is there a way to tweak the PostgreSQL optimizer to do the proper join removal 
during the planning?
Perhaps tweaking somehow either our schema or our queries (while still keeping 
a generic view)?

Thank you,
  Igor Sfiligoi



# explain select id, b1_name from v;
                                   QUERY PLAN
--------------------------------------------------------------------------------
Nested Loop  (cost=1.02..5.45 rows=1 width=6)
   Join Filter: (c.b3_id = b3.id)
   ->  Nested Loop  (cost=1.02..4.32 rows=1 width=10)
         Join Filter: (c.a_id = a.id)
         ->  Nested Loop  (cost=1.02..3.25 rows=1 width=14)
               Join Filter: (c.b2_id = b2.id)
               ->  Hash Join  (cost=1.02..2.12 rows=1 width=18)
                     Hash Cond: (b1.id = c.b1_id)
                     ->  Seq Scan on b b1  (cost=0.00..1.06 rows=6 width=6)
                     ->  Hash  (cost=1.01..1.01 rows=1 width=20)
                           ->  Seq Scan on c  (cost=0.00..1.01 rows=1 width=20)
               ->  Seq Scan on b b2  (cost=0.00..1.06 rows=6 width=4)
         ->  Seq Scan on a  (cost=0.00..1.03 rows=3 width=4)
   ->  Seq Scan on b b3  (cost=0.00..1.06 rows=6 width=4)
(14 rows)

PS: The tables were very small in this example, but are quite big in the 
production environment.

Reply via email to