Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins

2016-07-05 Thread Sfiligoi, Igor
The best that I can do right now is provide the explain of the three variants 
(see below).

The use of a left join did indeed remove the useless joins, but the selected 
plan is just terrible.

Thanks,
  Igor

-Original Message-
From: Kevin Grittner [mailto:kgri...@gmail.com] 
Sent: Saturday, July 02, 2016 6:28 AM
To: Sfiligoi, Igor <igor.sfili...@ga.com>
Cc: pgsql-general@postgresql.org
Subject: Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless 
joins

On Fri, Jul 1, 2016 at 7:15 PM, Sfiligoi, Igor <igor.sfili...@ga.com> wrote:
> OK.  Will change our query generation code to not use the view.
> (I have tried the LEFT JOIN approach, but it just does not seem to 
> perform.)

> PS: Here are the numbers for the real production query (will not provide 
> details):
> Original query:  300s
> Query on a manually optimized view:1ms
> Using left joins:200s

Please show a self-contained case (i.e., one that can be run against an empty 
database to demonstrate the problem).  

-
Original view, two unused joins
(data_info.true_arrow_id = true_dart.arrow_id && data_info.acl_arrow_id = 
acl_dart.arrow_id)
-
 Merge Join  (cost=121449.22..11705013.57 rows=890 width=63) (actual 
time=326791.858..365059.117 rows=1 loops=1)
   Merge Cond: (locn_info.rock_person_id = bird_rsrc_physical.rock_person_id)
   ->  Sort  (cost=1.03..1.03 rows=2 width=4) (actual time=0.018..0.020 rows=2 
loops=1)
 Sort Key: locn_info.rock_person_id
 Sort Method: quicksort  Memory: 25kB
 ->  Seq Scan on bird_locn_info locn_info  (cost=0.00..1.02 rows=2 
width=4) (actual time=0.003..0.003 rows=2 loops=1)
   ->  Materialize  (cost=121448.19..14015467.54 rows=890 width=71) (actual 
time=326791.835..365059.092 rows=1 loops=1)
 ->  Merge Join  (cost=121448.19..14015465.32 rows=890 width=71) 
(actual time=326791.833..365059.089 rows=1 loops=1)
   Merge Cond: (bird_rsrc_physical.rock_person_id = 
bird_person_bike.person_id)
   ->  Nested Loop  (cost=121444.20..16786761.58 rows=890 width=67) 
(actual time=326791.779..365059.033 rows=1 loops=1)
 Join Filter: (bird_data_silk.rsrc_id = 
bird_rsrc_physical.phy_rsrc_id)
 ->  Nested Loop  (cost=0.13..15.68 rows=1 width=8) (actual 
time=0.025..0.035 rows=1 loops=1)
   Join Filter: (bird_rsrc_physical.lock_rock_person_id 
= lock_info.rock_person_id)
   Rows Removed by Join Filter: 1
   ->  Nested Loop  (cost=0.13..14.63 rows=1 width=12) 
(actual time=0.022..0.031 rows=1 loops=1)
 Join Filter: (bird_rsrc_physical.rsrc_class_id 
= bird_rsrc_class.rsrc_class_id)
 Rows Removed by Join Filter: 7
 ->  Nested Loop  (cost=0.13..13.45 rows=1 
width=16) (actual time=0.017..0.025 rows=1 loops=1)
   Join Filter: 
((bird_rsrc_physical.rsrc_type_id)::text = (bird_rsrc_type.rsrc_type_id)::text)
   Rows Removed by Join Filter: 11
   ->  Index Scan using 
fidx_1_rsrc_physical on bird_rsrc_physical  (cost=0.13..12.18 rows=1 width=24) 
(actual time=0.009..0.013 rows=1 loops=1)
 Filter: ((rsrc_name)::text = 
'data'::text)
 Rows Removed by Filter: 2
   ->  Seq Scan on bird_rsrc_type  
(cost=0.00..1.12 rows=12 width=8) (actual time=0.002..0.004 rows=12 loops=1)
 ->  Seq Scan on bird_rsrc_class  
(cost=0.00..1.08 rows=8 width=4) (actual time=0.002..0.003 rows=8 loops=1)
   ->  Seq Scan on bird_locn_info lock_info  
(cost=0.00..1.02 rows=2 width=4) (actual time=0.001..0.002 rows=2 loops=1)
 ->  Hash Join  (cost=121444.07..16786712.53 rows=2670 
width=67) (actual time=326791.750..365058.993 rows=1 loops=1)
   Hash Cond: (data_info.true_data_id = 
bird_data_silk.data_id)
   ->  Hash Join  (cost=118019.92..16144855.88 
rows=102144928 width=43) (actual time=1945.934..346470.487 rows=102020209 
loops=1)
 Hash Cond: (data_info.sheet_id = 
bird_sheet.sheet_id)
 ->  Hash Join  (cost=118018.36..14740361.56 
rows=102144928 width=47) (actual time=1945.911..310945.806 rows=102020209 
loops=1)
   

Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins

2016-07-02 Thread Kevin Grittner
On Fri, Jul 1, 2016 at 7:15 PM, Sfiligoi, Igor  wrote:
> OK.  Will change our query generation code to not use the view.
> (I have tried the LEFT JOIN approach, but it just does not seem to perform.)

> PS: Here are the numbers for the real production query (will not provide 
> details):
> Original query:  300s
> Query on a manually optimized view:1ms
> Using left joins:200s

Please show a self-contained case (i.e., one that can be run
against an empty database to demonstrate the problem).  You might
start from this one and modify it until you see the problem that
you describe:

create table a (id int primary key, name varchar(128));
create table b (id int primary key, name varchar(128));
create table c (id int primary key,
a_id int not null references a(id),
b1_id int not null references b(id),
b2_id int not null references b(id),
b3_id int not null references b(id));

create view v as
  select
  c.id, c.a_id, c.b1_id, c.b2_id, c.b3_id, a.name a_name,
  b1.name b1_name, b2.name b2_name, b3.name b3_name
  from c
  left join aon a.id = c.a_id
  left join b b1 on b1.id = c.b1_id
  left join b b2 on b2.id = c.b2_id
  left join b b3 on b3.id = c.b3_id;

insert into a values (1, 'a1');
insert into b values (1, 'b1'), (2, 'b2'), (3, 'b3');
insert into c values (1, 1, 1, 2, 3);

vacuum analyze a;
vacuum analyze b;
vacuum analyze c;

select id, b1_name from v;
explain (analyze, buffers, verbose) select id, b1_name from v;

I'm seeing the unreferenced tables pruned from the plan, and a 1ms
execution time for the select from the view.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins

2016-07-01 Thread Sfiligoi, Igor
OK.  Will change our query generation code to not use the view.
(I have tried the LEFT JOIN approach, but it just does not seem to perform.)

Thanks,
  Igor

PS: Here are the numbers for the real production query (will not provide 
details):
Original query:  300s
Query on a manually optimized view:  1ms
Using left joins:200s

I would have gladly paid a few ms in additional planning time!

-Original Message-
From: Kevin Grittner [mailto:kgri...@gmail.com] 
Sent: Friday, July 01, 2016 1:57 PM
To: Sfiligoi, Igor <igor.sfili...@ga.com>
Cc: pgsql-general@postgresql.org
Subject: Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless 
joins

On Fri, Jul 1, 2016 at 3:33 PM, Sfiligoi, Igor <igor.sfili...@ga.com> wrote:
> No, I don't want to use LEFT JOINS.
> I want to use regular joins.
>
> But (as mentioned in my other follow-up), all the fields are not null 
> (was not in the original email, sorry), and are foreign keys, so it is 
> guaranteed to always match.

In that case there is no difference between the inner join and the left join 
except that the left join currently supports and optimization that makes your 
query faster if the optional table is not reference.  Whether you want to take 
advantage of that is up to you.

> The key part (in my mind) is that I am not filtering on any of the 
> useless tables, and I am not returning any columns from those tables 
> either.
> Both is known at planning time.

The fact that something can be determined at planning time doesn't mean that 
checking for it is free.

> is my logic still broken?

Your logic seems OK with the table definitions you are now showing.

Whether we ever decide it is OK to omit tables which use an inner join rather 
than only considering omitting them when the query specifies that the join is 
optional is anybody's guess.  If it is important enough to you you could submit 
a patch or fund development of such a feature; but since it would add at least 
some small amount of planning time to every inner join just to avoid specifying 
that the join is an optional one when writing the query, it seems to me 
unlikely to be accepted.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins

2016-07-01 Thread Kevin Grittner
On Fri, Jul 1, 2016 at 3:33 PM, Sfiligoi, Igor  wrote:
> No, I don't want to use LEFT JOINS.
> I want to use regular joins.
>
> But (as mentioned in my other follow-up), all the fields are not
> null (was not in the original email, sorry), and are foreign
> keys, so it is guaranteed to always match.

In that case there is no difference between the inner join and the
left join except that the left join currently supports and
optimization that makes your query faster if the optional table is
not reference.  Whether you want to take advantage of that is up to
you.

> The key part (in my mind) is that I am not filtering on any of
> the useless tables, and I am not returning any columns from those
> tables either.
> Both is known at planning time.

The fact that something can be determined at planning time doesn't
mean that checking for it is free.

> is my logic still broken?

Your logic seems OK with the table definitions you are now showing.

Whether we ever decide it is OK to omit tables which use an inner
join rather than only considering omitting them when the query
specifies that the join is optional is anybody's guess.  If it is
important enough to you you could submit a patch or fund
development of such a feature; but since it would add at least some
small amount of planning time to every inner join just to avoid
specifying that the join is an optional one when writing the query,
it seems to me unlikely to be accepted.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins

2016-07-01 Thread Sfiligoi, Igor
Nope, no difference how I express the joins:
create view v1 as 
select c.id, c.a_id, c.b1_id, c.b2_id , c.b3_id, a.name a_name, b1.name 
b1_name, b2.name b2_name, b3.name b3_name 
from c join  a on (c.a_id=a.id) join  b b1 on (c.b1_id=b1.id) join b b2 on 
(c.b2_id=b2.id) join b b3 on (c.b3_id=b3.id);

# explain select id, b1_name from v1;
   QUERY PLAN   

 Nested Loop  (cost=1.02..5.47 rows=1 width=7)
   Join Filter: (c.b3_id = b3.id)
   ->  Nested Loop  (cost=1.02..4.34 rows=1 width=11)
 Join Filter: (c.a_id = a.id)
 ->  Nested Loop  (cost=1.02..3.25 rows=1 width=15)
   Join Filter: (c.b2_id = b2.id)
   ->  Hash Join  (cost=1.02..2.12 rows=1 width=19)
 Hash Cond: (b1.id = c.b1_id)
 ->  Seq Scan on b b1  (cost=0.00..1.06 rows=6 width=7)
 ->  Hash  (cost=1.01..1.01 rows=1 width=20)
   ->  Seq Scan on c  (cost=0.00..1.01 rows=1 width=20)
   ->  Seq Scan on b b2  (cost=0.00..1.06 rows=6 width=4)
 ->  Seq Scan on a  (cost=0.00..1.04 rows=4 width=4)
   ->  Seq Scan on b b3  (cost=0.00..1.06 rows=6 width=4)
(14 rows)

Igor

PS: Here are the updated table definitions:
create table a (id int not null primary key, name varchar(128));
create table b (id int not null primary key, name varchar(128));
create table c (id int not null primary key, a_id int not null references 
a(id), b1_id int not null references b(id), b2_id int not null references 
b(id), b3_id int not null references b(id));

-Original Message-
From: Adrian Klaver [mailto:adrian.kla...@aklaver.com] 
Sent: Friday, July 01, 2016 1:38 PM
To: Sfiligoi, Igor <igor.sfili...@ga.com>; Merlin Moncure <mmonc...@gmail.com>
Cc: pgsql-general@postgresql.org
Subject: Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless 
joins

On 07/01/2016 01:28 PM, Sfiligoi, Igor wrote:
> Sorry... the example was incomplete.
>
> All the fields are defined as not-null.
> So it is guaranteed to always match the join.
>
> And PostgreSQL release notes claim that PGSQL can do at least partial join 
> removal:
> https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.0#Join_Rem
> oval

Those examples use explicit joins, so you might try that in your view 
definition.

>
> I was hoping this use case would fit in.
>
> Any suggestions?
>
> Igor
>
> -Original Message-
> From: Merlin Moncure [mailto:mmonc...@gmail.com]
> Sent: Friday, July 01, 2016 12:42 PM
> To: Sfiligoi, Igor <igor.sfili...@ga.com>
> Cc: pgsql-general@postgresql.org
> Subject: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious 
> useless joins
>
> On Fri, Jul 1, 2016 at 12:17 PM, Sfiligoi, Igor <igor.sfili...@ga.com> wrote:
>> Hello.
>>
>> We have a view that is very generic, and we noticed that PostgreSQL 
>> is not very good at removing useless joins, which makes our queries very 
>> slow.
>>
>> We could change our code to avoid the view and write ad-hoc queries 
>> to the underlying tables, but would prefer not to, if there is a way around 
>> it.
>>
>> (BTW: We are currently using psql 9.4)
>>
>> Here is a simplified implementation:
>>
>> # create table a (id int primary key, name varchar(128));
>>
>> # create table b (id int primary key, name varchar(128));
>>
>> # create table c (id int primary key, a_id int references a(id), 
>> b1_id int references b(id), b2_id int references b(id), b3_id int 
>> references b(id));
>>
>> # create view v as select c.id, c.a_id, c.b1_id, c.b2_id , c.b3_id, 
>> a.name a_name, b1.name b1_name, b2.name b2_name, b3.name b3_name from 
>> c,  a, b b1, b b2, b b3 where c.a_id=a.id and c.b1_id=b1.id and 
>> c.b2_id=b2.id and c.b3_id=b3.id;
>>
>> When I try to get just info from tables c and b1:
>>
>> # select id, b1_name from v
>>
>> it still does all the joins (see below).
>>
>> I would expect just one join (due to the request of columns from the 
>> two tables),
>>
>> since all joins are on foreign constrains referencing primary keys,
>>
>> there are no filters on the other tables, so it is guaranteed that 
>> the useless joins will always return exactly one answer.
>
> I think what you're asking for is a lot more complex than it sounds, and 
> incorrect.  The precise state of the data influences how many records come 
> back (in this case, either 1 or 0), for example if b3_id is null you get zero 
> rows.  More to the point, you *instructed* the server to make the join.  
> There are st

Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins

2016-07-01 Thread Adrian Klaver

On 07/01/2016 01:28 PM, Sfiligoi, Igor wrote:

Sorry... the example was incomplete.

All the fields are defined as not-null.
So it is guaranteed to always match the join.

And PostgreSQL release notes claim that PGSQL can do at least partial join 
removal:
https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.0#Join_Removal


Those examples use explicit joins, so you might try that in your view 
definition.




I was hoping this use case would fit in.

Any suggestions?

Igor

-Original Message-
From: Merlin Moncure [mailto:mmonc...@gmail.com]
Sent: Friday, July 01, 2016 12:42 PM
To: Sfiligoi, Igor <igor.sfili...@ga.com>
Cc: pgsql-general@postgresql.org
Subject: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins

On Fri, Jul 1, 2016 at 12:17 PM, Sfiligoi, Igor <igor.sfili...@ga.com> wrote:

Hello.

We have a view that is very generic, and we noticed that PostgreSQL is
not very good at removing useless joins, which makes our queries very slow.

We could change our code to avoid the view and write ad-hoc queries to
the underlying tables, but would prefer not to, if there is a way around it.

(BTW: We are currently using psql 9.4)

Here is a simplified implementation:

# create table a (id int primary key, name varchar(128));

# create table b (id int primary key, name varchar(128));

# create table c (id int primary key, a_id int references a(id), b1_id
int references b(id), b2_id int references b(id), b3_id int references
b(id));

# create view v as select c.id, c.a_id, c.b1_id, c.b2_id , c.b3_id,
a.name a_name, b1.name b1_name, b2.name b2_name, b3.name b3_name from
c,  a, b b1, b b2, b b3 where c.a_id=a.id and c.b1_id=b1.id and
c.b2_id=b2.id and c.b3_id=b3.id;

When I try to get just info from tables c and b1:

# select id, b1_name from v

it still does all the joins (see below).

I would expect just one join (due to the request of columns from the
two tables),

since all joins are on foreign constrains referencing primary keys,

there are no filters on the other tables, so it is guaranteed that the
useless joins will always return exactly one answer.


I think what you're asking for is a lot more complex than it sounds, and 
incorrect.  The precise state of the data influences how many records come back 
(in this case, either 1 or 0), for example if b3_id is null you get zero rows.  
More to the point, you *instructed* the server to make the join.  There are 
strategies to make joins 'optional' at run time with respect to a query, but 
they are more complicated than simply withdrawing columns from the select list.

Stepping back a bit, the query needs to be planned before peeking at the data 
in the tables.  The planner is able to make assumptions against a statistical 
picture of the data but shouldn't be expected to actually inspect precise 
result data in order to generate a better plan.

merlin





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins

2016-07-01 Thread Sfiligoi, Igor
No, I don't want to use LEFT JOINS.
I want to use regular joins.

But (as mentioned in my other follow-up), all the fields are not null (was not 
in the original email, sorry), and are foreign keys, so it is guaranteed to 
always match.

The key part (in my mind) is that I am not filtering on any of the useless 
tables, and I am not returning any columns from those tables either.
Both is known at planning time.

Or is my logic still broken?

Thanks,
 Igor

-Original Message-
From: Kevin Grittner [mailto:kgri...@gmail.com] 
Sent: Friday, July 01, 2016 1:29 PM
To: Sfiligoi, Igor <igor.sfili...@ga.com>
Cc: pgsql-general@postgresql.org
Subject: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins

On Fri, Jul 1, 2016 at 12:17 PM, Sfiligoi, Igor <igor.sfili...@ga.com> wrote:

> We have a view that is very generic, and we noticed that PostgreSQL is 
> not very good at removing useless joins, which makes our queries very slow.
>
> We could change our code to avoid the view and write ad-hoc queries to 
> the underlying tables, but would prefer not to, if there is a way around it.

If it did not do the joins it could not provide the information you are 
actually asking to see.  Of course, there is a very good chance that what you 
are asking to see is not what you *want* to see.

test=# create table a (id int primary key, name varchar(128)); CREATE TABLE 
test=# create table b (id int primary key, name varchar(128)); CREATE TABLE 
test=# create table c (id int primary key,
test(# a_id int references a(id),
test(# b1_id int references b(id),
test(# b2_id int references b(id),
test(# b3_id int references b(id));
CREATE TABLE
test=#
test=# create view v_broken as
test-#   select
test-#c.id, c.a_id, c.b1_id, c.b2_id, c.b3_id, a.name a_name,
test-#b1.name b1_name, b2.name b2_name, b3.name b3_name
test-# from c, a, b b1, b b2, b b3
test-# where c.a_id=a.id and c.b1_id=b1.id and c.b2_id=b2.id and
c.b3_id=b3.id;
CREATE VIEW
test=#
test=# create view v as
test-#   select
test-#   c.id, c.a_id, c.b1_id, c.b2_id, c.b3_id, a.name a_name,
test-#   b1.name b1_name, b2.name b2_name, b3.name b3_name
test-#   from c
test-#   left join aon a.id = c.a_id
test-#   left join b b1 on b1.id = c.b1_id
test-#   left join b b2 on b2.id = c.b2_id
test-#   left join b b3 on b3.id = c.b3_id;
CREATE VIEW
test=#
test=# insert into a values (1, 'a1');
INSERT 0 1
test=# insert into b values (1, 'b1'), (2, 'b2'), (3, 'b3'); INSERT 0 3 test=# 
insert into c values (1, 1, 1, 2, 3), (2, 1, 1, 2, null); INSERT 0 2 test=# 
test=# select id, b1_name from v_broken;  id | b1_name
+-
  1 | b1
(1 row)

test=# explain analyze select id, b1_name from v_broken;
 QUERY PLAN

 Hash Join  (cost=63.40..183.90 rows=1700 width=278) (actual
time=0.049..0.052 rows=1 loops=1)
   Hash Cond: (c.b3_id = b3.id)
   ->  Hash Join  (cost=47.55..144.68 rows=1700 width=282) (actual
time=0.030..0.033 rows=2 loops=1)
 Hash Cond: (c.b2_id = b2.id)
 ->  Hash Join  (cost=31.70..105.45 rows=1700 width=286) (actual 
time=0.018..0.020 rows=2 loops=1)
   Hash Cond: (c.b1_id = b1.id)
   ->  Hash Join  (cost=15.85..66.22 rows=1700 width=16) (actual 
time=0.010..0.012 rows=2 loops=1)
 Hash Cond: (c.a_id = a.id)
 ->  Seq Scan on c  (cost=0.00..27.00 rows=1700
width=20) (actual time=0.001..0.001 rows=2 loops=1)
 ->  Hash  (cost=12.60..12.60 rows=260 width=4) (actual 
time=0.003..0.003 rows=1 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 9kB
   ->  Seq Scan on a  (cost=0.00..12.60
rows=260 width=4) (actual time=0.002..0.003 rows=1 loops=1)
   ->  Hash  (cost=12.60..12.60 rows=260 width=278) (actual 
time=0.005..0.005 rows=3 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on b b1  (cost=0.00..12.60 rows=260
width=278) (actual time=0.002..0.003 rows=3 loops=1)
 ->  Hash  (cost=12.60..12.60 rows=260 width=4) (actual
time=0.006..0.006 rows=3 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 9kB
   ->  Seq Scan on b b2  (cost=0.00..12.60 rows=260
width=4) (actual time=0.004..0.004 rows=3 loops=1)
   ->  Hash  (cost=12.60..12.60 rows=260 width=4) (actual
time=0.013..0.013 rows=3 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on b b3  (cost=0.00..12.60 rows=260 width=4) (actual 
time=0.008..0.009 rows=3 loops=1)  Planning time: 0.729 ms  Execution time: 
0.153 ms
(23 rows)

test=# select id, b1_name from v;
 id | b1_name
+

Re: [GENERAL] PSQL does not remove obvious useless joins

2016-07-01 Thread Kevin Grittner
On Fri, Jul 1, 2016 at 12:17 PM, Sfiligoi, Igor  wrote:

> We have a view that is very generic, and we noticed that PostgreSQL is not
> very good at removing useless joins, which makes our queries very slow.
>
> We could change our code to avoid the view and write ad-hoc queries to the
> underlying tables, but would prefer not to, if there is a way around it.

If it did not do the joins it could not provide the information you
are actually asking to see.  Of course, there is a very good chance
that what you are asking to see is not what you *want* to see.

test=# create table a (id int primary key, name varchar(128));
CREATE TABLE
test=# create table b (id int primary key, name varchar(128));
CREATE TABLE
test=# create table c (id int primary key,
test(# a_id int references a(id),
test(# b1_id int references b(id),
test(# b2_id int references b(id),
test(# b3_id int references b(id));
CREATE TABLE
test=#
test=# create view v_broken as
test-#   select
test-#c.id, c.a_id, c.b1_id, c.b2_id, c.b3_id, a.name a_name,
test-#b1.name b1_name, b2.name b2_name, b3.name b3_name
test-# from c, a, b b1, b b2, b b3
test-# where c.a_id=a.id and c.b1_id=b1.id and c.b2_id=b2.id and
c.b3_id=b3.id;
CREATE VIEW
test=#
test=# create view v as
test-#   select
test-#   c.id, c.a_id, c.b1_id, c.b2_id, c.b3_id, a.name a_name,
test-#   b1.name b1_name, b2.name b2_name, b3.name b3_name
test-#   from c
test-#   left join aon a.id = c.a_id
test-#   left join b b1 on b1.id = c.b1_id
test-#   left join b b2 on b2.id = c.b2_id
test-#   left join b b3 on b3.id = c.b3_id;
CREATE VIEW
test=#
test=# insert into a values (1, 'a1');
INSERT 0 1
test=# insert into b values (1, 'b1'), (2, 'b2'), (3, 'b3');
INSERT 0 3
test=# insert into c values (1, 1, 1, 2, 3), (2, 1, 1, 2, null);
INSERT 0 2
test=#
test=# select id, b1_name from v_broken;
 id | b1_name
+-
  1 | b1
(1 row)

test=# explain analyze select id, b1_name from v_broken;
 QUERY PLAN

 Hash Join  (cost=63.40..183.90 rows=1700 width=278) (actual
time=0.049..0.052 rows=1 loops=1)
   Hash Cond: (c.b3_id = b3.id)
   ->  Hash Join  (cost=47.55..144.68 rows=1700 width=282) (actual
time=0.030..0.033 rows=2 loops=1)
 Hash Cond: (c.b2_id = b2.id)
 ->  Hash Join  (cost=31.70..105.45 rows=1700 width=286)
(actual time=0.018..0.020 rows=2 loops=1)
   Hash Cond: (c.b1_id = b1.id)
   ->  Hash Join  (cost=15.85..66.22 rows=1700 width=16)
(actual time=0.010..0.012 rows=2 loops=1)
 Hash Cond: (c.a_id = a.id)
 ->  Seq Scan on c  (cost=0.00..27.00 rows=1700
width=20) (actual time=0.001..0.001 rows=2 loops=1)
 ->  Hash  (cost=12.60..12.60 rows=260 width=4)
(actual time=0.003..0.003 rows=1 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 9kB
   ->  Seq Scan on a  (cost=0.00..12.60
rows=260 width=4) (actual time=0.002..0.003 rows=1 loops=1)
   ->  Hash  (cost=12.60..12.60 rows=260 width=278)
(actual time=0.005..0.005 rows=3 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on b b1  (cost=0.00..12.60 rows=260
width=278) (actual time=0.002..0.003 rows=3 loops=1)
 ->  Hash  (cost=12.60..12.60 rows=260 width=4) (actual
time=0.006..0.006 rows=3 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 9kB
   ->  Seq Scan on b b2  (cost=0.00..12.60 rows=260
width=4) (actual time=0.004..0.004 rows=3 loops=1)
   ->  Hash  (cost=12.60..12.60 rows=260 width=4) (actual
time=0.013..0.013 rows=3 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on b b3  (cost=0.00..12.60 rows=260 width=4)
(actual time=0.008..0.009 rows=3 loops=1)
 Planning time: 0.729 ms
 Execution time: 0.153 ms
(23 rows)

test=# select id, b1_name from v;
 id | b1_name
+-
  1 | b1
  2 | b1
(2 rows)

test=# explain analyze select id, b1_name from v;
  QUERY PLAN
---
 Hash Left Join  (cost=15.85..66.22 rows=1700 width=278) (actual
time=0.017..0.018 rows=2 loops=1)
   Hash Cond: (c.b1_id = b1.id)
   ->  Seq Scan on c  (cost=0.00..27.00 rows=1700 width=20) (actual
time=0.005..0.005 rows=2 loops=1)
   ->  Hash  (cost=12.60..12.60 rows=260 width=278) (actual
time=0.006..0.006 rows=3 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on b b1  (cost=0.00..12.60 rows=260 width=278)
(actual time=0.002..0.003 rows=3 loops=1)
 Planning time: 0.177 ms
 Execution time: 0.044 ms
(8 

Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins

2016-07-01 Thread Sfiligoi, Igor
Sorry... the example was incomplete.

All the fields are defined as not-null.
So it is guaranteed to always match the join.

And PostgreSQL release notes claim that PGSQL can do at least partial join 
removal:
https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.0#Join_Removal 

I was hoping this use case would fit in.

Any suggestions?

Igor

-Original Message-
From: Merlin Moncure [mailto:mmonc...@gmail.com] 
Sent: Friday, July 01, 2016 12:42 PM
To: Sfiligoi, Igor <igor.sfili...@ga.com>
Cc: pgsql-general@postgresql.org
Subject: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins

On Fri, Jul 1, 2016 at 12:17 PM, Sfiligoi, Igor <igor.sfili...@ga.com> wrote:
> Hello.
>
> We have a view that is very generic, and we noticed that PostgreSQL is 
> not very good at removing useless joins, which makes our queries very slow.
>
> We could change our code to avoid the view and write ad-hoc queries to 
> the underlying tables, but would prefer not to, if there is a way around it.
>
> (BTW: We are currently using psql 9.4)
>
> Here is a simplified implementation:
>
> # create table a (id int primary key, name varchar(128));
>
> # create table b (id int primary key, name varchar(128));
>
> # create table c (id int primary key, a_id int references a(id), b1_id 
> int references b(id), b2_id int references b(id), b3_id int references 
> b(id));
>
> # create view v as select c.id, c.a_id, c.b1_id, c.b2_id , c.b3_id, 
> a.name a_name, b1.name b1_name, b2.name b2_name, b3.name b3_name from 
> c,  a, b b1, b b2, b b3 where c.a_id=a.id and c.b1_id=b1.id and 
> c.b2_id=b2.id and c.b3_id=b3.id;
>
> When I try to get just info from tables c and b1:
>
> # select id, b1_name from v
>
> it still does all the joins (see below).
>
> I would expect just one join (due to the request of columns from the 
> two tables),
>
> since all joins are on foreign constrains referencing primary keys,
>
> there are no filters on the other tables, so it is guaranteed that the 
> useless joins will always return exactly one answer.

I think what you're asking for is a lot more complex than it sounds, and 
incorrect.  The precise state of the data influences how many records come back 
(in this case, either 1 or 0), for example if b3_id is null you get zero rows.  
More to the point, you *instructed* the server to make the join.  There are 
strategies to make joins 'optional' at run time with respect to a query, but 
they are more complicated than simply withdrawing columns from the select list.

Stepping back a bit, the query needs to be planned before peeking at the data 
in the tables.  The planner is able to make assumptions against a statistical 
picture of the data but shouldn't be expected to actually inspect precise 
result data in order to generate a better plan.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PSQL does not remove obvious useless joins

2016-07-01 Thread Merlin Moncure
On Fri, Jul 1, 2016 at 12:17 PM, Sfiligoi, Igor  wrote:
> Hello.
>
> We have a view that is very generic, and we noticed that PostgreSQL is not
> very good at removing useless joins, which makes our queries very slow.
>
> We could change our code to avoid the view and write ad-hoc queries to the
> underlying tables, but would prefer not to, if there is a way around it.
>
> (BTW: We are currently using psql 9.4)
>
> Here is a simplified implementation:
>
> # create table a (id int primary key, name varchar(128));
>
> # create table b (id int primary key, name varchar(128));
>
> # create table c (id int primary key, a_id int references a(id), b1_id int
> references b(id), b2_id int references b(id), b3_id int references b(id));
>
> # create view v as select c.id, c.a_id, c.b1_id, c.b2_id , c.b3_id, a.name
> a_name, b1.name b1_name, b2.name b2_name, b3.name b3_name from c,  a, b b1,
> b b2, b b3 where c.a_id=a.id and c.b1_id=b1.id and c.b2_id=b2.id and
> c.b3_id=b3.id;
>
> When I try to get just info from tables c and b1:
>
> # select id, b1_name from v
>
> it still does all the joins (see below).
>
> I would expect just one join (due to the request of columns from the two
> tables),
>
> since all joins are on foreign constrains referencing primary keys,
>
> there are no filters on the other tables,
> so it is guaranteed that the useless joins will always return exactly one
> answer.

I think what you're asking for is a lot more complex than it sounds,
and incorrect.  The precise state of the data influences how many
records come back (in this case, either 1 or 0), for example if b3_id
is null you get zero rows.  More to the point, you *instructed* the
server to make the join.  There are strategies to make joins
'optional' at run time with respect to a query, but they are more
complicated than simply withdrawing columns from the select list.

Stepping back a bit, the query needs to be planned before peeking at
the data in the tables.  The planner is able to make assumptions
against a statistical picture of the data but shouldn't be expected to
actually inspect precise result data in order to generate a better
plan.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] PSQL does not remove obvious useless joins

2016-07-01 Thread Sfiligoi, Igor
Hello.

We have a view that is very generic, and we noticed that PostgreSQL is not very 
good at removing useless joins, which makes our queries very slow.
We could change our code to avoid the view and write ad-hoc queries to the 
underlying tables, but would prefer not to, if there is a way around it.
(BTW: We are currently using psql 9.4)

Here is a simplified implementation:
# create table a (id int primary key, name varchar(128));
# create table b (id int primary key, name varchar(128));
# create table c (id int primary key, a_id int references a(id), b1_id int 
references b(id), b2_id int references b(id), b3_id int references b(id));
# create view v as select c.id, c.a_id, c.b1_id, c.b2_id , c.b3_id, a.name 
a_name, b1.name b1_name, b2.name b2_name, b3.name b3_name from c,  a, b b1, b 
b2, b b3 where c.a_id=a.id and c.b1_id=b1.id and c.b2_id=b2.id and 
c.b3_id=b3.id;

When I try to get just info from tables c and b1:
# select id, b1_name from v
it still does all the joins (see below).

I would expect just one join (due to the request of columns from the two 
tables),
since all joins are on foreign constrains referencing primary keys,
there are no filters on the other tables,
so it is guaranteed that the useless joins will always return exactly one 
answer.

Is there a way to tweak the PostgreSQL optimizer to do the proper join removal 
during the planning?
Perhaps tweaking somehow either our schema or our queries (while still keeping 
a generic view)?

Thank you,
  Igor Sfiligoi



# explain select id, b1_name from v;
   QUERY PLAN

Nested Loop  (cost=1.02..5.45 rows=1 width=6)
   Join Filter: (c.b3_id = b3.id)
   ->  Nested Loop  (cost=1.02..4.32 rows=1 width=10)
 Join Filter: (c.a_id = a.id)
 ->  Nested Loop  (cost=1.02..3.25 rows=1 width=14)
   Join Filter: (c.b2_id = b2.id)
   ->  Hash Join  (cost=1.02..2.12 rows=1 width=18)
 Hash Cond: (b1.id = c.b1_id)
 ->  Seq Scan on b b1  (cost=0.00..1.06 rows=6 width=6)
 ->  Hash  (cost=1.01..1.01 rows=1 width=20)
   ->  Seq Scan on c  (cost=0.00..1.01 rows=1 width=20)
   ->  Seq Scan on b b2  (cost=0.00..1.06 rows=6 width=4)
 ->  Seq Scan on a  (cost=0.00..1.03 rows=3 width=4)
   ->  Seq Scan on b b3  (cost=0.00..1.06 rows=6 width=4)
(14 rows)

PS: The tables were very small in this example, but are quite big in the 
production environment.