Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins
The best that I can do right now is provide the explain of the three variants (see below). The use of a left join did indeed remove the useless joins, but the selected plan is just terrible. Thanks, Igor -Original Message- From: Kevin Grittner [mailto:kgri...@gmail.com] Sent: Saturday, July 02, 2016 6:28 AM To: Sfiligoi, Igor <igor.sfili...@ga.com> Cc: pgsql-general@postgresql.org Subject: Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins On Fri, Jul 1, 2016 at 7:15 PM, Sfiligoi, Igor <igor.sfili...@ga.com> wrote: > OK. Will change our query generation code to not use the view. > (I have tried the LEFT JOIN approach, but it just does not seem to > perform.) > PS: Here are the numbers for the real production query (will not provide > details): > Original query: 300s > Query on a manually optimized view:1ms > Using left joins:200s Please show a self-contained case (i.e., one that can be run against an empty database to demonstrate the problem). - Original view, two unused joins (data_info.true_arrow_id = true_dart.arrow_id && data_info.acl_arrow_id = acl_dart.arrow_id) - Merge Join (cost=121449.22..11705013.57 rows=890 width=63) (actual time=326791.858..365059.117 rows=1 loops=1) Merge Cond: (locn_info.rock_person_id = bird_rsrc_physical.rock_person_id) -> Sort (cost=1.03..1.03 rows=2 width=4) (actual time=0.018..0.020 rows=2 loops=1) Sort Key: locn_info.rock_person_id Sort Method: quicksort Memory: 25kB -> Seq Scan on bird_locn_info locn_info (cost=0.00..1.02 rows=2 width=4) (actual time=0.003..0.003 rows=2 loops=1) -> Materialize (cost=121448.19..14015467.54 rows=890 width=71) (actual time=326791.835..365059.092 rows=1 loops=1) -> Merge Join (cost=121448.19..14015465.32 rows=890 width=71) (actual time=326791.833..365059.089 rows=1 loops=1) Merge Cond: (bird_rsrc_physical.rock_person_id = bird_person_bike.person_id) -> Nested Loop (cost=121444.20..16786761.58 rows=890 width=67) (actual time=326791.779..365059.033 rows=1 loops=1) Join Filter: (bird_data_silk.rsrc_id = bird_rsrc_physical.phy_rsrc_id) -> Nested Loop (cost=0.13..15.68 rows=1 width=8) (actual time=0.025..0.035 rows=1 loops=1) Join Filter: (bird_rsrc_physical.lock_rock_person_id = lock_info.rock_person_id) Rows Removed by Join Filter: 1 -> Nested Loop (cost=0.13..14.63 rows=1 width=12) (actual time=0.022..0.031 rows=1 loops=1) Join Filter: (bird_rsrc_physical.rsrc_class_id = bird_rsrc_class.rsrc_class_id) Rows Removed by Join Filter: 7 -> Nested Loop (cost=0.13..13.45 rows=1 width=16) (actual time=0.017..0.025 rows=1 loops=1) Join Filter: ((bird_rsrc_physical.rsrc_type_id)::text = (bird_rsrc_type.rsrc_type_id)::text) Rows Removed by Join Filter: 11 -> Index Scan using fidx_1_rsrc_physical on bird_rsrc_physical (cost=0.13..12.18 rows=1 width=24) (actual time=0.009..0.013 rows=1 loops=1) Filter: ((rsrc_name)::text = 'data'::text) Rows Removed by Filter: 2 -> Seq Scan on bird_rsrc_type (cost=0.00..1.12 rows=12 width=8) (actual time=0.002..0.004 rows=12 loops=1) -> Seq Scan on bird_rsrc_class (cost=0.00..1.08 rows=8 width=4) (actual time=0.002..0.003 rows=8 loops=1) -> Seq Scan on bird_locn_info lock_info (cost=0.00..1.02 rows=2 width=4) (actual time=0.001..0.002 rows=2 loops=1) -> Hash Join (cost=121444.07..16786712.53 rows=2670 width=67) (actual time=326791.750..365058.993 rows=1 loops=1) Hash Cond: (data_info.true_data_id = bird_data_silk.data_id) -> Hash Join (cost=118019.92..16144855.88 rows=102144928 width=43) (actual time=1945.934..346470.487 rows=102020209 loops=1) Hash Cond: (data_info.sheet_id = bird_sheet.sheet_id) -> Hash Join (cost=118018.36..14740361.56 rows=102144928 width=47) (actual time=1945.911..310945.806 rows=102020209 loops=1)
Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins
On Fri, Jul 1, 2016 at 7:15 PM, Sfiligoi, Igorwrote: > OK. Will change our query generation code to not use the view. > (I have tried the LEFT JOIN approach, but it just does not seem to perform.) > PS: Here are the numbers for the real production query (will not provide > details): > Original query: 300s > Query on a manually optimized view:1ms > Using left joins:200s Please show a self-contained case (i.e., one that can be run against an empty database to demonstrate the problem). You might start from this one and modify it until you see the problem that you describe: create table a (id int primary key, name varchar(128)); create table b (id int primary key, name varchar(128)); create table c (id int primary key, a_id int not null references a(id), b1_id int not null references b(id), b2_id int not null references b(id), b3_id int not null references b(id)); create view v as select c.id, c.a_id, c.b1_id, c.b2_id, c.b3_id, a.name a_name, b1.name b1_name, b2.name b2_name, b3.name b3_name from c left join aon a.id = c.a_id left join b b1 on b1.id = c.b1_id left join b b2 on b2.id = c.b2_id left join b b3 on b3.id = c.b3_id; insert into a values (1, 'a1'); insert into b values (1, 'b1'), (2, 'b2'), (3, 'b3'); insert into c values (1, 1, 1, 2, 3); vacuum analyze a; vacuum analyze b; vacuum analyze c; select id, b1_name from v; explain (analyze, buffers, verbose) select id, b1_name from v; I'm seeing the unreferenced tables pruned from the plan, and a 1ms execution time for the select from the view. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins
OK. Will change our query generation code to not use the view. (I have tried the LEFT JOIN approach, but it just does not seem to perform.) Thanks, Igor PS: Here are the numbers for the real production query (will not provide details): Original query: 300s Query on a manually optimized view: 1ms Using left joins:200s I would have gladly paid a few ms in additional planning time! -Original Message- From: Kevin Grittner [mailto:kgri...@gmail.com] Sent: Friday, July 01, 2016 1:57 PM To: Sfiligoi, Igor <igor.sfili...@ga.com> Cc: pgsql-general@postgresql.org Subject: Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins On Fri, Jul 1, 2016 at 3:33 PM, Sfiligoi, Igor <igor.sfili...@ga.com> wrote: > No, I don't want to use LEFT JOINS. > I want to use regular joins. > > But (as mentioned in my other follow-up), all the fields are not null > (was not in the original email, sorry), and are foreign keys, so it is > guaranteed to always match. In that case there is no difference between the inner join and the left join except that the left join currently supports and optimization that makes your query faster if the optional table is not reference. Whether you want to take advantage of that is up to you. > The key part (in my mind) is that I am not filtering on any of the > useless tables, and I am not returning any columns from those tables > either. > Both is known at planning time. The fact that something can be determined at planning time doesn't mean that checking for it is free. > is my logic still broken? Your logic seems OK with the table definitions you are now showing. Whether we ever decide it is OK to omit tables which use an inner join rather than only considering omitting them when the query specifies that the join is optional is anybody's guess. If it is important enough to you you could submit a patch or fund development of such a feature; but since it would add at least some small amount of planning time to every inner join just to avoid specifying that the join is an optional one when writing the query, it seems to me unlikely to be accepted. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins
On Fri, Jul 1, 2016 at 3:33 PM, Sfiligoi, Igorwrote: > No, I don't want to use LEFT JOINS. > I want to use regular joins. > > But (as mentioned in my other follow-up), all the fields are not > null (was not in the original email, sorry), and are foreign > keys, so it is guaranteed to always match. In that case there is no difference between the inner join and the left join except that the left join currently supports and optimization that makes your query faster if the optional table is not reference. Whether you want to take advantage of that is up to you. > The key part (in my mind) is that I am not filtering on any of > the useless tables, and I am not returning any columns from those > tables either. > Both is known at planning time. The fact that something can be determined at planning time doesn't mean that checking for it is free. > is my logic still broken? Your logic seems OK with the table definitions you are now showing. Whether we ever decide it is OK to omit tables which use an inner join rather than only considering omitting them when the query specifies that the join is optional is anybody's guess. If it is important enough to you you could submit a patch or fund development of such a feature; but since it would add at least some small amount of planning time to every inner join just to avoid specifying that the join is an optional one when writing the query, it seems to me unlikely to be accepted. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins
Nope, no difference how I express the joins: create view v1 as select c.id, c.a_id, c.b1_id, c.b2_id , c.b3_id, a.name a_name, b1.name b1_name, b2.name b2_name, b3.name b3_name from c join a on (c.a_id=a.id) join b b1 on (c.b1_id=b1.id) join b b2 on (c.b2_id=b2.id) join b b3 on (c.b3_id=b3.id); # explain select id, b1_name from v1; QUERY PLAN Nested Loop (cost=1.02..5.47 rows=1 width=7) Join Filter: (c.b3_id = b3.id) -> Nested Loop (cost=1.02..4.34 rows=1 width=11) Join Filter: (c.a_id = a.id) -> Nested Loop (cost=1.02..3.25 rows=1 width=15) Join Filter: (c.b2_id = b2.id) -> Hash Join (cost=1.02..2.12 rows=1 width=19) Hash Cond: (b1.id = c.b1_id) -> Seq Scan on b b1 (cost=0.00..1.06 rows=6 width=7) -> Hash (cost=1.01..1.01 rows=1 width=20) -> Seq Scan on c (cost=0.00..1.01 rows=1 width=20) -> Seq Scan on b b2 (cost=0.00..1.06 rows=6 width=4) -> Seq Scan on a (cost=0.00..1.04 rows=4 width=4) -> Seq Scan on b b3 (cost=0.00..1.06 rows=6 width=4) (14 rows) Igor PS: Here are the updated table definitions: create table a (id int not null primary key, name varchar(128)); create table b (id int not null primary key, name varchar(128)); create table c (id int not null primary key, a_id int not null references a(id), b1_id int not null references b(id), b2_id int not null references b(id), b3_id int not null references b(id)); -Original Message- From: Adrian Klaver [mailto:adrian.kla...@aklaver.com] Sent: Friday, July 01, 2016 1:38 PM To: Sfiligoi, Igor <igor.sfili...@ga.com>; Merlin Moncure <mmonc...@gmail.com> Cc: pgsql-general@postgresql.org Subject: Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins On 07/01/2016 01:28 PM, Sfiligoi, Igor wrote: > Sorry... the example was incomplete. > > All the fields are defined as not-null. > So it is guaranteed to always match the join. > > And PostgreSQL release notes claim that PGSQL can do at least partial join > removal: > https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.0#Join_Rem > oval Those examples use explicit joins, so you might try that in your view definition. > > I was hoping this use case would fit in. > > Any suggestions? > > Igor > > -Original Message- > From: Merlin Moncure [mailto:mmonc...@gmail.com] > Sent: Friday, July 01, 2016 12:42 PM > To: Sfiligoi, Igor <igor.sfili...@ga.com> > Cc: pgsql-general@postgresql.org > Subject: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious > useless joins > > On Fri, Jul 1, 2016 at 12:17 PM, Sfiligoi, Igor <igor.sfili...@ga.com> wrote: >> Hello. >> >> We have a view that is very generic, and we noticed that PostgreSQL >> is not very good at removing useless joins, which makes our queries very >> slow. >> >> We could change our code to avoid the view and write ad-hoc queries >> to the underlying tables, but would prefer not to, if there is a way around >> it. >> >> (BTW: We are currently using psql 9.4) >> >> Here is a simplified implementation: >> >> # create table a (id int primary key, name varchar(128)); >> >> # create table b (id int primary key, name varchar(128)); >> >> # create table c (id int primary key, a_id int references a(id), >> b1_id int references b(id), b2_id int references b(id), b3_id int >> references b(id)); >> >> # create view v as select c.id, c.a_id, c.b1_id, c.b2_id , c.b3_id, >> a.name a_name, b1.name b1_name, b2.name b2_name, b3.name b3_name from >> c, a, b b1, b b2, b b3 where c.a_id=a.id and c.b1_id=b1.id and >> c.b2_id=b2.id and c.b3_id=b3.id; >> >> When I try to get just info from tables c and b1: >> >> # select id, b1_name from v >> >> it still does all the joins (see below). >> >> I would expect just one join (due to the request of columns from the >> two tables), >> >> since all joins are on foreign constrains referencing primary keys, >> >> there are no filters on the other tables, so it is guaranteed that >> the useless joins will always return exactly one answer. > > I think what you're asking for is a lot more complex than it sounds, and > incorrect. The precise state of the data influences how many records come > back (in this case, either 1 or 0), for example if b3_id is null you get zero > rows. More to the point, you *instructed* the server to make the join. > There are st
Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins
On 07/01/2016 01:28 PM, Sfiligoi, Igor wrote: Sorry... the example was incomplete. All the fields are defined as not-null. So it is guaranteed to always match the join. And PostgreSQL release notes claim that PGSQL can do at least partial join removal: https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.0#Join_Removal Those examples use explicit joins, so you might try that in your view definition. I was hoping this use case would fit in. Any suggestions? Igor -Original Message- From: Merlin Moncure [mailto:mmonc...@gmail.com] Sent: Friday, July 01, 2016 12:42 PM To: Sfiligoi, Igor <igor.sfili...@ga.com> Cc: pgsql-general@postgresql.org Subject: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins On Fri, Jul 1, 2016 at 12:17 PM, Sfiligoi, Igor <igor.sfili...@ga.com> wrote: Hello. We have a view that is very generic, and we noticed that PostgreSQL is not very good at removing useless joins, which makes our queries very slow. We could change our code to avoid the view and write ad-hoc queries to the underlying tables, but would prefer not to, if there is a way around it. (BTW: We are currently using psql 9.4) Here is a simplified implementation: # create table a (id int primary key, name varchar(128)); # create table b (id int primary key, name varchar(128)); # create table c (id int primary key, a_id int references a(id), b1_id int references b(id), b2_id int references b(id), b3_id int references b(id)); # create view v as select c.id, c.a_id, c.b1_id, c.b2_id , c.b3_id, a.name a_name, b1.name b1_name, b2.name b2_name, b3.name b3_name from c, a, b b1, b b2, b b3 where c.a_id=a.id and c.b1_id=b1.id and c.b2_id=b2.id and c.b3_id=b3.id; When I try to get just info from tables c and b1: # select id, b1_name from v it still does all the joins (see below). I would expect just one join (due to the request of columns from the two tables), since all joins are on foreign constrains referencing primary keys, there are no filters on the other tables, so it is guaranteed that the useless joins will always return exactly one answer. I think what you're asking for is a lot more complex than it sounds, and incorrect. The precise state of the data influences how many records come back (in this case, either 1 or 0), for example if b3_id is null you get zero rows. More to the point, you *instructed* the server to make the join. There are strategies to make joins 'optional' at run time with respect to a query, but they are more complicated than simply withdrawing columns from the select list. Stepping back a bit, the query needs to be planned before peeking at the data in the tables. The planner is able to make assumptions against a statistical picture of the data but shouldn't be expected to actually inspect precise result data in order to generate a better plan. merlin -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins
No, I don't want to use LEFT JOINS. I want to use regular joins. But (as mentioned in my other follow-up), all the fields are not null (was not in the original email, sorry), and are foreign keys, so it is guaranteed to always match. The key part (in my mind) is that I am not filtering on any of the useless tables, and I am not returning any columns from those tables either. Both is known at planning time. Or is my logic still broken? Thanks, Igor -Original Message- From: Kevin Grittner [mailto:kgri...@gmail.com] Sent: Friday, July 01, 2016 1:29 PM To: Sfiligoi, Igor <igor.sfili...@ga.com> Cc: pgsql-general@postgresql.org Subject: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins On Fri, Jul 1, 2016 at 12:17 PM, Sfiligoi, Igor <igor.sfili...@ga.com> wrote: > We have a view that is very generic, and we noticed that PostgreSQL is > not very good at removing useless joins, which makes our queries very slow. > > We could change our code to avoid the view and write ad-hoc queries to > the underlying tables, but would prefer not to, if there is a way around it. If it did not do the joins it could not provide the information you are actually asking to see. Of course, there is a very good chance that what you are asking to see is not what you *want* to see. test=# create table a (id int primary key, name varchar(128)); CREATE TABLE test=# create table b (id int primary key, name varchar(128)); CREATE TABLE test=# create table c (id int primary key, test(# a_id int references a(id), test(# b1_id int references b(id), test(# b2_id int references b(id), test(# b3_id int references b(id)); CREATE TABLE test=# test=# create view v_broken as test-# select test-#c.id, c.a_id, c.b1_id, c.b2_id, c.b3_id, a.name a_name, test-#b1.name b1_name, b2.name b2_name, b3.name b3_name test-# from c, a, b b1, b b2, b b3 test-# where c.a_id=a.id and c.b1_id=b1.id and c.b2_id=b2.id and c.b3_id=b3.id; CREATE VIEW test=# test=# create view v as test-# select test-# c.id, c.a_id, c.b1_id, c.b2_id, c.b3_id, a.name a_name, test-# b1.name b1_name, b2.name b2_name, b3.name b3_name test-# from c test-# left join aon a.id = c.a_id test-# left join b b1 on b1.id = c.b1_id test-# left join b b2 on b2.id = c.b2_id test-# left join b b3 on b3.id = c.b3_id; CREATE VIEW test=# test=# insert into a values (1, 'a1'); INSERT 0 1 test=# insert into b values (1, 'b1'), (2, 'b2'), (3, 'b3'); INSERT 0 3 test=# insert into c values (1, 1, 1, 2, 3), (2, 1, 1, 2, null); INSERT 0 2 test=# test=# select id, b1_name from v_broken; id | b1_name +- 1 | b1 (1 row) test=# explain analyze select id, b1_name from v_broken; QUERY PLAN Hash Join (cost=63.40..183.90 rows=1700 width=278) (actual time=0.049..0.052 rows=1 loops=1) Hash Cond: (c.b3_id = b3.id) -> Hash Join (cost=47.55..144.68 rows=1700 width=282) (actual time=0.030..0.033 rows=2 loops=1) Hash Cond: (c.b2_id = b2.id) -> Hash Join (cost=31.70..105.45 rows=1700 width=286) (actual time=0.018..0.020 rows=2 loops=1) Hash Cond: (c.b1_id = b1.id) -> Hash Join (cost=15.85..66.22 rows=1700 width=16) (actual time=0.010..0.012 rows=2 loops=1) Hash Cond: (c.a_id = a.id) -> Seq Scan on c (cost=0.00..27.00 rows=1700 width=20) (actual time=0.001..0.001 rows=2 loops=1) -> Hash (cost=12.60..12.60 rows=260 width=4) (actual time=0.003..0.003 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on a (cost=0.00..12.60 rows=260 width=4) (actual time=0.002..0.003 rows=1 loops=1) -> Hash (cost=12.60..12.60 rows=260 width=278) (actual time=0.005..0.005 rows=3 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on b b1 (cost=0.00..12.60 rows=260 width=278) (actual time=0.002..0.003 rows=3 loops=1) -> Hash (cost=12.60..12.60 rows=260 width=4) (actual time=0.006..0.006 rows=3 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on b b2 (cost=0.00..12.60 rows=260 width=4) (actual time=0.004..0.004 rows=3 loops=1) -> Hash (cost=12.60..12.60 rows=260 width=4) (actual time=0.013..0.013 rows=3 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on b b3 (cost=0.00..12.60 rows=260 width=4) (actual time=0.008..0.009 rows=3 loops=1) Planning time: 0.729 ms Execution time: 0.153 ms (23 rows) test=# select id, b1_name from v; id | b1_name +
Re: [GENERAL] PSQL does not remove obvious useless joins
On Fri, Jul 1, 2016 at 12:17 PM, Sfiligoi, Igorwrote: > We have a view that is very generic, and we noticed that PostgreSQL is not > very good at removing useless joins, which makes our queries very slow. > > We could change our code to avoid the view and write ad-hoc queries to the > underlying tables, but would prefer not to, if there is a way around it. If it did not do the joins it could not provide the information you are actually asking to see. Of course, there is a very good chance that what you are asking to see is not what you *want* to see. test=# create table a (id int primary key, name varchar(128)); CREATE TABLE test=# create table b (id int primary key, name varchar(128)); CREATE TABLE test=# create table c (id int primary key, test(# a_id int references a(id), test(# b1_id int references b(id), test(# b2_id int references b(id), test(# b3_id int references b(id)); CREATE TABLE test=# test=# create view v_broken as test-# select test-#c.id, c.a_id, c.b1_id, c.b2_id, c.b3_id, a.name a_name, test-#b1.name b1_name, b2.name b2_name, b3.name b3_name test-# from c, a, b b1, b b2, b b3 test-# where c.a_id=a.id and c.b1_id=b1.id and c.b2_id=b2.id and c.b3_id=b3.id; CREATE VIEW test=# test=# create view v as test-# select test-# c.id, c.a_id, c.b1_id, c.b2_id, c.b3_id, a.name a_name, test-# b1.name b1_name, b2.name b2_name, b3.name b3_name test-# from c test-# left join aon a.id = c.a_id test-# left join b b1 on b1.id = c.b1_id test-# left join b b2 on b2.id = c.b2_id test-# left join b b3 on b3.id = c.b3_id; CREATE VIEW test=# test=# insert into a values (1, 'a1'); INSERT 0 1 test=# insert into b values (1, 'b1'), (2, 'b2'), (3, 'b3'); INSERT 0 3 test=# insert into c values (1, 1, 1, 2, 3), (2, 1, 1, 2, null); INSERT 0 2 test=# test=# select id, b1_name from v_broken; id | b1_name +- 1 | b1 (1 row) test=# explain analyze select id, b1_name from v_broken; QUERY PLAN Hash Join (cost=63.40..183.90 rows=1700 width=278) (actual time=0.049..0.052 rows=1 loops=1) Hash Cond: (c.b3_id = b3.id) -> Hash Join (cost=47.55..144.68 rows=1700 width=282) (actual time=0.030..0.033 rows=2 loops=1) Hash Cond: (c.b2_id = b2.id) -> Hash Join (cost=31.70..105.45 rows=1700 width=286) (actual time=0.018..0.020 rows=2 loops=1) Hash Cond: (c.b1_id = b1.id) -> Hash Join (cost=15.85..66.22 rows=1700 width=16) (actual time=0.010..0.012 rows=2 loops=1) Hash Cond: (c.a_id = a.id) -> Seq Scan on c (cost=0.00..27.00 rows=1700 width=20) (actual time=0.001..0.001 rows=2 loops=1) -> Hash (cost=12.60..12.60 rows=260 width=4) (actual time=0.003..0.003 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on a (cost=0.00..12.60 rows=260 width=4) (actual time=0.002..0.003 rows=1 loops=1) -> Hash (cost=12.60..12.60 rows=260 width=278) (actual time=0.005..0.005 rows=3 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on b b1 (cost=0.00..12.60 rows=260 width=278) (actual time=0.002..0.003 rows=3 loops=1) -> Hash (cost=12.60..12.60 rows=260 width=4) (actual time=0.006..0.006 rows=3 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on b b2 (cost=0.00..12.60 rows=260 width=4) (actual time=0.004..0.004 rows=3 loops=1) -> Hash (cost=12.60..12.60 rows=260 width=4) (actual time=0.013..0.013 rows=3 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on b b3 (cost=0.00..12.60 rows=260 width=4) (actual time=0.008..0.009 rows=3 loops=1) Planning time: 0.729 ms Execution time: 0.153 ms (23 rows) test=# select id, b1_name from v; id | b1_name +- 1 | b1 2 | b1 (2 rows) test=# explain analyze select id, b1_name from v; QUERY PLAN --- Hash Left Join (cost=15.85..66.22 rows=1700 width=278) (actual time=0.017..0.018 rows=2 loops=1) Hash Cond: (c.b1_id = b1.id) -> Seq Scan on c (cost=0.00..27.00 rows=1700 width=20) (actual time=0.005..0.005 rows=2 loops=1) -> Hash (cost=12.60..12.60 rows=260 width=278) (actual time=0.006..0.006 rows=3 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on b b1 (cost=0.00..12.60 rows=260 width=278) (actual time=0.002..0.003 rows=3 loops=1) Planning time: 0.177 ms Execution time: 0.044 ms (8
Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins
Sorry... the example was incomplete. All the fields are defined as not-null. So it is guaranteed to always match the join. And PostgreSQL release notes claim that PGSQL can do at least partial join removal: https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.0#Join_Removal I was hoping this use case would fit in. Any suggestions? Igor -Original Message- From: Merlin Moncure [mailto:mmonc...@gmail.com] Sent: Friday, July 01, 2016 12:42 PM To: Sfiligoi, Igor <igor.sfili...@ga.com> Cc: pgsql-general@postgresql.org Subject: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins On Fri, Jul 1, 2016 at 12:17 PM, Sfiligoi, Igor <igor.sfili...@ga.com> wrote: > Hello. > > We have a view that is very generic, and we noticed that PostgreSQL is > not very good at removing useless joins, which makes our queries very slow. > > We could change our code to avoid the view and write ad-hoc queries to > the underlying tables, but would prefer not to, if there is a way around it. > > (BTW: We are currently using psql 9.4) > > Here is a simplified implementation: > > # create table a (id int primary key, name varchar(128)); > > # create table b (id int primary key, name varchar(128)); > > # create table c (id int primary key, a_id int references a(id), b1_id > int references b(id), b2_id int references b(id), b3_id int references > b(id)); > > # create view v as select c.id, c.a_id, c.b1_id, c.b2_id , c.b3_id, > a.name a_name, b1.name b1_name, b2.name b2_name, b3.name b3_name from > c, a, b b1, b b2, b b3 where c.a_id=a.id and c.b1_id=b1.id and > c.b2_id=b2.id and c.b3_id=b3.id; > > When I try to get just info from tables c and b1: > > # select id, b1_name from v > > it still does all the joins (see below). > > I would expect just one join (due to the request of columns from the > two tables), > > since all joins are on foreign constrains referencing primary keys, > > there are no filters on the other tables, so it is guaranteed that the > useless joins will always return exactly one answer. I think what you're asking for is a lot more complex than it sounds, and incorrect. The precise state of the data influences how many records come back (in this case, either 1 or 0), for example if b3_id is null you get zero rows. More to the point, you *instructed* the server to make the join. There are strategies to make joins 'optional' at run time with respect to a query, but they are more complicated than simply withdrawing columns from the select list. Stepping back a bit, the query needs to be planned before peeking at the data in the tables. The planner is able to make assumptions against a statistical picture of the data but shouldn't be expected to actually inspect precise result data in order to generate a better plan. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PSQL does not remove obvious useless joins
On Fri, Jul 1, 2016 at 12:17 PM, Sfiligoi, Igorwrote: > Hello. > > We have a view that is very generic, and we noticed that PostgreSQL is not > very good at removing useless joins, which makes our queries very slow. > > We could change our code to avoid the view and write ad-hoc queries to the > underlying tables, but would prefer not to, if there is a way around it. > > (BTW: We are currently using psql 9.4) > > Here is a simplified implementation: > > # create table a (id int primary key, name varchar(128)); > > # create table b (id int primary key, name varchar(128)); > > # create table c (id int primary key, a_id int references a(id), b1_id int > references b(id), b2_id int references b(id), b3_id int references b(id)); > > # create view v as select c.id, c.a_id, c.b1_id, c.b2_id , c.b3_id, a.name > a_name, b1.name b1_name, b2.name b2_name, b3.name b3_name from c, a, b b1, > b b2, b b3 where c.a_id=a.id and c.b1_id=b1.id and c.b2_id=b2.id and > c.b3_id=b3.id; > > When I try to get just info from tables c and b1: > > # select id, b1_name from v > > it still does all the joins (see below). > > I would expect just one join (due to the request of columns from the two > tables), > > since all joins are on foreign constrains referencing primary keys, > > there are no filters on the other tables, > so it is guaranteed that the useless joins will always return exactly one > answer. I think what you're asking for is a lot more complex than it sounds, and incorrect. The precise state of the data influences how many records come back (in this case, either 1 or 0), for example if b3_id is null you get zero rows. More to the point, you *instructed* the server to make the join. There are strategies to make joins 'optional' at run time with respect to a query, but they are more complicated than simply withdrawing columns from the select list. Stepping back a bit, the query needs to be planned before peeking at the data in the tables. The planner is able to make assumptions against a statistical picture of the data but shouldn't be expected to actually inspect precise result data in order to generate a better plan. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] PSQL does not remove obvious useless joins
Hello. We have a view that is very generic, and we noticed that PostgreSQL is not very good at removing useless joins, which makes our queries very slow. We could change our code to avoid the view and write ad-hoc queries to the underlying tables, but would prefer not to, if there is a way around it. (BTW: We are currently using psql 9.4) Here is a simplified implementation: # create table a (id int primary key, name varchar(128)); # create table b (id int primary key, name varchar(128)); # create table c (id int primary key, a_id int references a(id), b1_id int references b(id), b2_id int references b(id), b3_id int references b(id)); # create view v as select c.id, c.a_id, c.b1_id, c.b2_id , c.b3_id, a.name a_name, b1.name b1_name, b2.name b2_name, b3.name b3_name from c, a, b b1, b b2, b b3 where c.a_id=a.id and c.b1_id=b1.id and c.b2_id=b2.id and c.b3_id=b3.id; When I try to get just info from tables c and b1: # select id, b1_name from v it still does all the joins (see below). I would expect just one join (due to the request of columns from the two tables), since all joins are on foreign constrains referencing primary keys, there are no filters on the other tables, so it is guaranteed that the useless joins will always return exactly one answer. Is there a way to tweak the PostgreSQL optimizer to do the proper join removal during the planning? Perhaps tweaking somehow either our schema or our queries (while still keeping a generic view)? Thank you, Igor Sfiligoi # explain select id, b1_name from v; QUERY PLAN Nested Loop (cost=1.02..5.45 rows=1 width=6) Join Filter: (c.b3_id = b3.id) -> Nested Loop (cost=1.02..4.32 rows=1 width=10) Join Filter: (c.a_id = a.id) -> Nested Loop (cost=1.02..3.25 rows=1 width=14) Join Filter: (c.b2_id = b2.id) -> Hash Join (cost=1.02..2.12 rows=1 width=18) Hash Cond: (b1.id = c.b1_id) -> Seq Scan on b b1 (cost=0.00..1.06 rows=6 width=6) -> Hash (cost=1.01..1.01 rows=1 width=20) -> Seq Scan on c (cost=0.00..1.01 rows=1 width=20) -> Seq Scan on b b2 (cost=0.00..1.06 rows=6 width=4) -> Seq Scan on a (cost=0.00..1.03 rows=3 width=4) -> Seq Scan on b b3 (cost=0.00..1.06 rows=6 width=4) (14 rows) PS: The tables were very small in this example, but are quite big in the production environment.