EXPLAIN ANALYZE didn't give result even after three hours.
-----Original Message----- From: Vivekanand Joshi [mailto:vjo...@zetainteractive.com] Sent: Tuesday, March 17, 2015 1:11 PM To: 'Jim Nasby'; 'Tomas Vondra'; 'Scott Marlowe'; 'Varadharajan Mukundan' Cc: 'pgsql-performance@postgresql.org' Subject: RE: [PERFORM] Performance issues Hi Guys, Next level of query is following: If this works, I guess 90% of the problem will be solved. SELECT COUNT(DISTINCT TARGET_ID) FROM S_V_F_PROMOTION_HISTORY_EMAIL PH INNER JOIN S_V_D_CAMPAIGN_HIERARCHY CH ON PH.TOUCHPOINT_EXECUTION_ID = CH.TOUCHPOINT_EXECUTION_ID WHERE 1=1 AND SEND_DT >= '2014-03-13' AND SEND_DT <= '2015-03-14' In this query, I am joining two views which were made earlier with CTEs. I have replaced the CTE's with subqueries. The view were giving me output in around 5-10 minutes and now I am getting the same result in around 3-4 seconds. But when I executed the query written above, I am again stuck. I am attaching the query plan as well the link. http://explain.depesz.com/s/REeu I can see most of the time is spending inside a nested loop and total costs comes out be cost=338203.81..338203.82. How to take care of this? I need to run this query in a report so I cannot create a table like select * from views and then join the table. If I do that I am getting the answer of whole big query in some 6-7 seconds. But that is not feasible. A report (Jasper can have only one single (big/small query). Let me know if you need any other information. Thanks a ton! Vivek -----Original Message----- From: Jim Nasby [mailto:jim.na...@bluetreble.com] Sent: Tuesday, March 17, 2015 5:36 AM To: Tomas Vondra; vjo...@zetainteractive.com; Scott Marlowe; Varadharajan Mukundan Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Performance issues On 3/16/15 3:59 PM, Tomas Vondra wrote: > On 16.3.2015 20:43, Jim Nasby wrote: >> On 3/13/15 7:12 PM, Tomas Vondra wrote: >>> (4) I suspect many of the relations referenced in the views are not >>> actually needed in the query, i.e. the join is performed but >>> then it's just discarded because those columns are not used. >>> Try to simplify the views as much has possible - remove all the >>> tables that are not really necessary to run the query. If two >>> queries need different tables, maybe defining two views is >>> a better approach. >> >> A better alternative with multi-purpose views is to use an outer join >> instead of an inner join. With an outer join if you ultimately don't >> refer to any of the columns in a particular table Postgres will >> remove the table from the query completely. > > Really? Because a quick test suggests otherwise: > > db=# create table test_a (id int); > CREATE TABLE > db=# create table test_b (id int); > CREATE TABLE > db=# explain select test_a.* from test_a left join test_b using (id); > QUERY PLAN > ---------------------------------------------------------------------- > Merge Left Join (cost=359.57..860.00 rows=32512 width=4) > Merge Cond: (test_a.id = test_b.id) > -> Sort (cost=179.78..186.16 rows=2550 width=4) > Sort Key: test_a.id > -> Seq Scan on test_a (cost=0.00..35.50 rows=2550 width=4) > -> Sort (cost=179.78..186.16 rows=2550 width=4) > Sort Key: test_b.id > -> Seq Scan on test_b (cost=0.00..35.50 rows=2550 width=4) > (8 rows) > > Also, how would that work with duplicate rows in the referenced table? Right, I neglected to mention that the omitted table must also be unique on the join key: decibel@decina.attlocal=# create table a(a_id serial primary key); CREATE TABLE decibel@decina.attlocal=# create table b(a_id int); CREATE TABLE decibel@decina.attlocal=# explain analyze select a.* from a left join b using(a_id); QUERY PLAN -------------------------------------------------------------------------- --------------------------------- Hash Right Join (cost=67.38..137.94 rows=2550 width=4) (actual time=0.035..0.035 rows=0 loops=1) Hash Cond: (b.a_id = a.a_id) -> Seq Scan on b (cost=0.00..35.50 rows=2550 width=4) (never executed) -> Hash (cost=35.50..35.50 rows=2550 width=4) (actual time=0.002..0.002 rows=0 loops=1) Buckets: 4096 Batches: 1 Memory Usage: 32kB -> Seq Scan on a (cost=0.00..35.50 rows=2550 width=4) (actual time=0.001..0.001 rows=0 loops=1) Planning time: 0.380 ms Execution time: 0.086 ms (8 rows) decibel@decina.attlocal=# alter table b add primary key(a_id); ALTER TABLE decibel@decina.attlocal=# explain analyze select a.* from a left join b using(a_id); QUERY PLAN -------------------------------------------------------------------------- --------------------- Seq Scan on a (cost=0.00..35.50 rows=2550 width=4) (actual time=0.001..0.001 rows=0 loops=1) Planning time: 0.247 ms Execution time: 0.029 ms (3 rows) decibel@decina.attlocal=# alter table a drop constraint a_pkey; ALTER TABLE decibel@decina.attlocal=# explain analyze select a.* from a left join b using(a_id); QUERY PLAN -------------------------------------------------------------------------- --------------------- Seq Scan on a (cost=0.00..35.50 rows=2550 width=4) (actual time=0.001..0.001 rows=0 loops=1) Planning time: 0.098 ms Execution time: 0.011 ms (3 rows) -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance