Re: [GENERAL] View with an outer join - is there any way to optimise

2005-12-13 Thread Rich Doughty

Tom Lane wrote:

Rich Doughty <[EMAIL PROTECTED]> writes:


I have a view vw_tokens defined as
...
I cannot however perform a meaningful join against this view.
...
PG forms the full output of the view.



You seem to be wishing that PG would push the INNER JOIN down inside the
nested LEFT JOINs.  In general, rearranging inner and outer joins like
that can change the results.  There are limited cases where it can be
done without breaking the query semantics, but the planner doesn't
currently have any logic to analyze whether it's safe or not, so it just
doesn't try.

Improving this situation is (or ought to be) on the TODO list, but I dunno
when it will happen.


ok, thanks. as i suspected, i don't think i'm going to be able to views for
this. when the query is ultimately returning only 100 or so rows, i cannot
afford a full 4 million row table scan to form the full view when a nested
loop might make more sense (anything makes more sense than the full view :-)

i have a workaround (of sorts). instead of

  WHERE token_id IN (SELECT token_id FROM ta_tokenhist WHERE sarreport_id = 9)

if i perform the subquery manually, then create a second query of the form

  WHERE token_id IN (?,?,?,?,?)

i get decent results. it's pretty ugly but it works. i doubt that it will
scale up to 500 or more results (if that), but thankfully in general, neither
do the query results.

cheers anyway

  - Rich Doughty

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] View with an outer join - is there any way to optimise this?

2005-12-12 Thread Tom Lane
Rich Doughty <[EMAIL PROTECTED]> writes:
> I have a view vw_tokens defined as
> ...
> I cannot however perform a meaningful join against this view.
> ...
> PG forms the full output of the view.

You seem to be wishing that PG would push the INNER JOIN down inside the
nested LEFT JOINs.  In general, rearranging inner and outer joins like
that can change the results.  There are limited cases where it can be
done without breaking the query semantics, but the planner doesn't
currently have any logic to analyze whether it's safe or not, so it just
doesn't try.

Improving this situation is (or ought to be) on the TODO list, but I dunno
when it will happen.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] View with an outer join - is there any way to optimise

2005-12-12 Thread Rich Doughty

John McCawley wrote:
You should be able to use my trick...the join that is giving you the 
problem is:


SELECT *
 FROM
 tokens.ta_tokenhist h INNER JOIN
 tokens.vw_tokenst ON h.token_id = t.token_id
 WHERE
 h.sarreport_id = 9 ;


ta_tokenhist is already part of your view, right?  So you should be able 
to include the sarreport_id as part of your view, and then restructure 
your query as:



SELECT *
 FROM
 tokens.ta_tokenhist INNER JOIN
 tokens.vw_tokens ON tokens.ta_tokenhist.token_id = 
tokens.vw_tokens.token_id

 WHERE
 tokens.vw_tokens.sarreport_id = 9 ;

I removed the aliases because they confuse me ;)


i don't think i can do that. basically i want to run a variety of queries
on the vw_tokens view. for example, joins i hope to do may include:

  tokens.ta_tokenhist h INNER JOIN tokens.vw_tokens WHERE h.customer_id = ?
  tokens.ta_tokenhist h INNER JOIN tokens.vw_tokens WHERE h.histdate between 
'then' and 'now'
  tokens.vw_tokens WHERE number = ?

i just want vw_tokens to give me a constant resultset. i have a feeling
though that views aren't go to be able to give me what i need.

i suppose i could go for a set returning function, or just write the
queries manually.


--

  - Rich Doughty

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] View with an outer join - is there any way to optimise

2005-12-12 Thread John McCawley
You should be able to use my trick...the join that is giving you the 
problem is:


SELECT *
 FROM
 tokens.ta_tokenhist h INNER JOIN
 tokens.vw_tokenst ON h.token_id = t.token_id
 WHERE
 h.sarreport_id = 9 ;


ta_tokenhist is already part of your view, right?  So you should be able 
to include the sarreport_id as part of your view, and then restructure 
your query as:



SELECT *
 FROM
 tokens.ta_tokenhist INNER JOIN
 tokens.vw_tokens ON tokens.ta_tokenhist.token_id = 
tokens.vw_tokens.token_id

 WHERE
 tokens.vw_tokens.sarreport_id = 9 ;

I removed the aliases because they confuse me ;)



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] View with an outer join - is there any way to optimise

2005-12-12 Thread John McCawley

First of all, check out this thread:

http://archives.postgresql.org/pgsql-general/2005-11/msg00734.php

I had a similar performance issue with a view.  Look at my solution and 
it might help you out.


Second, you might want to change your token history status from a string 
to an integer that references a status table.  If your view is causing a 
sequential scan, you're going to end up will bazillions of string 
comparisons.  I don't know if Postgres has some form of black magic 
string comparison optimization, but I generally avoid string comparisons 
when I am dealing with a few known values, as would be the case in a 
status table.



Rich Doughty wrote:



I have a view vw_tokens defined as

  CREATE VIEW tokens.vw_tokens AS SELECT
  -- too many columns to mention
  FROM
  tokens.ta_tokens  t LEFT JOIN
  tokens.ta_tokenhist   i ON t.token_id = i.token_id AND
 i.status   = 'issued'   LEFT JOIN
  tokens.ta_tokenhist   s ON t.token_id = s.token_id AND
 s.status   = 'sold' LEFT JOIN
  tokens.ta_tokenhist   r ON t.token_id = r.token_id AND
   r.status   = 'redeemed'
  ;


the ta_tokens table contains approx 4 million records, and ta_tokenhist
approx 10 millions. queries against the view itself on the primary key
execute with no issues at all.

I cannot however perform a meaningful join against this view. when i
execute

  SELECT *
  FROM
  tokens.ta_tokenhist h INNER JOIN
  tokens.vw_tokenst ON h.token_id = t.token_id
  WHERE
  h.sarreport_id = 9
  ;

PG forms the full output of the view. the query plan is


 Hash Join  (cost=1638048.47..3032073.73 rows=1 width=702)
   Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
   ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907 
width=470)
 Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
 ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907 
width=322)
   Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
   ->  Hash Left Join  (cost=559931.55..1093783.71 
rows=4052907 width=174)
 Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
 ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07 
rows=4052907 width=26)
 ->  Hash  (cost=459239.41..459239.41 rows=4114456 
width=152)
   ->  Seq Scan on ta_tokenhist i  
(cost=0.00..459239.41 rows=4114456 width=152)
 Filter: ((status)::text = 
'issued'::text)
   ->  Hash  (cost=459239.41..459239.41 rows=3905186 
width=152)
 ->  Seq Scan on ta_tokenhist s  
(cost=0.00..459239.41 rows=3905186 width=152)

   Filter: ((status)::text = 'sold'::text)
 ->  Hash  (cost=459239.41..459239.41 rows=2617645 width=152)
   ->  Seq Scan on ta_tokenhist r  (cost=0.00..459239.41 
rows=2617645 width=152)

 Filter: ((status)::text = 'redeemed'::text)
   ->  Hash  (cost=6.01..6.01 rows=1 width=236)
 ->  Index Scan using fkx_tokenhist__sarreports on 
ta_tokenhist h  (cost=0.00..6.01 rows=1 width=236)

   Index Cond: ((sarreport_id)::integer = 9)


I have also tried explicitly querying token_id in the view, hoping
to force a nested loop:


  EXPLAIN
  SELECT *
  FROM
  tokens.vw_tokens__user
  WHERE
token_id IN (SELECT token_id FROM tokens.ta_tokenhist WHERE 
sarreport_id = 9);



QUERY PLAN
-- 


 Hash IN Join  (cost=1638048.47..3032073.73 rows=1 width=470)
   Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
   ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907 
width=470)
 Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
 ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907 
width=322)
   Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
   ->  Hash Left Join  (cost=559931.55..1093783.71 
rows=4052907 width=174)
 Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
 ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07 
rows=4052907 width=26)
 ->  Hash  (cost=459239.41..459239.41 rows=4114456 
width=152)
   ->  Seq Scan on ta_tokenhist i  
(cost=0.00..459239.41 rows=4114456 width=152)
 Filter: ((status)::text = 
'issued'::text)
   ->  Hash  (cost=459239.41..459239.41 rows=3905186 
width=152)
 ->  Seq Scan on ta_tokenhist s  
(cost=0.00..459239.41 rows=3905186 width=152)

  

Re: [GENERAL] View with an outer join - is there any way to optimise

2005-12-12 Thread Rich Doughty

John McCawley wrote:

First of all, check out this thread:

http://archives.postgresql.org/pgsql-general/2005-11/msg00734.php

I had a similar performance issue with a view.  Look at my solution and 
it might help you out.


i'm not sure that'll help in my case as ta_tokens has a 1-to-many
relationship with ta_tokenhist.

there are various indexed tokenhist columns that i want to query on (in
addition to salesorder_id). none of them will return more than 100 or so
rows. it'd probably be easier to abandon the view altogether (which isn't
something i'd really like to do)

Second, you might want to change your token history status from a string 
to an integer that references a status table.  If your view is causing a 
sequential scan, you're going to end up will bazillions of string 
comparisons.  I don't know if Postgres has some form of black magic 
string comparison optimization, but I generally avoid string comparisons 
when I am dealing with a few known values, as would be the case in a 
status table.


interesting thought. of course, i'd rather postgres didn't do a full
sequential scan ;-)




Rich Doughty wrote:



I have a view vw_tokens defined as

  CREATE VIEW tokens.vw_tokens AS SELECT
  -- too many columns to mention
  FROM
  tokens.ta_tokens  t LEFT JOIN
  tokens.ta_tokenhist   i ON t.token_id = i.token_id AND
 i.status   = 'issued'   LEFT JOIN
  tokens.ta_tokenhist   s ON t.token_id = s.token_id AND
 s.status   = 'sold' LEFT JOIN
  tokens.ta_tokenhist   r ON t.token_id = r.token_id AND
   r.status   = 'redeemed'
  ;


the ta_tokens table contains approx 4 million records, and ta_tokenhist
approx 10 millions. queries against the view itself on the primary key
execute with no issues at all.

I cannot however perform a meaningful join against this view. when i
execute

  SELECT *
  FROM
  tokens.ta_tokenhist h INNER JOIN
  tokens.vw_tokenst ON h.token_id = t.token_id
  WHERE
  h.sarreport_id = 9
  ;

PG forms the full output of the view. the query plan is


 Hash Join  (cost=1638048.47..3032073.73 rows=1 width=702)
   Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
   ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907 
width=470)
 Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
 ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907 
width=322)
   Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
   ->  Hash Left Join  (cost=559931.55..1093783.71 
rows=4052907 width=174)
 Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
 ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07 
rows=4052907 width=26)
 ->  Hash  (cost=459239.41..459239.41 rows=4114456 
width=152)
   ->  Seq Scan on ta_tokenhist i  
(cost=0.00..459239.41 rows=4114456 width=152)
 Filter: ((status)::text = 
'issued'::text)
   ->  Hash  (cost=459239.41..459239.41 rows=3905186 
width=152)
 ->  Seq Scan on ta_tokenhist s  
(cost=0.00..459239.41 rows=3905186 width=152)

   Filter: ((status)::text = 'sold'::text)
 ->  Hash  (cost=459239.41..459239.41 rows=2617645 width=152)
   ->  Seq Scan on ta_tokenhist r  (cost=0.00..459239.41 
rows=2617645 width=152)

 Filter: ((status)::text = 'redeemed'::text)
   ->  Hash  (cost=6.01..6.01 rows=1 width=236)
 ->  Index Scan using fkx_tokenhist__sarreports on 
ta_tokenhist h  (cost=0.00..6.01 rows=1 width=236)

   Index Cond: ((sarreport_id)::integer = 9)


I have also tried explicitly querying token_id in the view, hoping
to force a nested loop:


  EXPLAIN
  SELECT *
  FROM
  tokens.vw_tokens__user
  WHERE
token_id IN (SELECT token_id FROM tokens.ta_tokenhist WHERE 
sarreport_id = 9);



QUERY PLAN
-- 


 Hash IN Join  (cost=1638048.47..3032073.73 rows=1 width=470)
   Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
   ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907 
width=470)
 Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
 ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907 
width=322)
   Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
   ->  Hash Left Join  (cost=559931.55..1093783.71 
rows=4052907 width=174)
 Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
 ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07 
rows=4052907 width=26)

[GENERAL] View with an outer join - is there any way to optimise this?

2005-12-12 Thread Rich Doughty


I have a view vw_tokens defined as

  CREATE VIEW tokens.vw_tokens AS SELECT
  -- too many columns to mention
  FROM
  tokens.ta_tokens  t LEFT JOIN
  tokens.ta_tokenhist   i ON t.token_id = i.token_id AND
 i.status   = 'issued'   LEFT JOIN
  tokens.ta_tokenhist   s ON t.token_id = s.token_id AND
 s.status   = 'sold' LEFT JOIN
  tokens.ta_tokenhist   r ON t.token_id = r.token_id AND
   r.status   = 'redeemed'
  ;


the ta_tokens table contains approx 4 million records, and ta_tokenhist
approx 10 millions. queries against the view itself on the primary key
execute with no issues at all.

I cannot however perform a meaningful join against this view. when i
execute

  SELECT *
  FROM
  tokens.ta_tokenhist h INNER JOIN
  tokens.vw_tokenst ON h.token_id = t.token_id
  WHERE
  h.sarreport_id = 9
  ;

PG forms the full output of the view. the query plan is


 Hash Join  (cost=1638048.47..3032073.73 rows=1 width=702)
   Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
   ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907 width=470)
 Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
 ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907 
width=322)
   Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
   ->  Hash Left Join  (cost=559931.55..1093783.71 rows=4052907 
width=174)
 Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
 ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07 
rows=4052907 width=26)
 ->  Hash  (cost=459239.41..459239.41 rows=4114456 
width=152)
   ->  Seq Scan on ta_tokenhist i  
(cost=0.00..459239.41 rows=4114456 width=152)
 Filter: ((status)::text = 'issued'::text)
   ->  Hash  (cost=459239.41..459239.41 rows=3905186 width=152)
 ->  Seq Scan on ta_tokenhist s  (cost=0.00..459239.41 
rows=3905186 width=152)
   Filter: ((status)::text = 'sold'::text)
 ->  Hash  (cost=459239.41..459239.41 rows=2617645 width=152)
   ->  Seq Scan on ta_tokenhist r  (cost=0.00..459239.41 
rows=2617645 width=152)
 Filter: ((status)::text = 'redeemed'::text)
   ->  Hash  (cost=6.01..6.01 rows=1 width=236)
 ->  Index Scan using fkx_tokenhist__sarreports on ta_tokenhist h  
(cost=0.00..6.01 rows=1 width=236)
   Index Cond: ((sarreport_id)::integer = 9)


I have also tried explicitly querying token_id in the view, hoping
to force a nested loop:


  EXPLAIN
  SELECT *
  FROM
tokens.vw_tokens__user
  WHERE
token_id IN (SELECT token_id FROM tokens.ta_tokenhist WHERE 
sarreport_id = 9);


QUERY PLAN
--
 Hash IN Join  (cost=1638048.47..3032073.73 rows=1 width=470)
   Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
   ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907 width=470)
 Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
 ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907 
width=322)
   Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
   ->  Hash Left Join  (cost=559931.55..1093783.71 rows=4052907 
width=174)
 Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
 ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07 
rows=4052907 width=26)
 ->  Hash  (cost=459239.41..459239.41 rows=4114456 
width=152)
   ->  Seq Scan on ta_tokenhist i  
(cost=0.00..459239.41 rows=4114456 width=152)
 Filter: ((status)::text = 'issued'::text)
   ->  Hash  (cost=459239.41..459239.41 rows=3905186 width=152)
 ->  Seq Scan on ta_tokenhist s  (cost=0.00..459239.41 
rows=3905186 width=152)
   Filter: ((status)::text = 'sold'::text)
 ->  Hash  (cost=459239.41..459239.41 rows=2617645 width=152)
   ->  Seq Scan on ta_tokenhist r  (cost=0.00..459239.41 
rows=2617645 width=152)
 Filter: ((status)::text = 'redeemed'::text)
   ->  Hash  (cost=6.01..6.01 rows=1 width=4)
 ->  Index Scan using fkx_tokenhist__sarreports on ta_tokenhist  
(cost=0.00..6.01 rows=1 width=4)
   Index Cond: ((sarreport_id)::integer = 9)


Setting enable_mergejoin and enable_hashjoin to off results in a nested
but still forms the view output.


I can achieve the results i need be eliminating the view and writing
the query manually