[PERFORM] performance change from 8.3.1 to later releases

2010-04-20 Thread Roger Ging
uot;  ->  Materialize  (cost=19800.37..21112.29 
rows=104954 width=27) (actual time=2878.550..5291.205 rows=914636 loops=1)"
"->  Sort  (cost=19800.37..20062.75 
rows=104954 width=27) (actual time=2878.538..3607.201 rows=104624 loops=1)"

"  Sort Key: vc2.short_desc_75"
"  Sort Method:  external merge  
Disk: 3776kB"
"  ->  Seq Scan on vendor_catalog 
vc2  (cost=0.00..8534.52 rows=104954 width=27) (actual 
time=0.018..392.270 rows=104624 loops=1)"

"Filter: (cutoff_date IS NULL)"
"Total runtime: 45145.977 ms"



on any version from 8.3.8 on, this query has never returned, and explain 
analyze never returns, so I am only posting explain output


explain --analyze
select vc.* from traderhank.vendor_catalog vc
where vc.th_sku not in
(
select vc1.th_sku from traderhank.vendor_catalog vc1
join traderhank.vendor_catalog vc2 on vc2.short_desc_75 = vc1.short_desc_75
and vc2.th_sku != vc1.th_sku
where vc1.cutoff_date is null and vc2.cutoff_date is null and vc1.th_sku 
is not null

group by vc1.th_sku
)
and vc.th_sku not in
(select vc_th_Sku from traderhank.xc_products where vc_th_sku is not null)


"Seq Scan on vendor_catalog vc  (cost=140413.05..91527264.28 rows=36838 
width=309)"

"  Filter: ((NOT (hashed SubPlan 2)) AND (NOT (SubPlan 1)))"
"  SubPlan 2"
"->  Seq Scan on xc_products  (cost=0.00..1716.99 rows=5132 width=8)"
"  Filter: (vc_th_sku IS NOT NULL)"
"  SubPlan 1"
"->  Materialize  (cost=138683.23..139734.64 rows=75541 width=8)"
"  ->  Group  (cost=134997.43..138311.69 rows=75541 width=8)"
"->  Sort  (cost=134997.43..136654.56 rows=662853 width=8)"
"  Sort Key: vc1.th_sku"
"  ->  Merge Join  (cost=39600.73..52775.08 
rows=662853 width=8)"
"Merge Cond: ((vc1.short_desc_75)::text = 
(vc2.short_desc_75)::text)"
"Join Filter: ((vc2.th_sku)::text <> 
(vc1.th_sku)::text)"
"->  Sort  (cost=19800.37..20062.75 
rows=104954 width=27)"

"  Sort Key: vc1.short_desc_75"
"  ->  Seq Scan on vendor_catalog vc1  
(cost=0.00..8534.52 rows=104954 width=27)"
"Filter: ((cutoff_date IS NULL) 
AND (th_sku IS NOT NULL))"
"->  Materialize  (cost=19800.37..21112.29 
rows=104954 width=27)"
"  ->  Sort  (cost=19800.37..20062.75 
rows=104954 width=27)"

"Sort Key: vc2.short_desc_75"
"->  Seq Scan on vendor_catalog 
vc2  (cost=0.00..8534.52 rows=104954 width=27)"
"  Filter: (cutoff_date IS 
NULL)"





I've also tried changing the code to not exists, but that query never 
comes back on any version I have available:


explain --analyze
select vc.* from traderhank.vendor_catalog vc
where not exists
(
select 1 from traderhank.vendor_catalog vc1
join traderhank.vendor_catalog vc2 on vc2.short_desc_75 = vc1.short_desc_75
and vc2.th_sku != vc1.th_sku
where vc1.cutoff_date is null and vc2.cutoff_date is null and vc1.th_sku 
= vc.th_sku

group by vc1.th_sku
)
and not exists
(select 1 from traderhank.xc_products where vc_th_sku is not null and 
vc_th_sku = vc.th_sku)


"Nested Loop Anti Join  (cost=63650.74..93617.53 rows=1 width=309)"
"  Join Filter: ((xc_products.vc_th_sku)::text = (vc.th_sku)::text)"
"  ->  Hash Anti Join  (cost=63650.74..91836.39 rows=1 width=309)"
"Hash Cond: ((vc.th_sku)::text = (vc1.th_sku)::text)"
"->  Seq Scan on vendor_catalog vc  (cost=0.00..8534.52 
rows=147352 width=309)"

"->  Hash  (cost=52775.08..52775.08 rows=662853 width=8)"
"  ->  Merge Join  (cost=39600.73..52775.08 rows=662853 
width=8)"
"Merge Cond: ((vc1.short_desc_75)::text = 
(vc2.short_desc_75)::text)"
"Join Filter: ((vc2.th_sku)::text <> 
(vc1.th_sku)::text)"
"->  Sort  (cost=19800.37..20062.75 rows=104954 
width=27)"

"  Sort Key: vc1.short_desc_75"
"  ->  Seq Scan on vendor_catalog vc1  
(cost=0.00..8534.52 rows=104954 width=27)"

"Filter: (cutoff_date IS NULL)"
"->  Materialize  (cost=19800.37..21112.29 
rows=104954 width=27)"
"  ->  Sort  (cost=19800.37..20062.75 
rows=104954 width=27)"

"Sort Key: vc2.short_desc_75"
"->  Seq Scan on vendor_catalog vc2  
(cost=0.00..8534.52 rows=104954 width=27)"

"  Filter: (cutoff_date IS NULL)"
"  ->  Seq Scan on xc_products  (cost=0.00..1716.99 rows=5132 width=8)"
"Filter: (xc_products.vc_th_sku IS NOT NULL)"



So, my question is, do I need  to re-write all of my in() and not in () 
queries to left joins, is this something that might get resolved in 
another release in the future?


Thanks for any help.

Roger Ging



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow query

2004-10-22 Thread Roger Ging
The following query has never finished.  I have let it run for over 24 
hours.  This is a one time update that is part of a conversion script 
from MSSQL data.  All of the tables are freshly built and inserted 
into.  I have not run explain analyze because it does not return in a 
reasonable time.  Explain output is posted below.  Any suggestions on 
syntax changes or anything else to improve this would be appreciated.

Dual PIII 1Ghz
4 GB RAM
4 spindle IDE RAID 0 on LSI controller.
Postgres 7.4.5
Linux version 2.6.3-7mdk-p3-smp-64GB
postgresql.cong snip:
tcpip_socket = true
max_connections = 40
shared_buffers = 1000
sort_mem = 65536 
fsync = true

source_song_title +-10,500,000 rows
source_song +-9,500,000 rows
source_system 10 rows
source_title +- 5,600,000
Code run right before this query:
create index ssa_source_song_id on source_song_artist (source_song_id);
analyze source_song_artist;
create index sa_artist_id on source_artist (artist_id);
analyze source_artist;
create index ss_source_song_id on source_song (source_song_id);
analyze source_song;
create index st_title_id on source_title (title_id);
analyze source_title;
source_song.source_song_id = int4
source_song_title.source_song_id = int4
source_title.title_id = int4
source_song_title.title_id = int4
update source_song_title set
source_song_title_id = nextval('source_song_title_seq')
,licensing_match_order = (select licensing_match_order from 
source_system where source_system_id = ss.source_system_id)
,affiliation_match_order = (select affiliation_match_order from 
source_system where source_system_id = ss.source_system_id)
,title = st.title
from source_song_title sst
join source_song ss on ss.source_song_id = sst.source_song_id
join source_title st on st.title_id = sst.title_id
where source_song_title.source_song_id = sst.source_song_id;

Explain output:
"Hash Join  (cost=168589.60..16651072.43 rows=6386404 width=335)"
"  Hash Cond: ("outer".title_id = "inner".title_id)"
"  ->  Merge Join  (cost=0.00..1168310.61 rows=6386403 width=311)"
"Merge Cond: ("outer".source_song_id = "inner".source_song_id)"
"->  Merge Join  (cost=0.00..679279.40 rows=6386403 width=16)"
"  Merge Cond: ("outer".source_song_id = 
"inner".source_song_id)"
"  ->  Index Scan using source_song_title_pkey on 
source_song_title sst  (cost=0.00..381779.37 rows=10968719 width=8)"
"  ->  Index Scan using ss_source_song_id on source_song ss  
(cost=0.00..190583.36 rows=6386403 width=8)"
"->  Index Scan using source_song_title_pkey on 
source_song_title  (cost=0.00..381779.37 rows=10968719 width=303)"
"  ->  Hash  (cost=117112.08..117112.08 rows=5513808 width=32)"
"->  Seq Scan on source_title st  (cost=0.00..117112.08 
rows=5513808 width=32)"
"  SubPlan"
"->  Seq Scan on source_system  (cost=0.00..1.14 rows=2 width=4)"
"  Filter: (source_system_id = $0)"
"->  Seq Scan on source_system  (cost=0.00..1.14 rows=2 width=2)"
"  Filter: (source_system_id = $0)"

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] How can one see what queries are running withing a

2004-06-30 Thread Roger Ging

P.A.M. van Dam wrote:
Hi!
I'd like to know if there is a way to see what queries are running
within a certain postgres instance and how much resources (cpu/memory)
etc. they are using. Right now it's impossible to see what is happening
within postgres when it's binaries are using 100% CPU.
In Sybase there is a command which let's you view what 'processes' are
running within the server and how much cpu (according to Sybase) they
are using. It also provides you with a stored procedure to kill off some
bad behaving queries. How can one do this within postgres?
Thanks in advance!
Best regards,
Pascal
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster
 

select * from pg_stat_activity.  If you want to see the command that was 
run, you will need to turn on stats_command_string = true in 
postgresql.conf and re-start server.  PID shows up, so you can kill bad 
queries from terminal and see CUP % in top

Roger Ging
V.P., Information Technology
Music Reports, Inc.
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Followup - expression (functional) index use in joins

2003-12-01 Thread Roger Ging




Tom,

Turning enable_hashjoin off made the query run as it had on v7.3.  We
have worked around this by changing the index from a function call to a
direct index on a new column with the results of the function
maintained by a trigger.  Would there be performance issues from
leaving enable_hashjoin off, or do you recomend enabling it, and
working around function calls in indices?

See results below.  I was not sure if I was supposed to reply-all, or
just to the list.  Sorry if the protocol is incorrect.



ppl=# explain analyse select title from music.program p
ppl-# join music.logfile l on
ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no)
ppl-# where l.air_date = '01/30/2001'
ppl-# and l.station = 'KABC';
  
QUERY PLAN
-
 Hash Join  (cost=69.89..19157.06 rows=2322 width=28) (actual
time=500.905..1473.748 rows=242 loops=1)
   Hash Cond: ((music.fn_mri_id_no_program("outer".mri_id_no))::text =
("inner".program_id)::text)
   ->  Seq Scan on program p  (cost=0.00..16888.98 rows=173998
width=40) (actual time=98.371..532.184 rows=173998 loops=1)
   ->  Hash  (cost=69.84..69.84 rows=17 width=9) (actual
time=65.817..65.817 rows=0 loops=1)
 ->  Index Scan using idx_logfile_station_air_date on
logfile l  (cost=0.00..69.84 rows=17 width=9) (actual
time=24.499..65.730 rows=32 loops=1)
   Index Cond: (((station)::text = 'KABC'::text) AND
(air_date = '2001-01-30 00:00:00'::timestamp without time zone))
 Total runtime: 1474.067 ms
(7 rows)

ppl=# set enable_mergejoin = false;
SET
ppl=# explain analyse select title from music.program p
ppl-# join music.logfile l on
ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no)
ppl-# where l.air_date = '01/30/2001'
ppl-# and l.station = 'KABC';
 
QUERY PLAN
---
 Hash Join  (cost=69.89..19157.06 rows=2322 width=28) (actual
time=444.834..1428.815 rows=242 loops=1)
   Hash Cond: ((music.fn_mri_id_no_program("outer".mri_id_no))::text =
("inner".program_id)::text)
   ->  Seq Scan on program p  (cost=0.00..16888.98 rows=173998
width=40) (actual time=105.977..542.870 rows=173998 loops=1)
   ->  Hash  (cost=69.84..69.84 rows=17 width=9) (actual
time=1.197..1.197 rows=0 loops=1)
 ->  Index Scan using idx_logfile_station_air_date on
logfile l  (cost=0.00..69.84 rows=17 width=9) (actual time=0.574..1.151
rows=32 loops=1)
   Index Cond: (((station)::text = 'KABC'::text) AND
(air_date = '2001-01-30 00:00:00'::timestamp without time zone))
 Total runtime: 1429.111 ms
(7 rows)

ppl=# set enable_hashjoin = false;
SET
ppl=# explain analyse select title from music.program p
ppl-# join music.logfile l on
ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no)
ppl-# where l.air_date = '01/30/2001'
ppl-# and l.station = 'KABC';
 
QUERY PLAN
--
 Nested Loop  (cost=0.00..58104.34 rows=2322 width=28) (actual
time=0.480..5.357 rows=242 loops=1)
   ->  Index Scan using idx_logfile_station_air_date on logfile l 
(cost=0.00..69.84 rows=17 width=9) (actual time=0.176..0.754 rows=32
loops=1)
 Index Cond: (((station)::text = 'KABC'::text) AND (air_date =
'2001-01-30 00:00:00'::timestamp without time zone))
   ->  Index Scan using idx_program_mri_id_no_program on program p 
(cost=0.00..3400.74 rows=870 width=40) (actual time=0.041..0.127 rows=8
loops=32)
 Index Cond: (("outer".program_id)::text =
(music.fn_mri_id_no_program(p.mri_id_no))::text)
 Total runtime: 5.637 ms
(6 rows)


Tom Lane wrote:

  Roger Ging <[EMAIL PROTECTED]> writes:
  
  
Ran vacuum analyse on both program and logfile tables.  Estimates are 
more in line with reality now,

  
  
And they are what now?  You really can't expect to get useful help here
when you're being so miserly with the details ...

FWIW, I suspect you could force 7.4 to generate 7.3's plan by setting
enable_mergejoin to off (might have to also set enable_hashjoin to off,
if it then tries for a hash join).  7.3 could not even consider those
join types in this example, while 7.4 can.  The interesting question
from my perspective is why the planner is guessing wrong about the
relative costs of the plans.  EXPLAIN ANALYZE results wi

Re: [PERFORM] Followup - expression (functional) index use in joins

2003-12-01 Thread Roger Ging





Turning enable_hashjoin off made the query run as it had on v7.3.  We
have worked around this by changing the index from a function call to a
direct index on a new column with the results of the function maintained
by a trigger.  Would there be performance issues from leaving
enable_hashjoin off, or do you recomend enabling it, and working around
function calls in indices?

See results below.



ppl=# explain analyse select title from music.program p
ppl-# join music.logfile l on
ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no)
ppl-# where l.air_date = '01/30/2001'
ppl-# and l.station = 'KABC';
  
QUERY PLAN
-
 Hash Join  (cost=69.89..19157.06 rows=2322 width=28) (actual
time=500.905..1473.748 rows=242 loops=1)
   Hash Cond: ((music.fn_mri_id_no_program("outer".mri_id_no))::text =
("inner".program_id)::text)
   ->  Seq Scan on program p  (cost=0.00..16888.98 rows=173998
width=40) (actual time=98.371..532.184 rows=173998 loops=1)
   ->  Hash  (cost=69.84..69.84 rows=17 width=9) (actual
time=65.817..65.817 rows=0 loops=1)
 ->  Index Scan using idx_logfile_station_air_date on
logfile l  (cost=0.00..69.84 rows=17 width=9) (actual
time=24.499..65.730 rows=32 loops=1)
   Index Cond: (((station)::text = 'KABC'::text) AND
(air_date = '2001-01-30 00:00:00'::timestamp without time zone))
 Total runtime: 1474.067 ms
(7 rows)

ppl=# set enable_mergejoin = false;
SET
ppl=# explain analyse select title from music.program p
ppl-# join music.logfile l on
ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no)
ppl-# where l.air_date = '01/30/2001'
ppl-# and l.station = 'KABC';
 
QUERY PLAN
---
 Hash Join  (cost=69.89..19157.06 rows=2322 width=28) (actual
time=444.834..1428.815 rows=242 loops=1)
   Hash Cond: ((music.fn_mri_id_no_program("outer".mri_id_no))::text =
("inner".program_id)::text)
   ->  Seq Scan on program p  (cost=0.00..16888.98 rows=173998
width=40) (actual time=105.977..542.870 rows=173998 loops=1)
   ->  Hash  (cost=69.84..69.84 rows=17 width=9) (actual
time=1.197..1.197 rows=0 loops=1)
 ->  Index Scan using idx_logfile_station_air_date on
logfile l  (cost=0.00..69.84 rows=17 width=9) (actual time=0.574..1.151
rows=32 loops=1)
   Index Cond: (((station)::text = 'KABC'::text) AND
(air_date = '2001-01-30 00:00:00'::timestamp without time zone))
 Total runtime: 1429.111 ms
(7 rows)

ppl=# set enable_hashjoin = false;
SET
ppl=# explain analyse select title from music.program p
ppl-# join music.logfile l on
ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no)
ppl-# where l.air_date = '01/30/2001'
ppl-# and l.station = 'KABC';
 
QUERY PLAN
--
 Nested Loop  (cost=0.00..58104.34 rows=2322 width=28) (actual
time=0.480..5.357 rows=242 loops=1)
   ->  Index Scan using idx_logfile_station_air_date on logfile l 
(cost=0.00..69.84 rows=17 width=9) (actual time=0.176..0.754 rows=32
loops=1)
 Index Cond: (((station)::text = 'KABC'::text) AND (air_date =
'2001-01-30 00:00:00'::timestamp without time zone))
   ->  Index Scan using idx_program_mri_id_no_program on program p 
(cost=0.00..3400.74 rows=870 width=40) (actual time=0.041..0.127 rows=8
loops=32)
 Index Cond: (("outer".program_id)::text =
(music.fn_mri_id_no_program(p.mri_id_no))::text)
 Total runtime: 5.637 ms
(6 rows)


Tom Lane wrote:

  Roger Ging <[EMAIL PROTECTED]> writes:
  
  
Ran vacuum analyse on both program and logfile tables.  Estimates are 
more in line with reality now,

  
  
And they are what now?  You really can't expect to get useful help here
when you're being so miserly with the details ...

FWIW, I suspect you could force 7.4 to generate 7.3's plan by setting
enable_mergejoin to off (might have to also set enable_hashjoin to off,
if it then tries for a hash join).  7.3 could not even consider those
join types in this example, while 7.4 can.  The interesting question
from my perspective is why the planner is guessing wrong about the
relative costs of the plans.  EXPLAIN ANALYZE results with each type of
join forced would be useful to look at.

			regards, tom lane


Re: [PERFORM] Followup - expression (functional) index use in joins

2003-11-26 Thread Roger Ging




Ran vacuum analyse on both program and logfile tables.  Estimates are
more in line with reality now, but query still takes 10 seconds on v7.4
and 10 ms on v7.3.  Function is marked as immutable and returns
varchar(5).  I am wondering why the planner would choose a merge join
(v7.4) as opposed to a nested loop (v7.3) given the small number of
rows in the top level table (logfile) based upon the where clause (
L.air_date = '04/12/2002'::TIMESTAMP
)
there are typically only 30 rows per station/air_date.  What am I
missing here?

Richard Huxton wrote:

  On Wednesday 26 November 2003 18:39, Roger Ging wrote:
  
  
version 7.4 results:

explain analyse SELECT L.row_id FROM music.logfile L LEFT JOIN
music.program P ON
music.fn_mri_id_no_program(P.mri_id_no) = L.program_id
WHERE  L.station = UPPER('kabc')::VARCHAR
AND L.air_date = '04/12/2002'::TIMESTAMP
AND P.cutoff_date IS NULL
ORDER BY L.chron_start,L.chron_end;

  
  
  
  
->  Seq Scan on program p  (cost=0.00..15192.35
rows=4335 width=20) (actual time=109.045..1955.882 rows=173998 loops=1)

  
  
The estimated number of rows here (4335) is *way* off (173998 actually). If 
you only had 4335 rows, then this might be a more sensible plan.

First step is to run:
  VACUUM ANALYSE program;
Then, check the definition of your function fn_mri_id_no_program() and make 
sure it is marked immutable/stable (depending on what it does) and that it's 
returning a varchar.


  





[PERFORM] Followup - expression (functional) index use in joins

2003-11-26 Thread Roger Ging
version 7.4 results:

explain analyse SELECT L.row_id FROM music.logfile L LEFT JOIN 
music.program P ON
music.fn_mri_id_no_program(P.mri_id_no) = L.program_id
WHERE  L.station = UPPER('kabc')::VARCHAR
AND L.air_date = '04/12/2002'::TIMESTAMP
AND P.cutoff_date IS NULL
ORDER BY L.chron_start,L.chron_end;

QUERY PLAN
-
 Sort  (cost=17595.99..17608.23 rows=4894 width=12) (actual 
time=8083.719..8083.738 rows=30 loops=1)
   Sort Key: l.chron_start, l.chron_end
   ->  Merge Left Join  (cost=17135.92..17296.07 rows=4894 width=12) 
(actual time=7727.590..8083.349 rows=30 loops=1)
 Merge Cond: ("outer"."?column5?" = "inner"."?column3?")
 Filter: ("inner".cutoff_date IS NULL)
 ->  Sort  (cost=1681.69..1682.73 rows=414 width=21) (actual 
time=1.414..1.437 rows=30 loops=1)
   Sort Key: (l.program_id)::text
   ->  Index Scan using idx_logfile_station_air_date on 
logfile l  (cost=0.00..1663.70 rows=414 width=21) (actual 
time=0.509..1.228 rows=30 loops=1)
 Index Cond: (((station)::text = 'KABC'::text) AND 
(air_date = '2002-04-12 00:00:00'::timestamp without time zone))
 ->  Sort  (cost=15454.22..15465.06 rows=4335 width=20) (actual 
time=7718.612..7869.874 rows=152779 loops=1)
   Sort Key: (music.fn_mri_id_no_program(p.mri_id_no))::text
   ->  Seq Scan on program p  (cost=0.00..15192.35 
rows=4335 width=20) (actual time=109.045..1955.882 rows=173998 loops=1)
 Total runtime: 8194.290 ms
(13 rows)

version 7.3 results:

explain analyse SELECT L.row_id FROM music.logfile L LEFT JOIN 
music.program P ON
music.fn_mri_id_no_program(P.mri_id_no) = L.program_id
WHERE  L.station = UPPER('kabc')::VARCHAR
AND L.air_date = '04/12/2002'::TIMESTAMP
AND P.cutoff_date IS NULL
ORDER BY L.chron_start,L.chron_end;

 QUERY PLAN
--
 Sort  (cost=55765.51..55768.33 rows=1127 width=41) (actual 
time=7.74..7.75 rows=30 loops=1)
   Sort Key: l.chron_start, l.chron_end
   ->  Nested Loop  (cost=0.00..55708.36 rows=1127 width=41) (actual 
time=0.21..7.62 rows=30 loops=1)
 Filter: ("inner".cutoff_date IS NULL)
 ->  Index Scan using idx_logfile_station_air_date on logfile l 
 (cost=0.00..71.34 rows=17 width=21) (actual time=0.14..0.74 rows=30 
loops=1)
   Index Cond: ((station = 'KABC'::character varying) AND 
(air_date = '2002-04-12 00:00:00'::timestamp without time zone))
 ->  Index Scan using idx_program_mri_id_no_program on program 
p  (cost=0.00..3209.16 rows=870 width=20) (actual time=0.05..0.22 rows=9 
loops=30)
   Index Cond: (music.fn_mri_id_no_program(p.mri_id_no) = 
"outer".program_id)
 Total runtime: 7.86 msec

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] expression (functional) index use in joins

2003-11-26 Thread Roger Ging
I just installed v7.4 and restored a database from v7.3.4.  I have an 
index based on a function that the planner is using on the old version, 
but doing seq scans on left joins in the new version.  I have run 
analyze on the table post restore. the query returns in less than 1 
second on version 7.3.4 and takes over 10 seconds on version 7.4.  Any 
help will be appreciated.

Roger Ging

Query:

SELECT L.row_id FROM music.logfile L LEFT JOIN music.program P ON
music.fn_mri_id_no_program(P.mri_id_no) = L.program_id
WHERE  L.station = UPPER('kabc')::VARCHAR
AND L.air_date = '04/12/2002'::TIMESTAMP
AND P.cutoff_date IS NULL
ORDER BY L.chron_start,L.chron_end;
planner results on 7.4:

 Sort  (cost=17595.99..17608.23 rows=4894 width=12)
   Sort Key: l.chron_start, l.chron_end
   ->  Merge Left Join  (cost=17135.92..17296.07 rows=4894 width=12)
 Merge Cond: ("outer"."?column5?" = "inner"."?column3?")
 Filter: ("inner".cutoff_date IS NULL)
 ->  Sort  (cost=1681.69..1682.73 rows=414 width=21)
   Sort Key: (l.program_id)::text
   ->  Index Scan using idx_logfile_station_air_date on 
logfile l  (cost=0.00..1663.70 rows=414 width=21)
 Index Cond: (((station)::text = 'KABC'::text) AND 
(air_date = '2002-04-12 00:00:00'::timestamp without time zone))
 ->  Sort  (cost=15454.22..15465.06 rows=4335 width=20)
   Sort Key: (music.fn_mri_id_no_program(p.mri_id_no))::text
   ->  Seq Scan on program p  (cost=0.00..15192.35 
rows=4335 width=20)

planner results on 7.3.4:

 Sort  (cost=55765.51..55768.33 rows=1127 width=41)
   Sort Key: l.chron_start, l.chron_end
   ->  Nested Loop  (cost=0.00..55708.36 rows=1127 width=41)
 Filter: ("inner".cutoff_date IS NULL)
 ->  Index Scan using idx_logfile_station_air_date on logfile l 
 (cost=0.00..71.34 rows=17 width=21)
   Index Cond: ((station = 'KABC'::character varying) AND 
(air_date = '2002-04-12 00:00:00'::timestamp without time zone))
 ->  Index Scan using idx_program_mri_id_no_program on program 
p  (cost=0.00..3209.16 rows=870 width=20)
   Index Cond: (music.fn_mri_id_no_program(p.mri_id_no) = 
"outer".program_id)

table "Program" details:

 Column |Type | Modifiers
+-+---
 record_id  | integer |
 title  | character varying(40)   |
 mri_id_no  | character varying(8)|
 ascap_cat  | character varying(1)|
 ascap_mult | numeric(5,3)|
 ascap_prod | character varying(10)   |
 npa_ind| character varying(3)|
 non_inc_in | character varying(1)|
 as_pr_su   | character varying(1)|
 as_1st_run | character varying(1)|
 as_cue_st  | character varying(1)|
 bmi_cat| character varying(2)|
 bmi_mult   | numeric(6,2)|
 bmi_prod   | character varying(7)|
 year   | integer |
 prog_type  | character varying(1)|
 total_ep   | integer |
 last_epis  | character varying(3)|
 syndicator | character varying(6)|
 station| character varying(4)|
 syn_loc| character varying(1)|
 spdb_ver   | character varying(4)|
 as_filed   | character varying(4)|
 bmidb_ver  | character varying(4)|
 cutoff_date| timestamp without time zone |
 effective_date | timestamp without time zone |
 program_id | character varying(5)|
Indexes:
"idx_program_mri_id_no" btree (mri_id_no)
"idx_program_mri_id_no_program" btree 
(music.fn_mri_id_no_program(mri_id_no))
"idx_program_program_id" btree (program_id)
"program_mri_id_no" btree (mri_id_no)
"program_oid" btree (oid)

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match