[PERFORM] Stored Procedure Performance

2006-04-11 Thread Simon Dale








Hi,



Im trying to evaluate PostgreSQL as a database that
will have to store a high volume of data and access that data frequently. One
of the features on our wish list is to be able to use stored procedures to
access the data and I was wondering if it is usual for stored procedures to
perform slower on PostgreSQL than raw SQL?



A simple example of this can be shown with the following
commands:



First I created a test table:



CREATE TABLE test (

id int8,

name varchar(128),

description varchar(500),

constraint pk_test
primary key (id)

);



Then the function I want to test:



CREATE OR REPLACE FUNCTION readTest() RETURNS SETOF test AS

$$

DECLARE

 row
test%ROWTYPE;

BEGIN

 FOR
row IN SELECT * FROM test LOOP

 RETURN
NEXT row;

 END
LOOP;



 RETURN;

END;

$$ LANGUAGE plpgsql;



Firstly, I ran EXPLAIN on the raw SQL to see how long that
takes to access the database the results are as follows:



EXPLAIN ANALYZE SELECT * FROM test;

Seq Scan on test (cost=0.00..10.90 rows=90 width=798)
(actual time=0.003..0.003 rows=0 loops=1)

Total runtime: 0.074 ms

(2 rows)



Secondly, I ran EXPLAIN on the function created above and
the results are as follows:



EXPLAIN ANALYZE SELECT * FROM readTest();

Function Scan on readtest (cost=0.00..12.50 rows=1000 width=798)
(actual time=0.870..0.870 rows=0 loops=1)

Total runtime: 0.910 ms

(2 rows)



I know that the function is planned the first time it is
executed so I ran the same command again to remove that processing from the
timings and the results are as follows:



EXPLAIN ANALYZE SELECT * FROM readTest();

Function Scan on readtest (cost=0.00..12.50 rows=1000 width=798)
(actual time=0.166..0.166 rows=0 loops=1)

Total runtime: 0.217 ms

(2 rows)



Event with the planning removed, the function still performs
significantly slower than the raw SQL. Is that normal or am I doing something wrong
with the creation or calling of the function?



Thanks for your help,



Simon







Visit our Website at www.rm.com


This message is confidential. You should not copy it or disclose its contents to anyone. You may use and apply the information for the intended purpose only. Internet communications are not secure; therefore, RM does not accept legal responsibility for the contents of this message. Any views or opinions presented are those of the author only and not of RM. If this email has come to you in error, please delete it, along with any attachments. Please note that RM may intercept incoming and outgoing email communications. 

Freedom of Information Act 2000

This email and any attachments may contain confidential information belonging to RM.  Where the email and any attachments do contain information of a confidential nature, including without limitation information relating to trade secrets, special terms or prices these shall be deemed for the purpose of the Freedom of Information Act 2000 as information provided in confidence by RM and the disclosure of which would be prejudicial to RM's commercial interests.

This email has been scanned for viruses by Trend ScanMail.





Re: [PERFORM] Takes too long to fetch the data from database

2006-04-11 Thread soni de

I have flushed the database, so currently records in the lan table are: 665280
but records can be increased more than 1GB and in that case it takes more than 1 hour

Below is explain analyze output taken from the table having 665280 records

pdb=# explain analyze SELECT sdate, stime, rbts from lan WHERE (
( bname = 'pluto' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate
= '2004-07-21' ) AND ( sdate = '2004-07-21' ) ) ) ORDER BY sdate, stime ;
NOTICE: QUERY PLAN:
Sort (cost=17.13..17.13 rows=1 width=16) (actual time=619140.18..619140.29 rows
=288 loops=1)
 - Index Scan using lan_pkey on lan (cost=0.00..17.12 rows=1 width=16) (ac
tual time=7564.44..619121.61 rows=288 loops=1)
Total runtime: 619140.76 msec

EXPLAIN

bsdb=# explain analyze SELECT DISTINCT sdate, stime, rbts from lan
WHERE ( ( bname = 'pluto' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate = '2004-07-21' ) AND ( sdate = '2004-07-21' ) ) ) 

ORDER BY sdate, stime ;
NOTICE: QUERY PLAN:

Unique (cost=17.13..17.14 rows=1 width=16) (actual time=610546.66..610564.31 rows=288 loops=1)
 - Sort (cost=17.13..17.13 rows=1 width=16) (actual time=610546.65..610546.75 rows=288 loops=1)
 - Index Scan using lan_pkey on lan (cost=0.00..17.12 rows=1 width=16) (actual time=7524.47..610533.50 rows=288 loops=1)

Total runtime: 610565.51 msec

EXPLAIN

pdb=# explain analyze SELECT ALL sdate, stime, rbts from lan WHERE ( ( bname = 'neptune' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate = '2004-07-21' ) AND ( sdate = '2004-07-21' ) ) ) ORDER BY sdate, stime ;

NOTICE: QUERY PLAN:

Sort (cost=17.13..17.13 rows=1 width=16) (actual time=1260756.66..1260756.76 rows=288 loops=1)
 - Index Scan using lan_pkey on lan (cost=0.00..17.12 rows=1 width=16) (actual time=7725.97..1260752.47 rows=288 loops=1)

Total runtime: 1260757.09 msec


pdb=# \d lan
 Table lan
 Column | Type | Modifiers
--+---+---
bname | character varying(64) | not null
sdate | date | not null
stime | integer | not null
cno | smallint | not null
pno | smallint | not null
rbts | bigint |
tbts | bigint |
u_inpkt | bigint |
u_outpkt | bigint |
m_inpkt | bigint |
m_outpkt | bigint |
b_inpkt | bigint |
b_outpkt | bigint |
Primary key: lan_pkey
Check constraints: lan_stime ((stime = 0) AND (stime  86400))
On 4/10/06, Joshua D. Drake [EMAIL PROTECTED] wrote:
Rajesh Kumar Mallah wrote: what is the query ? use LIMIT or a restricting where clause.
You could also use a cursor.Joshua D. Drake regds mallah. On 4/10/06, *soni de*  [EMAIL PROTECTED] mailto:
[EMAIL PROTECTED] wrote: Hello, I have difficulty in fetching the records from the database. Database table contains more than 1 GB data. For fetching the records it is taking more the 1 hour and that's why
 it is slowing down the performance. please provide some help regarding improving the performance and how do I run query so that records will be fetched in a less time.
--=== The PostgreSQL Company: Command Prompt, Inc. ===Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240Providing the most comprehensivePostgreSQL solutions since 1997
 http://www.commandprompt.com/


Re: [PERFORM] Stored Procedure Performance

2006-04-11 Thread hubert depesz lubaczewski
On 4/11/06, Simon Dale [EMAIL PROTECTED] wrote:













I'm trying to evaluate PostgreSQL as a database that
will have to store a high volume of data and access that data frequently. One
of the features on our wish list is to be able to use stored procedures to
access the data and I was wondering if it is usual for stored procedures to
perform slower on PostgreSQL than raw SQL?worry but your benchmark is completelly flawed.1st. the tables are empty. will you ever run the real code on empty tables?
2nd. do you really need a stored procedure for such a simple query?testing something that's far from real usage will not give you any good.return next will of course show up as slower than standard select. the thing is - will the relative slowness of return next matter to you when you will put more logic in the procedure?
depesz


Re: [PERFORM] Stored Procedure Performance

2006-04-11 Thread Rajesh Kumar Mallah
On 4/11/06, Simon Dale [EMAIL PROTECTED] wrote:



 Hi,



 I'm trying to evaluate PostgreSQL as a database that will have to store a
 high volume of data and access that data frequently. One of the features on
 our wish list is to be able to use stored procedures to access the data and
 I was wondering if it is usual for stored procedures to perform slower on
 PostgreSQL than raw SQL?


No.

RETURN NEXT keeps accumulating the data before returning.
I am not sure if any optimisations have been done to that effect.

In general functions are *NOT* slower than RAW SQL.

Regds
mallah.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Stored Procedure Performance

2006-04-11 Thread Richard Huxton

Rajesh Kumar Mallah wrote:

On 4/11/06, Simon Dale [EMAIL PROTECTED] wrote:


I'm trying to evaluate PostgreSQL as a database that will have to store a
high volume of data and access that data frequently. One of the features on
our wish list is to be able to use stored procedures to access the data and
I was wondering if it is usual for stored procedures to perform slower on
PostgreSQL than raw SQL?


No.

RETURN NEXT keeps accumulating the data before returning.
I am not sure if any optimisations have been done to that effect.

In general functions are *NOT* slower than RAW SQL.


Actually, in cases where there is a simple way to state the query in raw 
SQL then I'd expect that a procedural solution IS slower. After all, 
you're adding another layer of processing.


Of course, you normally wouldn't write a procedural solution to a simple 
query.


Added to this is the difference that plpgsql is planned once whereas raw 
sql will be planned on each query. This means you save planning costs 
with the plpgsql but have the chance to get better plans with the raw sql.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Takes too long to fetch the data from database

2006-04-11 Thread Richard Huxton

soni de wrote:

I have flushed the database, so currently records in the lan table are:
665280

but records can be increased more than 1GB and in that case it takes more
than 1 hour

Below is explain analyze output taken from the table having 665280 records

pdb=# explain analyze SELECT sdate, stime, rbts from lan WHERE (
 ( bname = 'pluto' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate
 = '2004-07-21' ) AND ( sdate = '2004-07-21' ) )  )  ORDER BY sdate, stime
;

NOTICE:  QUERY PLAN:
Sort  (cost=17.13..17.13 rows=1 width=16) (actual time=619140.18..619140.29rows
=288 loops=1)
  -  Index Scan using lan_pkey on lan  (cost=0.00..17.12 rows=1 width=16)
(actual time=7564.44..619121.61 rows=288 loops=1)

Total runtime: 619140.76 msec


OK - there is clearly something wrong here when you take 10 minutes to 
fetch 288 rows from an index.


1. VACUUM FULL VERBOSE lan;
2. test again, and if that doesn't work...
3. REINDEX TABLE lan;
4. test again

I'm guessing you have a *lot* of dead rows in there.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Restore performance?

2006-04-11 Thread Jesper Krogh
 Well, your pg_dump command lost your BLOBs since the plain text
 format doesn't support them.

Well, no.. they are stored as BYTEA not Large Objects.. They are encoded
in ASCII in the pg_dump output.

 But once you use the -Fc format on your dump and enable blob backups,
 you can speed up reloads by increasing your checkpoint segments to a big
 number like 256 and the checkpoint timeout to something like 10 minutes.
 All other normal tuning parameters should be what you plan
 to use for your normal operations, too.

Thanks.

Jesper
-- 
Jesper Krogh


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Takes too long to fetch the data from database

2006-04-11 Thread Merlin Moncure
 pdb=# explain analyze SELECT sdate, stime, rbts from lan WHERE (

  ( bname = 'pluto' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate

  = '2004-07-21' ) AND ( sdate = '2004-07-21' ) )  )  ORDER BY sdate, stime
 ;

this query would benefit from an index on
pluto, cno, pno, sdate

create index Ian_idx on Ian(bname, cno, pno, sdate);


 pdb=# explain analyze SELECT ALL sdate, stime, rbts from lan WHERE (  (
 bname = 'neptune' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate =
 '2004-07-21' ) AND ( sdate = '2004-07-21' ) )  )  ORDER BY sdate, stime ;

ditto above.  Generally, the closer the fields in the where clause are
matched by the index, the it will speed up your query.

Merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Stored Procedure Performance

2006-04-11 Thread Merlin Moncure
On 4/11/06, Simon Dale [EMAIL PROTECTED] wrote:
 I'm trying to evaluate PostgreSQL as a database that will have to store a
 high volume of data and access that data frequently. One of the features on
 our wish list is to be able to use stored procedures to access the data and
 I was wondering if it is usual for stored procedures to perform slower on
 PostgreSQL than raw SQL?

pl/pgsql procedures are a very thin layer over the query engine. 
Generally, they run about the same speed as SQL but you are not making
apples to apples comparison.  One of the few but annoying limitations
of pl/pgsql procedures is that you can't return a select directly from
the query engine but have to go through the return/return next
paradigm which will be slower than raw query for obvious reasons.

You can however return a refcursor and you may want to look at them in
situations where you want to return arbitrary sets outside the query
engine or between pl/pgsql functions.  An example of using refcurors
in that way is on my blog at
http://people.planetpostgresql.org/merlin/index.php?/archives/2-Dealing-With-Recursive-Sets-With-PLPGSQL.html

Generally, in my opinion if you want to really unlock the power of
postgresql you have to master pl/pgsql.  Go for it...it will work and
work well.

merlin

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Takes too long to fetch the data from database

2006-04-11 Thread Tom Lane
Richard Huxton dev@archonet.com writes:
 soni de wrote:
 NOTICE:  QUERY PLAN:
 Sort  (cost=17.13..17.13 rows=1 width=16) (actual 
 time=619140.18..619140.29rows
 =288 loops=1)
 -  Index Scan using lan_pkey on lan  (cost=0.00..17.12 rows=1 width=16)
 (actual time=7564.44..619121.61 rows=288 loops=1)
 
 Total runtime: 619140.76 msec

 OK - there is clearly something wrong here when you take 10 minutes to 
 fetch 288 rows from an index.

 I'm guessing you have a *lot* of dead rows in there.

Yeah.  The other small problem here is that EXPLAIN output hasn't looked
like that since PG 7.2 (unless Soni has just omitted the index-condition
lines).  I'd recommend updating to something modern.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Stored Procedure Performance

2006-04-11 Thread Christopher Browne
[EMAIL PROTECTED] (Simon Dale) wrote:
 p class=MsoNormalfont size=2 face=Arialspan style='font-size:10.0pt;
 font-family:Arial'Event with the planning removed, the function still 
 performs
 significantly slower than the raw SQL. Is that normal or am I doing something 
 wrong
 with the creation or calling of the
 function?o:p/o:p/span/font/p

I'd expect this, yes.

You're doing something via stored procedure logic that would be done
more directly via straight SQL; of course it won't be faster.

In effect, pl/pgsql involves (planning once) then running each line of
logic.  In effect, you replaced one query (select * from some table)
into 90 queries.  Yup, there's extra cost there.

There's not some magic by which stored procedures provide results
faster as a natural matter of course; the performance benefits
generally fall out of two improvements:

 1.  You eliminate client-to-server round trips.

A stored proc that runs 8 queries saves you 8 round trips over
submitting the 8 queries directly.  Saving you latency time.

 2.  You can eliminate the marshalling and transmission of unnecessary
 data.

   A stored proc that runs 8 queries, and only returns summarized
   results that all come from the last table queried will eliminate
   the need to marshall and transmit (possibly over a slow link) the
   data for the 7 preceding queries.

The case that you tried can benefit from neither of those effects;
your stored procedure eliminates NO round trips, and NO
marshalling/transmission.
-- 
(format nil [EMAIL PROTECTED] cbbrowne gmail.com)
http://linuxdatabases.info/info/rdbms.html
Rules of  the Evil Overlord  #228.  If the  hero claims he  wishes to
confess  in public  or to  me  personally, I  will remind  him that  a
notarized deposition will serve just as well.
http://www.eviloverlord.com/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Stored Procedure Performance

2006-04-11 Thread Alvaro Herrera
Merlin Moncure wrote:
 On 4/11/06, Simon Dale [EMAIL PROTECTED] wrote:
  I'm trying to evaluate PostgreSQL as a database that will have to store a
  high volume of data and access that data frequently. One of the features on
  our wish list is to be able to use stored procedures to access the data and
  I was wondering if it is usual for stored procedures to perform slower on
  PostgreSQL than raw SQL?
 
 pl/pgsql procedures are a very thin layer over the query engine. 
 Generally, they run about the same speed as SQL but you are not making
 apples to apples comparison.  One of the few but annoying limitations
 of pl/pgsql procedures is that you can't return a select directly from
 the query engine but have to go through the return/return next
 paradigm which will be slower than raw query for obvious reasons.

There's one problem that hasn't been mentioned.  For the optimizer a
PL/pgSQL function (really, a function in any language except SQL) is a
black box.  If you have a complex join of two or three functions, and
they don't return 1000 rows, it's very likely that the optimizer is
going to get it wrong.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Indexes with descending date columns

2006-04-11 Thread Markus Schaber
Hi, Bruce,

Bruce Momjian wrote:

Ahh. There's a hack to do that by defining a new opclass that reverses 
and , and then doing ORDER BY project_id, id, date USING new_opclass.

I think there's a TODO about this, but I'm not sure...
 
 Yes, and updated:
 
   * Allow the creation of indexes with mixed ascending/descending
 specifiers
   
 This is possible now by creating an operator class with reversed sort
 operators.  One complexity is that NULLs would then appear at the 
 start
 of the result set, and this might affect certain sort types, like
 merge join.

I think it would be better to allow index zig-zag scans for
multi-column index.[1]

So it traverses in a given order on the higher order column, and the sub
trees for each specific high order value is traversed in reversed order.
From my knowledge at least of BTrees, and given correct commutator
definitions, this should be not so complicated to implement.[2]

This would allow the query planner to use the same index for arbitrary
ASC/DESC combinations of the given columns.


Just a thought,
Markus


[1] It may make sense to implement the mixed specifiers on indices as
well, to allow CLUSTERing on mixed search order.

[2] But I admit that I currently don't have enough knowledge in
PostgreSQL index scan internals to know whether it really is easy to
implement.


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-11 Thread Rodrigo Sakai



 Hi,

 I think this is an old question, but I want 
to know if it really is well worth to not create some foreign 
keys an deal with the referential integrity at 
application-level?
 Specifically, the system we are developing 
is a server/cliente architecture that the server is the database and the fat 
client is an application developed in DELPHI!!!

 Thanks in 
advance!!


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-11 Thread Scott Marlowe
On Tue, 2006-04-11 at 14:13, Rodrigo Sakai wrote:
   Hi,
  
   I think this is an old question, but I want to know if it really is
 well worth to not create some foreign keys an deal with the
 referential integrity at application-level?
   Specifically, the system we are developing is a server/cliente
 architecture that the server is the database and the fat client is an
 application developed in DELPHI!!!
  

If ref integrity is important, you'll have to do it either in the app or
the database.

Almost always, it's faster to let the database do it, as there's less
traffic across the wire required to maintain ref integrity, plus, the
guys who wrote the database have spent years making sure race conditions
won't scram your data.

For simple, straight forward FK-PK relationships, you will likely NOT
be able to beat the database in terms of either reliability or
performance with your own code.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Encouraging multi-table join order

2006-04-11 Thread Dan Harris

Tom Lane wrote:

SNIP
So it's estimating 5775 cost units per probe into eventactivity, which
is pretty high --- it must think that a lot of rows will be retrieved by
the index (way more than the 20 or so it thinks will get past the filter
condition). 



 What does the pg_stats entry for eventactivity.incidentid
contain?
select * from pg_stats where tablename = 'eventactivity' and 
attname='incidentid';
schemaname |   tablename   |  attname   | null_frac | avg_width | 
n_distinct |
most_common_vals 
|
most_common_freqs
|  
histogram_bounds  | 
correlation

+---++---+---++-+-++-
public | eventactivity | incidentid | 0 |14 |   
8157 | 
{P043190299,P051560740,P052581036,P052830218,P053100679,P053190889,P060370845,P042070391,P042690319,P043290117} 
| 
{0.0017,0.0017,0.0017,0.0017,0.0017,0.0017,0.0017,0.0013,0.0013,0.0013} 
| 
{P022140319,P030471058,P033090308,P041961082,P042910689,P050311006,P051350254,P052261148,P053270945,P060240316,P061000287} 
|0.241737



  It might be worth increasing the statistics target for that
column to try to get a better estimate.
  
How high should I set this?  I read the default is 10, but I'm not sure 
if doubling this would make a difference or if I should be doing a much 
larger number. There's approx 45 million rows in the table, if that matters.



Thanks again,
Dan

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Stored Procedure Performance

2006-04-11 Thread Merlin Moncure
On 4/11/06, Alvaro Herrera [EMAIL PROTECTED] wrote:
 Merlin Moncure wrote:
  pl/pgsql procedures are a very thin layer over the query engine.
  Generally, they run about the same speed as SQL but you are not making
  apples to apples comparison.  One of the few but annoying limitations
  of pl/pgsql procedures is that you can't return a select directly from
  the query engine but have to go through the return/return next
  paradigm which will be slower than raw query for obvious reasons.

 There's one problem that hasn't been mentioned.  For the optimizer a
 PL/pgSQL function (really, a function in any language except SQL) is a
 black box.  If you have a complex join of two or three functions, and
 they don't return 1000 rows, it's very likely that the optimizer is
 going to get it wrong.

This doesn't bother me that much. Those cases usually have a high
overlap with views.You just have to plan on the function being fully
materialized before it is inovled further.  What drives me crazy is I
have to do 'select * from plpgsql_srf()' but I am allowed to do the
much friendlier and more versatile 'select sql_srf()', even if they do
more or less the same thing.

On the flip side, what drives me crazy about sql functions is that all
tables have to be in the search path for the validator.  Since I
frequently use the trick of having multiple schemas with one set of
functions this is annoying.

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Encouraging multi-table join order

2006-04-11 Thread Tom Lane
Dan Harris [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 What does the pg_stats entry for eventactivity.incidentid
 contain?

 {P043190299,P051560740,P052581036,P052830218,P053100679,P053190889,P060370845,P042070391,P042690319,P043290117}
  
 | 
 {0.0017,0.0017,0.0017,0.0017,0.0017,0.0017,0.0017,0.0013,0.0013,0.0013}
  

 How high should I set this?  I read the default is 10, but I'm not sure 
 if doubling this would make a difference or if I should be doing a much 
 larger number. There's approx 45 million rows in the table, if that matters.

What the stats entry is saying is that the most common entries occur
about 75000 times apiece (0.0017 * 45e6), which is what's scaring
the planner here ;-).  I think those frequencies are artificially high
though.  The default statistics sample size is 3000 rows (300 *
statistics target, actually), so those numbers correspond to 5 or 4
rows in the sample, which is probably just random chance.

Try increasing the stats targets for this table to 100, then re-ANALYZE
and see what you get.  The most_common_freqs entries might drop as much
as a factor of 10.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Sequencial scan instead of using index

2006-04-11 Thread markir
Quoting Harry Hehl [EMAIL PROTECTED]:

 Mark, 
 
 (snippage)However I am still getting seq scans on indexes for other queries
 
 For example:
 
 select * from omfile where ( objectid in ( select distinct(ref_oid)
 from
 ts ) ); 
 objectid  ref_oid are non-unique indexes 
 omimagefile  omclipfile inherit from omfile
 
 --
 --
 
 
  Nested Loop IN Join (cost=21432.32..951981.42 rows=204910 width=217)
  Join Filter: (outer.objectid = inner.ref_oid)
  - Append (cost=0.00..8454.10 rows=204910 width=217)
  - Seq Scan on omfile (cost=0.00..8428.20 rows=204320
 width=217)
  - Seq Scan on omimagefile omfile (cost=0.00..12.70 rows=270
 width=217)
  - Seq Scan on omclipfile omfile (cost=0.00..13.20 rows=320
 width=217)
  - Materialize (cost=21432.32..21434.32 rows=200 width=16)
  - Unique (cost=20614.91..21430.12 rows=200 width=16)
  - Sort (cost=20614.91..21022.52 rows=163041 width=16)
  Sort Key: ts.ref_oid
  - Seq Scan on ts (cost=0.00..3739.41 rows=163041
 width=16)
 
 (11 rows) 
 Time: 164.232 ms 
 
 BTW set enable_seqscan=off has no affect i.e still uses seq scans.
 
 If I do a simple query, it is very quick, no sequencial scans. 
 So how can I get index scans to work consistently with joins?
 
 explain select * from omfile where
 objectid='65ef0be3-bf02-46b6-bae9-5bd015ffdb79'; 
 
 
 
 
 
  Result (cost=2.00..7723.30 rows=102903 width=217)
  - Append (cost=2.00..7723.30 rows=102903 width=217)
  - Bitmap Heap Scan on omfile (cost=2.00..7697.60 rows=102608
 width=217)
  Recheck Cond: (objectid =
 '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
  - Bitmap Index Scan on omfile_objectid_idx
 (cost=0.00..2.00 rows=102608 width=0)
  Index Cond: (objectid =
 '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
  - Bitmap Heap Scan on omimagefile omfile (cost=1.00..12.69
 rows=135 width=217)
  Recheck Cond: (objectid =
 '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
  - Bitmap Index Scan on omimagefile_objectid_idx
 (cost=0.00..1.00 rows=135 width=0)
  Index Cond: (objectid =
 '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
  - Bitmap Heap Scan on omclipfile omfile (cost=1.00..13.00
 rows=160 width=217)
  Recheck Cond: (objectid =
 '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
  - Bitmap Index Scan on omclipfile_objectid_idx
 (cost=0.00..1.00 rows=160 width=0)
  Index Cond: (objectid =
 '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
 
 (14 rows) 
 Time: 5.164
 


Hmm - that first query needs to do a sort, so you might want to experiment with
the sort_mem parameter. Could you show us output from explain analyze for both
the above queries?

At face value, selecting 20 rows (assuming the estimates are accurate) may
mean that a seqscan is the best plan! But we'll know more after seeing the
explain analyze...

Cheers


Mark

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Sequencial scan instead of using index

2006-04-11 Thread markir
Quoting [EMAIL PROTECTED] [EMAIL PROTECTED]:


 Hmm - that first query needs to do a sort, so you might want to
 experiment with
 the sort_mem parameter

Oops - I mean work_mem...

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Sequencial scan instead of using index

2006-04-11 Thread Harry Hehl
Mark, 

If you can upgrade to 8.1.(3), then the planner can consider paths that

use *both* the indexes on srcobj and dstobj (which would probably be
the 
business!).

Yes, 8.1.3 resolved this issue. Thanks.

However I am still getting seq scans on indexes for other queries

For example:

select * from omfile where ( objectid in ( select distinct(ref_oid) from
ts ) ); 
objectid  ref_oid are non-unique indexes 
omimagefile  omclipfile inherit from omfile




 Nested Loop IN Join  (cost=21432.32..951981.42 rows=204910 width=217)
   Join Filter: (outer.objectid = inner.ref_oid)
   -  Append  (cost=0.00..8454.10 rows=204910 width=217)
 -  Seq Scan on omfile  (cost=0.00..8428.20 rows=204320
width=217)
 -  Seq Scan on omimagefile omfile  (cost=0.00..12.70 rows=270
width=217)
 -  Seq Scan on omclipfile omfile  (cost=0.00..13.20 rows=320
width=217)
   -  Materialize  (cost=21432.32..21434.32 rows=200 width=16)
 -  Unique  (cost=20614.91..21430.12 rows=200 width=16)
   -  Sort  (cost=20614.91..21022.52 rows=163041 width=16)
 Sort Key: ts.ref_oid
 -  Seq Scan on ts  (cost=0.00..3739.41 rows=163041
width=16)

(11 rows) 
Time: 164.232 ms 

BTW set enable_seqscan=off has no affect i.e still uses seq scans.

If I do a simple query, it is very quick, no sequencial scans. 
So how can I get index scans to work consistently with joins?

explain select * from omfile where
objectid='65ef0be3-bf02-46b6-bae9-5bd015ffdb79'; 




 Result  (cost=2.00..7723.30 rows=102903 width=217)
   -  Append  (cost=2.00..7723.30 rows=102903 width=217)
 -  Bitmap Heap Scan on omfile  (cost=2.00..7697.60 rows=102608
width=217)
   Recheck Cond: (objectid =
'65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
   -  Bitmap Index Scan on omfile_objectid_idx
(cost=0.00..2.00 rows=102608 width=0)
 Index Cond: (objectid =
'65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
 -  Bitmap Heap Scan on omimagefile omfile  (cost=1.00..12.69
rows=135 width=217)
   Recheck Cond: (objectid =
'65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
   -  Bitmap Index Scan on omimagefile_objectid_idx
(cost=0.00..1.00 rows=135 width=0)
 Index Cond: (objectid =
'65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
 -  Bitmap Heap Scan on omclipfile omfile  (cost=1.00..13.00
rows=160 width=217)
   Recheck Cond: (objectid =
'65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
   -  Bitmap Index Scan on omclipfile_objectid_idx
(cost=0.00..1.00 rows=160 width=0)
 Index Cond: (objectid =
'65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)

(14 rows) 
Time: 5.164



-Original Message-
From: Mark Kirkwood [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 07, 2006 12:04 AM
To: Harry Hehl
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Sequencial scan instead of using index

Harry Hehl wrote:
 There seems to be many posts on this issue but I not yet found an
answer to the seq scan issue.
 
 I am having an issue with a joins. I am using 8.0.3 on FC4
 
 Query: select * from ommemberrelation where srcobj='somevalue' and 
 dstobj in (select objectid from omfilesysentry where 
 name='dir15_file80');
 
 Columns srcobj, dstobj  name are all indexed.
 
 

The planner is over-estimating the number of rows here (33989 vs 100):

-  Seq Scan on ommemberrelation  (cost=0.00..2394.72 rows=33989
width=177) (actual time=0.078..70.887 rows=100 loops=1)

The usual way to attack this is to up the sample size for ANALYZE:

ALTER TABLE ommemberrelation ALTER COLUMN srcobj SET STATISTICS 100;
ALTER TABLE ommemberrelation ALTER COLUMN dstobj SET STATISTICS 100;
-- or even 1000.
ANALYZE ommemberrelation;

Then try EXPLAIN ANALYZE again.


If you can upgrade to 8.1.(3), then the planner can consider paths that 
use *both* the indexes on srcobj and dstobj (which would probably be the

business!).

Cheers

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Sequencial scan instead of using index

2006-04-11 Thread Tom Lane
Harry Hehl [EMAIL PROTECTED] writes:
  Nested Loop IN Join  (cost=21432.32..951981.42 rows=204910 width=217)
Join Filter: (outer.objectid = inner.ref_oid)
-  Append  (cost=0.00..8454.10 rows=204910 width=217)
  -  Seq Scan on omfile  (cost=0.00..8428.20 rows=204320
 width=217)
  -  Seq Scan on omimagefile omfile  (cost=0.00..12.70 rows=270
 width=217)
  -  Seq Scan on omclipfile omfile  (cost=0.00..13.20 rows=320
 width=217)
-  Materialize  (cost=21432.32..21434.32 rows=200 width=16)
  -  Unique  (cost=20614.91..21430.12 rows=200 width=16)
-  Sort  (cost=20614.91..21022.52 rows=163041 width=16)
  Sort Key: ts.ref_oid
  -  Seq Scan on ts  (cost=0.00..3739.41 rows=163041
 width=16)

 (11 rows) 
 Time: 164.232 ms 

 So how can I get index scans to work consistently with joins?

It's not the join that's the problem, it's the inheritance.  I recently
improved the planner so that it can consider appended indexscans for an
inheritance tree on the inside of a join, but no pre-8.2 release can do
it.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-11 Thread Michael Glaesemann


On Apr 12, 2006, at 4:13 , Rodrigo Sakai wrote:

  I think this is an old question, but I want to know if it really  
is well worth to not create some foreign keys an deal with the  
referential integrity at application-level?


If I had to choose between one or the other, I'd leave all  
referential integrity in the database and deal with the errors thrown  
when referential integrity is violated in the application. PostgreSQL  
is designed to handle these kinds of issues. Anything you code in  
your application is more likely to contain bugs or miss corner cases  
that would allow referential integrity to be violated. PostgreSQL has  
been pounded on for years by a great many users and developers,  
making the likelihood of bugs still remaining much smaller.


Of course, you can add some referential integrity checks in your  
application code, but those should be in addition to your database- 
level checks.


Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-11 Thread Jim C. Nasby
On Wed, Apr 12, 2006 at 08:06:17AM +0900, Michael Glaesemann wrote:
 
 On Apr 12, 2006, at 4:13 , Rodrigo Sakai wrote:
 
   I think this is an old question, but I want to know if it really  
 is well worth to not create some foreign keys an deal with the  
 referential integrity at application-level?
 
 If I had to choose between one or the other, I'd leave all  
 referential integrity in the database and deal with the errors thrown  
 when referential integrity is violated in the application. PostgreSQL  
 is designed to handle these kinds of issues. Anything you code in  
 your application is more likely to contain bugs or miss corner cases  
 that would allow referential integrity to be violated. PostgreSQL has  
 been pounded on for years by a great many users and developers,  
 making the likelihood of bugs still remaining much smaller.

It's also pretty unlikely that you can make RI in the application
perform better than in the database.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-11 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 On Wed, Apr 12, 2006 at 08:06:17AM +0900, Michael Glaesemann wrote:
 ... Anything you code in  
 your application is more likely to contain bugs or miss corner cases  
 that would allow referential integrity to be violated. PostgreSQL has  
 been pounded on for years by a great many users and developers,  
 making the likelihood of bugs still remaining much smaller.

 It's also pretty unlikely that you can make RI in the application
 perform better than in the database.

I think the traditional assumption among the you should do RI in the
application crowd is that the application has higher-level knowledge
that lets it understand when it can skip doing an RI check entirely.
Skipping an RI check is always faster than doing it --- so that's right,
it's faster.  As long as you don't make any mistakes.

The question you have to ask yourself is whether you are really that
smart ... not just today, but every single time.  To quote Clint
Eastwood: Do you feel lucky punk?  Well, do you?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly