Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-11 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> On Wed, Apr 12, 2006 at 08:06:17AM +0900, Michael Glaesemann wrote:
>> ... Anything you code in  
>> your application is more likely to contain bugs or miss corner cases  
>> that would allow referential integrity to be violated. PostgreSQL has  
>> been pounded on for years by a great many users and developers,  
>> making the likelihood of bugs still remaining much smaller.

> It's also pretty unlikely that you can make RI in the application
> perform better than in the database.

I think the traditional assumption among the "you should do RI in the
application" crowd is that the application has higher-level knowledge
that lets it understand when it can skip doing an RI check entirely.
Skipping an RI check is always faster than doing it --- so that's right,
it's faster.  As long as you don't make any mistakes.

The question you have to ask yourself is whether you are really that
smart ... not just today, but every single time.  To quote Clint
Eastwood: "Do you feel lucky punk?  Well, do you?"

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Restore performance?

2006-04-11 Thread Christopher Kings-Lynne

Well, your pg_dump command lost your BLOBs since the plain text
format doesn't support them.


Well, no.. they are stored as BYTEA not Large Objects.. They are encoded
in ASCII in the pg_dump output.


As a side note: plain text dump format in 8.1 supprts LOBs


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-11 Thread Jim C. Nasby
On Wed, Apr 12, 2006 at 08:06:17AM +0900, Michael Glaesemann wrote:
> 
> On Apr 12, 2006, at 4:13 , Rodrigo Sakai wrote:
> 
> >  I think this is an old question, but I want to know if it really  
> >is well worth to not create some foreign keys an deal with the  
> >referential integrity at application-level?
> 
> If I had to choose between one or the other, I'd leave all  
> referential integrity in the database and deal with the errors thrown  
> when referential integrity is violated in the application. PostgreSQL  
> is designed to handle these kinds of issues. Anything you code in  
> your application is more likely to contain bugs or miss corner cases  
> that would allow referential integrity to be violated. PostgreSQL has  
> been pounded on for years by a great many users and developers,  
> making the likelihood of bugs still remaining much smaller.

It's also pretty unlikely that you can make RI in the application
perform better than in the database.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-11 Thread Michael Glaesemann


On Apr 12, 2006, at 4:13 , Rodrigo Sakai wrote:

  I think this is an old question, but I want to know if it really  
is well worth to not create some foreign keys an deal with the  
referential integrity at application-level?


If I had to choose between one or the other, I'd leave all  
referential integrity in the database and deal with the errors thrown  
when referential integrity is violated in the application. PostgreSQL  
is designed to handle these kinds of issues. Anything you code in  
your application is more likely to contain bugs or miss corner cases  
that would allow referential integrity to be violated. PostgreSQL has  
been pounded on for years by a great many users and developers,  
making the likelihood of bugs still remaining much smaller.


Of course, you can add some referential integrity checks in your  
application code, but those should be in addition to your database- 
level checks.


Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] freebsd/softupdates for data dir

2006-04-11 Thread Jim C. Nasby
On Thu, Apr 06, 2006 at 09:45:34AM -0400, Vivek Khera wrote:
> 
> On Apr 5, 2006, at 6:07 PM, Jim Nasby wrote:
> 
> >
> >More importantly, it allows the system to come up and do fsck in  
> >the background. If you've got a large database that's a pretty big  
> >benefit.
> 
> That's a UFS2 feature, not a soft-updates feature.

It's both. You can't background fsck with softupdates disabled.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] pgmemcache

2006-04-11 Thread Jim C. Nasby
On Tue, Apr 04, 2006 at 12:24:42AM -0700, C Storm wrote:
> I was wondering if anyone on the list has a successful installation of
> pgmemcache running
> that uses LISTEN/NOTIFY to signal a successfully completed transaction,
> i.e., to get around the fact
> that TRIGGERS are transaction unaware.  Or perhaps any other
> information regarding a successful
> deployment of pgmemcache.

The problem with attempting that is that you'd have a window between
transaction commit and when the cache was invalidated. If that's
acceptable then it shouldn't be too difficult to set something up using
LISTEN/NOTIFY like you describe.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Sequencial scan instead of using index

2006-04-11 Thread Tom Lane
"Harry Hehl" <[EMAIL PROTECTED]> writes:
>  Nested Loop IN Join  (cost=21432.32..951981.42 rows=204910 width=217)
>Join Filter: ("outer".objectid = "inner".ref_oid)
>->  Append  (cost=0.00..8454.10 rows=204910 width=217)
>  ->  Seq Scan on omfile  (cost=0.00..8428.20 rows=204320
> width=217)
>  ->  Seq Scan on omimagefile omfile  (cost=0.00..12.70 rows=270
> width=217)
>  ->  Seq Scan on omclipfile omfile  (cost=0.00..13.20 rows=320
> width=217)
>->  Materialize  (cost=21432.32..21434.32 rows=200 width=16)
>  ->  Unique  (cost=20614.91..21430.12 rows=200 width=16)
>->  Sort  (cost=20614.91..21022.52 rows=163041 width=16)
>  Sort Key: ts.ref_oid
>  ->  Seq Scan on ts  (cost=0.00..3739.41 rows=163041
> width=16)

> (11 rows) 
> Time: 164.232 ms 

> So how can I get index scans to work consistently with joins?

It's not the join that's the problem, it's the inheritance.  I recently
improved the planner so that it can consider appended indexscans for an
inheritance tree on the inside of a join, but no pre-8.2 release can do
it.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Sequencial scan instead of using index

2006-04-11 Thread Harry Hehl
Mark, 

>If you can upgrade to 8.1.(3), then the planner can consider paths that

>use *both* the indexes on srcobj and dstobj (which would probably be
the 
>business!).

Yes, 8.1.3 resolved this issue. Thanks.

However I am still getting seq scans on indexes for other queries

For example:

select * from omfile where ( objectid in ( select distinct(ref_oid) from
ts ) ); 
objectid & ref_oid are non-unique indexes 
omimagefile & omclipfile inherit from omfile




 Nested Loop IN Join  (cost=21432.32..951981.42 rows=204910 width=217)
   Join Filter: ("outer".objectid = "inner".ref_oid)
   ->  Append  (cost=0.00..8454.10 rows=204910 width=217)
 ->  Seq Scan on omfile  (cost=0.00..8428.20 rows=204320
width=217)
 ->  Seq Scan on omimagefile omfile  (cost=0.00..12.70 rows=270
width=217)
 ->  Seq Scan on omclipfile omfile  (cost=0.00..13.20 rows=320
width=217)
   ->  Materialize  (cost=21432.32..21434.32 rows=200 width=16)
 ->  Unique  (cost=20614.91..21430.12 rows=200 width=16)
   ->  Sort  (cost=20614.91..21022.52 rows=163041 width=16)
 Sort Key: ts.ref_oid
 ->  Seq Scan on ts  (cost=0.00..3739.41 rows=163041
width=16)

(11 rows) 
Time: 164.232 ms 

BTW set enable_seqscan=off has no affect i.e still uses seq scans.

If I do a simple query, it is very quick, no sequencial scans. 
So how can I get index scans to work consistently with joins?

explain select * from omfile where
objectid='65ef0be3-bf02-46b6-bae9-5bd015ffdb79'; 




 Result  (cost=2.00..7723.30 rows=102903 width=217)
   ->  Append  (cost=2.00..7723.30 rows=102903 width=217)
 ->  Bitmap Heap Scan on omfile  (cost=2.00..7697.60 rows=102608
width=217)
   Recheck Cond: (objectid =
'65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
   ->  Bitmap Index Scan on omfile_objectid_idx
(cost=0.00..2.00 rows=102608 width=0)
 Index Cond: (objectid =
'65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
 ->  Bitmap Heap Scan on omimagefile omfile  (cost=1.00..12.69
rows=135 width=217)
   Recheck Cond: (objectid =
'65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
   ->  Bitmap Index Scan on omimagefile_objectid_idx
(cost=0.00..1.00 rows=135 width=0)
 Index Cond: (objectid =
'65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
 ->  Bitmap Heap Scan on omclipfile omfile  (cost=1.00..13.00
rows=160 width=217)
   Recheck Cond: (objectid =
'65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
   ->  Bitmap Index Scan on omclipfile_objectid_idx
(cost=0.00..1.00 rows=160 width=0)
 Index Cond: (objectid =
'65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)

(14 rows) 
Time: 5.164



-Original Message-
From: Mark Kirkwood [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 07, 2006 12:04 AM
To: Harry Hehl
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Sequencial scan instead of using index

Harry Hehl wrote:
> There seems to be many posts on this issue but I not yet found an
answer to the seq scan issue.
> 
> I am having an issue with a joins. I am using 8.0.3 on FC4
> 
> Query: select * from ommemberrelation where srcobj='somevalue' and 
> dstobj in (select objectid from omfilesysentry where 
> name='dir15_file80');
> 
> Columns srcobj, dstobj & name are all indexed.
> 
> 

The planner is over-estimating the number of rows here (33989 vs 100):

->  Seq Scan on ommemberrelation  (cost=0.00..2394.72 rows=33989
width=177) (actual time=0.078..70.887 rows=100 loops=1)

The usual way to attack this is to up the sample size for ANALYZE:

ALTER TABLE ommemberrelation ALTER COLUMN srcobj SET STATISTICS 100;
ALTER TABLE ommemberrelation ALTER COLUMN dstobj SET STATISTICS 100;
-- or even 1000.
ANALYZE ommemberrelation;

Then try EXPLAIN ANALYZE again.


If you can upgrade to 8.1.(3), then the planner can consider paths that 
use *both* the indexes on srcobj and dstobj (which would probably be the

business!).

Cheers

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Sequencial scan instead of using index

2006-04-11 Thread markir
Quoting "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>:


> Hmm - that first query needs to do a sort, so you might want to
> experiment with
> the sort_mem parameter

Oops - I mean work_mem...

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Sequencial scan instead of using index

2006-04-11 Thread markir
Quoting Harry Hehl <[EMAIL PROTECTED]>:

> Mark, 
> 
> (snippage)However I am still getting seq scans on indexes for other queries
> 
> For example:
> 
> select * from omfile where ( objectid in ( select distinct(ref_oid)
> from
> ts ) ); 
> objectid & ref_oid are non-unique indexes 
> omimagefile & omclipfile inherit from omfile
> 
> --
> --
> 
> 
>  Nested Loop IN Join (cost=21432.32..951981.42 rows=204910 width=217)
>  Join Filter: ("outer".objectid = "inner".ref_oid)
>  -> Append (cost=0.00..8454.10 rows=204910 width=217)
>  -> Seq Scan on omfile (cost=0.00..8428.20 rows=204320
> width=217)
>  -> Seq Scan on omimagefile omfile (cost=0.00..12.70 rows=270
> width=217)
>  -> Seq Scan on omclipfile omfile (cost=0.00..13.20 rows=320
> width=217)
>  -> Materialize (cost=21432.32..21434.32 rows=200 width=16)
>  -> Unique (cost=20614.91..21430.12 rows=200 width=16)
>  -> Sort (cost=20614.91..21022.52 rows=163041 width=16)
>  Sort Key: ts.ref_oid
>  -> Seq Scan on ts (cost=0.00..3739.41 rows=163041
> width=16)
> 
> (11 rows) 
> Time: 164.232 ms 
> 
> BTW set enable_seqscan=off has no affect i.e still uses seq scans.
> 
> If I do a simple query, it is very quick, no sequencial scans. 
> So how can I get index scans to work consistently with joins?
> 
> explain select * from omfile where
> objectid='65ef0be3-bf02-46b6-bae9-5bd015ffdb79'; 
> 
> 
> 
> 
> 
>  Result (cost=2.00..7723.30 rows=102903 width=217)
>  -> Append (cost=2.00..7723.30 rows=102903 width=217)
>  -> Bitmap Heap Scan on omfile (cost=2.00..7697.60 rows=102608
> width=217)
>  Recheck Cond: (objectid =
> '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
>  -> Bitmap Index Scan on omfile_objectid_idx
> (cost=0.00..2.00 rows=102608 width=0)
>  Index Cond: (objectid =
> '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
>  -> Bitmap Heap Scan on omimagefile omfile (cost=1.00..12.69
> rows=135 width=217)
>  Recheck Cond: (objectid =
> '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
>  -> Bitmap Index Scan on omimagefile_objectid_idx
> (cost=0.00..1.00 rows=135 width=0)
>  Index Cond: (objectid =
> '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
>  -> Bitmap Heap Scan on omclipfile omfile (cost=1.00..13.00
> rows=160 width=217)
>  Recheck Cond: (objectid =
> '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
>  -> Bitmap Index Scan on omclipfile_objectid_idx
> (cost=0.00..1.00 rows=160 width=0)
>  Index Cond: (objectid =
> '65ef0be3-bf02-46b6-bae9-5bd015ffdb79'::capsa_sys.uuid)
> 
> (14 rows) 
> Time: 5.164
> 
>

Hmm - that first query needs to do a sort, so you might want to experiment with
the sort_mem parameter. Could you show us output from explain analyze for both
the above queries?

At face value, selecting 20 rows (assuming the estimates are accurate) may
mean that a seqscan is the best plan! But we'll know more after seeing the
explain analyze...

Cheers


Mark

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Encouraging multi-table join order

2006-04-11 Thread Tom Lane
Dan Harris <[EMAIL PROTECTED]> writes:
> I wonder why the estimates were so far off the first time?  This table 
> has been ANALYZED regularly ever since creation.

Probably just that you need a bigger sample size for such a large table.
We've been arguing ever since 7.2 about what the default statistics
target ought to be --- a lot of people think 10 is too small.  (It could
also be that the fixed 300X multiplier ought to depend on table size
instead.  The math that told us 300X was OK was really about getting the
histogram right, not about whether the most-common-values stats would be
any good.)

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Encouraging multi-table join order

2006-04-11 Thread Dan Harris

Tom Lane wrote:

What the stats entry is saying is that the most common entries occur
about 75000 times apiece (0.0017 * 45e6), which is what's scaring
the planner here ;-).  I think those frequencies are artificially high
though.  The default statistics sample size is 3000 rows (300 *
statistics target, actually), so those numbers correspond to 5 or 4
rows in the sample, which is probably just random chance.

Try increasing the stats targets for this table to 100, then re-ANALYZE
and see what you get.  The most_common_freqs entries might drop as much
as a factor of 10.

regards, tom lane
  


Tom:

I believe this was the problem.  I upped the statistics to 100, for a 
sample size of 30k and now the planner does the correct nested 
loop/index scan and takes only 30 seconds!  This is a HUGE performance 
increase.


I wonder why the estimates were so far off the first time?  This table 
has been ANALYZED regularly ever since creation.


Once again, thank you and all of the developers for your hard work on 
PostgreSQL.  This is by far the most pleasant management experience of 
any database I've worked on.


-Dan


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Encouraging multi-table join order

2006-04-11 Thread Tom Lane
Dan Harris <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> What does the pg_stats entry for eventactivity.incidentid
>> contain?

> {P043190299,P051560740,P052581036,P052830218,P053100679,P053190889,P060370845,P042070391,P042690319,P043290117}
>  
> | 
> {0.0017,0.0017,0.0017,0.0017,0.0017,0.0017,0.0017,0.0013,0.0013,0.0013}
>  

> How high should I set this?  I read the default is 10, but I'm not sure 
> if doubling this would make a difference or if I should be doing a much 
> larger number. There's approx 45 million rows in the table, if that matters.

What the stats entry is saying is that the most common entries occur
about 75000 times apiece (0.0017 * 45e6), which is what's scaring
the planner here ;-).  I think those frequencies are artificially high
though.  The default statistics sample size is 3000 rows (300 *
statistics target, actually), so those numbers correspond to 5 or 4
rows in the sample, which is probably just random chance.

Try increasing the stats targets for this table to 100, then re-ANALYZE
and see what you get.  The most_common_freqs entries might drop as much
as a factor of 10.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Stored Procedure Performance

2006-04-11 Thread Merlin Moncure
On 4/11/06, Alvaro Herrera <[EMAIL PROTECTED]> wrote:
> Merlin Moncure wrote:
> > pl/pgsql procedures are a very thin layer over the query engine.
> > Generally, they run about the same speed as SQL but you are not making
> > apples to apples comparison.  One of the few but annoying limitations
> > of pl/pgsql procedures is that you can't return a select directly from
> > the query engine but have to go through the return/return next
> > paradigm which will be slower than raw query for obvious reasons.
>
> There's one problem that hasn't been mentioned.  For the optimizer a
> PL/pgSQL function (really, a function in any language except SQL) is a
> black box.  If you have a complex join of two or three functions, and
> they don't return 1000 rows, it's very likely that the optimizer is
> going to get it wrong.

This doesn't bother me that much. Those cases usually have a high
overlap with views.You just have to plan on the function being fully
materialized before it is inovled further.  What drives me crazy is I
have to do 'select * from plpgsql_srf()' but I am allowed to do the
much friendlier and more versatile 'select sql_srf()', even if they do
more or less the same thing.

On the flip side, what drives me crazy about sql functions is that all
tables have to be in the search path for the validator.  Since I
frequently use the trick of having multiple schemas with one set of
functions this is annoying.

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Encouraging multi-table join order

2006-04-11 Thread Dan Harris

Tom Lane wrote:


So it's estimating 5775 cost units per probe into eventactivity, which
is pretty high --- it must think that a lot of rows will be retrieved by
the index (way more than the 20 or so it thinks will get past the filter
condition). 



 What does the pg_stats entry for eventactivity.incidentid
contain?
select * from pg_stats where tablename = 'eventactivity' and 
attname='incidentid';
schemaname |   tablename   |  attname   | null_frac | avg_width | 
n_distinct |
most_common_vals 
|
most_common_freqs
|  
histogram_bounds  | 
correlation

+---++---+---++-+-++-
public | eventactivity | incidentid | 0 |14 |   
8157 | 
{P043190299,P051560740,P052581036,P052830218,P053100679,P053190889,P060370845,P042070391,P042690319,P043290117} 
| 
{0.0017,0.0017,0.0017,0.0017,0.0017,0.0017,0.0017,0.0013,0.0013,0.0013} 
| 
{P022140319,P030471058,P033090308,P041961082,P042910689,P050311006,P051350254,P052261148,P053270945,P060240316,P061000287} 
|0.241737



  It might be worth increasing the statistics target for that
column to try to get a better estimate.
  
How high should I set this?  I read the default is 10, but I'm not sure 
if doubling this would make a difference or if I should be doing a much 
larger number. There's approx 45 million rows in the table, if that matters.



Thanks again,
Dan

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-11 Thread Scott Marlowe
On Tue, 2006-04-11 at 14:13, Rodrigo Sakai wrote:
>   Hi,
>  
>   I think this is an old question, but I want to know if it really is
> well worth to not create some foreign keys an deal with the
> referential integrity at application-level?
>   Specifically, the system we are developing is a server/cliente
> architecture that the server is the database and the fat client is an
> application developed in DELPHI!!!
>  

If ref integrity is important, you'll have to do it either in the app or
the database.

Almost always, it's faster to let the database do it, as there's less
traffic across the wire required to maintain ref integrity, plus, the
guys who wrote the database have spent years making sure race conditions
won't scram your data.

For simple, straight forward FK->PK relationships, you will likely NOT
be able to beat the database in terms of either reliability or
performance with your own code.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-11 Thread Rodrigo Sakai



  Hi,
 
  I think this is an old question, but I want 
to know if it really is well worth to not create some foreign 
keys an deal with the referential integrity at 
application-level?
  Specifically, the system we are developing 
is a server/cliente architecture that the server is the database and the fat 
client is an application developed in DELPHI!!!
 
  Thanks in 
advance!!


Re: [PERFORM] Indexes with descending date columns

2006-04-11 Thread Markus Schaber
Hi, Bruce,

Bruce Momjian wrote:

>>Ahh. There's a hack to do that by defining a new opclass that reverses <
>>and >, and then doing ORDER BY project_id, id, date USING new_opclass.
>>
>>I think there's a TODO about this, but I'm not sure...
> 
> Yes, and updated:
> 
>   * Allow the creation of indexes with mixed ascending/descending
> specifiers
>   
> This is possible now by creating an operator class with reversed sort
> operators.  One complexity is that NULLs would then appear at the 
> start
> of the result set, and this might affect certain sort types, like
> merge join.

I think it would be better to allow "index zig-zag scans" for
multi-column index.[1]

So it traverses in a given order on the higher order column, and the sub
trees for each specific high order value is traversed in reversed order.
>From my knowledge at least of BTrees, and given correct commutator
definitions, this should be not so complicated to implement.[2]

This would allow the query planner to use the same index for arbitrary
ASC/DESC combinations of the given columns.


Just a thought,
Markus


[1] It may make sense to implement the mixed specifiers on indices as
well, to allow CLUSTERing on mixed search order.

[2] But I admit that I currently don't have enough knowledge in
PostgreSQL index scan internals to know whether it really is easy to
implement.


-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Stored Procedure Performance

2006-04-11 Thread Alvaro Herrera
Merlin Moncure wrote:
> On 4/11/06, Simon Dale <[EMAIL PROTECTED]> wrote:
> > I'm trying to evaluate PostgreSQL as a database that will have to store a
> > high volume of data and access that data frequently. One of the features on
> > our wish list is to be able to use stored procedures to access the data and
> > I was wondering if it is usual for stored procedures to perform slower on
> > PostgreSQL than raw SQL?
> 
> pl/pgsql procedures are a very thin layer over the query engine. 
> Generally, they run about the same speed as SQL but you are not making
> apples to apples comparison.  One of the few but annoying limitations
> of pl/pgsql procedures is that you can't return a select directly from
> the query engine but have to go through the return/return next
> paradigm which will be slower than raw query for obvious reasons.

There's one problem that hasn't been mentioned.  For the optimizer a
PL/pgSQL function (really, a function in any language except SQL) is a
black box.  If you have a complex join of two or three functions, and
they don't return 1000 rows, it's very likely that the optimizer is
going to get it wrong.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Stored Procedure Performance

2006-04-11 Thread H.J. Sanders

Hello

At my little machine (pentium 4, 2.8 Ghz, 256 Mb RAM, Suse linux 9)
I can process about 10 records a minute using the next setup:

begin work

begin for
processing
if 10.000 records processed:
commit work
begin work
end if
end for

commit work (!)

Regards

Henk Sanders


 

> -Oorspronkelijk bericht-
> Van: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Merlin Moncure
> Verzonden: dinsdag 11 april 2006 15:50
> Aan: Simon Dale
> CC: pgsql-performance@postgresql.org
> Onderwerp: Re: [PERFORM] Stored Procedure Performance
> 
> 
> On 4/11/06, Simon Dale <[EMAIL PROTECTED]> wrote:
> > I'm trying to evaluate PostgreSQL as a database that will have to store a
> > high volume of data and access that data frequently. One of the features on
> > our wish list is to be able to use stored procedures to access the data and
> > I was wondering if it is usual for stored procedures to perform slower on
> > PostgreSQL than raw SQL?
> 
> pl/pgsql procedures are a very thin layer over the query engine. 
> Generally, they run about the same speed as SQL but you are not making
> apples to apples comparison.  One of the few but annoying limitations
> of pl/pgsql procedures is that you can't return a select directly from
> the query engine but have to go through the return/return next
> paradigm which will be slower than raw query for obvious reasons.
> 
> You can however return a refcursor and you may want to look at them in
> situations where you want to return arbitrary sets outside the query
> engine or between pl/pgsql functions.  An example of using refcurors
> in that way is on my blog at
> http://people.planetpostgresql.org/merlin/index.php?/archives/2-Dealing-With-Recursive-Sets-With-PLPGSQL.html
> 
> Generally, in my opinion if you want to really unlock the power of
> postgresql you have to master pl/pgsql.  Go for it...it will work and
> work well.
> 
> merlin
> 
> merlin
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings
> 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Stored Procedure Performance

2006-04-11 Thread Christopher Browne
[EMAIL PROTECTED] ("Simon Dale") wrote:
> Event with the planning removed, the function still 
> performs
> significantly slower than the raw SQL. Is that normal or am I doing something 
> wrong
> with the creation or calling of the
> function?

I'd expect this, yes.

You're doing something via "stored procedure logic" that would be done
more directly via straight SQL; of course it won't be faster.

In effect, pl/pgsql involves (planning once) then running each line of
logic.  In effect, you replaced one query (select * from some table)
into 90 queries.  Yup, there's extra cost there.

There's not some "magic" by which stored procedures provide results
faster as a natural "matter of course;" the performance benefits
generally fall out of two improvements:

 1.  You eliminate client-to-server round trips.

A stored proc that runs 8 queries saves you 8 round trips over
submitting the 8 queries directly.  Saving you latency time.

 2.  You can eliminate the marshalling and transmission of unnecessary
 data.

   A stored proc that runs 8 queries, and only returns summarized
   results that all come from the last table queried will eliminate
   the need to marshall and transmit (possibly over a slow link) the
   data for the 7 preceding queries.

The case that you tried can benefit from neither of those effects;
your stored procedure eliminates NO round trips, and NO
marshalling/transmission.
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "gmail.com")
http://linuxdatabases.info/info/rdbms.html
Rules of  the Evil Overlord  #228.  "If the  hero claims he  wishes to
confess  in public  or to  me  personally, I  will remind  him that  a
notarized deposition will serve just as well."


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Takes too long to fetch the data from database

2006-04-11 Thread Tom Lane
Richard Huxton  writes:
> soni de wrote:
>> NOTICE:  QUERY PLAN:
>> Sort  (cost=17.13..17.13 rows=1 width=16) (actual 
>> time=619140.18..619140.29rows
>> =288 loops=1)
>> ->  Index Scan using lan_pkey on lan  (cost=0.00..17.12 rows=1 width=16)
>> (actual time=7564.44..619121.61 rows=288 loops=1)
>> 
>> Total runtime: 619140.76 msec

> OK - there is clearly something wrong here when you take 10 minutes to 
> fetch 288 rows from an index.

> I'm guessing you have a *lot* of dead rows in there.

Yeah.  The other small problem here is that EXPLAIN output hasn't looked
like that since PG 7.2 (unless Soni has just omitted the index-condition
lines).  I'd recommend updating to something modern.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Stored Procedure Performance

2006-04-11 Thread Merlin Moncure
On 4/11/06, Simon Dale <[EMAIL PROTECTED]> wrote:
> I'm trying to evaluate PostgreSQL as a database that will have to store a
> high volume of data and access that data frequently. One of the features on
> our wish list is to be able to use stored procedures to access the data and
> I was wondering if it is usual for stored procedures to perform slower on
> PostgreSQL than raw SQL?

pl/pgsql procedures are a very thin layer over the query engine. 
Generally, they run about the same speed as SQL but you are not making
apples to apples comparison.  One of the few but annoying limitations
of pl/pgsql procedures is that you can't return a select directly from
the query engine but have to go through the return/return next
paradigm which will be slower than raw query for obvious reasons.

You can however return a refcursor and you may want to look at them in
situations where you want to return arbitrary sets outside the query
engine or between pl/pgsql functions.  An example of using refcurors
in that way is on my blog at
http://people.planetpostgresql.org/merlin/index.php?/archives/2-Dealing-With-Recursive-Sets-With-PLPGSQL.html

Generally, in my opinion if you want to really unlock the power of
postgresql you have to master pl/pgsql.  Go for it...it will work and
work well.

merlin

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Takes too long to fetch the data from database

2006-04-11 Thread Merlin Moncure
> pdb=# explain analyze SELECT sdate, stime, rbts from lan WHERE (
>
>  ( bname = 'pluto' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate
>
>  >= '2004-07-21' ) AND ( sdate <= '2004-07-21' ) )  )  ORDER BY sdate, stime
> ;

this query would benefit from an index on
pluto, cno, pno, sdate

create index Ian_idx on Ian(bname, cno, pno, sdate);


> pdb=# explain analyze SELECT ALL sdate, stime, rbts from lan WHERE (  (
> bname = 'neptune' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate >=
> '2004-07-21' ) AND ( sdate <= '2004-07-21' ) )  )  ORDER BY sdate, stime ;

ditto above.  Generally, the closer the fields in the where clause are
matched by the index, the it will speed up your query.

Merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Restore performance?

2006-04-11 Thread Jesper Krogh
> Well, your pg_dump command lost your BLOBs since the plain text
> format doesn't support them.

Well, no.. they are stored as BYTEA not Large Objects.. They are encoded
in ASCII in the pg_dump output.

> But once you use the -Fc format on your dump and enable blob backups,
> you can speed up reloads by increasing your checkpoint segments to a big
> number like 256 and the checkpoint timeout to something like 10 minutes.
> All other normal tuning parameters should be what you plan
> to use for your normal operations, too.

Thanks.

Jesper
-- 
Jesper Krogh


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Takes too long to fetch the data from database

2006-04-11 Thread Richard Huxton

soni de wrote:

I have flushed the database, so currently records in the "lan" table are:
665280

but records can be increased more than 1GB and in that case it takes more
than 1 hour

Below is explain analyze output taken from the table having 665280 records

pdb=# explain analyze SELECT sdate, stime, rbts from lan WHERE (
 ( bname = 'pluto' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate
 >= '2004-07-21' ) AND ( sdate <= '2004-07-21' ) )  )  ORDER BY sdate, stime
;

NOTICE:  QUERY PLAN:
Sort  (cost=17.13..17.13 rows=1 width=16) (actual time=619140.18..619140.29rows
=288 loops=1)
  ->  Index Scan using lan_pkey on lan  (cost=0.00..17.12 rows=1 width=16)
(actual time=7564.44..619121.61 rows=288 loops=1)

Total runtime: 619140.76 msec


OK - there is clearly something wrong here when you take 10 minutes to 
fetch 288 rows from an index.


1. VACUUM FULL VERBOSE lan;
2. test again, and if that doesn't work...
3. REINDEX TABLE lan;
4. test again

I'm guessing you have a *lot* of dead rows in there.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Stored Procedure Performance

2006-04-11 Thread Richard Huxton

Rajesh Kumar Mallah wrote:

On 4/11/06, Simon Dale <[EMAIL PROTECTED]> wrote:


I'm trying to evaluate PostgreSQL as a database that will have to store a
high volume of data and access that data frequently. One of the features on
our wish list is to be able to use stored procedures to access the data and
I was wondering if it is usual for stored procedures to perform slower on
PostgreSQL than raw SQL?


No.

RETURN NEXT keeps accumulating the data before returning.
I am not sure if any optimisations have been done to that effect.

In general functions are *NOT* slower than RAW SQL.


Actually, in cases where there is a simple way to state the query in raw 
SQL then I'd expect that a procedural solution IS slower. After all, 
you're adding another layer of processing.


Of course, you normally wouldn't write a procedural solution to a simple 
query.


Added to this is the difference that plpgsql is planned once whereas raw 
sql will be planned on each query. This means you save planning costs 
with the plpgsql but have the chance to get better plans with the raw sql.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Stored Procedure Performance

2006-04-11 Thread Rajesh Kumar Mallah
On 4/11/06, Simon Dale <[EMAIL PROTECTED]> wrote:
>
>
>
> Hi,
>
>
>
> I'm trying to evaluate PostgreSQL as a database that will have to store a
> high volume of data and access that data frequently. One of the features on
> our wish list is to be able to use stored procedures to access the data and
> I was wondering if it is usual for stored procedures to perform slower on
> PostgreSQL than raw SQL?


No.

RETURN NEXT keeps accumulating the data before returning.
I am not sure if any optimisations have been done to that effect.

In general functions are *NOT* slower than RAW SQL.

Regds
mallah.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Stored Procedure Performance

2006-04-11 Thread hubert depesz lubaczewski
On 4/11/06, Simon Dale <[EMAIL PROTECTED]> wrote:













I'm trying to evaluate PostgreSQL as a database that
will have to store a high volume of data and access that data frequently. One
of the features on our wish list is to be able to use stored procedures to
access the data and I was wondering if it is usual for stored procedures to
perform slower on PostgreSQL than raw SQL?worry but your benchmark is completelly flawed.1st. the tables are empty. will you ever run the real code on empty tables?
2nd. do you really need a stored procedure for such a simple query?testing something that's far from real usage will not give you any good.return next will of course show up as slower than standard select. the thing is - will the relative slowness of return next matter to you when you will put more logic in the procedure?
depesz


Re: [PERFORM] Takes too long to fetch the data from database

2006-04-11 Thread soni de
 
I have flushed the database, so currently records in the "lan" table are: 665280
but records can be increased more than 1GB and in that case it takes more than 1 hour
 
Below is explain analyze output taken from the table having 665280 records
 
pdb=# explain analyze SELECT sdate, stime, rbts from lan WHERE (
 ( bname = 'pluto' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate
 >= '2004-07-21' ) AND ( sdate <= '2004-07-21' ) )  )  ORDER BY sdate, stime ;
NOTICE:  QUERY PLAN:
Sort  (cost=17.13..17.13 rows=1 width=16) (actual time=619140.18..619140.29 rows
=288 loops=1)
  ->  Index Scan using lan_pkey on lan  (cost=0.00..17.12 rows=1 width=16) (ac
tual time=7564.44..619121.61 rows=288 loops=1)
Total runtime: 619140.76 msec
 
EXPLAIN
 
bsdb=# explain analyze SELECT DISTINCT sdate, stime, rbts from lan
 WHERE (  ( bname = 'pluto' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate >= '2004-07-21' ) AND ( sdate <= '2004-07-21' ) )  )  

 ORDER BY sdate, stime ;
NOTICE:  QUERY PLAN:
 
Unique  (cost=17.13..17.14 rows=1 width=16) (actual time=610546.66..610564.31 rows=288 loops=1)
  ->  Sort  (cost=17.13..17.13 rows=1 width=16) (actual time=610546.65..610546.75 rows=288 loops=1)
    ->  Index Scan using lan_pkey on lan  (cost=0.00..17.12 rows=1 width=16) (actual time=7524.47..610533.50 rows=288 loops=1)

Total runtime: 610565.51 msec
 
EXPLAIN
 
pdb=# explain analyze SELECT ALL sdate, stime, rbts from lan WHERE (  ( bname = 'neptune' ) AND ( cno = 17 ) AND ( pno = 1 ) AND ( ( sdate >= '2004-07-21' ) AND ( sdate <= '2004-07-21' ) )  )  ORDER BY sdate, stime ;

NOTICE:  QUERY PLAN:
 
Sort  (cost=17.13..17.13 rows=1 width=16) (actual time=1260756.66..1260756.76 rows=288 loops=1)
  ->  Index Scan using lan_pkey on lan  (cost=0.00..17.12 rows=1 width=16) (actual time=7725.97..1260752.47 rows=288 loops=1)

Total runtime: 1260757.09 msec
 
 
pdb=# \d lan
 Table "lan"
  Column  | Type  | Modifiers
--+---+---
 bname    | character varying(64) | not null
 sdate    | date  | not null
 stime    | integer   | not null
 cno  | smallint      | not null
 pno  | smallint  | not null
 rbts | bigint    |
 tbts | bigint    |
 u_inpkt  | bigint    |
 u_outpkt | bigint    |
 m_inpkt  | bigint    |
 m_outpkt | bigint    |
 b_inpkt  | bigint    |
 b_outpkt | bigint    |
Primary key: lan_pkey
Check constraints: "lan_stime" ((stime >= 0) AND (stime < 86400))
On 4/10/06, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
Rajesh Kumar Mallah wrote:>> what is the query ?> use LIMIT or a restricting where clause.
You could also use a cursor.Joshua D. Drake>>> regds> mallah.>> On 4/10/06, *soni de* < [EMAIL PROTECTED] 
[EMAIL PROTECTED]>> wrote:>> Hello,>> I have difficulty in fetching the records from the database.> Database table contains more than 1 GB data.> For fetching the records it is taking more the 1 hour and that's why
> it is slowing down the performance.> please provide some help regarding improving the performance and how> do I run query so that records will be fetched in a less time.>>
--=== The PostgreSQL Company: Command Prompt, Inc. ===  Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240  Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/


[PERFORM] Stored Procedure Performance

2006-04-11 Thread Simon Dale








Hi,

 

I’m trying to evaluate PostgreSQL as a database that
will have to store a high volume of data and access that data frequently. One
of the features on our wish list is to be able to use stored procedures to
access the data and I was wondering if it is usual for stored procedures to
perform slower on PostgreSQL than raw SQL?

 

A simple example of this can be shown with the following
commands:

 

First I created a test table:

 

CREATE TABLE test (

id int8,

name varchar(128),

description varchar(500),

constraint “pk_test”
primary key (id)

);

 

Then the function I want to test:

 

CREATE OR REPLACE FUNCTION readTest() RETURNS SETOF test AS

$$

DECLARE

    row
test%ROWTYPE;

BEGIN

    FOR
row IN SELECT * FROM test LOOP

    RETURN
NEXT row;

    END
LOOP;

 

    RETURN;

END;

$$ LANGUAGE plpgsql;

 

Firstly, I ran EXPLAIN on the raw SQL to see how long that
takes to access the database the results are as follows:

 

EXPLAIN ANALYZE SELECT * FROM test;

Seq Scan on test  (cost=0.00..10.90 rows=90 width=798)
(actual time=0.003..0.003 rows=0 loops=1)

Total runtime: 0.074 ms

(2 rows)

 

Secondly, I ran EXPLAIN on the function created above and
the results are as follows:

 

EXPLAIN ANALYZE SELECT * FROM readTest();

Function Scan on readtest  (cost=0.00..12.50 rows=1000 width=798)
(actual time=0.870..0.870 rows=0 loops=1)

Total runtime: 0.910 ms

(2 rows)

 

I know that the function is planned the first time it is
executed so I ran the same command again to remove that processing from the
timings and the results are as follows:

 

EXPLAIN ANALYZE SELECT * FROM readTest();

Function Scan on readtest  (cost=0.00..12.50 rows=1000 width=798)
(actual time=0.166..0.166 rows=0 loops=1)

Total runtime: 0.217 ms

(2 rows)

 

Event with the planning removed, the function still performs
significantly slower than the raw SQL. Is that normal or am I doing something wrong
with the creation or calling of the function?

 

Thanks for your help,

 

Simon

 





Visit our Website at www.rm.com


This message is confidential. You should not copy it or disclose its contents to anyone. You may use and apply the information for the intended purpose only. Internet communications are not secure; therefore, RM does not accept legal responsibility for the contents of this message. Any views or opinions presented are those of the author only and not of RM. If this email has come to you in error, please delete it, along with any attachments. Please note that RM may intercept incoming and outgoing email communications. 

Freedom of Information Act 2000

This email and any attachments may contain confidential information belonging to RM.  Where the email and any attachments do contain information of a confidential nature, including without limitation information relating to trade secrets, special terms or prices these shall be deemed for the purpose of the Freedom of Information Act 2000 as information provided in confidence by RM and the disclosure of which would be prejudicial to RM's commercial interests.

This email has been scanned for viruses by Trend ScanMail.