Re: [GENERAL] Performance question

2014-11-22 Thread Anil Menon
Thanks Laurenz, very good point!

Luckily (phew!) the business scenario is such that race conditions cannot
occur (and the transaction table is append only). There is business
workflow to address duplicates but
1) it occurs extremely rarely (it would be a deliberate sabotage if it
occurs)
2) there is no impact on business

Yours
Anil



On Fri, Nov 21, 2014 at 5:16 PM, Albe Laurenz laurenz.a...@wien.gv.at
wrote:

 Anil Menon wrote:
  I would like to ask from your experience which would be the best
 generic method for checking if row
  sets of a certain condition exists in a PLPGSQL function.
 
  I know of 4 methods so far (please feel free to add if I missed out any
 others)
 [...]

 Are you aware that all of these methods have a race condition unless
 you use isolation level READ STABILITY or better?

 It may be that rows are added or removed between the check and the
 corresponding action.

 Yours,
 Laurenz Albe



Re: [GENERAL] Performance question

2014-11-21 Thread Albe Laurenz
Anil Menon wrote:
 I would like to ask from your experience which would be the best generic 
 method for checking if row
 sets of a certain condition exists in a PLPGSQL function.
 
 I know of 4 methods so far (please feel free to add if I missed out any 
 others)
[...]

Are you aware that all of these methods have a race condition unless
you use isolation level READ STABILITY or better?

It may be that rows are added or removed between the check and the
corresponding action.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance question

2014-11-20 Thread Anil Menon
Thanks Adrian


On Thu, Nov 20, 2014 at 3:46 AM, Adrian Klaver adrian.kla...@aklaver.com
wrote:

 On 11/19/2014 08:26 AM, Anil Menon wrote:

 Hello,

 I would like to ask from your experience which would be the best
 generic method for checking if row sets of a certain condition exists
 in a PLPGSQL function.

 I know of 4 methods so far (please feel free to add if I missed out any
 others)

 1) get a count (my previous experience with ORCL shaped this option)

 select count(*) into vcnt
 from table
 where condition
 if vcnt 0 then
do X
 else
do y
 end if
 Cons : It seems doing a count(*) is not the best option for PG



 Well that would depend on the table size, whether it was 100 rows vs
 1,000,000 rows



​The table is estimated/guesstimated to be ~900 million rows (~30Ma day​,
90 days history, though initially it would be ~30M), though the where
part of the query would return between 0 and ~2 rows





 2) Use a non-count option
 select primary_key_Col into vcnt
 from table
 where condition
 if found then
do X
 else
do y
 end if
 Cons :Some people seems not to prefer this as (AFAIU) it causes a
 plpgsql-sql-plpgsql switches


 plpgsql is fairly tightly coupled to SQL, so I have not really seen any
 problems. But then I am not working on large datasets.


​I think that ~900M rows would constitute a large data set most likely
​




 3) using perform
 perform primary_key_Col into vcnt
 from table
 where condition
 if found then
do X
 else
do y
 end if

 Seems to remove the above (item 2) issues (if any)


 AFAIK, you cannot do the above as written. PERFORM does not return a
 result:

 http://www.postgresql.org/docs/9.3/interactive/plpgsql-
 statements.html#PLPGSQL-STATEMENTS-SQL-NORESULT

 It would have to be more like:

 perform primary_key_Col from table where condition


​You are absolutely right - my bad​.


 4) using exists
 if exists ( select 1 from table where condition ) then
   do x
 else
   do y
 end if


 My question is what would be the best (in terms of performance) method
 to use? My gut feeling is to use option 4 for PG. Am I right or is there
 any other method?


 All of the above is context specific. To know for sure you will need to
 test on actual data.


​Absolutely right, just that I want to ensure that I follow the most
optimal method before the DB goes into production, after which priorities
change on what needs to be changed.​




 --
 Adrian Klaver
 adrian.kla...@aklaver.com


​I guess the best answer would be its very context specific​, but picking
the brains of experienced resources helps :-)

Thanks again
Anil​


[GENERAL] Performance question

2014-11-19 Thread Anil Menon
Hello,

I would like to ask from your experience which would be the best generic
method for checking if row sets of a certain condition exists in a PLPGSQL
function.

I know of 4 methods so far (please feel free to add if I missed out any
others)

1) get a count (my previous experience with ORCL shaped this option)

select count(*) into vcnt
from table
where condition
if vcnt 0 then
  do X
else
  do y
end if
Cons : It seems doing a count(*) is not the best option for PG

2) Use a non-count option
select primary_key_Col into vcnt
from table
where condition
if found then
  do X
else
  do y
end if
Cons :Some people seems not to prefer this as (AFAIU) it causes a
plpgsql-sql-plpgsql switches

3) using perform
perform primary_key_Col into vcnt
from table
where condition
if found then
  do X
else
  do y
end if

Seems to remove the above (item 2) issues (if any)

4) using exists
if exists ( select 1 from table where condition ) then
 do x
else
 do y
end if


My question is what would be the best (in terms of performance) method to
use? My gut feeling is to use option 4 for PG. Am I right or is there any
other method?

Thanks in advance
Anil


Re: [GENERAL] Performance question

2014-11-19 Thread Francisco Olarte
Hi Anil:

On Wed, Nov 19, 2014 at 5:26 PM, Anil Menon gakme...@gmail.com wrote:

 Cons : It seems doing a count(*) is not the best option for PG


For this and some of the following options,  if you are going to just test
for existence, I would consider adding a limit 1 somewehere on the query,
to let the optimizer know you only need one and it will abort the scan on
first hit. Probabley not needed if you are going to give a query which uses
an unique index, but it shouldn't hurt.

Francisco Olarte.


Re: [GENERAL] Performance question

2014-11-19 Thread Adrian Klaver

On 11/19/2014 08:26 AM, Anil Menon wrote:

Hello,

I would like to ask from your experience which would be the best
generic method for checking if row sets of a certain condition exists
in a PLPGSQL function.

I know of 4 methods so far (please feel free to add if I missed out any
others)

1) get a count (my previous experience with ORCL shaped this option)

select count(*) into vcnt
from table
where condition
if vcnt 0 then
   do X
else
   do y
end if
Cons : It seems doing a count(*) is not the best option for PG



Well that would depend on the table size, whether it was 100 rows vs 
1,000,000 rows




2) Use a non-count option
select primary_key_Col into vcnt
from table
where condition
if found then
   do X
else
   do y
end if
Cons :Some people seems not to prefer this as (AFAIU) it causes a
plpgsql-sql-plpgsql switches


plpgsql is fairly tightly coupled to SQL, so I have not really seen any 
problems. But then I am not working on large datasets.




3) using perform
perform primary_key_Col into vcnt
from table
where condition
if found then
   do X
else
   do y
end if

Seems to remove the above (item 2) issues (if any)


AFAIK, you cannot do the above as written. PERFORM does not return a result:

http://www.postgresql.org/docs/9.3/interactive/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-NORESULT

It would have to be more like:

perform primary_key_Col from table where condition



4) using exists
if exists ( select 1 from table where condition ) then
  do x
else
  do y
end if


My question is what would be the best (in terms of performance) method
to use? My gut feeling is to use option 4 for PG. Am I right or is there
any other method?


All of the above is context specific. To know for sure you will need to 
test on actual data.




Thanks in advance
Anil



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance question: Commit or rollback?

2011-12-24 Thread vinny
On Sun, 2011-12-18 at 14:05 +1100, Chris Angelico wrote:
 On Sun, Dec 18, 2011 at 2:02 PM, Chris Travers chris.trav...@gmail.com 
 wrote:
  I do not believe there are performance penalties for either.  All
  commit or rollback does is determine visibility of changes made.
 
 Thanks. (And thanks for the incredibly quick response!)
 
 My framework has a read-only mode (determined by user-level access),
 in which it begins a read-only transaction. At the end of it, I
 currently have it rolling the transaction back (to make absolutely
 sure that no changes will be made), but was concerned that this might
 place unnecessary load on the system. I'll stick with rolling back,
 since it's not going to hurt!
 
 Chris Angelico
 

The actual rollback won't hurt as long as you have not made any
modificatons to any records. But opening the transaction could have side
effects for other processes that want to modiy the records that you want
to protect in your read-only transaction.

How about using a databaseuser that has it's create/update/delete rights
revoked? That will cause an error if the supposedly read-only routine
does try to change data.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance question: Commit or rollback?

2011-12-24 Thread Chris Angelico
On Sat, Dec 24, 2011 at 11:46 PM, vinny vi...@xs4all.nl wrote:
 The actual rollback won't hurt as long as you have not made any
 modificatons to any records. But opening the transaction could have side
 effects for other processes that want to modiy the records that you want
 to protect in your read-only transaction.

 How about using a databaseuser that has it's create/update/delete rights
 revoked? That will cause an error if the supposedly read-only routine
 does try to change data.

The readonly-ness of the session is defined based on information
stored in the database, so that would entail the cost of
re-authenticating. Also, we want to minimize debugging time by having
both read-only and read-write access use almost exactly the same code
and DB access, meaning that we should not need to test every module in
every mode.

ChrisA

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance question: Commit or rollback?

2011-12-24 Thread vinny
On Sat, 2011-12-24 at 23:49 +1100, Chris Angelico wrote:
 On Sat, Dec 24, 2011 at 11:46 PM, vinny vi...@xs4all.nl wrote:
  The actual rollback won't hurt as long as you have not made any
  modificatons to any records. But opening the transaction could have side
  effects for other processes that want to modiy the records that you want
  to protect in your read-only transaction.
 
  How about using a databaseuser that has it's create/update/delete rights
  revoked? That will cause an error if the supposedly read-only routine
  does try to change data.
 
 The readonly-ness of the session is defined based on information
 stored in the database, so that would entail the cost of
 re-authenticating.

Yes you would have to re-authenticate, you'd have to weigh the time-cost
of that that against any performance hits the transaction might cause.

 Also, we want to minimize debugging time by having
 both read-only and read-write access use almost exactly the same code
 and DB access, meaning that we should not need to test every module in
 every mode.

So, your read-only mode is basically a flag that forces your code to
always issue a rollback at the end, instead of a commit for read/write
mode.

I find that a bit scary. :-)

regard,
Vincent.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance question: Commit or rollback?

2011-12-24 Thread Chris Angelico
On Sun, Dec 25, 2011 at 12:00 AM, vinny vi...@xs4all.nl wrote:
 So, your read-only mode is basically a flag that forces your code to
 always issue a rollback at the end, instead of a commit for read/write
 mode.

 I find that a bit scary. :-)

It's three things:

1) BEGIN TRANSACTION READ ONLY instead of BEGIN TRANSACTION
2) A high level flag that tells the PHP code that it ought not to change things
3) ROLLBACK instead of COMMIT

ChrisA

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance question: Commit or rollback?

2011-12-24 Thread Darren Duncan

Chris Angelico wrote:

On Sat, Dec 24, 2011 at 11:46 PM, vinny vi...@xs4all.nl wrote:

How about using a databaseuser that has it's create/update/delete rights
revoked? That will cause an error if the supposedly read-only routine
does try to change data.


Also, we want to minimize debugging time by having
both read-only and read-write access use almost exactly the same code
and DB access, meaning that we should not need to test every module in
every mode.


You can do it correctly while reusing all of your code; you just have different 
arguments at connect time and otherwise your code uses the connection handle in 
the same way afterwards.  Its fine to have flags in the app so the app just 
tries acceptable things, but privileges in the database are the only way to 
actually be safe and resilient against accidental changes. -- Darren Duncan


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance question: Commit or rollback?

2011-12-24 Thread Chris Travers
On Sat, Dec 24, 2011 at 11:34 AM, Darren Duncan dar...@darrenduncan.net wrote:



 You can do it correctly while reusing all of your code; you just have
 different arguments at connect time and otherwise your code uses the
 connection handle in the same way afterwards.  Its fine to have flags in the
 app so the app just tries acceptable things, but privileges in the database
 are the only way to actually be safe and resilient against accidental
 changes. -- Darren Duncan

Agreed.  Roles in the db are important (we use them extensively in
LedgerSMB).  However here is a case where it doesn't work so well:

Suppose I want to run read-write test cases against a production
database to look for things that are wrong but we want to be
mathematically sure that the test cases do not commit data to the
database.

So I don't entirely know the best way to do this in other languages,
but here is what we did in Perl:

1) Our application normally uses DBI (and DBD::Pg).
2)  We have a special module (LedgerSMB::DBTest) which basically
exposes the portions of the DBI/DBD::Pg interface we are using, but
lies to the higher levels about committing.  Basically it returns
true, but does nothing database-wise, providing the appearance of a
consistent set of changes but in fact the changes are still to be
rolled back.
3)  In our test case scripts, we switch out the DBI database handles
with instances of LedgerSMB::DBTest.

This works well because it is fails safely.  If we omit something from
the DBTest module, our code will error.  If we have test cases that
depend on some transactions committing and others rolling back, we
have to order the test cases appropriately or the test cases fail.  In
other words, the worst we can get are test case failures, not spurious
commits, and the changes necessary to make this happen are in the test
case files themselves.  This makes it easy to verify.

Best Wishes,
Chris Travers

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Performance question: Commit or rollback?

2011-12-17 Thread Chris Angelico
Some of my code involves transactions which will not have significant
effect on the database. It might be a read-only transaction (possibly
declared as one, but possibly not), or perhaps a completely empty
transaction - I have a framework that will always open a transaction,
then call on other code, and then complete the transaction at the
bottom.

In these instances, is it better to commit or to rollback? Are there
performance penalties to either option?

Similarly, what about when the first action in a transaction puts it
in an error state? Is it better to commit (which should have no effect
- nothing succeeded) or to roll back?

I could test these things experimentally, but am afraid I'll skew my
results based on the data I use. Hoping that somebody here knows for
certain - there does seem to be a wealth of expertise here.

Chris Angelico

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance question: Commit or rollback?

2011-12-17 Thread Chris Travers
On Sat, Dec 17, 2011 at 6:53 PM, Chris Angelico ros...@gmail.com wrote:
 Some of my code involves transactions which will not have significant
 effect on the database. It might be a read-only transaction (possibly
 declared as one, but possibly not), or perhaps a completely empty
 transaction - I have a framework that will always open a transaction,
 then call on other code, and then complete the transaction at the
 bottom.

 In these instances, is it better to commit or to rollback? Are there
 performance penalties to either option?

I do not believe there are performance penalties for either.  All
commit or rollback does is determine visibility of changes made.

Best Wishes,
Chris Travers

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance question: Commit or rollback?

2011-12-17 Thread Chris Angelico
On Sun, Dec 18, 2011 at 2:02 PM, Chris Travers chris.trav...@gmail.com wrote:
 I do not believe there are performance penalties for either.  All
 commit or rollback does is determine visibility of changes made.

Thanks. (And thanks for the incredibly quick response!)

My framework has a read-only mode (determined by user-level access),
in which it begins a read-only transaction. At the end of it, I
currently have it rolling the transaction back (to make absolutely
sure that no changes will be made), but was concerned that this might
place unnecessary load on the system. I'll stick with rolling back,
since it's not going to hurt!

Chris Angelico

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Performance question

2007-08-16 Thread Marcelo de Moraes Serpa
Hello list,

If I've got a trigger that calls a function each time there is a DELETE or
UPDATE opration on a table in my system, and in this function I retrieve
some boolean information from another table and based on this information,
additional code will be ran or not in this function. Could the solely fact
of calling the function and selecting the data on another table (and the
trigger on each update and delete on any table) affect the overall db
performance in a noticiable manner ?


Re: [GENERAL] Performance question

2007-08-16 Thread Decibel!
On Thu, Aug 16, 2007 at 08:52:02AM -0300, Marcelo de Moraes Serpa wrote:
 Hello list,
 
 If I've got a trigger that calls a function each time there is a DELETE or
 UPDATE opration on a table in my system, and in this function I retrieve
 some boolean information from another table and based on this information,
 additional code will be ran or not in this function. Could the solely fact
 of calling the function and selecting the data on another table (and the
 trigger on each update and delete on any table) affect the overall db
 performance in a noticiable manner ?

Of course, you're adding at least one extra query to each UPDATE and
DELETE. Plus the overhead of the trigger itself.

The real question is: so what? If you need that logic to happen, you
need it to happen. Unless you'll be updating or deleting scores of rows
a second, I wouldn't worry too much about it.

Remember the first rule of performance tuning: don't. :)
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpkB4ZpKaXlM.pgp
Description: PGP signature


Re: [GENERAL] Performance Question - Table Row Size

2007-07-12 Thread Mike
I see. Thank you for the elaborate response. I have a clearer idea of
what is going on now. In designing my application I was thinking of
storing pieces of my data as serialized python data structures into a
binary field (no more than 15KB), while a friend was arguing I should
store the data in other tables and relate the tables together. He was
arguing storing binary data on a table, even though, it is not queried
slows down other queries and with this.

Thanks again,
Mike


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Performance Question - Table Row Size

2007-07-12 Thread Douglas McNaught
Mike [EMAIL PROTECTED] writes:

 I see. Thank you for the elaborate response. I have a clearer idea of
 what is going on now. In designing my application I was thinking of
 storing pieces of my data as serialized python data structures into a
 binary field (no more than 15KB), while a friend was arguing I should
 store the data in other tables and relate the tables together. He was
 arguing storing binary data on a table, even though, it is not queried
 slows down other queries and with this.

A 15KB column value is going to be stored out-of-line in the TOAST
table anyway, so your table tuple will just contain a pointer to it,
which isn't very big.  If you don't use that column value in a given
query its effect will be minimal.

-Doug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] Performance Question - Table Row Size

2007-07-09 Thread Mike
Hi,

I am designing my database and I was wondering whether my table row
size effects the performance of querying my table. Please note that my
table is being designed to hold high volume of records and I do not
plan to do (select *) for retrieving them. That is I plan to only
query a few of those fields at a given time but each row contains
significantly more data that are not being queried at the time.

Thanks,
Mike


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Performance Question - Table Row Size

2007-07-09 Thread Alexander Staubo

On 7/9/07, Mike [EMAIL PROTECTED] wrote:

I am designing my database and I was wondering whether my table row
size effects the performance of querying my table. Please note that my
table is being designed to hold high volume of records and I do not
plan to do (select *) for retrieving them. That is I plan to only
query a few of those fields at a given time but each row contains
significantly more data that are not being queried at the time.


Obvious observation: Since PostgreSQL's unit of data storage is the
page, selects -- even on single attributes -- result in entire pages
being loaded into memory and then read.

Since the cache (PostgreSQL's shared buffers plus the OS file system
cache) holds pages, not individual attributes, more data per tuple
(row) means fewer tuples to fit in the cache.

As far as the CPU cache goes, as I understand it, the fact that you're
reading just a few attributes from each tuple (maybe even just a few
from each page) is inefficient -- you will be forcing lots of data
into the cache that is never used.

In general, you might be better off normalizing your table, if
possible, or partitioning it into subtables.

But these are the broad strokes -- how many columns are we talking
about exactly, and of what data types?

Alexander.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Performance Question - Table Row Size

2007-07-09 Thread Gregory Stark
Mike [EMAIL PROTECTED] writes:

 I am designing my database and I was wondering whether my table row
 size effects the performance of querying my table. 

yes

If your table is large and you're reading all the rows then you'll be limited
by the i/o rate. If your rows are twice as big it will take twice as much i/o
to read and it will take twice as long.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Performance Question

2006-06-16 Thread Patrick TJ McPhee
In article [EMAIL PROTECTED],
Terry Lee Tucker [EMAIL PROTECTED] wrote:

% elements of 50 thousand records on 8 structurally identical databases. We 
% threw together the script and decided to just delete the record and re-insert 
% it with the data that was brought into sync. Now the question: Is it just as 
% fast to do it this way, or is there some hidden advantage to performing an 
% update?

If you have foreign key relationships to the table being updated, then
deleting from that table will often be slower than updating.


-- 

Patrick TJ McPhee
North York  Canada
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] Performance Question

2006-06-14 Thread Terry Lee Tucker
Hello List:

I've been told that an update to a record is equivalent to a delete and insert 
operation. We have a utility written in Perl that brings into sync certain 
elements of 50 thousand records on 8 structurally identical databases. We 
threw together the script and decided to just delete the record and re-insert 
it with the data that was brought into sync. Now the question: Is it just as 
fast to do it this way, or is there some hidden advantage to performing an 
update?

Just curious.

TIA

master=# select version();
   version
--
 PostgreSQL 7.4.6 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.2.3 
20030502 (Red Hat Linux 3.2.3-49)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Performance Question

2006-06-14 Thread Douglas McNaught
Terry Lee Tucker [EMAIL PROTECTED] writes:

 Hello List:

 I've been told that an update to a record is equivalent to a delete
 and insert operation. We have a utility written in Perl that brings
 into sync certain elements of 50 thousand records on 8 structurally
 identical databases. We threw together the script and decided to
 just delete the record and re-insert it with the data that was
 brought into sync. Now the question: Is it just as fast to do it
 this way, or is there some hidden advantage to performing an update?

UPDATE will probably be somewhat faster because it's only one SQL
statement to parse, plan and execute.

-Doug

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Performance Question

2006-06-14 Thread Greg Stark
Terry Lee Tucker [EMAIL PROTECTED] writes:

 Hello List:
 
 I've been told that an update to a record is equivalent to a delete and 
 insert 
 operation. We have a utility written in Perl that brings into sync certain 
 elements of 50 thousand records on 8 structurally identical databases. We 
 threw together the script and decided to just delete the record and re-insert 
 it with the data that was brought into sync. Now the question: Is it just as 
 fast to do it this way, or is there some hidden advantage to performing an 
 update?

If you're doing the whole DELETE/INSERT as a single transaction then it should
be roughly comparable. The UPDATE operation tries to keep the records on the
same page which makes it a faster operation all else being equal, but all else
is rarely equal.

One way it would be unequal is if you can do your DELETE as a single query and
the insert operation as using a single large COPY FROM. Even if you issue 50
thousand INSERTs and a single big DELETE that would be better than issuing 50
thousand separate UPDATEs that have to use index lookups to track down the
tuples being updated. 

Just be sure not to be issuing 50 thousand separate transactions, that will be
*much* slower.

-- 
greg


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Performance Question

2006-06-14 Thread Alan Hodgson
On Wednesday 14 June 2006 13:24, Greg Stark [EMAIL PROTECTED] wrote:
 One way it would be unequal is if you can do your DELETE as a single
 query and the insert operation as using a single large COPY FROM. 

This is definitely the fastest way to update tens of thousands of rows if 
you know they all need to be replaced.  It saves on index lookups and also 
network latency to the feeding app.

I have also had measurable success COPYing data into a temp table and then 
using joins against that to delete,update,or insert only the rows that 
actually need to be processed in the real table (saving unnecessary index 
updates).

-- 
In a truly free society, Alcohol, Tobacco and Firearms would be a
convenience store chain.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Performance Question

2006-06-14 Thread Terry Lee Tucker
On Wednesday 14 June 2006 03:57 pm, Terry Lee Tucker [EMAIL PROTECTED] thus 
communicated:
-- Hello List:
--
-- I've been told that an update to a record is equivalent to a delete and
 insert -- operation. We have a utility written in Perl that brings into
 sync certain -- elements of 50 thousand records on 8 structurally identical
 databases. We -- threw together the script and decided to just delete the
 record and re-insert -- it with the data that was brought into sync. Now
 the question: Is it just as -- fast to do it this way, or is there some
 hidden advantage to performing an -- update?
--
-- Just curious.
--
-- TIA

Thanks for the answers. This list is a BIG help to us all :o]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] Performance question (FOR loop)

2005-09-01 Thread vishal saberwal
hi,
I have this preformance question.

create view test_v as select 'text'::varchar as Field1, 'text'::varchar as Field2;

create or replace function test() returns setof test_v as $$
declare
 res test_v%ROWTYPE;
begin
 for res in 
 select t1.field1, t1.field2 from table1 t1;
loop
return next res;
end loop;
return;
end;
$$ Language plpgsql;

where table1 has fields other than field1 and field2.

I can run this query at the prompt, but i do not want the aplication layer to know my database schema.
The only way i know I can hide the database architecture is giving 'em
the stored procedure name to call (in this case: test() ).

The query I am actually trying to optimize is long and has a few joins (for normalization) and hence didn't copy it here.
The function structure is similar to the one above.

(a) Am i right in thinking that if I eliminate the for loop, some performance gain can be achieved?
(b) Is there a way to eliminate this for loop?
(c) Is there any other way anyone has implemented where they have Application layer API accessing the database 
with its schema hidden?

thanks,
vish


Re: [GENERAL] Performance question (FOR loop)

2005-09-01 Thread Tom Lane
vishal saberwal [EMAIL PROTECTED] writes:
 The query I am actually trying to optimize is long and has a few joins (for=
 =20
 normalization) and hence didn't copy it here.
 The function structure is similar to the one above.

 (a) Am i right in thinking that if I eliminate the for loop, some=20
 performance gain can be achieved?
 (b) Is there a way to eliminate this for loop?

Is the plpgsql layer actually doing anything useful, or just passing
back the results of the SQL command?  If the latter, use a SQL function
instead.  Or perhaps even better, replace the function by a view.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] Performance question

2005-05-24 Thread LiSim: Rainer Mokros
Hello,
Anyone has try to install Postgres on a AMT Opteron (Dual Core) box with 4
CPU’s and 64GB main memory and 3 TB Raid 10 (24 disks) Linux (2.6), to run a
datawarehouse of 1TB.
Any points

Kind regards
Rainer


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Performance question

2005-05-24 Thread Scott Marlowe
On Mon, 2005-05-23 at 14:07, LiSim: Rainer Mokros wrote:
 Hello,
 Anyone has try to install Postgres on a AMT Opteron (Dual Core) box with 4
 CPUs and 64GB main memory and 3 TB Raid 10 (24 disks) Linux (2.6), to run a
 datawarehouse of 1TB.
 Any points

No, but if you buy my ticket I'd fly out and install it for free just to
play on such a machine for an afternoon.  Seriously.

For that many CPUs, you might want to try running the latest snapshot or
daily CVS tip, as Tom made some changes that seem to really increase
performance on larger SMP systems.

For performance tuning, look at the tuning guide on varlena first:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

I'd also make sure you're getting the best RAID controller you can
afford, with a reliable and fast driver for your OS.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Performance question

2003-07-02 Thread Tom Lane
Oleg Bartunov [EMAIL PROTECTED] writes:
 On Wed, 2 Jul 2003, Tom Lane wrote:
 You might find it useful to read the slides from my talk at last
 year's O'Reilly conference about this and related concurrency
 problems:
 http://conferences.oreillynet.com/cs/os2002/view/e_sess/2681

 I'd like to see all presentations in one collections. It'd be nice
 addition to documentation.

Yes.  Last year I asked Vince to put those slides up somewhere on the
postgresql.org website, but he never got around to it (I think he got
stuck wondering where they should go).  Bruce has materials for several
different talks he's given that should be there somewhere, too.

Perhaps someone in the new webmastering crew would like to give the
idea some thought.  Or would techdocs be the right place to go?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Performance question

2003-07-02 Thread Joe Conway
Tom Lane wrote:
Yes.  Last year I asked Vince to put those slides up somewhere on the
postgresql.org website, but he never got around to it (I think he got
stuck wondering where they should go).  Bruce has materials for several
different talks he's given that should be there somewhere, too.
Perhaps someone in the new webmastering crew would like to give the
idea some thought.  Or would techdocs be the right place to go?
I've always thought what the PHP group does with presentation materials 
is nice (simple but sufficient). See:

  http://conf.php.net/

Could we have a conf.postgresql.org with links from the home page?

Joe

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [webmaster] [GENERAL] Performance question

2003-07-02 Thread Bruce Momjian
Tom Lane wrote:
 Oleg Bartunov [EMAIL PROTECTED] writes:
  On Wed, 2 Jul 2003, Tom Lane wrote:
  You might find it useful to read the slides from my talk at last
  year's O'Reilly conference about this and related concurrency
  problems:
  http://conferences.oreillynet.com/cs/os2002/view/e_sess/2681
 
  I'd like to see all presentations in one collections. It'd be nice
  addition to documentation.
 
 Yes.  Last year I asked Vince to put those slides up somewhere on the
 postgresql.org website, but he never got around to it (I think he got
 stuck wondering where they should go).  Bruce has materials for several
 different talks he's given that should be there somewhere, too.
 
 Perhaps someone in the new webmastering crew would like to give the
 idea some thought.  Or would techdocs be the right place to go?

All my stuff is on my home page under Writings/Computer.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[GENERAL] Performance question

2003-07-01 Thread Jean-Christian Imbeault
I'm trying to convince another open-source project (phpOpenTracker) to 
modify their current INSERT sql queries. Currently they just do an 
INSERT into a table without first checking if their might be a record 
with the same primary key.

The reason for this that they need fast inserts and most user I assume 
are using MySQL which silently drops INSERT queries that violate primary 
key constraints. But postgres on the other hand (and rightly so) issues 
and logs an error.

I have suggested that their current INSERT INTO t VALUES() be changed to:

INSERT INTO
  T
SELECT 'v1', 'v2'
WHERE
  NOT EXISTS (
SELECT NULL FROM t WHERE pk='v1'
  )
However one of the developers is worried that this would cause a 
performance drop for MySQL users b/c of the extra SELECT in my version 
of the INSERT query.

I had thought that the 'extra' SELECT isn't really extra at all since 
*any* DB still has to check that there is a record with the primary key 
that we are trying to insert. So whereas in my query the SELECT is 
explicitly stated in the regular version of a simple INSERT, the select 
is still there but implicit since there was a primary key defined on the 
table. So there really shouldn't be much, if any of a performance drop.

Is there any truth to my logic in the previous paragraph? :)

I'd like to test my theory on a real MySQL database but I don't have 
access to one right now, and I am not sure how to go about testing a 
MySQL db or even what kind of testing. If I can get a spare machine I 
will give it a go though.

Thanks,

Jean-Christian Imbeault

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [GENERAL] Performance question (stripped down the problem)

2001-09-27 Thread gravity

On Thu, Sep 20, 2001 at 11:10:02AM +0200, Herbert Liechti wrote:
 I tried it. See my actions below. The main performance boost is
 reached by creating an index and disabling the sequential scan:
 Without any index;
 real0m18.128s
 user0m0.010s
 sys 0m0.010s
 
 Same statement with index
 real0m18.259s
 user0m0.020s
 sys 0m0.010s
 no difference
 
 now disable seqscan:
 time psql tt END
 set enable_seqscan = off;
 real0m3.701s
 user0m0.010s
 sys 0m0.000s


same here (dual PIII-866, Debian, 512 MB, raid1+0)

real0m6.472s
user0m0.000s
sys 0m0.010s

real0m6.195s
user0m0.010s
sys 0m0.000s

real0m2.885s
user0m0.010s
sys 0m0.000s


tinus

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [GENERAL] Performance question (stripped down the problem)

2001-09-20 Thread Tille, Andreas

On Wed, 19 Sep 2001, Tom Lane wrote:

 No.  In the first place, there's no extra sort: the planner is well
 aware that our current GROUP BY implementation produces ordered output.
 In the second place, there's no guarantee that GROUP BY will always
 produce ordered output in the future --- we are thinking about changing
 over to a hash-table-based implementation of grouping.  If you want
 ordered output, you should say ORDER BY, not try to outsmart the system.
And even if I would do so - sorting 51 items takes quite nothing.  So
the problem is anywhere else.  I did further tests and got:

  SELECT Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE 
(((Hauptdaten_Fall.IstAktuell)=20)) ;

  anz

 177458
(1 row)   - 2 Seconds

... that means longer than MS-SQL server takes with the additional GROUB BY

and

  SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM 
Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) GROUP BY 
Hauptdaten_Fall.MeldeKategorie;

... without ORDER BY but this doesn´t matter regarding the result set
(well it is as ordered as it would be with the ORDER BY clause) and
the time which stays at 18 seconds (after I increased sort_mem = 2048
it was *slightly* faster - 20 seconds for default sort_mem).

So the real bottleneck seems to be the GROUP BY.

Any chances that this could be changed in future PostgreSQL versions?
This slowness makes use of PostgreSQL impossible for our application.

Kind regards

 Andreas.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [GENERAL] Performance question (stripped down the problem)

2001-09-20 Thread Tille, Andreas

On Thu, 20 Sep 2001, Herbert Liechti wrote:

 I tried it. See my actions below. The main performance boost is
 reached by creating an index and disabling the sequential scan:
Thanks.  I tried this and it helps in dead (see below).

 ---
 create index ix_1 on hauptdaten_fall(meldekategorie);
I did so before for in other tests.
 ---

 Same statement

 real0m18.259s
 user0m0.020s
 sys 0m0.010s
The same on my machine:

real0m18.128s
user0m0.070s
sys 0m0.010s

 ---
 now disable seqscan:
 ---
 time psql tt END
 set enable_seqscan = off;
 SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz
 FROM Hauptdaten_Fall WHERE Hauptdaten_Fall.IstAktuell = 20 GROUP BY
 Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie;
 END


 real0m3.701s
 user0m0.010s
 sys 0m0.000s

real0m5.905s
user0m0.060s
sys 0m0.030s

I have no real explanation why I have the same result in the first
case but significant more time for the second. but it helps for the
first step.

On the other hand if I test my *real* database:

real0m20.539s
user0m0.060s
sys 0m0.060s

and with set enable_seqscan = off;

real0m10.133s
user0m0.040s
sys 0m0.020s

I get in fact an increase of speed by factor 2, but anyway it is even
far to slow for our application.  If I start a slightly more complex
query (not to mention that we are far from the amount of data we will
get after a year, I get the following stats:

  -- default -- set enable_seqscan = off; MS-SQL server
real0m30.891s real0m27.165s   about 1s
user0m0.050s  user0m0.080s
sys 0m0.070s  sys 0m0.050s

other query example

real0m53.698s real0m54.481s   about 2.5s
user0m0.190s  user0m0.180s
sys 0m0.040s  sys 0m0.040s

This is about factor 20 compared to the MS-SQL server and I have
real hard arguing for PostgreSQL.  In fact the MS-SQL server times
are estimated from inside Access - the plain server would be even
faster.

By the way - the last example shows that enforcing index scan don´t
necessarily makes the thing faster - perhaps it could even slow down
for other queries??

I would be happy to forward the exact queries which lead to this
measures if someone is interested.

Kind regards

 Andreas.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [GENERAL] Performance question (stripped down the problem)

2001-09-20 Thread Tille, Andreas

On Thu, 20 Sep 2001, Justin Clift wrote:

 Sorry, I haven't seen the history of this thread.  One question which
 might be relevant is, have you adjusted the postgresql.conf file from
 the default memory settings to be something better?
I adjusted two parameters:

shared_buffers = 2048
 (When I tried 4096 I´ve got a connection error. Don´t know what this
  means, but anyway increasing of this value did not changed anything.)

sort_mem = 2048
 (After increasing this value (from default 512) to 1024 I got an
  increase in speed from 20s to 18s - not much but better than nothing.
  Further increase to 2048 did not change anything further so I stopped
  here.)

 If these are the times you're getting from a default configuration, you
 might be able to get far better results by doing performance tuning of
 PostgreSQL and/or the server.
Any other values which might help here?

Kind regards

   Andreas.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [GENERAL] Performance question (stripped down the problem)

2001-09-20 Thread Justin Clift

Hi Andreas,

Sorry, I haven't seen the history of this thread.  One question which
might be relevant is, have you adjusted the postgresql.conf file from
the default memory settings to be something better?

If these are the times you're getting from a default configuration, you
might be able to get far better results by doing performance tuning of
PostgreSQL and/or the server.

What do you think?

Regards and best wishes,

Justin Clift


Tille, Andreas wrote:
 
 On Thu, 20 Sep 2001, Herbert Liechti wrote:
 
  I tried it. See my actions below. The main performance boost is
  reached by creating an index and disabling the sequential scan:
 Thanks.  I tried this and it helps in dead (see below).
 
  ---
  create index ix_1 on hauptdaten_fall(meldekategorie);
 I did so before for in other tests.
  ---
 
  Same statement
 
  real0m18.259s
  user0m0.020s
  sys 0m0.010s
 The same on my machine:
 
 real0m18.128s
 user0m0.070s
 sys 0m0.010s
 
  ---
  now disable seqscan:
  ---
  time psql tt END
  set enable_seqscan = off;
  SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz
  FROM Hauptdaten_Fall WHERE Hauptdaten_Fall.IstAktuell = 20 GROUP BY
  Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie;
  END
 
 
  real0m3.701s
  user0m0.010s
  sys 0m0.000s
 
 real0m5.905s
 user0m0.060s
 sys 0m0.030s
 
 I have no real explanation why I have the same result in the first
 case but significant more time for the second. but it helps for the
 first step.
 
 On the other hand if I test my *real* database:
 
 real0m20.539s
 user0m0.060s
 sys 0m0.060s
 
 and with set enable_seqscan = off;
 
 real0m10.133s
 user0m0.040s
 sys 0m0.020s
 
 I get in fact an increase of speed by factor 2, but anyway it is even
 far to slow for our application.  If I start a slightly more complex
 query (not to mention that we are far from the amount of data we will
 get after a year, I get the following stats:
 
   -- default -- set enable_seqscan = off; MS-SQL server
 real0m30.891s real0m27.165s   about 1s
 user0m0.050s  user0m0.080s
 sys 0m0.070s  sys 0m0.050s
 
 other query example
 
 real0m53.698s real0m54.481s   about 2.5s
 user0m0.190s  user0m0.180s
 sys 0m0.040s  sys 0m0.040s
 
 This is about factor 20 compared to the MS-SQL server and I have
 real hard arguing for PostgreSQL.  In fact the MS-SQL server times
 are estimated from inside Access - the plain server would be even
 faster.
 
 By the way - the last example shows that enforcing index scan don´t
 necessarily makes the thing faster - perhaps it could even slow down
 for other queries??
 
 I would be happy to forward the exact queries which lead to this
 measures if someone is interested.
 
 Kind regards
 
  Andreas.
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
 http://www.postgresql.org/users-lounge/docs/faq.html

-- 
My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there.
 - Indira Gandhi

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [GENERAL] Performance question (stripped down the problem)

2001-09-20 Thread Tille, Andreas

On Thu, 20 Sep 2001, Einar Karttunen asked me for query plans for
both M$ SQL and postgresql:

M$ SQL:

  |--Compute Scalar(DEFINE:([Expr1002]=Convert([Expr1005])))
   |--Stream Aggregate(GROUP BY:([Hauptdaten_Fall].[MeldeKategorie])
DEFINE:([Expr1005]=Count(*)))
|--Index
Scan(OBJECT:([IfSG].[dbo].[Hauptdaten_Fall].[IX_MeldeKategorie]),
ORDERED FORWARD)


Postgresql:

time psql ifsg ...
explain
SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz
FROM Hauptdaten_Fall
WHERE (((Hauptdaten_Fall.IstAktuell)=20))
GROUP BY Hauptdaten_Fall.MeldeKategorie
ORDER BY Hauptdaten_Fall.MeldeKategorie;
...


NOTICE:  QUERY PLAN:

Aggregate  (cost=32881.62..33768.91 rows=17746 width=16)
  -  Group  (cost=32881.62..33325.27 rows=177458 width=16)
-  Sort  (cost=32881.62..32881.62 rows=177458 width=16)
  -  Seq Scan on hauptdaten_fall  (cost=0.00..15024.12 rows=177458 
width=16)


real0m1.382s
user0m0.040s
sys 0m0.020s


And the other case with enforcing index scan:

time psql ifsg ...
set enable_seqscan = off;
explain
SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz
FROM Hauptdaten_Fall
WHERE (((Hauptdaten_Fall.IstAktuell)=20))
GROUP BY Hauptdaten_Fall.MeldeKategorie
ORDER BY Hauptdaten_Fall.MeldeKategorie;
...


NOTICE:  QUERY PLAN:

Aggregate  (cost=0.00..146770.97 rows=17746 width=16)
  -  Group  (cost=0.00..146327.32 rows=177458 width=16)
-  Index Scan using ix_meldekategorie_hauptdaten_fa on hauptdaten_fall  
(cost=0.00..145883.68 rows=177458 width=16)


real0m0.102s  (for sure it´s faster to have a plan if enforced ...)
user0m0.030s
sys 0m0.020s


Does this help in any way?  If I´m not completely wrong also M$ SQL
server prefers to use the index ix_meldekategorie.

Kind regards

   Andreas.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [GENERAL] Performance question (stripped down the problem)

2001-09-20 Thread Justin Clift

Hi Andreas,

I'm running PostgreSQL 7.1.3 here on a PC with nearly a gig of ram, and
running Linux Mandrake 8.0

First thing I did was to increase the amount of shared memory and stuff
which Linux allows things to use :

echo kernel.shmall = 134217728  /etc/sysctl.conf
echo kernel.shmmax = 134217728  /etc/sysctl.conf

For my system, that'll raise the shared memory limits to 128MB at system
boot time.

btw, the 134217728 figure = 128MB  (128 * 1024 * 1024)

Then I changed the limits for the running system (so no reboot is
necessary) :

echo 134217728  /proc/sys/kernel/shmall
echo 134217728  /proc/sys/kernel/shmmax

Then adjusted the postgresql.conf file with these values :

sort_mem = 32768
shared_buffers = 220

Now, that's a bunch of shared_buffers, but at the same time I also
raised the max_connections to 110.

This seems to have dropped my execution times, but I haven't seriously
gotten around to tuning this system.

The key thing I think you've missed is to update the shared memory,
etc.  More info about it can be found at :

http://www.postgresql.org/idocs/index.php?kernel-resources.html

Bruce Momjian also put together some information about optimising things
with PostgreSQL at :

http://www.ca.postgresql.org/docs/hw_performance/

If you want to be able to benchmark things on your system, I use the
Open Source Database Benchmark (Linux only at present), running the
latest CVS version of it, and also tweaked to not use hash indices.  A
tarball of working source code is available at :

http://techdocs.postgresql.org/techdocs/perftuningfigures.php

Hope this is of assistance Andreas.

Regards and best wishes,

Justin Clift


Tille, Andreas wrote:
 
 On Thu, 20 Sep 2001, Justin Clift wrote:
 
  Sorry, I haven't seen the history of this thread.  One question which
  might be relevant is, have you adjusted the postgresql.conf file from
  the default memory settings to be something better?
 I adjusted two parameters:
 
 shared_buffers = 2048
  (When I tried 4096 I´ve got a connection error. Don´t know what this
   means, but anyway increasing of this value did not changed anything.)
 
 sort_mem = 2048
  (After increasing this value (from default 512) to 1024 I got an
   increase in speed from 20s to 18s - not much but better than nothing.
   Further increase to 2048 did not change anything further so I stopped
   here.)
 
  If these are the times you're getting from a default configuration, you
  might be able to get far better results by doing performance tuning of
  PostgreSQL and/or the server.
 Any other values which might help here?
 
 Kind regards
 
Andreas.
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster

-- 
My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there.
 - Indira Gandhi

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [GENERAL] Performance question (stripped down the problem)

2001-09-19 Thread Peter Eisentraut

Tille, Andreas writes:

 SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz
 FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) GROUP BY
 Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie;

 (which should just measure the time needed for this task).  It took my
 E250 (2x400MHz, 2GB) server 20 seconds and this is definitely much to
 long for our application.

I loaded this into 7.2 development sources and it ran 35 seconds
wall-clock time on a much smaller machine.  After I ran what in 7.1 would
be VACUUM ANALYZE it took about 22 seconds.  The difference was that it
was using the index on hauptdaten_fall.istaktuell when it shouldn't.
(Try EXPLAIN to see what it does in your case.  If it's still using the
index you might want to force enable_indexscan = off.)

I also got a minuscule speed-up by replacing the Count(Hauptdaten_Fall.ID)
with Count(*), which acts differently with respect to nulls, so it depends
whether you want to use it.

Besides that, I don't see anything blatantly obvious to speed this up.

-- 
Peter Eisentraut   [EMAIL PROTECTED]   http://funkturm.homeip.net/~peter


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[GENERAL] Performance question

2001-09-10 Thread Tille, Andreas

Hello,

I have ported a database from MS SQL Server to PostgreSQL.  The database has
40 tables and 117 indexes which I defined for the same fields as in MS SQL.
I converted the data using some SQL output from MS SQL server and inserted
it with psql.

Now I started some performance comparisons and did the following statement:

SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz
FROM Hauptdaten_Fall
WHERE (((Hauptdaten_Fall.IstAktuell)=20))
GROUP BY Hauptdaten_Fall.MeldeKategorie
ORDER BY Hauptdaten_Fall.MeldeKategorie;

(sorry for the German names used here).

The MS-SQL server represents the result immediately - you just not notice
any delay.  If I do it on the PostgreSQL server it takes 30s on comparable
hardware :-(((.

I really have no idea what caused this big difference in speed which would
forbid the use of PostgreSQL for our application.  How can I checked
whether the indexes I created are really used?  What could be other
reasons for such a big difference in speed?

I´m using PostgreSQL 7.1.2 on Debian GNU/Linux (testing).

Any help appreciated

Andreas.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [GENERAL] Performance question

2001-09-10 Thread Tille, Andreas

On Mon, 10 Sep 2001 [EMAIL PROTECTED] wrote:

 Use explain. Explain tells you the query plan of the optimizer.

 explain SELECT .;
Thanks I just found the thread Index usage question and tried to make
some profit from it:

explain SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM 
Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) GROUP BY 
Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie;

NOTICE:  QUERY PLAN:

Aggregate  (cost=35267.33..36154.62 rows=17746 width=16)
  -  Group  (cost=35267.33..35710.98 rows=177458 width=16)
-  Sort  (cost=35267.33..35267.33 rows=177458 width=16)
  -  Seq Scan on hauptdaten_fall  (cost=0.00..15024.12 rows=177458 
width=16)


Now I tried

Vacuum analyze;

but nothing changed :-(

Aggregate  (cost=35267.33..36154.62 rows=17746 width=16)
  -  Group  (cost=35267.33..35710.98 rows=177458 width=16)
-  Sort  (cost=35267.33..35267.33 rows=177458 width=16)
  -  Seq Scan on hauptdaten_fall  (cost=0.00..15024.12 rows=177458 
width=16)

I have nearly no experience with query optimizing but the gread difference
in speed tells me that something is wrong here.  There were some hints in
the Index usage question thread about some fields which might be interpreted
as strings.  Could this be a reason and how to check this?

Kind regards

   Andreas.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [GENERAL] Performance question

2001-09-10 Thread Roderick A. Anderson

On Mon, 10 Sep 2001, Tille, Andreas wrote:

 Hello,
 
 Now I started some performance comparisons and did the following statement:

 The MS-SQL server represents the result immediately - you just not notice
 any delay.  If I do it on the PostgreSQL server it takes 30s on comparable
 hardware :-(((.

Did you do a VACUUM ANALYZE after loading the data?



Rod
-- 
  A small clue and no money
  will get you further than
  lots of money and no clue.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [GENERAL] Performance question

2001-09-10 Thread Stephan Szabo


On Mon, 10 Sep 2001, Tille, Andreas wrote:

 On Mon, 10 Sep 2001 [EMAIL PROTECTED] wrote:
 
  Use explain. Explain tells you the query plan of the optimizer.
 
  explain SELECT .;
 Thanks I just found the thread Index usage question and tried to make
 some profit from it:
 
 explain SELECT Hauptdaten_Fall.MeldeKategorie,
 Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE
 (((Hauptdaten_Fall.IstAktuell)=20)) GROUP BY
 Hauptdaten_Fall.MeldeKategorie ORDER BY
 Hauptdaten_Fall.MeldeKategorie;
 
 Aggregate  (cost=35267.33..36154.62 rows=17746 width=16)
   -  Group  (cost=35267.33..35710.98 rows=177458 width=16)
 -  Sort  (cost=35267.33..35267.33 rows=177458 width=16)
   -  Seq Scan on hauptdaten_fall  (cost=0.00..15024.12 rows=177458 
width=16)
 
 I have nearly no experience with query optimizing but the gread difference
 in speed tells me that something is wrong here.  There were some hints in
 the Index usage question thread about some fields which might be interpreted
 as strings.  Could this be a reason and how to check this?

What's the schema for the table? How many rows are in the table?  How many
rows actually have IstAktuell=20 (is 177458 a reasonable estimate?).  If
not, is there a common, non-NULL value that is much more common than other
values?



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [GENERAL] Performance question

2001-09-10 Thread Herbert.Liechti

On Mon, 10 Sep 2001, Tille, Andreas wrote:

 Hello,

 I have ported a database from MS SQL Server to PostgreSQL.  The database has
 40 tables and 117 indexes which I defined for the same fields as in MS SQL.
 I converted the data using some SQL output from MS SQL server and inserted
 it with psql.

 Now I started some performance comparisons and did the following statement:

 SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz
 FROM Hauptdaten_Fall
 WHERE (((Hauptdaten_Fall.IstAktuell)=20))
 GROUP BY Hauptdaten_Fall.MeldeKategorie
 ORDER BY Hauptdaten_Fall.MeldeKategorie;

 (sorry for the German names used here).

 The MS-SQL server represents the result immediately - you just not notice
 any delay.  If I do it on the PostgreSQL server it takes 30s on comparable
 hardware :-(((.

 I really have no idea what caused this big difference in speed which would
 forbid the use of PostgreSQL for our application.  How can I checked
 whether the indexes I created are really used?  What could be other
 reasons for such a big difference in speed?

Use explain. Explain tells you the query plan of the optimizer.

explain SELECT .;

Best regards
Herbie
-- 

Herbert Liechti  http://www.thinx.ch
ThinX networked business servicesAdlergasse 5, CH-4500 Solothurn




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [GENERAL] Performance question

2001-09-10 Thread Einar Karttunen

On Mon, Sep 10, 2001 at 02:34:25PM +0200, Tille, Andreas wrote:
 On Mon, 10 Sep 2001 [EMAIL PROTECTED] wrote:
 
  Use explain. Explain tells you the query plan of the optimizer.
 
  explain SELECT .;
 Thanks I just found the thread Index usage question and tried to make
 some profit from it:
 
 explain SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM 
Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) GROUP BY 
Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie;
 
 NOTICE:  QUERY PLAN:
 
 Aggregate  (cost=35267.33..36154.62 rows=17746 width=16)
   -  Group  (cost=35267.33..35710.98 rows=177458 width=16)
 -  Sort  (cost=35267.33..35267.33 rows=177458 width=16)
   -  Seq Scan on hauptdaten_fall  (cost=0.00..15024.12 rows=177458 
width=16)
 
 
There must be a problem with your indeces. I tried the following:
temp=# CREATE TABLE Hauptdaten_Fall (
temp(# MeldeKategorie text,
temp(# ID integer,
temp(# IstAktuell integer);
CREATE
temp=# explain SELECT MeldeKategorie,count(ID) FROM Hauptdaten_Fall
temp-#  WHERE IstAktuell=20 GROUP BY MeldeKategorie
temp-#  ORDER BY MeldeKategorie;
NOTICE:  QUERY PLAN:

Aggregate  (cost=22.67..22.72 rows=1 width=16)
  -  Group  (cost=22.67..22.69 rows=10 width=16)
-  Sort  (cost=22.67..22.67 rows=10 width=16)
  -  Seq Scan on hauptdaten_fall  (cost=0.00..22.50 rows=10 width=16)

EXPLAIN
temp=# CREATE INDEX hfia ON Hauptdaten_Fall (IstAktuell);
CREATE
temp=# explain SELECT MeldeKategorie,count(ID) FROM Hauptdaten_Fall
temp-#  WHERE IstAktuell=20 GROUP BY MeldeKategorie
temp-#  ORDER BY MeldeKategorie;
NOTICE:  QUERY PLAN:

Aggregate  (cost=8.30..8.35 rows=1 width=16)
  -  Group  (cost=8.30..8.33 rows=10 width=16)
-  Sort  (cost=8.30..8.30 rows=10 width=16)
  -  Index Scan using hfia on hauptdaten_fall  (cost=0.00..8.14 rows=10 
width=16)

EXPLAIN
temp=#

Which shows quite clearly that an index scan will improve the situation drastically. 
Even more
so for you because the table seems to have very many rows in it.

- Einar Karttunen

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



VS: [GENERAL] performance question

2000-07-14 Thread Janne Blomqvist
Title: VS: [GENERAL] performance question





Tom Lane wrote:
 
 ernie cline [EMAIL PROTECTED] writes:
  Quick question. I am using postgres 7.0. When running a select query,
  does postgres lock the row (or table) while doing a select? Only on an
  insert/update/delete, right?
 
 SELECTs certainly don't block each other (except the SELECT FOR UPDATE
 variety).


I kind of didn't think so, but I just wanted to make sure I wasn't
insane. ;) 



  We are experiencing some performance
  problems when running 2 programs that access the database at once (only
  doing querys, and not even complex ones).
 
 Need more detail ...


What else do you need to know? Postgres 7.0 compiled with default
options, running on a VA Linux Fullon box, kernel 2.2.13, running VA
Linux/Redhat 6.1. Box is a single processor pentium 3 500mhz, 128megs
of ram, apache 1.3.9. Using Perl DBI interface to access postgres. A
very small html page is built (about 4.5k), and it takes about 2.6
seconds to execute and build the html. Running 2 of the pages at once,
it takes about 5 seconds for them to complete. Not total, but 5 for
each one. This is using just unix 'time' command, and moving my mouse
to another rxvt window and hitting enter after first hitting it in
another window ;). 


Is using Perl DBI slow? Would writing the cgi in C speed things up? 
(Please no holy wars for perl vs. C performance! G)
Any help would be appreciated ...

-ernie


You could take your query and run it in psql to see if the performance problem is with postgres or with your perl stuff. If it's postgres, redesign your query. Of course, this can be quite difficult... BTW, does a reasonably complicated SELECT run faster if you implement it using a view or a stored procedure? If I'm not entirely uncorrect high end commercial db:s like oracle can perform some kind of statistical optimization on views and stored procs. Is this also true for postgres?

If your perl stuff is slow then rewriting it in c won't help that much. Unless you have made some gross programming error which gets magically fixed in the c version, that is. Usually the culprit is the braindead way in which CGI:s work. Ie.:

1. CGI request to apache
2. Apache fires up the perl interpreter which byte-code compiles your script and runs it
3. Script initiates connection to postgres, which forks off a new backend to handle the request.
4. After the script has completed the request, it closes the db connection and gets unloaded. The forked postgres backend also dies, of course.

5. Phew! What a lot of work!
Rewriting in C fixes only point 2 to a certain extent. The buzzwords to fix the CGI problem are 1) in-process script interpreter and 2) persistent database connections. 1) can be solved by using for example mod_perl and 2) hmm... should be doable. I don't know if you have to do this for yourself or if some friendly guy has made some library to implement this. Or you can use AOLServer which does all this for you (my personal choice). But then you have to rewrite your perl stuff in tcl or some other language implemented by aolserver modules (python, java and php at least, I think). But considering that your box seems quite fast I suspect the problem is with your SQL anyway... You could of course also try the postgres tuning tips at 

http://www.openacs.org/doc/openacs/html/openacs-install-5.html#ss5.4
But this applies mostly to writes, I think..
If you are planning on running a production server, I also recommend you to upgrade your kernel to 2.2.16 which fixes a few serious bugs:

1) The disk trashing bug (Could be a serious problem! Ok this was fixed in 2.2.15.. or was it 2.2.14 already.. I don't remember.)

2) The much halooed root exploit bug (Applies only if you have untrusted accounts, if I remember correctly..)



Janne Blomqvist
[EMAIL PROTECTED]





Re: [GENERAL] performance question

2000-07-14 Thread ernie cline

Arg!  Thanks to everyone for their help.  I followed a few suggestions I
got from here, and installed Time::HiRes on my box (that part was my
idea actually ;)), to monitor how long the query's take.  I know all the
stats show the MySQL is faster, but in my situation, postgres is really
kicking ass.  None of the query's took longer than .03 seconds ;).  Then
again, the database is small ... but anyway, in my script, we also call
to an oracle Database.  THAT is slow.  Of the 2.6 seconds it takes to
execute the script, 2.3 is that oracle call.  Granted its a huge ~8 line
query, but its still oracle's "fault".  Thanks for all your help guys
and gals!

-ernie
P.S. Anyone know of a script to convert oracle db to postgres?  :)  Or
if I dump the oracle tables, will postgres be able to read them?

Tom Lane wrote:
 
 ernie cline [EMAIL PROTECTED] writes:
  Need more detail ...
 
  What else do you need to know?
 
 What queries are you issuing, *exactly*?  You might find it easiest
 to restart the postmaster with -d2 and collect the queries in the
 postmaster's log...
 
 regards, tom lane



Re: [GENERAL] performance question

2000-07-14 Thread Thomas Lockhart

 ...  I know all the stats show the MySQL is faster...

A story which, if and when true, stays true only for cases with one or a
very few users. We used to just accept MySQL's claims in this regard
without question, but it is finally dawning on us that they are not
doing representative tests for a multi-client environment :)

- Thomas



[GENERAL] Performance Question ODBC vs C

2000-03-19 Thread martin

Hi there, I have been doing some performance testing with ODBC and have
found that
ODBC is much slower than using C to call the Psql API
I don't understand why the results are like this, I thought ODBC would
slow things down a bit but not my much, all its doing in sending off SQL
straight the server?

Test

Server: 486 50mhz, 16 mb Ram
Client: P266, 32 mb Ram
SQL: 1000 INSERT INTO's with quite a lot of fields. Read from a text
input file

ODBC Test
Windows 98 running VB6 and the freeware ODBC driver (I forget the name)
Using TCP/IP network.
The VB program will do about 30 records per minute.

C Test
Linux C program, reads input file, calls API's. Run on Client PC so
still communicates over the network.
This setup will do 340 record per minute

My only guess is that the following bottlenecks are in the system:
VB6 code that reads the text file - unlikely, it run's very quick if you
remove the db.Execute call.
ODBC parsing, maybe
Something crappy about Windows, more likely

Any ideas?





--
End of Martin's email. \0





Re: [GENERAL] Performance Question ODBC vs C

2000-03-19 Thread Alex Pilosov

On Sun, 19 Mar 2000, martin wrote:

 Hi there, I have been doing some performance testing with ODBC and have
 found that
 ODBC is much slower than using C to call the Psql API
 I don't understand why the results are like this, I thought ODBC would
 slow things down a bit but not my much, all its doing in sending off SQL
 straight the server?
 
 Test
 
 Server: 486 50mhz, 16 mb Ram
 Client: P266, 32 mb Ram
 SQL: 1000 INSERT INTO's with quite a lot of fields. Read from a text
 input file
 
 ODBC Test
 Windows 98 running VB6 and the freeware ODBC driver (I forget the name)
 Using TCP/IP network.
 The VB program will do about 30 records per minute.
 
 C Test
 Linux C program, reads input file, calls API's. Run on Client PC so
 still communicates over the network.
 This setup will do 340 record per minute
 
 My only guess is that the following bottlenecks are in the system:
 VB6 code that reads the text file - unlikely, it run's very quick if you
 remove the db.Execute call.
 ODBC parsing, maybe
 Something crappy about Windows, more likely

ODBC sucks on its own, but I'd bet the difference is that you open a
transaction for insert in Linux and don't do that in Windows (how do you
open a transaction in ODBC?). Difference is usually about 1:10 when you do
inserts in transaction, and server doesn't have to fsync after each
insert...


-alex