Re: [HACKERS] Frequently updated tables

2004-06-14 Thread pgsql
 [EMAIL PROTECTED] wrote:


I have been talking about two types of problems which are both based on
PostgreSQL's behavior with frequently updated tables.

Summary table: In the single row table system, you have to vacuum very
requently, and this affects performance.

Frequently updated tables: think about the session table for a website.
Each new user gets a new session row. Everytime they refresh or act in
 the
site, the row is updated. When they leave or their session times out, the
row is deleted. I wrote a RAM only session manager for PHP because
PostgreSQL couldn't handle the volume. (2000 hits a second)



 It would be interesting to see if the vacuum delay patch, fsm tuning +
 vacuum scheduling could have changed this situation. Clearly there is an
 issue here (hence a patch...), but ISTM that just as significant is the
 fact that it is difficult to know how to configure the various bits and
 pieces, and also difficult to know if it has been done optimally.

If you have an active site, with hundreds or thousands of hits a second,
vacuuming the table constantly is not practical.

I don't think anyone who has seriously looked at these issues has
concluded that PostgreSQL works fine in these cases. The question is
 what,
if anything, can be done? The frequent update issue really affects
PostgreSQL's acceptance in web applications, and one which MySQL seems to
do a better job.




 As an aside, I have had similar issues with DB2 and high update tables -
 lock escalations (locklist tuning needed). It is not just
 non-overwriting storage managers that need the magic tuning wand :-)

Funny, I've used DB2 for a few projects, but never for a web session
system. This is an interesting data point thanks.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Frequently updated tables

2004-06-14 Thread Mark Kirkwood
[EMAIL PROTECTED] wrote:
I have been talking about two types of problems which are both based on
PostgreSQL's behavior with frequently updated tables.
Summary table: In the single row table system, you have to vacuum very
requently, and this affects performance.
Frequently updated tables: think about the session table for a website.
Each new user gets a new session row. Everytime they refresh or act in the
site, the row is updated. When they leave or their session times out, the
row is deleted. I wrote a RAM only session manager for PHP because
PostgreSQL couldn't handle the volume. (2000 hits a second)
 

It would be interesting to see if the vacuum delay patch, fsm tuning + 
vacuum scheduling could have changed this situation. Clearly there is an 
issue here (hence a patch...), but ISTM that just as significant is the 
fact that it is difficult to know how to configure the various bits and 
pieces, and also difficult to know if it has been done optimally.

If you have an active site, with hundreds or thousands of hits a second,
vacuuming the table constantly is not practical.
I don't think anyone who has seriously looked at these issues has
concluded that PostgreSQL works fine in these cases. The question is what,
if anything, can be done? The frequent update issue really affects
PostgreSQL's acceptance in web applications, and one which MySQL seems to
do a better job.
 

As an aside, I have had similar issues with DB2 and high update tables - 
lock escalations (locklist tuning needed). It is not just 
non-overwriting storage managers that need the magic tuning wand :-)

regards
Mark
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Frequently updated tables

2004-06-10 Thread Josh Berkus
Mohawksoft:

 I don't think anyone who has seriously looked at these issues has
 concluded that PostgreSQL works fine in these cases. The question is what,
 if anything, can be done? The frequent update issue really affects
 PostgreSQL's acceptance in web applications, and one which MySQL seems to
 do a better job.

I think that we'd welcome any suggestions that don't break MVCC.   Do you have 
any?   

MySQL is able to handle this situation -- in MyISAM tables -- precisely 
because there is no transaction isolation and they regard 97% data integrity 
as good enough.   Essentially, the MyISAM tables are little better than 
delimited text flatfiles.  That's not an approach we can take. 

 IMHO, this issue, a two stage commit based replication system, and a real
 and usable setup/configuration system are all that stands between
 PostgreSQL and the serious enterprise deployment.

There I have to disagree with you.   The features you mention may be important 
to your clients, but they are not to mine; instead, we're waiting for 
clustering, and table partitioning in addition to what's in 7.5.   Please 
don't assume that all DB applications have the same needs as yours.  The 
problems you raise are legitimate, but not everyone shares your priorities. 

Besides, we already have serious enterprise deployment.   5 of my clients 
are startups which run on PostgreSQL.   The .ORG and .INFO domains run on 
PostgreSQL.   There are two commerical-grade, deployed, ERP systems for 
manufacturers which run on PostgreSQL.What is your definition of 
enterprise deployment, exactly?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Frequently updated tables

2004-06-09 Thread pgsql

 [EMAIL PROTECTED] wrote:

The best phrasing would be the accumulating overhead of deletes and
updates.

Yes.



 Are you using 7.3?

 I am asking because in 7.3 high update / delete tables could suffer
 (index and toast) bloat that was untamable via (lazy) VACUUM and FSM.
 I believe this is fixed in 7.4, so it should be possible to achieve on
 disk size control of tables / indexes by configuring FSM and (lazy)
 VACUUM. Did you find this not to be the case?

Interesting, the company is usng 7.3.4. One single row summary table got
up to 2 million dead rows. A select from that single row took a quarter of
a second. A regular vacuum did not fix it, only a vacuum full did.
However, when the test was re-run with constant vacuums, it did not get
out of hand.

My concern is performance, and yes, for inserts PostgreSQL is great. For
data that is constantly being updated, PostgreSQL is a bit weak. Think
about a table with a few million rows that needs to be updated a few
thousand times a minute.

I love PG, I've been using it since version 6x, and it has gotten
fantastic over the years, and in many cases, I would choose it over
Oracle, but for systems that need frequent updates, I have a lot of
concerns.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Frequently updated tables

2004-06-09 Thread pgsql
 I love PG, I've been using it since version 6x, and it has gotten
 fantastic over the years, and in many cases, I would choose it over
 Oracle, but for systems that need frequent updates, I have a lot of
 concerns.

 ...that's the price you pay for concurrency man...

I think that's a cop-out. Other databases can handle this case fine and
they have MVCC.

Are we not open source free software proponents? Isn't one of our
motivations that we can do it better?

This *is* a problem with PostgreSQL, and it *is* a concern for a
reasonable number of potential deployments.



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Frequently updated tables

2004-06-09 Thread Christopher Kings-Lynne
I love PG, I've been using it since version 6x, and it has gotten
fantastic over the years, and in many cases, I would choose it over
Oracle, but for systems that need frequent updates, I have a lot of
concerns.
...that's the price you pay for concurrency man...
Chris
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Frequently updated tables

2004-06-09 Thread Alvaro Herrera
On Wed, Jun 09, 2004 at 10:49:20PM +0800, Christopher Kings-Lynne wrote:
 I love PG, I've been using it since version 6x, and it has gotten
 fantastic over the years, and in many cases, I would choose it over
 Oracle, but for systems that need frequent updates, I have a lot of
 concerns.
 
 ...that's the price you pay for concurrency man...

Also he said that the problem was solved with enough lazy VACUUM
scheduling.  I don't understand why he doesn't want to use that
solution.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
People get annoyed when you try to debug them.  (Larry Wall)


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Frequently updated tables

2004-06-09 Thread pgsql
 On Wed, Jun 09, 2004 at 10:49:20PM +0800, Christopher Kings-Lynne wrote:
 I love PG, I've been using it since version 6x, and it has gotten
 fantastic over the years, and in many cases, I would choose it over
 Oracle, but for systems that need frequent updates, I have a lot of
 concerns.

 ...that's the price you pay for concurrency man...

 Also he said that the problem was solved with enough lazy VACUUM
 scheduling.  I don't understand why he doesn't want to use that
 solution.


Sigh, because vacuums take away from performance. Imagine a table that has
to be updated on the order of a few thousand times a minute. Think about
the drop in performance during the vacuum.

On a one row table, vacuum is not so bad, but try some benchmarks on a
table with a goodly number of rows.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Frequently updated tables

2004-06-09 Thread Bruno Wolff III
On Wed, Jun 09, 2004 at 13:41:27 -0400,
  [EMAIL PROTECTED] wrote:
 
 Sigh, because vacuums take away from performance. Imagine a table that has
 to be updated on the order of a few thousand times a minute. Think about
 the drop in performance during the vacuum.
 
 On a one row table, vacuum is not so bad, but try some benchmarks on a
 table with a goodly number of rows.

But you only need to rapidly vacuum the one table that is keeping your
totals record. This isn't going to be a big hit in performance relative
to the updates that are going on. You don't need to vacuum the tables
you are doing the inserts or updates to at that same rate.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Frequently updated tables

2004-06-09 Thread Joshua D. Drake

Also he said that the problem was solved with enough lazy VACUUM
scheduling.  I don't understand why he doesn't want to use that
solution.
Because even lazy VACUUM is horrendous to performance but as I said in
a further post this has been pretty much fixed by (Jan I believe) in 7.5.
Sincerely,
Joshua D. Drake



--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
begin:vcard
fn:Joshua D. Drake
n:Drake;Joshua D.
org:Command Prompt, Inc.
adr:;;PO Box 215;Cascade Locks;Oregon;97014;USA
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0034
note:Command Prompt, Inc. is the largest and oldest US based commercial PostgreSQL support provider. We  provide the only commercially viable integrated PostgreSQL replication solution, but also custom programming, and support. We authored  the book Practical PostgreSQL, the procedural language plPHP, and adding trigger capability to plPerl.
x-mozilla-html:FALSE
url:http://www.commandprompt.com/
version:2.1
end:vcard


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Frequently updated tables

2004-06-09 Thread Joshua D. Drake

Sigh, because vacuums take away from performance. 
This is a known issue that has been pretty much resolved for 7.5. Vacuum 
in 7.5 does not take even close to as much IO resources.

Imagine a table that has
to be updated on the order of a few thousand times a minute. Think about
the drop in performance during the vacuum.
On a one row table, vacuum is not so bad, but try some benchmarks on a
table with a goodly number of rows.
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
begin:vcard
fn:Joshua D. Drake
n:Drake;Joshua D.
org:Command Prompt, Inc.
adr:;;PO Box 215;Cascade Locks;Oregon;97014;USA
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0034
note:Command Prompt, Inc. is the largest and oldest US based commercial PostgreSQL support provider. We  provide the only commercially viable integrated PostgreSQL replication solution, but also custom programming, and support. We authored  the book Practical PostgreSQL, the procedural language plPHP, and adding trigger capability to plPerl.
x-mozilla-html:FALSE
url:http://www.commandprompt.com/
version:2.1
end:vcard


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Frequently updated tables

2004-06-09 Thread Scott Marlowe
On Wed, 2004-06-09 at 11:41, [EMAIL PROTECTED] wrote:
  On Wed, Jun 09, 2004 at 10:49:20PM +0800, Christopher Kings-Lynne wrote:
  I love PG, I've been using it since version 6x, and it has gotten
  fantastic over the years, and in many cases, I would choose it over
  Oracle, but for systems that need frequent updates, I have a lot of
  concerns.
 
  ...that's the price you pay for concurrency man...
 
  Also he said that the problem was solved with enough lazy VACUUM
  scheduling.  I don't understand why he doesn't want to use that
  solution.
 
 
 Sigh, because vacuums take away from performance. Imagine a table that has
 to be updated on the order of a few thousand times a minute. Think about
 the drop in performance during the vacuum.
 
 On a one row table, vacuum is not so bad, but try some benchmarks on a
 table with a goodly number of rows.

Several points:

All databases pay to clean up the mess they've made, so to speak.  In
PostgreSQL you get to choose when, instead of always paying the price at
the end of transaction.

Lazy vacuum does not impact performance nearly as much as the old full
vacuum.  With the sleep / delay patch that's been passed around on
hackers its impact is virtually zero on the rest of the database

Properly setup fsm settings, pg_autovacuum deamon, and an installation
of the sleep / delay patch mentioned aobve makes this a non-issue.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Frequently updated tables

2004-06-09 Thread Alvaro Herrera
On Wed, Jun 09, 2004 at 01:41:27PM -0400, [EMAIL PROTECTED] wrote:
  On Wed, Jun 09, 2004 at 10:49:20PM +0800, Christopher Kings-Lynne wrote:

  Also he said that the problem was solved with enough lazy VACUUM
  scheduling.  I don't understand why he doesn't want to use that
  solution.
 
 Sigh, because vacuums take away from performance. Imagine a table that has
 to be updated on the order of a few thousand times a minute. Think about
 the drop in performance during the vacuum.
 
 On a one row table, vacuum is not so bad, but try some benchmarks on a
 table with a goodly number of rows.

Hmm, this can be a problem if VACUUM pollutes the shared buffer pool.
So what about a new buffer replacement policy that takes this into
account and is not fooled by VACUUM?  This is already implemented in
7.5.  Also, how about a background writer process that writes dirty
buffers so that backends don't have to wait for IO to complete when a
dirty buffer has to be written?  This is also in current CVS.


Have you tried and measured how the current CVS code performs?  Jan
Wieck reported a lot of performance improvement some time ago while he
was developing this.  The code has changed since and I have not seen any
measurement.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Oh, oh, las chicas galacianas, lo harán por las perlas,
¡Y las de Arrakis por el agua! Pero si buscas damas
Que se consuman como llamas, ¡Prueba una hija de Caladan! (Gurney Halleck)


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Frequently updated tables

2004-06-09 Thread Jim C. Nasby
On Tue, Jun 08, 2004 at 07:16:45PM -0400, [EMAIL PROTECTED] wrote:
 
 
  [EMAIL PROTECTED] wrote:
 
 I've been down several roads about how to handle data that has to change
 on a very frequent and rapid manner.
 
 Think about summary tables, WEB session tables, etc. As great as MVCC is
 for the vast majority of uses. The overhead of updates and deletes can
 kill a project that needs to constantly update tables.
 
 
 
 
 
  Are you saying that MVCC has *by design* a higher overhead for updates
  and deletes? or are you referring to the gradual loss of performance as
  a consequence of many dead tuples?
 
  I am guessing you mean the latter, but best to be sure :-)
 The best phrasing would be the accumulating overhead of deletes and
 updates.
 
 Yes.
 
Doesn't pg_autovacuum largely take care of this issue?
-- 
Jim C. Nasby, Database Consultant  [EMAIL PROTECTED]
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Frequently updated tables

2004-06-09 Thread pgsql
 On Wed, Jun 09, 2004 at 13:41:27 -0400,
   [EMAIL PROTECTED] wrote:

 Sigh, because vacuums take away from performance. Imagine a table that
 has
 to be updated on the order of a few thousand times a minute. Think about
 the drop in performance during the vacuum.

 On a one row table, vacuum is not so bad, but try some benchmarks on a
 table with a goodly number of rows.

 But you only need to rapidly vacuum the one table that is keeping your
 totals record. This isn't going to be a big hit in performance relative
 to the updates that are going on. You don't need to vacuum the tables
 you are doing the inserts or updates to at that same rate.


I have been talking about two types of problems which are both based on
PostgreSQL's behavior with frequently updated tables.

Summary table: In the single row table system, you have to vacuum very
requently, and this affects performance.

Frequently updated tables: think about the session table for a website.
Each new user gets a new session row. Everytime they refresh or act in the
site, the row is updated. When they leave or their session times out, the
row is deleted. I wrote a RAM only session manager for PHP because
PostgreSQL couldn't handle the volume. (2000 hits a second)

If you have an active site, with hundreds or thousands of hits a second,
vacuuming the table constantly is not practical.

I don't think anyone who has seriously looked at these issues has
concluded that PostgreSQL works fine in these cases. The question is what,
if anything, can be done? The frequent update issue really affects
PostgreSQL's acceptance in web applications, and one which MySQL seems to
do a better job.

IMHO, this issue, a two stage commit based replication system, and a real
and usable setup/configuration system are all that stands between
PostgreSQL and the serious enterprise deployment.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] Frequently updated tables

2004-06-08 Thread pgsql
I've been down several roads about how to handle data that has to change
on a very frequent and rapid manner.

Think about summary tables, WEB session tables, etc. As great as MVCC is
for the vast majority of uses. The overhead of updates and deletes can
kill a project that needs to constantly update tables.

In most cases, the answer is just not to use PostgreSQL for that, but then
you are stuck with Mysql or something worse.

Would having a special class of table that is marked for frequent updates,
which causes the system to lock and update in place, make sense? Is that
even possible?

There has to be a way of making PostgreSQL able to handle this class of
problem. Anyone have a reasonable idea?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Frequently updated tables

2004-06-08 Thread pgsql


 [EMAIL PROTECTED] wrote:

I've been down several roads about how to handle data that has to change
on a very frequent and rapid manner.

Think about summary tables, WEB session tables, etc. As great as MVCC is
for the vast majority of uses. The overhead of updates and deletes can
kill a project that needs to constantly update tables.





 Are you saying that MVCC has *by design* a higher overhead for updates
 and deletes? or are you referring to the gradual loss of performance as
 a consequence of many dead tuples?

 I am guessing you mean the latter, but best to be sure :-)
The best phrasing would be the accumulating overhead of deletes and
updates.

Yes.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Frequently updated tables

2004-06-08 Thread Mark Kirkwood
[EMAIL PROTECTED] wrote:
The best phrasing would be the accumulating overhead of deletes and
updates.
Yes.
 

Are you using 7.3?
I am asking because in 7.3 high update / delete tables could suffer 
(index and toast) bloat that was untamable via (lazy) VACUUM and FSM.
I believe this is fixed in 7.4, so it should be possible to achieve on 
disk size control of tables / indexes by configuring FSM and (lazy) 
VACUUM. Did you find this not to be the case?

regards
Mark
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html