subject:"\[PERFORM\] Minimizing dead tuples caused by update triggers"

Re: [PERFORM] Minimizing dead tuples caused by update triggers

2007-12-20 Thread Dimitri Fontaine

Le jeudi 20 décembre 2007, Decibel! a écrit :
 A work-around others have used is to have the trigger just insert
 into a 'staging' table and then periodically take the records from
 that table and summarize them somewhere else.

And you can even use the PgQ skytools implementation to easily have this kind 
of 'staging'-table with a producer and one or many subscribers. See those 
references if you're interrested:
http://kaiv.wordpress.com/2007/10/19/skytools-database-scripting-framework-pgq/
  http://skytools.projects.postgresql.org/doc/pgq-sql.html
  http://skytools.projects.postgresql.org/doc/pgq-admin.html
  http://skytools.projects.postgresql.org/doc/pgq-nodupes.html

Hope this helps, regards,
-- 
dim


signature.asc
Description: This is a digitally signed message part.

[PERFORM] Minimizing dead tuples caused by update triggers

2007-12-19 Thread Dan Harris

I've been fighting with the common workarounds for inadequate response  
times on select count(*) and min(),max() on tables with tens of  
millions of rows for quite a while now and understand the reasons for  
the table scans.


I have applications that regularly poll a table ( ideally, the more  
frequent, the better ) to learn the most recent data inside it as well  
as the # of rows in it ( among a few other things ).  As the databases  
have grown in size, these summarizations could no longer be done on  
the fly, so I wrote a database wrapper API that tracks those values  
internally.


This wrapper has grown very complex and is difficult to manage across  
different systems.  What I'd like to do instead is implement triggers  
for insert, updates, and deletes to check and/or replace a value in a  
table_stats, representing table count, min/max dates, and a few  
other costly operations.. that can then be queried in short order.  I  
know this is a fairly common thing to do.


The thing that concerns me is dead tuples on the table_stats table.  I  
believe that every insert of new data in one of the monitored tables  
will result in an UPDATE of the table_stats table.  When thousands  
( or millions ) of rows are inserted, the select performance ( even  
trying with an index ) on table_stats slows down in a hurry.  If I  
wrap the inserts into large transactions, will it only call the update  
on table_states when I commit?


Obviously I want to vacuum this table regularly to recover this.  The  
problem I'm running into is contention between VACUUM ( not full ) and  
pg_dump ( version 8.0.12 ).  My system backups takes 6 hours to run  
pg_dump on a 400GB cluster directory.  If the vacuum command fires  
during the dump, it forces an exclusive lock and any queries will hang  
until pg_dump finishes.


If I have to wait until pg_dump is finished before issuing the VACUUM  
command, everything slows down significantly as the dead tuples in  
table_stats pile up.


What strategy could I employ to either:

1. resolve the contention between pg_dump and vacuum, or
2. reduce the dead tuple pile up between vacuums

Thanks for reading

-Dan

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [PERFORM] Minimizing dead tuples caused by update triggers

2007-12-19 Thread Tom Lane

Dan Harris [EMAIL PROTECTED] writes:
 The thing that concerns me is dead tuples on the table_stats table.  I  
 believe that every insert of new data in one of the monitored tables  
 will result in an UPDATE of the table_stats table.  When thousands  
 ( or millions ) of rows are inserted, the select performance ( even  
 trying with an index ) on table_stats slows down in a hurry.

Yup.  FWIW, 8.3's HOT tuple updates might help this quite a lot.
Not a lot to be done about it in 8.0.x though :-(

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Minimizing dead tuples caused by update triggers

2007-12-19 Thread Decibel!


On Dec 19, 2007, at 6:39 PM, Tom Lane wrote:
The thing that concerns me is dead tuples on the table_stats  
table.  I

believe that every insert of new data in one of the monitored tables
will result in an UPDATE of the table_stats table.  When thousands
( or millions ) of rows are inserted, the select performance ( even
trying with an index ) on table_stats slows down in a hurry.


Yup.  FWIW, 8.3's HOT tuple updates might help this quite a lot.
Not a lot to be done about it in 8.0.x though :-(



A work-around others have used is to have the trigger just insert  
into a 'staging' table and then periodically take the records from  
that table and summarize them somewhere else. You still have a vacuum  
concern on the staging table, but the advantage is that you trigger  
path is a simple insert instead of an update, which is effectively a  
delete and an insert.


This is a case where a cron'd vacuum that runs once a minute is  
probably a wise idea.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature

Re: [PERFORM] Minimizing dead tuples caused by update triggers

[PERFORM] Minimizing dead tuples caused by update triggers

Re: [PERFORM] Minimizing dead tuples caused by update triggers

Re: [PERFORM] Minimizing dead tuples caused by update triggers

4 matches

Site Navigation

Mail list logo

Footer information