Re: [GENERAL] Alternative for vacuuming queue-like tables

2006-05-04 Thread Jim C. Nasby
On Sat, Apr 29, 2006 at 06:39:21PM -0400, Tom Lane wrote:
 Florian G. Pflug [EMAIL PROTECTED] writes:
  The general problem seems to be that a transaction has no way to promise 
  never to touch a specific table. Maybe some kind of negative lock 
  would help here - you'd do exclude table foo from transaction at the
  start of your transaction, which would cause postgres to raise an error
  if you indeed tried to access that table. Vacuum could then ignore your
  transaction when deciding which tuples it can safely remove from the 
  table foo.
 
 Unfortunately that really wouldn't help VACUUM at all.  The nasty
 problem for VACUUM is that what it has to figure out is not the oldest
 transaction that it thinks is running, but the oldest transaction that
 anyone else thinks is running.  So what it looks through PGPROC for is
 not the oldest XID, but the oldest XMIN.  And even if it excluded procs
 that had promised not to touch the target table, it would find that
 their XIDs had been factored into other processes' XMINs, resulting
 in no improvement.
 
 As a comparison point, VACUUM already includes code to ignore backends
 in other databases (if it's vacuuming a non-shared table), but it turns
 out that that code is almost entirely useless :-(, because those other
 backends still get factored into the XMINs computed by backends that are
 in the same database as VACUUM.  We've speculated about fixing this by
 having each backend compute and advertise both global and database
 local XMINs, but the extra cycles that'd need to be spent in *every*
 snapshot computation seem like a pretty nasty penalty.  And the approach
 certainly does not scale to anything like per-table exclusions.

I'd actually been thinking about this recently, and had come up with the
following half-baked ideas:

Allow a transaction to specify exactly what tables it will be touching,
perhaps as an extension to BEGIN. Should any action that transaction
takes attempt to access a table not specified, throw an error.

A possible variant on that would be to automatically determine at
transaction start all the tables that would be accessed by that
transaction.

Once that list is available, vacuum should be able to use it to ignore
any transactions that have promised not to touch whatever table it's
vacuuming. While this doesn't help with transactions that don't make any
promises, for short-running transactions we don't really care. As long
as all long-running transactions state their intentions it should allow
for useful vacuuming of queue tables and the like.

Given that we don't care about what tables a short-running transaction
will access, we could delay the (probably expensive) determination of
what tables a backend will access until the transaction is over a
specific age. At that point the list of tables could be built and
(theoretically) the transactions XMIN adjusted accordingly.

Unfortunately this won't help with pg_dump right now. But an extension
would be to allow passing a table order into pg_dump so that it dumps
queue tables first. Once a table is dumped pg_dump shouldn't need to
access it again, so it's theoretically possible to make the I promise
not to access these tables list dynamic during the life of a
transaction; as pg_dump finishes with tables it could then promise not
to touch them again.

Or maybe a better idea would just be to come up with some other means
for people to do things like queue tables and session tables...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Alternative for vacuuming queue-like tables

2006-05-04 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 I'd actually been thinking about this recently, and had come up with the
 following half-baked ideas:

 Allow a transaction to specify exactly what tables it will be touching,
 perhaps as an extension to BEGIN. Should any action that transaction
 takes attempt to access a table not specified, throw an error.

 A possible variant on that would be to automatically determine at
 transaction start all the tables that would be accessed by that
 transaction.

 Once that list is available, vacuum should be able to use it to ignore
 any transactions that have promised not to touch whatever table it's
 vacuuming.

No, you missed my point entirely.  The above would help not at all,
unless the restrictions were somehow propagated through XMIN
calculations, which seems impracticable.  (Every backend calculate a
separate XMIN with respect to every table that's being mentioned by any
other backend?  I don't think so...)

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Alternative for vacuuming queue-like tables

2006-05-04 Thread Florian G. Pflug

Tom Lane wrote:

Jim C. Nasby [EMAIL PROTECTED] writes:


I'd actually been thinking about this recently, and had come up with the
following half-baked ideas:



Allow a transaction to specify exactly what tables it will be touching,
perhaps as an extension to BEGIN. Should any action that transaction
takes attempt to access a table not specified, throw an error.



A possible variant on that would be to automatically determine at
transaction start all the tables that would be accessed by that
transaction.



Once that list is available, vacuum should be able to use it to ignore
any transactions that have promised not to touch whatever table it's
vacuuming.


No, you missed my point entirely.  The above would help not at all,
unless the restrictions were somehow propagated through XMIN
calculations, which seems impracticable.  (Every backend calculate a
separate XMIN with respect to every table that's being mentioned by any
other backend?  I don't think so...)


So basically the problem is that calculating a cluster-wide XMIN is fine from
a performance point of view, but to coarse from a vacuum point of view. OTOH
calculating a table-specific XMIN is fine enough for vacuum to vacuum queue-like
tables efficiently, but bad from a performance point of view.

What about doing the xmin-calculations and the xmin exporting per tablespace 
instead?
This would be more generic than the local/global split, because all global 
objects
are in the pg_global tablespace I believe. OTOH most people probably don't have 
more
than a handfull of tablespaces, which seems a manageable amount...

greetings, Florian Pflug

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Alternative for vacuuming queue-like tables

2006-05-04 Thread Jim C. Nasby
On Thu, May 04, 2006 at 03:30:34PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  I'd actually been thinking about this recently, and had come up with the
  following half-baked ideas:
 
  Allow a transaction to specify exactly what tables it will be touching,
  perhaps as an extension to BEGIN. Should any action that transaction
  takes attempt to access a table not specified, throw an error.
 
  A possible variant on that would be to automatically determine at
  transaction start all the tables that would be accessed by that
  transaction.
 
  Once that list is available, vacuum should be able to use it to ignore
  any transactions that have promised not to touch whatever table it's
  vacuuming.
 
 No, you missed my point entirely.  The above would help not at all,
 unless the restrictions were somehow propagated through XMIN
 calculations, which seems impracticable.  (Every backend calculate a
 separate XMIN with respect to every table that's being mentioned by any
 other backend?  I don't think so...)

I mentioned it further down the post, as well as the idea that we really
don't care about short-lived transactions, so theoretically we could
just compute this information for transactions that have been running
for more than some period of time. Presumably the overhead of
calculating a seperate XMIN for each table wouldn't be that great for a
transaction that's already been running 15 seconds...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Alternative for vacuuming queue-like tables

2006-05-02 Thread Csaba Nagy
  I guess what I'm asking for is a kind of REBUILD TABLE which is not
  MVCC by definition but it would be useful in the mentioned queue table
  case.
 
 
 vaccum full ?

Nope, it won't work, it will still leave in all the dead tuples
potentially visible by old transactions, even if those transactions
didn't touch yet the vacuumed table. In my case that means for some
tables sometimes 1000s times of bloat, given that the table always has
few active rows and high insert/update/delete rates...

CLUSTER is locking the table exclusively, which means it will wait until
all transactions which actually touched the table are finished, and then
rebuild the table, ignoring dead rows, without caring about old
transactions who could still see the dead rows (at least this is how I
think it works based on my observations). Perfect for my purpose, but
not MVCC...

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Alternative for vacuuming queue-like tables

2006-04-29 Thread Florian G. Pflug

Csaba Nagy wrote:

There is, I believe, a problem there; there is a scenario where data
can get dropped out from under those old connections.

This has been added to the TODO...

http://www.postgresql.org/docs/faqs.TODO.html

* Make CLUSTER preserve recently-dead tuples per MVCC requirements


OK, I can see this being a problem in the general case. 


However, for my queue table the current behavior is a life-saver. Would
it be possible to still provide a variant of rebuild which is documented
to not be MVCC compliant ? Something like: I don't care the old
transactions which did not touch yet this table to see the old data for
this table, I want to rebuild it. 


I actually don't care about clustering in this case, only about
rebuilding the table once I can get an exclusive lock on it. I guess
this would be possible with the switch to a copy method except the new
table is a different entity (as mentioned in another post) and it will
not preserve the dependencies of the original table.

I guess what I'm asking for is a kind of REBUILD TABLE which is not
MVCC by definition but it would be useful in the mentioned queue table
case.
The general problem seems to be that a transaction has no way to promise 
never to touch a specific table. Maybe some kind of negative lock 
would help here - you'd do exclude table foo from transaction at the

start of your transaction, which would cause postgres to raise an error
if you indeed tried to access that table. Vacuum could then ignore your
transaction when deciding which tuples it can safely remove from the 
table foo.


This would be a nice feature IMHO - would it be possible to do something
like that, or am I overlooking something.

greetings, Florian Pflug

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Alternative for vacuuming queue-like tables

2006-04-29 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 The general problem seems to be that a transaction has no way to promise 
 never to touch a specific table. Maybe some kind of negative lock 
 would help here - you'd do exclude table foo from transaction at the
 start of your transaction, which would cause postgres to raise an error
 if you indeed tried to access that table. Vacuum could then ignore your
 transaction when deciding which tuples it can safely remove from the 
 table foo.

Unfortunately that really wouldn't help VACUUM at all.  The nasty
problem for VACUUM is that what it has to figure out is not the oldest
transaction that it thinks is running, but the oldest transaction that
anyone else thinks is running.  So what it looks through PGPROC for is
not the oldest XID, but the oldest XMIN.  And even if it excluded procs
that had promised not to touch the target table, it would find that
their XIDs had been factored into other processes' XMINs, resulting
in no improvement.

As a comparison point, VACUUM already includes code to ignore backends
in other databases (if it's vacuuming a non-shared table), but it turns
out that that code is almost entirely useless :-(, because those other
backends still get factored into the XMINs computed by backends that are
in the same database as VACUUM.  We've speculated about fixing this by
having each backend compute and advertise both global and database
local XMINs, but the extra cycles that'd need to be spent in *every*
snapshot computation seem like a pretty nasty penalty.  And the approach
certainly does not scale to anything like per-table exclusions.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] Alternative for vacuuming queue-like tables

2006-04-28 Thread Csaba Nagy
Hi all,

Short background: postgres does not support very well queue type tables
in an environment where these queue tables are small in size but heavily
inserted/updated/deleted, while there are activities in the system which
cause long running transactions. The reason is that the queue table
cannot be vacuumed properly due to the long running transactions, and
suffers bloat.

This was a quite big problem for our postgres systems, until I
discovered that there is an alternative solution to vacuuming if the
transactions on the queue table are always short. Beside vacuuming
aggressively the queue table (which works fine most of the time, when I
have no long running transaction currently) I placed a cron job to
cluster the queue table on it's PK index. That seems to work well, the
table is kept small even in the presence of long running transactions.

The only time this is NOT working if some long running transaction
actually accesses the queue table. An example is a full DB backup.

I know this made one of my systems drop it's load significantly.

HTH,
Csaba.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Alternative for vacuuming queue-like tables

2006-04-28 Thread kmh496
2006-04-28 (금), 14:40 +0200, Csaba Nagy 쓰시길:
 I placed a cron job to
 cluster the queue table on it's PK index.
what does that mean?
-- 
my site a href=http://www.myowndictionary.com;myowndictionary/a was
made to help students of many languages learn them faster.






---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Alternative for vacuuming queue-like tables

2006-04-28 Thread Csaba Nagy
On Fri, 2006-04-28 at 15:20, kmh496 wrote:
 2006-04-28 (금), 14:40 +0200, Csaba Nagy 쓰시길:
  I placed a cron job to
  cluster the queue table on it's PK index.
 what does that mean?

Means execute:

CLUSTER pk_queue_table ON queue_table;

See http://www.postgresql.org/docs/8.1/static/sql-cluster.html for more
details on CLUSTER.

This will actually rebuild the table I guess after it can lock it
exclusively. Due to the fact that the queue transactions are very short
lived, the exclusive lock is fast acquired and the clustering operation
is also fast as the table has only a few entries.

I'm not sure how this operation can work in the presence of other long
running transactions which did not touch the queue table yet, but it
actually does work, I can confirm that. Is it violating MVCC maybe ?

Cheers,
Csaba.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Alternative for vacuuming queue-like tables

2006-04-28 Thread Vivek Khera


On Apr 28, 2006, at 9:32 AM, Csaba Nagy wrote:


I'm not sure how this operation can work in the presence of other long
running transactions which did not touch the queue table yet, but it
actually does work, I can confirm that. Is it violating MVCC maybe ?


It sounds like it does potentially violate mvcc, but I haven't  
thought hard about it.  It  also has the advantage that it rebuilds  
your indexes during the clustering.


And a convenient feature is that you just define those CLUSTER  
statements once, then whenever you need it, just run CLUSTER; all  
by itself and all tables previously defined as clustered will be re- 
clustered on the specified index.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Alternative for vacuuming queue-like tables

2006-04-28 Thread Tom Lane
Csaba Nagy [EMAIL PROTECTED] writes:
 I'm not sure how this operation can work in the presence of other long
 running transactions which did not touch the queue table yet, but it
 actually does work, I can confirm that. Is it violating MVCC maybe ?

Yes :-(.  I think you can get away with it if all your transactions that
use the queue table run in READ COMMITTED (not serializable) mode, and
if they work like

BEGIN;
LOCK queue_table IN some-suitable-mode;
process queue table;
COMMIT;

Grabbing the lock will ensure that CLUSTER finishes before any snapshot
is taken.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Alternative for vacuuming queue-like tables

2006-04-28 Thread Csaba Nagy
  actually does work, I can confirm that. Is it violating MVCC maybe ?
 
 Yes :-(.  I think you can get away with it if all your transactions that
[snip]

Well, I actually don't want to get away this time :-)

This table is only processed by the queue manager and that uses very
short transactions, basically only atomic insert/update/delete. No long
running transaction will ever touch that table.

So this fits perfectly my queue cleanup needs.

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Alternative for vacuuming queue-like tables

2006-04-28 Thread Chris Browne
[EMAIL PROTECTED] (Csaba Nagy) writes:
 On Fri, 2006-04-28 at 15:20, kmh496 wrote:
 2006-04-28 (금), 14:40 +0200, Csaba Nagy 쓰시길:
  I placed a cron job to
  cluster the queue table on it's PK index.
 what does that mean?

 Means execute:

 CLUSTER pk_queue_table ON queue_table;

 See http://www.postgresql.org/docs/8.1/static/sql-cluster.html for more
 details on CLUSTER.

 This will actually rebuild the table I guess after it can lock it
 exclusively. Due to the fact that the queue transactions are very short
 lived, the exclusive lock is fast acquired and the clustering operation
 is also fast as the table has only a few entries.

 I'm not sure how this operation can work in the presence of other long
 running transactions which did not touch the queue table yet, but it
 actually does work, I can confirm that. Is it violating MVCC maybe ?

There is, I believe, a problem there; there is a scenario where data
can get dropped out from under those old connections.

This has been added to the TODO...

http://www.postgresql.org/docs/faqs.TODO.html

* Make CLUSTER preserve recently-dead tuples per MVCC requirements
-- 
(format nil [EMAIL PROTECTED] cbbrowne ntlug.org)
http://cbbrowne.com/info/sgml.html
Should vegetarians eat animal crackers? 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Alternative for vacuuming queue-like tables

2006-04-28 Thread Csaba Nagy
 There is, I believe, a problem there; there is a scenario where data
 can get dropped out from under those old connections.
 
 This has been added to the TODO...
 
 http://www.postgresql.org/docs/faqs.TODO.html
 
 * Make CLUSTER preserve recently-dead tuples per MVCC requirements

OK, I can see this being a problem in the general case. 

However, for my queue table the current behavior is a life-saver. Would
it be possible to still provide a variant of rebuild which is documented
to not be MVCC compliant ? Something like: I don't care the old
transactions which did not touch yet this table to see the old data for
this table, I want to rebuild it. 

I actually don't care about clustering in this case, only about
rebuilding the table once I can get an exclusive lock on it. I guess
this would be possible with the switch to a copy method except the new
table is a different entity (as mentioned in another post) and it will
not preserve the dependencies of the original table.

I guess what I'm asking for is a kind of REBUILD TABLE which is not
MVCC by definition but it would be useful in the mentioned queue table
case.

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Alternative for vacuuming queue-like tables

2006-04-28 Thread Robert Treat
On Friday 28 April 2006 12:20, Csaba Nagy wrote:
  There is, I believe, a problem there; there is a scenario where data
  can get dropped out from under those old connections.
 
  This has been added to the TODO...
 
  http://www.postgresql.org/docs/faqs.TODO.html
 
  * Make CLUSTER preserve recently-dead tuples per MVCC requirements

 OK, I can see this being a problem in the general case.

 However, for my queue table the current behavior is a life-saver. Would
 it be possible to still provide a variant of rebuild which is documented
 to not be MVCC compliant ? Something like: I don't care the old
 transactions which did not touch yet this table to see the old data for
 this table, I want to rebuild it.

 I actually don't care about clustering in this case, only about
 rebuilding the table once I can get an exclusive lock on it. I guess
 this would be possible with the switch to a copy method except the new
 table is a different entity (as mentioned in another post) and it will
 not preserve the dependencies of the original table.

 I guess what I'm asking for is a kind of REBUILD TABLE which is not
 MVCC by definition but it would be useful in the mentioned queue table
 case.


vaccum full ?

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings