Matthew T. O'Connor wrote:
* Inability to customize thresholds on a per table basis

I ran headlong into this one. IMHO fixing this is critical.


* Inability to set default thresholds on a per database basis
* Inability to exclude specific databases / tables from pg_autovacuum
monitoring

These would be nice to have, but less critical than #1 I think.


* Inability to schedule vacuums during off-peak times

This would be *really* nice to have. In my recent case, if pg_autovacuum could work for say 3 minutes, and then back off for 2 minutes or so while the batch transactions hit, it would be ideal.


I'm not sure how to address all of these concerns, or that they all
should be addressed right now.  One of my big questions is backend
integration.  I am leaning towards leaving pg_autovacuum as a client
application in contrib for one more release.  During this time, I can
continue to tweak and improve pg_autovacuum so that we will have a very
good idea what the final product should be before we make it a standard
backend process.

I really think pg_autovacuum ought to get folded into the backend now, for 7.5. I haven't had time yet to read the entire thread, but I saw others making the same comment. It would make some of the listed problems go away, or at least become far easier to deal with.


For PostgreSQL 7.5, I plan to implement these new features:

 1.Per database defaults and per table thresholds (including total
exclusion)

Great!


 2.Persistent data
 3.Single-Pass Mode (external scheduling from cron etc...)
 4.Off peak scheduling

Great again!


1. Per Database defaults and Per table Thresholds:

1.Store config data inside a special pg_autovacuum table inside
existing databases that wants custom settings.

A natural if folded into the backend.


3.Single-Pass Mode (External Scheduling):

I have received requests to be able to run pg_autovacuum only on request
(not as a daemon) making only one pass over all the tables (not looping
indefinately).  The advantage being that it will operate more like the
current vacuum command except that it will only vacuum tables that need
to be vacuumed.  This feature could be useful as long as pg_autovacuum
exists outside the backend.  If pg_autovacuum gets integrated into the
backend and gets automatically started as a daemon during startup, then
this option will no longer make sense.

It still might make sense. You could have a mode where the daemon essentially sleeps forever, until explicitly woken up by a signal. When woken, it makes one pass, and goes back to infinite sleep. Then provide a simple way to signal the autovacuum process -- maybe an extension of the current VACUUM syntax.


4.Off-Peak Scheduling:

A fundamental advantage of our vacuum system is that the work required
to reclaim table space is taken out of the critical path and can be
moved to and off-peak time when cycles are less precious.  One of the
drawbacks of the current pg_autovacuum is that it doesn't have any way
to factor this in.

In it's simplest form (which I will implement first) I would add the
ability to add a second set of thresholds that will be active only
during an âoff-peakâ time that can be specified in the pg_autovacuum
database, perhaps in a general_settings table.

I don't know how this would work, but it is for sure important. In the recent testing I found that pg_autovacuum (well, lazy vacuum in general, but I was using pg_autovacuum to control it) made a huge difference in performance of batch transactions. They range from 4-5 seconds without vacuum running, to as high as 15 minutes with vacuum running. With the vacuum delay patch, delay = 1, pagecount = 8, I still saw times go as high as 10 minutes. Backing vacuum off any more than that caused it to fall behind the transaction rate unrecoverably. But as I said above, if the transactions could complete without vacuum running in 4-5 seconds, then vacuuming resumes for the 3-to-4 minutes between batches, all would be well.


Joe


---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Reply via email to