* Inability to customize thresholds on a per table basis
I ran headlong into this one. IMHO fixing this is critical.
* Inability to set default thresholds on a per database basis * Inability to exclude specific databases / tables from pg_autovacuum monitoring
These would be nice to have, but less critical than #1 I think.
* Inability to schedule vacuums during off-peak times
This would be *really* nice to have. In my recent case, if pg_autovacuum could work for say 3 minutes, and then back off for 2 minutes or so while the batch transactions hit, it would be ideal.
I'm not sure how to address all of these concerns, or that they all should be addressed right now. One of my big questions is backend integration. I am leaning towards leaving pg_autovacuum as a client application in contrib for one more release. During this time, I can continue to tweak and improve pg_autovacuum so that we will have a very good idea what the final product should be before we make it a standard backend process.
I really think pg_autovacuum ought to get folded into the backend now, for 7.5. I haven't had time yet to read the entire thread, but I saw others making the same comment. It would make some of the listed problems go away, or at least become far easier to deal with.
For PostgreSQL 7.5, I plan to implement these new features:
1.Per database defaults and per table thresholds (including total exclusion)
Great!
2.Persistent data 3.Single-Pass Mode (external scheduling from cron etc...) 4.Off peak scheduling
Great again!
1. Per Database defaults and Per table Thresholds:
1.Store config data inside a special pg_autovacuum table inside
existing databases that wants custom settings.
A natural if folded into the backend.
3.Single-Pass Mode (External Scheduling):
I have received requests to be able to run pg_autovacuum only on request (not as a daemon) making only one pass over all the tables (not looping indefinately). The advantage being that it will operate more like the current vacuum command except that it will only vacuum tables that need to be vacuumed. This feature could be useful as long as pg_autovacuum exists outside the backend. If pg_autovacuum gets integrated into the backend and gets automatically started as a daemon during startup, then this option will no longer make sense.
It still might make sense. You could have a mode where the daemon essentially sleeps forever, until explicitly woken up by a signal. When woken, it makes one pass, and goes back to infinite sleep. Then provide a simple way to signal the autovacuum process -- maybe an extension of the current VACUUM syntax.
4.Off-Peak Scheduling:
A fundamental advantage of our vacuum system is that the work required to reclaim table space is taken out of the critical path and can be moved to and off-peak time when cycles are less precious. One of the drawbacks of the current pg_autovacuum is that it doesn't have any way to factor this in.
In it's simplest form (which I will implement first) I would add the ability to add a second set of thresholds that will be active only during an âoff-peakâ time that can be specified in the pg_autovacuum database, perhaps in a general_settings table.
I don't know how this would work, but it is for sure important. In the recent testing I found that pg_autovacuum (well, lazy vacuum in general, but I was using pg_autovacuum to control it) made a huge difference in performance of batch transactions. They range from 4-5 seconds without vacuum running, to as high as 15 minutes with vacuum running. With the vacuum delay patch, delay = 1, pagecount = 8, I still saw times go as high as 10 minutes. Backing vacuum off any more than that caused it to fall behind the transaction rate unrecoverably. But as I said above, if the transactions could complete without vacuum running in 4-5 seconds, then vacuuming resumes for the 3-to-4 minutes between batches, all would be well.
Joe
---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?
http://archives.postgresql.org