This is great news! I will do what I can to continue improving the code and address these concerns as best I can. Many of the items below will need to be addressed by Alvaro, but I will comment where I think I have something useful to say :-)

Tom Lane wrote:

I've applied Alvaro's latest integrated-autovacuum patch.  There are
still a number of loose ends to be dealt with before beta, though:

* Not all the functionality of the current contrib code seems to have
made it in.  In particular I noted the "sleep scaling factor" is
missing, as well as the options to use nondefault vacuum_cost_delay
settings.  (I'm not sure how important the sleep scale factor is,
but the vacuum cost options seem pretty critical for practical use.)
There may be other stuff to move over; Matthew or someone more familiar
than I with the contrib version needs to take a look.  (I have refrained
from removing the contrib module until we're sure we have extracted
everything from it.)

I will take a look for missing features, thanks for not removing it from contrib yet. As for the sleep factor I'm not sure it makes sense. It was initially put in as a way to prevent autovacuum from running more than X% of the time. However, I think the better answer these days is to use the vacuum delay settings.

Speaking of which, I think I mentioned this to Alvaro, but I guess it just didn't make it in. The pg_autovacuum table should have a few additional columns that allow setting vacuum delay settings on a per table basis. I also think that there should be GUC settings for the default autovacuum delay settings which an admin might want to be separate from the system wide default vacuum delay settings.

* The code does not make a provision to ignore temporary tables.
Although vacuum.c and analyze.c will disregard the request to touch
such tables, it'd probably be better to recognize the situation further
upstream.  In particular it seems that autovacuum will continually throw
ANALYZE requests for a temp table due to lack of stats.

Does the stats system track data about temp tables? If it doesn't then autovacuum won't try to vacuum them. Will take a look.

* ANALYZE also refuses to do anything with pg_statistic itself, which
is another case that may need special treatment to avoid useless cycles.

Should be easy enough to tell autovacuum to ignore this table specifically.

* For that matter I'm unconvinced that it's a good idea to try to force
the pgstat DB to pick up every table in every database.  If there's no
entry it's because the table is not getting modified, and therefore it
seems to me that we can just leave well enough alone.  The code really
is not very good about doing nothing where nothing is called for ;-)

I think in a production environment, this won't be an issue, but in a development situation where the postmaster is getting stopped and started fairly often, it could be an issue. Actually, if the stats system doesn't reset it's data on postmaster restart, this shouldn't be a problem. Any thoughts on changing this default?

* The code ignores datallowconn and therefore will periodically vacuum
template0.  I've got mixed emotions about this --- it could save
someone's bacon if they failed to properly VACUUM FREEZE a template
database, but in 99.99% of installations it's just wasted cycles.
Maybe it'd make sense to perform XID-wraparound-prevention vacuuming,
but not anything more, in a template DB.  Thoughts?

Sounds like a good idea. Bacon conservation is clearly one of the goals of autovacuum.

* Or actually, it would vacuum template0, except that since no regular
backend ever connects to template0, there will be no stats DB entry for
it and so the loop in AutoVacMain will ignore it.  This is definitely
BAD as it means that a database that's not been touched since postmaster
start will never be vacuumed, not even for XID wraparound prevention.
That test needs to be weakened.

* I'm still pretty concerned about the handling of shared catalogs.
AFAICS the current pgstats infrastructure simply gets this wrong,
meaning that shared catalogs will not get adequate vacuuming.  We need
to fix that.

This was handled in the contrib version by only vacuuming shared catalogs inside template1, however it would then analyze those tables in each and every database. Is there a reason this solution is not adequate? Or perhaps this concept doesn't translate to the integrated version?

* As Alvaro noted, the default parameter settings need a lookover.
What is in the patch is not what was the default in the contrib module,
but the contrib defaults seem awfully passive.

Alvaro and I talked about this. I suggested these as the new defaults as there seemed to be a consensus that the defaults in the contrib version were not very useful for most people. Hopefully these defaults still a bit conservative, but useful.

* The documentation badly needs work.  I committed some minimal
additions to runtime.sgml and catalogs.sgml, but the chapter about
routine maintenance needs a section added about how to use autovac.


I promised Alvaro that I would do all the documentation. I will work on it in the next few days now that the patch has been applied.
Thanks!

Matthew O'Connor


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Reply via email to