Hi,
Gregory Stark wrote:
That's an interesting thought. I think your caveats are right but with some
more work it might be possible to work it out. For example if a background
process processed the WAL and accumulated an array of possibly-dead tuples to
process in batch. It would wait whenever it sees an xid which isn't yet past
globalxmin, and keep accumulating until it has enough to make it worthwhile
doing a pass.
I don't understand why one would want to go via the WAL, that only
creates needless I/O. Better accumulate the data right away, during the
inserts, updates and deletes. Spilling the accumulated data to disk, if
absolutely required, would presumably still result in less I/O.
I think a bigger issue with this approach is that it ties all your tables
together. You can't process one table frequently while some other table has
some long-lived deleted tuples.
Don't use the WAL as the source of that information and that's issue's gone.
I'm also not sure it really buys us anything over having a second
dead-space-map data structure. The WAL is much larger and serves other
purposes which would limit what we can do with it.
Exactly.
You seem to be assuming that only few tuples have changed between vacuums, so
that WAL could quickly guide the VACUUM processes to the areas where cleaning
is necessary.
Let's drop that assumption, because by default, autovacuum_scale_factor is 20%,
so a VACUUM process normally kicks in after 20% of tuples changed (disk space
is cheap, I/O isn't). Additionally, there's a default nap time of one minute -
and VACUUM is forced to take at least that much of a nap.
I think this is exactly backwards. The goal should be to improve vacuum, then
adjust the autovacuum_scale_factor as low as we can. As vacuum gets cheaper
the scale factor can go lower and lower.
But you can't lower it endlessly, it's still a compromise, because it
also means reducing the amount of tuples being cleaned per scan, which
is against the goal of minimizing overall I/O cost of vacuuming.
We shouldn't allow the existing
autovacuum behaviour to control the way vacuum works.
That's a point.
As a side point, "disk is cheap, I/O isn't" is a weird statement. The more
disk you use the more I/O you'll have to do to work with the data.
That's only true, as long as you need *all* your data to work with it.
I still
maintain the default autovacuum_scale_factor is *far* to liberal. If I had my
druthers it would be 5%. But that's mostly informed by TPCC experience, in
real life the actual value will vary depending on the width of your records
and the relative length of your transactions versus transaction rate. The TPCC
experience is with ~ 400 byte records and many short transactions.
Hm.. 5% vs 20% would mean 4x as many vacuum scans, but only a 15% growth
in size (105% vs 120%), right? Granted, those 15% are also taken from
memory and caches, resulting in additional I/O... Still these numbers
are surprising me. Or am I missing something?
Regards
Markus
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org