The topic of improving vacuum for use in heavy-update environments seems
to come up frequently on the list. Has anyone weighed the costs of
allowing VACUUM to reclaim tuples that are not older than the oldest
transaction but are nonetheless invisible to all running transactions?
It seems that it's not that hard....
Currently, a tuple is not elligible to be reclaimed by vacuum unless it
was deleted by a transaction that committed before the oldest currently
running transaction committed. (i.e., it's xmax is known to have
committed before the oldest-currently-running xid was started.) Right?
However, it seems like under certain scenarios (heavy updates to small
tables while a long-running transaction is occurring) there might be a
lot of tuples that are invisible to all transactions but not able to be
vacuumed under the current method. Example: updating a single row over
and over again while pg_dump is running.
Suppose that in the system, we have a serializable transaction with xid
1000 and a read committed transaction with xid 1001. Other than these
two, the oldest running xid is 2000.
Suppose we consider a tuple with xmin 1200 and xmax 1201. We will
assume that xid 1201 committed before xid 2000 began to run.
So:
(A) This tuple is invisible to the serializable transaction, since its
snapshot can't ever advance.
(B) The read committed transaction might be able to see it. However, if
its current command started AFTER xid 1201 committed, it can't.
Unless I'm missing something, it seems that when vacuuming you can leave
serializable transactions (like pg_dump) out of the calculation of the
"oldest running transaction" so long as you keep a list of them and
check each tuple T against each serializable transaction X to make sure
that T's xmin is greater than X, or else T's xmax committed before X
started to run. Of course this is "a lot" of work, but this should
mitigate the effect of long running serializable transactions until such
time as processor power becomes your limiting factor.
The read committed ones are a more difficult matter, but I think you can
treat a tuple as dead if it was inserted after the read committed
transaction started to run AND the tuple was deleted before the
transaction's currently running command started to run. I suppose the
major difficulty here is that currently a transaction has no way of
knowing when another backend's command started to run?
Is this too difficult to do or is it a good idea that no one has enough
'round tuits for?
Regards,
Paul Tillotson
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend