The topic of improving vacuum for use in heavy-update environments seems to come up frequently on the list. Has anyone weighed the costs of allowing VACUUM to reclaim tuples that are not older than the oldest transaction but are nonetheless invisible to all running transactions? It seems that it's not that hard....

Currently, a tuple is not elligible to be reclaimed by vacuum unless it was deleted by a transaction that committed before the oldest currently running transaction committed. (i.e., it's xmax is known to have committed before the oldest-currently-running xid was started.) Right?

However, it seems like under certain scenarios (heavy updates to small tables while a long-running transaction is occurring) there might be a lot of tuples that are invisible to all transactions but not able to be vacuumed under the current method. Example: updating a single row over and over again while pg_dump is running.

Suppose that in the system, we have a serializable transaction with xid 1000 and a read committed transaction with xid 1001. Other than these two, the oldest running xid is 2000.

Suppose we consider a tuple with xmin 1200 and xmax 1201. We will assume that xid 1201 committed before xid 2000 began to run.

So:

(A) This tuple is invisible to the serializable transaction, since its snapshot can't ever advance.

(B) The read committed transaction might be able to see it. However, if its current command started AFTER xid 1201 committed, it can't. Unless I'm missing something, it seems that when vacuuming you can leave serializable transactions (like pg_dump) out of the calculation of the "oldest running transaction" so long as you keep a list of them and check each tuple T against each serializable transaction X to make sure that T's xmin is greater than X, or else T's xmax committed before X started to run. Of course this is "a lot" of work, but this should mitigate the effect of long running serializable transactions until such time as processor power becomes your limiting factor.

The read committed ones are a more difficult matter, but I think you can treat a tuple as dead if it was inserted after the read committed transaction started to run AND the tuple was deleted before the transaction's currently running command started to run. I suppose the major difficulty here is that currently a transaction has no way of knowing when another backend's command started to run?

Is this too difficult to do or is it a good idea that no one has enough 'round tuits for?

Regards,

Paul Tillotson

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to