[HACKERS] vacuum, performance, and MVCC

Mark Woodward Wed, 21 Jun 2006 10:54:05 -0700

We all know that PostgreSQL suffers performance problems when rows are
updated frequently prior to a vacuum. The most serious example can be seen
by using PostgreSQL as a session handler for a busy we site. You may have
thousands or millions of active sessions, each being updated per page hit.


Each time the record is updated, a new version is created, thus
lengthening the "correct" version search each time row is accessed, until,
of course, the next vacuum comes along and corrects the index to point to
the latest version of the record.

Is that a fair explanation?

If my assertion is fundimentally true, then PostgreSQL will always suffer
performance penalties under a heavy modification load. Of course, tables
with many inserts are not an issue, it is mainly updates. The problem is
that there are classes of problems where updates are the primary
operation.

I was thinking, just as a hypothetical, what if we reversed the problem,
and always referenced the newest version of a row and scanned backwards
across the versions to the first that has a lower transacton number?

One possible implementation: PostgreSQL could keep an indirection array of
index to table ref for use by all the indexes on a table. The various
indexes return offsets into the array, not direct table refs. Because the
table refs are separate from the index, they can be updated each time a
transaction is commited.

This way, the newest version of a row is always the first row found. Also,
on a heavily updated site, the most used rows would always be at the end
of the table, reducing amount of disk reads or cache memory required to
find the correct row version for each query.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

[HACKERS] vacuum, performance, and MVCC

Reply via email to