OK, in general you have to pay for MVCC one way or another.  Many
databases make you pay as you go, so to speak, by storing all the MVCC
info in a log to be applied at some future date.  Other databases you
can pay later, by storing all the MVCC in the table itself.  Both have
similar costs, but one can punish you harshly if you let the MVCC data
stored in the database get out of hand.

8.3 and above are much more aggresive about autovacuuming, and on
bigger hardware you can make it VERY aggressive and keep the bloat out
while keeping up good throughput.  On some servers I set up 4 or 6 or
8 autovacuum threads to keep up.    If you were on another db you
might be adding more drives to make some other part faster.

For batch processing storing all MVCC data in the data store can be
problematic, but for more normal work where you're changing <1% of a
table all the time it can be very fast.

Some other databases will just run out of space to store transactions
and roll back everything you've done.  PostgreSQL will gladly let you
shoot yourself in the foot with bloating the data store by running
successive whole table updates without vacuuming in between.

Bottom line, if your hardware can't keep up, it can't keep up.  If
vacuum capsizes your IO and still can't keep up then you need more
disks and / or better storage subsystems.  A 32 disk array with single
controller goes for ~$7 to $10k, and you can sustain some pretty
amazing thgouhput on that kind of IO subsystem.

If you're doing batch processing you can get a lot return by just
making sure you vacuum after each mass update.  Especially if you are
on a single use machine with no cost delays for vacuum, running a
vacuum on a freshly worked table should be pretty fast.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to