I have a fairly large database(approx. 1.5TB) that is backed up by a warm
standby database using log shipping(PITR). This setup had been running for a
couple of months when I ran into a problem on the primary DB and had to
failover to the standby DB. This worked as expected.
Shortly thereafter(Sometime over the long weekend of course), Postgres shutdown
the database to avoid XID wraparound data loss. I presume there were warnings
in the log about running out of XIDs, but nobody noticed in time and given what
transpired after that I don't think it would have mattered if they had.
As per the documentation, I started the DB in single user mode and attempted to
do a full database vacuum. After this ran for about 12hours the pg_xlog
directory ran out of disk space. I'm not sure I understand why anything is
written to pg_xlog as part of the vacuum process, perhaps someone can enlighten
me.
I next started looking at the age(refrozenxid) of the tables in my DB, and was
surprised to see that over 4000 of the 5000 tables in this DB had an age over
2Billion. So thats 4000 tables representing over a terabyte of data that need
to be vacuumed! I am now vacuuming those tables one at a time, which is taking
a long time(This is a scripted process). So there is no way I could have
vacuumed the tables quickly enough even given a warning of impending XID
wraparound.
Looking through the support mailing lists(Bugs) I see some discussion about the
frozenxid updates on the master not being propogated to the slave through the
WAL logs, and comments from Tom, Alvaro and Heikki suggesting that they were
looking into a solution for PG 8.3 and needed a way around the problem in PG
8.2.
I am currently running PG 8.2.4 on FreeBSD.
So my questions are:
1) What is the recommended way to either solve or get around this problem in PG
8.2.4?
2) Is this "problem" fixed in some more current version of Postgres? I didn't
see any mention of it in release notes up to PG 8.3.3?
3) Does this mean that if you are trying to use a warm standby DB with PITR,
you need to make a new base backup of your primary DB every 1.5billion
transactions, or there abouts, to avoid the problem. If so, I think this should
be documented in the "Caveats" section of "Continuous Archiving and
Point-in-time-recovery(PITR)" section of the manual.
Regards...
Mark Sherwood
_________________________________________________________________