Hi Tom,

On Mar 25, 2009, at 9:02 PM, Tom Lane wrote:

Tom Duffey <tduf...@techbydesign.com> writes:
One of our databases suffered a problem yesterday during a normal
update, something we have been doing for years.  Near the end of the
process a foreign key constraint is rebuilt on a table containing
several hundred million rows.  Rebuilding the constraint failed with
the following message:

ERROR:  could not access status of transaction 4294918145
DETAIL: Could not open file "pg_clog/0FFF": No such file or directory.

This looks like a garden-variety data corruption problem to me.
Trashed rows tend to yield this type of error because the "xmin"
transaction ID is the first field that the server can check with
any amount of finesse.  4294918145 is FFFF4001 in hex, saith my
calculator, so it looks like a bunch of bits went to ones --- or
perhaps more likely, the row offset in the page header got clobbered
and we're looking at some bytes that never were a transaction ID
at all.

So I'd try looking around for flaky RAM, failing disks, loose cables,
that sort of thing ...

Are you aware of any issues like this related to VMWare ESX? Our PostgreSQL server is running in such an environment and I asked the guys to review your email and they thought maybe this type of corruption could happen when the virtual machine was moved from one physical server to another, which we have done once or twice in the past few months.

Tom

--
Tom Duffey <tduf...@techbydesign.com>
Technology by Design :: http://techbydesign.com/
p: 414.431.0800

Reply via email to