[HACKERS] Suggestions for post-mortem...

Philip Warner Wed, 25 Jan 2006 04:34:22 -0800

We just had a DB die quite nastily, and have no clear idea why.

Looking in the system logs shows nothing out of the ordinary, and
looking in the db logs shows a few odd records:


2006-01-25 12:25:31 EST [mail,5017]: ERROR:  failed to fetch new tuple
for AFTER trigger
2006-01-25 12:26:01 EST [mail,93689]: WARNING:  index "XXXX_pkey"
contains 1416 row versions, but table contains 1410 row versions
2006-01-25 12:26:01 EST [mail,93689]: HINT:  Rebuild the index with REINDEX.
2006-01-25 12:26:01 EST [mail,93689]: WARNING:  index "YYYY" contains
1416 row versions, but table contains 1410 row versions

...repeated several times for several indexes of the same table.

These messages occurred almost immediately before we noticed the dead
state of the DB. Over an hour before these messages there was a
deadlock, but that's not too worrying -- the DB was still OK.

After the above messages, about 80 rows were missing from the table, and
a REINDEX did not restore them (not really surprising). The table in
question has only a small number of rows (1400-ish), but gets updated up
to 5 to 10 times per second.

Thankfully, we had replication in place and just failed over, but we'd
like to try to understand what happened to the old DB.

Any suggestions where to start? Or what the first error might signify?
Or what to put in place to catch more details next time?

It's been running fine for several months (until now) using PG 8.0.3.








---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

[HACKERS] Suggestions for post-mortem...

Reply via email to