I've expanded my searching a bit, to see if I can find any other 
correlations. One thing that seems to happen about 10 times a day 
is an error of this sort:

ERROR:  could not open relation with OID 1554847326

In this case, the OID in question always exists, and corresponds to 
one of a handful of particularly busy tables. Sometimes the query 
does not even touch the OID mentioned directly: in the above example, 
the SQL was an update to table A that had a FK to table B, and the 
OID above is for table B. The queries themselves vary: I've not found any 
common factor yet.

These errors have been happening a long time, and obviously don't cause the 
same database-hosed-must-restart issue the btree does, but it is still 
a little disconcerting. Although 10 times out of > 20 million transactions 
per day is at least an extremely rare event :) It is definitely NOT correlated 
to 
system table reindexing, but does seem to be roughly correlated to how busy 
things are in general. We've not been able to duplicate on a non-prod test 
system yet either, which points to either hardware or (more likely) a failure 
to completely simulate the high activity level of prod.

No idea if this related to the relatively recent btree errors, but figured 
I would get it out there. There is also an even rarer sprinkling of:

ERROR:  relation with OID 3924107573 does not exist

but I figured that was probably a variant of the first error.

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8

Attachment: pgpmh4I30bBvW.pgp
Description: PGP signature

Reply via email to