I've expanded my searching a bit, to see if I can find any other correlations. One thing that seems to happen about 10 times a day is an error of this sort:
ERROR: could not open relation with OID 1554847326 In this case, the OID in question always exists, and corresponds to one of a handful of particularly busy tables. Sometimes the query does not even touch the OID mentioned directly: in the above example, the SQL was an update to table A that had a FK to table B, and the OID above is for table B. The queries themselves vary: I've not found any common factor yet. These errors have been happening a long time, and obviously don't cause the same database-hosed-must-restart issue the btree does, but it is still a little disconcerting. Although 10 times out of > 20 million transactions per day is at least an extremely rare event :) It is definitely NOT correlated to system table reindexing, but does seem to be roughly correlated to how busy things are in general. We've not been able to duplicate on a non-prod test system yet either, which points to either hardware or (more likely) a failure to completely simulate the high activity level of prod. No idea if this related to the relatively recent btree errors, but figured I would get it out there. There is also an even rarer sprinkling of: ERROR: relation with OID 3924107573 does not exist but I figured that was probably a variant of the first error. -- Greg Sabino Mullane g...@endpoint.com End Point Corporation PGP Key: 0x14964AC8
pgpmh4I30bBvW.pgp
Description: PGP signature