On Sun, Aug 14, 2011 at 2:21 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > So, as the testing rolls on, I started to see some failures in various > ALTER-FOREIGN-thingy commands. The cause proved to be that numerous > places in foreigncmds.c do this: > > tuple = SearchSysCacheCopy(...); > > ... alter the tuple as needed ... > > rel = heap_open(target-catalog, RowExclusiveLock); > > simple_heap_update(rel, &tuple->t_self, tuple); > > heap_close(rel, RowExclusiveLock); > > rather than the more common pattern in which the catalog is opened > first.
Interesting. I vaguely recall flipping some of those around (to put the lock acquisition first) before committing the 9.1-era foreign table patch; it didn't seem like an entirely healthy thing to do. But I didn't really have any concrete notion of why it might be dangerous. > foreigncmds.c is not hard to fix, but the scary aspect of this is the > possibility that we've made the same mistake elsewhere, or might do so > again in future. Some desultory examination of simple_heap_update and > simple_heap_delete calls didn't find any other instances, but I am not > sure I didn't miss anything. And this seems like an easy trap to fall > into when refactoring (the current work to try to unify operations like > ALTER OWNER could easily get into this kind of problem, for instance). > > I tried to think of some practical way to mechanically test for this > type of error, but came up with nothing. Any ideas? Hmm. How about setting the TID to an illegal value of some kind when a catcache tuple is extracted without a table lock? Then any subsequent update or delete using that tuple would blow up. I think that'd be way too expensive to do in normal running but perhaps we could have a #define... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers