On 2013-06-27 00:12:07 -0700, Hitoshi Harada wrote: > > Two, until we get MVCC catalog scans, it's not safe to update any > > system catalog tuple without an AccessExclusiveLock on some locktag > > that will prevent concurrent catalog scans for that tuple. Under > > SnapshotNow semantics, concurrent readers can fail to see that the > > object is present at all, leading to mysterious failures - especially > > if some of the object's catalog scans are seen and others are missed. > > > > > > So what I'm saying above is take AccessExclusiveLock on swapping relfile > in catalog. This doesn't violate your statement, I suppose. I'm actually > still skeptical about MVCC catalog, because even if you can make catalog > lookup MVCC, relfile on the filesystem is not MVCC. If session 1 changes > relfilenode in pg_class and commit transaction, delete the old relfile from > the filesystem, but another concurrent session 2 that just took a snapshot > before 1 made such change keeps running and tries to open this relation, > grabbing the old relfile and open it from filesystem -- ERROR: relfile not > found.
We can play cute tricks akin to what CREATE INDEX CONCURRENTLY currently does, i.e. wait for all other relations that could have possibly seen the old relfilenode (they must have at least a share lock on the relation) before dropping the actual storage. The reason we cannot currently do that in most scenarios is that we cannot perform transactional/mvcc updates of non-exclusively locked objects due to the SnapshotNow problems of seeing multiple or no versions of a row during a single scan. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers