On Fri, Nov 26, 2010 at 10:53 AM, Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: >>>> Incidentally, I haven't been able to wrap my head around why we need >>>> to propagate AccessExclusiveLocks to the standby in the first place. >>>> Can someone explain? >>> >>> To make the standby stop applying WAL when a local transaction on the >>> standby >>> uses an object. >>> E.g. dropping a table on the master need the standby top stop applying >>> wal (or >>> kill the local client using the table). >>> How would you want to protect against something like that otherwise? >> >> Hmm. But it seems like that it would be enough to log any exclusive >> locks held at commit time, rather than logging them as they're >> acquired. By then, the XID will be assigned (if you need it - if you >> don't then you probably don't need to XLOG it anyway) and you avoid >> holding the lock for more than a moment on the standby. >> >> But it seems like an even better idea would be to actually XLOG the >> operations that are problematic specifically. Because, for example, >> if a user session on the master does LOCK TABLE ... IN ACCESS >> EXCLUSIVE MODE, AFAICS there's no reason for the standby to care. Or >> am I confused? > > Let's approach this from a different direction: > > If you have operation A in the master that currently acquires an > AccessExclusiveLock on a table, do you think it's safe for another > transaction to peek at the table at the same time?
Beep, time out. The notion of "at the same time" is extremely fuzzy here. The operations on the master and slave are not simultaneous, or anything close to it. Let's go back to the case of a dropped table. Suppose that, on the master, someone begins a transaction, drops a table, and heads out to lunch. Upon returning, they commit the transaction. At what point does it became unsafe for readers on the standby to be looking at the table? Surely, the whole time the guy is out to lunch, readers on the standby are free to do whatever they want. Only at the point when we actually remove the file does it become a problem for somebody to be in the middle of using it. In fact, you could apply the same logic to the master, if you were willing to defer the removal of the actual physical file until all transactions that were using it released their locks. The reason we don't do that - aside from complexity - is that it would result in an unpredictable and indefinite delay between issuing the DROP TABLE command and OS-level storage reclamation. But in the standby situation, there is *already* an unpredictable and indefinite delay. The standby can fall behind in applying WAL, lose connectivity, have replay paused, etc. You lose nothing by waiting until the last possible moment to kick everyone out. (In fact, you gain something: the standby is more usable.) The problem here is not propagating operations from the master, but making sure that actions performed by the startup process on the standby are properly locked. In the case of dropping a relation, the problem is that the startup process only knows which relfilenode it needs to blow away, not which relation that relfilenode is associated with. If the AccessShareLock were against the relfilenode rather than the relation itself, the startup process would have no problem at all generating a conflicting lock - it would simply lock each relfilenode before dropping it, without any additional XLOG information at all. > As a concrete example, VACUUM acquires an AccessExclusiveLock when it wants > to truncate the relation. A sequential scan running against the table in the > standby will get upset, if the startup process replays a truncation record > on the table without warning. This case is similar. xl_smgr_truncate has only a relfilenode number, not a relation OID, so there's no way for the startup process to generate a conflicting lock request itself. But if the standby backends locked the relfilenode, or if the xl_smgr_truncate WAL record included the relation OID, it would be simple. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers