On Fri, Nov 26, 2010 at 10:53 AM, Heikki Linnakangas
<heikki.linnakan...@enterprisedb.com> wrote:
>>>> Incidentally, I haven't been able to wrap my head around why we need
>>>> to propagate AccessExclusiveLocks to the standby in the first place.
>>>> Can someone explain?
>>>
>>> To make the standby stop applying WAL when a local transaction on the
>>> standby
>>> uses an object.
>>> E.g. dropping a table on the master need the standby top stop applying
>>> wal (or
>>> kill the local client using the table).
>>> How would you want to protect against something like that otherwise?
>>
>> Hmm.  But it seems like that it would be enough to log any exclusive
>> locks held at commit time, rather than logging them as they're
>> acquired.  By then, the XID will be assigned (if you need it - if you
>> don't then you probably don't need to XLOG it anyway) and you avoid
>> holding the lock for more than a moment on the standby.
>>
>> But it seems like an even better idea would be to actually XLOG the
>> operations that are problematic specifically.  Because, for example,
>> if a user session on the master does LOCK TABLE ... IN ACCESS
>> EXCLUSIVE MODE, AFAICS there's no reason for the standby to care.  Or
>> am I confused?
>
> Let's approach this from a different direction:
>
> If you have operation A in the master that currently acquires an
> AccessExclusiveLock on a table, do you think it's safe for another
> transaction to peek at the table at the same time?

Beep, time out.  The notion of "at the same time" is extremely fuzzy
here.  The operations on the master and slave are not simultaneous, or
anything close to it.  Let's go back to the case of a dropped table.
Suppose that, on the master, someone begins a transaction, drops a
table, and heads out to lunch.  Upon returning, they commit the
transaction.  At what point does it became unsafe for readers on the
standby to be looking at the table?  Surely, the whole time the guy is
out to lunch, readers on the standby are free to do whatever they
want.  Only at the point when we actually remove the file does it
become a problem for somebody to be in the middle of using it.

In fact, you could apply the same logic to the master, if you were
willing to defer the removal of the actual physical file until all
transactions that were using it released their locks.  The reason we
don't do that - aside from complexity - is that it would result in an
unpredictable and indefinite delay between issuing the DROP TABLE
command and OS-level storage reclamation.  But in the standby
situation, there is *already* an unpredictable and indefinite delay.
The standby can fall behind in applying WAL, lose connectivity, have
replay paused, etc.  You lose nothing by waiting until the last
possible moment to kick everyone out.  (In fact, you gain something:
the standby is more usable.)

The problem here is not propagating operations from the master, but
making sure that actions performed by the startup process on the
standby are properly locked.  In the case of dropping a relation, the
problem is that the startup process only knows which relfilenode it
needs to blow away, not which relation that relfilenode is associated
with.  If the AccessShareLock were against the relfilenode rather than
the relation itself, the startup process would have no problem at all
generating a conflicting lock - it would simply lock each relfilenode
before dropping it, without any additional XLOG information at all.

> As a concrete example, VACUUM acquires an AccessExclusiveLock when it wants
> to truncate the relation. A sequential scan running against the table in the
> standby will get upset, if the startup process replays a truncation record
> on the table without warning.

This case is similar.  xl_smgr_truncate has only a relfilenode number,
not a relation OID, so there's no way for the startup process to
generate a conflicting lock request itself.  But if the standby
backends locked the relfilenode, or if the xl_smgr_truncate WAL record
included the relation OID, it would be simple.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to