Hi,

On 2018-06-19 17:43:42 +1200, Thomas Munro wrote:
> The problem is that StandbyReleaseLocks() does a linear search of all
> known AccessExclusiveLocks when a transaction ends.  Luckily, since
> v10 (commit 9b013dc2) that is skipped for transactions that haven't
> taken any AELs and aren't using 2PC, but that doesn't help all users.

> It's fine if the AEL list is short, but if you do something that takes
> a lot of AELs such as restoring a database with many tables or
> truncating a lot of partitions while other transactions are in flight
> then we start doing O(txrate * nlocks * nsubxacts) work and that can
> hurt.
> 
> The reproducer script I've attached creates one long-lived transaction
> that acquires 6,000 AELs and takes a nap, while 48 connections run
> trivial 2PC transactions (I was also able to reproduce the effect
> without 2PC by creating a throw-away temporary table in every
> transaction, but it was unreliable due to contention slowing
> everything down).  For me, the standby's startup process becomes 100%
> pegged, replay_lag begins to climb and perf says something like:
> 
> +   97.88%    96.96%  postgres  postgres            [.] StandbyReleaseLocks
> 
> The attached patch splits the AEL list into one list per xid and
> sticks them in a hash table.  That makes perf say something like:

> +    0.60%     0.00%  postgres  postgres            [.] StandbyReleaseLocks

Neato. Results aside, that seems like a better suited datastructure.

>  /*
>   * InitRecoveryTransactionEnvironment
> @@ -63,6 +73,19 @@ void
>  InitRecoveryTransactionEnvironment(void)
>  {
>       VirtualTransactionId vxid;
> +     HASHCTL                 hash_ctl;
> +
> +     /*
> +      * Initialize the hash table for tracking the list of locks held by each
> +      * transaction.
> +      */
> +     memset(&hash_ctl, 0, sizeof(hash_ctl));
> +     hash_ctl.keysize = sizeof(TransactionId);
> +     hash_ctl.entrysize = sizeof(RecoveryLockListsEntry);
> +     RecoveryLockLists = hash_create("RecoveryLockLists",
> +                                                                     4096,
> +                                                                     
> &hash_ctl,

Isn't that initial size needlessly big?


>  void
>  StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid)
>  {
> +     RecoveryLockListsEntry *entry;
>       xl_standby_lock *newlock;
>       LOCKTAG         locktag;
> +     bool            found;
>  
>       /* Already processed? */
>       if (!TransactionIdIsValid(xid) ||
> @@ -616,11 +641,19 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, 
> Oid dbOid, Oid relOid)
>       /* dbOid is InvalidOid when we are locking a shared relation. */
>       Assert(OidIsValid(relOid));
>  
> +     /* Create a new list for this xid, if we don't have one already. */
> +     entry = hash_search(RecoveryLockLists, &xid, HASH_ENTER, &found);
> +     if (!found)
> +     {
> +             entry->xid = xid;
> +             entry->locks = NIL;
> +     }
> +
>       newlock = palloc(sizeof(xl_standby_lock));
>       newlock->xid = xid;
>       newlock->dbOid = dbOid;
>       newlock->relOid = relOid;
> -     RecoveryLockList = lappend(RecoveryLockList, newlock);
> +     entry->locks = lappend(entry->locks, newlock);

Gotta be careful with lappend et al - it's really easy to leak memory
without explicit context usage. Have you checked that we don't here?

>       SET_LOCKTAG_RELATION(locktag, newlock->dbOid, newlock->relOid);
>  
> @@ -628,46 +661,45 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, 
> Oid dbOid, Oid relOid)
>  }
>  
>  static void
> -StandbyReleaseLocks(TransactionId xid)
> +StandbyReleaseLockList(List *locks)
>  {
> -     ListCell   *cell,
> -                        *prev,
> -                        *next;
> -
> -     /*
> -      * Release all matching locks and remove them from list
> -      */
> -     prev = NULL;
> -     for (cell = list_head(RecoveryLockList); cell; cell = next)
> +     while (locks)
>       {
> -             xl_standby_lock *lock = (xl_standby_lock *) lfirst(cell);
> +             xl_standby_lock *lock = (xl_standby_lock *) linitial(locks);
> +             LOCKTAG         locktag;
> +             elog(trace_recovery(DEBUG4),
> +                      "releasing recovery lock: xid %u db %u rel %u",
> +                      lock->xid, lock->dbOid, lock->relOid);
> +             SET_LOCKTAG_RELATION(locktag, lock->dbOid, lock->relOid);
> +             if (!LockRelease(&locktag, AccessExclusiveLock, true))
> +                     elog(LOG,
> +                              "RecoveryLockLists contains entry for lock no 
> longer recorded by lock manager: xid %u database %u relation %u",
> +                              lock->xid, lock->dbOid, lock->relOid);

This should be a PANIC imo.


Greetings,

Andres Freund

Reply via email to