Hello hackers, Andres Freund diagnosed a case of $SUBJECT in a customer's 9.6 system. I've written a minimal reproducer and a prototype patch to address the root cause.
The problem is that StandbyReleaseLocks() does a linear search of all known AccessExclusiveLocks when a transaction ends. Luckily, since v10 (commit 9b013dc2) that is skipped for transactions that haven't taken any AELs and aren't using 2PC, but that doesn't help all users. It's fine if the AEL list is short, but if you do something that takes a lot of AELs such as restoring a database with many tables or truncating a lot of partitions while other transactions are in flight then we start doing O(txrate * nlocks * nsubxacts) work and that can hurt. The reproducer script I've attached creates one long-lived transaction that acquires 6,000 AELs and takes a nap, while 48 connections run trivial 2PC transactions (I was also able to reproduce the effect without 2PC by creating a throw-away temporary table in every transaction, but it was unreliable due to contention slowing everything down). For me, the standby's startup process becomes 100% pegged, replay_lag begins to climb and perf says something like: + 97.88% 96.96% postgres postgres [.] StandbyReleaseLocks The attached patch splits the AEL list into one list per xid and sticks them in a hash table. That makes perf say something like: + 0.60% 0.00% postgres postgres [.] StandbyReleaseLocks This seems like something we'd want to back-patch because the problem affects all branches (the older releases more severely because they lack the above-mentioned optimisation). Thoughts? -- Thomas Munro http://www.enterprisedb.com
import psycopg2 import threading import time DSN = "dbname=postgres host=localhost" def init(): conn = psycopg2.connect(DSN) cursor = conn.cursor() cursor.execute("""DROP TABLE IF EXISTS foo""") cursor.execute("""CREATE TABLE foo (i serial primary key)""") conn.commit() def big_slow(): conn = psycopg2.connect(DSN) cursor = conn.cursor() # hold onto many AELs for i in range(6000): cursor.execute("create temp table foo%s (i int) on commit drop" % (i,)) time.sleep(600) conn.rollback() def many_small_xacts(thread_id): conn = psycopg2.connect(DSN) cursor = conn.cursor() for loops in range(100000): #cursor.execute("create temp table foo1 (i int) on commit drop"); cursor.execute("begin") cursor.execute("insert into foo values (default)") cursor.execute("prepare transaction %s", [thread_id]) cursor.execute("commit prepared %s", [thread_id]) init() threading.Thread(target=big_slow).start() for i in range(48): threading.Thread(target=many_small_xacts, args=['t' + str(i)]).start()
0001-Move-RecoveryLockList-into-a-hash-table.patch
Description: Binary data