On Tue, Jan 24, 2023 at 7:00 PM Bharath Rupireddy
<[email protected]> wrote:
>
> I'm attaching the v3 patch with the above review comments addressed.
> Hopefully, no memory ordering issues now. FWIW, I've added it to CF
> https://commitfest.postgresql.org/42/4141/.
>
> Test results with the v3 patch and insert workload are the same as
> that of the earlier run - TPS starts to scale at higher clients as
> expected after 512 clients and peaks at 2X with 2048 and 4096 clients.
>
> HEAD:
> 1 1380.411086
> 2 1358.378988
> 4 2701.974332
> 8 5925.380744
> 16 10956.501237
> 32 20877.513953
> 64 40838.046774
> 128 70251.744161
> 256 108114.321299
> 512 120478.988268
> 768 99140.425209
> 1024 93645.984364
> 2048 70111.159909
> 4096 55541.804826
>
> v3 PATCHED:
> 1 1493.800209
> 2 1569.414953
> 4 3154.186605
> 8 5965.578904
> 16 11912.587645
> 32 22720.964908
> 64 42001.094528
> 128 78361.158983
> 256 110457.926232
> 512 148941.378393
> 768 167256.590308
> 1024 155510.675372
> 2048 147499.376882
> 4096 119375.457779
I slightly modified the comments and attached the v4 patch for further
review. I also took perf report - there's a clear reduction in the
functions that are affected by the patch - LWLockWaitListLock,
WaitXLogInsertionsToFinish, LWLockWaitForVar and
LWLockConflictsWithVar. Note that I compiled the source code with
-ggdb for capturing symbols for perf, still the benefit stands at > 2X
for a higher number of clients.
HEAD:
+ 16.87% 0.01% postgres [.] CommitTransactionCommand
+ 16.86% 0.00% postgres [.] finish_xact_command
+ 16.81% 0.01% postgres [.] CommitTransaction
+ 15.09% 0.20% postgres [.] LWLockWaitListLock
+ 14.53% 0.01% postgres [.] WaitXLogInsertionsToFinish
+ 14.51% 0.02% postgres [.] LWLockWaitForVar
+ 11.70% 11.63% postgres [.] pg_atomic_read_u32_impl
+ 11.66% 0.08% postgres [.] pg_atomic_read_u32
+ 9.96% 0.03% postgres [.] LWLockConflictsWithVar
+ 4.78% 0.00% postgres [.] LWLockQueueSelf
+ 1.91% 0.01% postgres [.] pg_atomic_fetch_or_u32
+ 1.91% 1.89% postgres [.] pg_atomic_fetch_or_u32_impl
+ 1.73% 0.00% postgres [.] XLogInsert
+ 1.69% 0.01% postgres [.] XLogInsertRecord
+ 1.41% 0.01% postgres [.] LWLockRelease
+ 1.37% 0.47% postgres [.] perform_spin_delay
+ 1.11% 1.11% postgres [.] spin_delay
+ 1.10% 0.03% postgres [.] exec_bind_message
+ 0.91% 0.00% postgres [.] WALInsertLockRelease
+ 0.91% 0.00% postgres [.] LWLockReleaseClearVar
+ 0.72% 0.02% postgres [.] LWLockAcquire
+ 0.60% 0.00% postgres [.] LWLockDequeueSelf
+ 0.58% 0.00% postgres [.] GetTransactionSnapshot
0.58% 0.49% postgres [.] GetSnapshotData
+ 0.58% 0.00% postgres [.] WALInsertLockAcquire
+ 0.55% 0.00% postgres [.] XactLogCommitRecord
TPS (compiled with -ggdb for capturing symbols for perf)
1 1392.512967
2 1435.899119
4 3104.091923
8 6159.305522
16 11477.641780
32 22701.000718
64 41662.425880
128 23743.426209
256 89837.651619
512 65164.221500
768 66015.733370
1024 56421.223080
2048 52909.018072
4096 40071.146985
PATCHED:
+ 2.19% 0.05% postgres [.] LWLockWaitListLock
+ 2.10% 0.01% postgres [.] LWLockQueueSelf
+ 1.73% 1.71% postgres [.] pg_atomic_read_u32_impl
+ 1.73% 0.02% postgres [.] pg_atomic_read_u32
+ 1.72% 0.02% postgres [.] LWLockRelease
+ 1.65% 0.04% postgres [.] exec_bind_message
+ 1.43% 0.00% postgres [.] XLogInsert
+ 1.42% 0.01% postgres [.] WaitXLogInsertionsToFinish
+ 1.40% 0.03% postgres [.] LWLockWaitForVar
+ 1.38% 0.02% postgres [.] XLogInsertRecord
+ 0.93% 0.03% postgres [.] LWLockAcquireOrWait
+ 0.91% 0.00% postgres [.] GetTransactionSnapshot
+ 0.91% 0.79% postgres [.] GetSnapshotData
+ 0.91% 0.00% postgres [.] WALInsertLockRelease
+ 0.91% 0.00% postgres [.] LWLockReleaseClearVar
+ 0.53% 0.02% postgres [.] ExecInitModifyTable
TPS (compiled with -ggdb for capturing symbols for perf)
1 1295.296611
2 1459.079162
4 2865.688987
8 5533.724983
16 10771.697842
32 20557.499312
64 39436.423783
128 42555.639048
256 73139.060227
512 124649.665196
768 131162.826976
1024 132185.160007
2048 117377.586644
4096 88240.336940
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 74c5bd8cc4f1497aa7f2fa02c6487039dc91e847 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Thu, 2 Feb 2023 03:42:27 +0000
Subject: [PATCH v4] Optimize WAL insertion lock acquisition and release
This commit optimizes WAL insertion lock acquisition and release
in the following way:
1. WAL insertion lock's variable insertingAt is currently read and
written with the help of lwlock's wait list lock to avoid
torn-free reads/writes. This wait list lock can become a point of
contention on a highly concurrent write workloads. Therefore, make
insertingAt a 64-bit atomic which inherently provides torn-free
reads/writes.
2. LWLockUpdateVar currently acquires lwlock's wait list lock even
when there are no waiters at all. Add a fastpath exit to
LWLockUpdateVar when there are no waiters to avoid unnecessary
locking.
Note that atomic exchange operation (which is a full barrier) is
used when necessary, instead of atomic write to ensure the memory
ordering is preserved.
It also adds a note in WaitXLogInsertionsToFinish regarding how the
use of spinlock there can avoid explicit memory barrier in some
subsequently called functions.
Suggested-by: Andres Freund
Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart
Discussion: https://www.postgresql.org/message-id/20221124184619.xit4sfi52bcz2tva%40awork3.anarazel.de
---
src/backend/access/transam/xlog.c | 14 +++++--
src/backend/storage/lmgr/lwlock.c | 66 +++++++++++++++++++------------
src/include/storage/lwlock.h | 6 +--
3 files changed, 55 insertions(+), 31 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fb4c860bde..95aed0e97f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -350,7 +350,8 @@ typedef struct XLogwrtResult
* wait for all currently in-progress insertions to finish, but the
* insertingAt indicator allows you to ignore insertions to later in the WAL,
* so that you only wait for the insertions that are modifying the buffers
- * you're about to write out.
+ * you're about to write out. Using an atomic variable for insertingAt avoids
+ * taking any explicit lock for reads and writes.
*
* This isn't just an optimization. If all the WAL buffers are dirty, an
* inserter that's holding a WAL insert lock might need to evict an old WAL
@@ -376,7 +377,7 @@ typedef struct XLogwrtResult
typedef struct
{
LWLock lock;
- XLogRecPtr insertingAt;
+ pg_atomic_uint64 insertingAt;
XLogRecPtr lastImportantAt;
} WALInsertLock;
@@ -1496,6 +1497,13 @@ WaitXLogInsertionsToFinish(XLogRecPtr upto)
* calling LWLockUpdateVar. But if it has to sleep, it will
* advertise the insertion point with LWLockUpdateVar before
* sleeping.
+ *
+ * XXX: Use of a spinlock at the beginning of this function to read
+ * current insert position implies memory ordering. That means that
+ * the immediate loads and stores to shared memory (for instance,
+ * in LWLockUpdateVar called via LWLockWaitForVar) don't need an
+ * explicit memory barrier as far as the current usage is
+ * concerned. But that might not be safe in general.
*/
if (LWLockWaitForVar(&WALInsertLocks[i].l.lock,
&WALInsertLocks[i].l.insertingAt,
@@ -4596,7 +4604,7 @@ XLOGShmemInit(void)
for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
{
LWLockInitialize(&WALInsertLocks[i].l.lock, LWTRANCHE_WAL_INSERT);
- WALInsertLocks[i].l.insertingAt = InvalidXLogRecPtr;
+ pg_atomic_init_u64(&WALInsertLocks[i].l.insertingAt, InvalidXLogRecPtr);
WALInsertLocks[i].l.lastImportantAt = InvalidXLogRecPtr;
}
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index d2ec396045..27c3b63c68 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1547,9 +1547,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
* *result is set to true if the lock was free, and false otherwise.
*/
static bool
-LWLockConflictsWithVar(LWLock *lock,
- uint64 *valptr, uint64 oldval, uint64 *newval,
- bool *result)
+LWLockConflictsWithVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 oldval,
+ uint64 *newval, bool *result)
{
bool mustwait;
uint64 value;
@@ -1572,13 +1571,11 @@ LWLockConflictsWithVar(LWLock *lock,
*result = false;
/*
- * Read value using the lwlock's wait list lock, as we can't generally
- * rely on atomic 64 bit reads/stores. TODO: On platforms with a way to
- * do atomic 64 bit reads/writes the spinlock should be optimized away.
+ * Read value atomically without any explicit lock. We rely on 64-bit
+ * atomic reads/writes that transparently does the required work to make
+ * even non-atomic reads/writes tear free.
*/
- LWLockWaitListLock(lock);
- value = *valptr;
- LWLockWaitListUnlock(lock);
+ value = pg_atomic_read_u64(valptr);
if (value != oldval)
{
@@ -1607,7 +1604,8 @@ LWLockConflictsWithVar(LWLock *lock,
* in shared mode, returns 'true'.
*/
bool
-LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
+LWLockWaitForVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 oldval,
+ uint64 *newval)
{
PGPROC *proc = MyProc;
int extraWaits = 0;
@@ -1735,29 +1733,47 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
* LWLockUpdateVar - Update a variable and wake up waiters atomically
*
* Sets *valptr to 'val', and wakes up all processes waiting for us with
- * LWLockWaitForVar(). Setting the value and waking up the processes happen
- * atomically so that any process calling LWLockWaitForVar() on the same lock
- * is guaranteed to see the new value, and act accordingly.
+ * LWLockWaitForVar(). It first sets the value atomically and then wakes up
+ * the waiting processes so that any process calling LWLockWaitForVar() on the
+ * same lock is guaranteed to see the new value, and act accordingly.
*
* The caller must be holding the lock in exclusive mode.
*/
void
-LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
+LWLockUpdateVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val)
{
proclist_head wakeup;
proclist_mutable_iter iter;
PRINT_LWDEBUG("LWLockUpdateVar", lock, LW_EXCLUSIVE);
+ /*
+ * Update the lock variable atomically first without having to acquire wait
+ * list lock, so that if anyone looking for the lock will have chance to
+ * grab it a bit quickly.
+ *
+ * NB: Note the use of pg_atomic_exchange_u64 as opposed to just
+ * pg_atomic_write_u64 to update the value. Since pg_atomic_exchange_u64 is
+ * a full barrier, we're guaranteed that the subsequent atomic read of lock
+ * state to check if it has any waiters happens after we set the lock
+ * variable to new value here. Without a barrier, we could end up missing
+ * waiters that otherwise should have been woken up.
+ */
+ pg_atomic_exchange_u64(valptr, val);
+
+ /*
+ * Quick exit when there are no waiters. This avoids unnecessary lwlock's
+ * wait list lock acquisition and release.
+ */
+ if ((pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS) == 0)
+ return;
+
proclist_init(&wakeup);
LWLockWaitListLock(lock);
Assert(pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE);
- /* Update the lock's value */
- *valptr = val;
-
/*
* See if there are any LW_WAIT_UNTIL_FREE waiters that need to be woken
* up. They are always in the front of the queue.
@@ -1873,17 +1889,17 @@ LWLockRelease(LWLock *lock)
* LWLockReleaseClearVar - release a previously acquired lock, reset variable
*/
void
-LWLockReleaseClearVar(LWLock *lock, uint64 *valptr, uint64 val)
+LWLockReleaseClearVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val)
{
- LWLockWaitListLock(lock);
-
/*
- * Set the variable's value before releasing the lock, that prevents race
- * a race condition wherein a new locker acquires the lock, but hasn't yet
- * set the variables value.
+ * Update the lock variable atomically first.
+ *
+ * NB: Note the use of pg_atomic_exchange_u64 as opposed to just
+ * pg_atomic_write_u64 to update the value. Since pg_atomic_exchange_u64 is
+ * a full barrier, we're guaranteed that the subsequent shared memory
+ * reads/writes, if any, happen after we reset the lock variable.
*/
- *valptr = val;
- LWLockWaitListUnlock(lock);
+ pg_atomic_exchange_u64(valptr, val);
LWLockRelease(lock);
}
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index d2c7afb8f4..f19bc49193 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -128,14 +128,14 @@ extern bool LWLockAcquire(LWLock *lock, LWLockMode mode);
extern bool LWLockConditionalAcquire(LWLock *lock, LWLockMode mode);
extern bool LWLockAcquireOrWait(LWLock *lock, LWLockMode mode);
extern void LWLockRelease(LWLock *lock);
-extern void LWLockReleaseClearVar(LWLock *lock, uint64 *valptr, uint64 val);
+extern void LWLockReleaseClearVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val);
extern void LWLockReleaseAll(void);
extern bool LWLockHeldByMe(LWLock *lock);
extern bool LWLockAnyHeldByMe(LWLock *lock, int nlocks, size_t stride);
extern bool LWLockHeldByMeInMode(LWLock *lock, LWLockMode mode);
-extern bool LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval);
-extern void LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val);
+extern bool LWLockWaitForVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 oldval, uint64 *newval);
+extern void LWLockUpdateVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val);
extern Size LWLockShmemSize(void);
extern void CreateLWLocks(void);
--
2.34.1