Hi all,

gokiburi has been failing on only REL_16_STABLE for the last few days,
for the tests of module test_slru.  First failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gokiburi&dt=2026-05-13%2012%3A20%3A45

Set of changes associated with the first failure, which seem
completely innocent to me:
5f12d86dd76 Wed May 13 05:43:49 2026 UTC  Add more tests for
 corrupted data with pglz_decompress()
d140237dab8 Wed May 13 02:46:17 2026 UTC  Fix stale COPY progress
 during logical replication table sync 

While the buildfarm runs don't show much, I have been able to
reproduce the failure on the buildfarm host, after using
-DEXEC_BACKEND.  Here is a backtrace, pointing out that something is
broken with LWLock initialization:
2026-05-18 05:20:50.186 UTC client backend[870830]
pg_regress/test_slru STATEMENT:  SELECT
test_slru_page_readonly(12377); TRAP: failed
Assert("LWLockHeldByMe(TestSLRULock)"), File: "test_slru.c", Line:
124, PID: 870830
postgres: popo contrib_regression [local]
SELECT(ExceptionalCondition+0x16c) [0xaaaaabcf4d88]
/home/popo/lib/test_slru.so(test_slru_page_readonly+0xe4)
[0xffffedf83060] 
postgres: popo contrib_regression [local] SELECT(+0x885c40) [0xaaaaab325c40] 
postgres: popo contrib_regression [local] SELECT(ExecInterpExprStillValid+0x84) 
[0xaaaaab329a4c] 
postgres: popo contrib_regression [local] SELECT(+0x9405fc) [0xaaaaab3e05fc] 
postgres: popo contrib_regression [local] SELECT(+0x9406d4) [0xaaaaab3e06d4] 
postgres: popo contrib_regression [local] SELECT(+0x940b34) [0xaaaaab3e0b34] 
postgres: popo contrib_regression [local] SELECT(+0x8b7ac0) [0xaaaaab357ac0] 
postgres: popo contrib_regression [local] SELECT(+0x89de14) [0xaaaaab33de14] 
postgres: popo contrib_regression [local] SELECT(+0x8a46c0) [0xaaaaab3446c0] 
postgres: popo contrib_regression [local] SELECT(standard_ExecutorRun+0x2d0) 
[0xaaaaab33ec68] 
postgres: popo contrib_regression [local] SELECT(ExecutorRun+0xb8) 
[0xaaaaab33e970] 
postgres: popo contrib_regression [local] SELECT(+0xe550dc) [0xaaaaab8f50dc] 
postgres: popo contrib_regression [local] SELECT(PortalRun+0x460) 
[0xaaaaab8f4958] 
postgres: popo contrib_regression [local] SELECT(+0xe43150) [0xaaaaab8e3150] 
postgres: popo contrib_regression [local] SELECT(PostgresMain+0x15e8) 
[0xaaaaab8f0560] 
postgres: popo contrib_regression [local] SELECT(postmaster_forkexec+0x0) 
[0xaaaaab70f644] 
postgres: popo contrib_regression [local] SELECT(SubPostmasterMain+0x6fc) 
[0xaaaaab7106d8] 
postgres: popo contrib_regression [local] SELECT(main+0x6d0)
[0xaaaaab463f6c] /lib/aarch64-linux-gnu/libc.so.6(+0x2225c)
[0xfffff725225c]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x9c)
[0xfffff725233c] 
postgres: popo contrib_regression [local] SELECT(_start+0x30) [0xaaaaaad3d4b0]

The server logs include the following, pointing to a broken state
(these two should not fail):
2026-05-18 05:20:50.184 UTC client backend[870830] pg_regress/test_slru
ERROR:  lock <unassigned:0> is not held
2026-05-18 05:20:50.184 UTC client backend[870830] pg_regress/test_slru
STATEMENT:  SELECT test_slru_page_write(12345, 'Test SLRU');

Note that the tests pass without -DEXEC_BACKEND.

While reading through the module, I think that the LWLock
initialization logic is borked, where we decide to do a
LWLockInitialize() more times than necessary, confusing the internal
states.  Honestly, I have no clue why the test has suddenly been
failing, and why other buildfarm members don't complain.  The host has
been upgraded a couple of days ago to the latest Debian, but I also
had a few clean runs in the buildfarm before this began showing up.
What I do know is that the patch attached is able to make the tests of
the module pass for v16 on the problematic host with -DEXEC_BACKEND.

Comments or opinions?
--
Michael
From ad64c6a38603f8606674e16c9830084eff4b54d4 Mon Sep 17 00:00:00 2001
From: Michael Paquier <[email protected]>
Date: Mon, 18 May 2026 20:27:34 +0900
Subject: [PATCH] test_slru: Fix LWLock allocation logic

Only for REL_16_STABLE, per report from buildfarm members gokiburi.
---
 src/test/modules/test_slru/test_slru.c | 36 +++++++++++++++++++-------
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/src/test/modules/test_slru/test_slru.c 
b/src/test/modules/test_slru/test_slru.c
index ae21444c4763..72b4c66658b8 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,9 +40,16 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS               16
 
-/* SLRU control lock */
-LWLock         TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
+typedef struct TestSlruSharedState
+{
+       /* SLRU control lock */
+       LWLock          lock;
+} TestSlruSharedState;
+
+/* Pointer to shared-memory state. */
+static TestSlruSharedState *test_slru_state = NULL;
+
+#define TestSLRULock (&test_slru_state->lock)
 
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl                    (&TestSlruCtlData)
@@ -202,6 +209,7 @@ test_slru_shmem_request(void)
 
        /* reserve shared memory for the test SLRU */
        RequestAddinShmemSpace(SimpleLruShmemSize(NUM_TEST_BUFFERS, 0));
+       RequestAddinShmemSpace(MAXALIGN(sizeof(TestSlruSharedState)));
 }
 
 static bool
@@ -214,7 +222,7 @@ static void
 test_slru_shmem_startup(void)
 {
        const char      slru_dir_name[] = "pg_test_slru";
-       int                     test_tranche_id;
+       bool            found;
 
        if (prev_shmem_startup_hook)
                prev_shmem_startup_hook();
@@ -225,15 +233,25 @@ test_slru_shmem_startup(void)
         */
        (void) MakePGDirectory(slru_dir_name);
 
-       /* initialize the SLRU facility */
-       test_tranche_id = LWLockNewTrancheId();
-       LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-       LWLockInitialize(TestSLRULock, test_tranche_id);
+       LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+       test_slru_state = ShmemInitStruct("test_slru",
+                                                                         
sizeof(TestSlruSharedState),
+                                                                         
&found);
 
+       if (!found)
+       {
+               /* First time through ... */
+               LWLockInitialize(&test_slru_state->lock, LWLockNewTrancheId());
+               LWLockRegisterTranche(test_slru_state->lock.tranche, 
"test_slru");
+       }
+
+       LWLockRelease(AddinShmemInitLock);
+
+       /* initialize the SLRU facility */
        TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
        SimpleLruInit(TestSlruCtl, "TestSLRU",
                                  NUM_TEST_BUFFERS, 0, TestSLRULock, 
slru_dir_name,
-                                 test_tranche_id, SYNC_HANDLER_NONE);
+                                 test_slru_state->lock.tranche, 
SYNC_HANDLER_NONE);
 }
 
 void
-- 
2.54.0

Attachment: signature.asc
Description: PGP signature

Reply via email to