Hi all, gokiburi has been failing on only REL_16_STABLE for the last few days, for the tests of module test_slru. First failure: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gokiburi&dt=2026-05-13%2012%3A20%3A45
Set of changes associated with the first failure, which seem
completely innocent to me:
5f12d86dd76 Wed May 13 05:43:49 2026 UTC Add more tests for
corrupted data with pglz_decompress()
d140237dab8 Wed May 13 02:46:17 2026 UTC Fix stale COPY progress
during logical replication table sync
While the buildfarm runs don't show much, I have been able to
reproduce the failure on the buildfarm host, after using
-DEXEC_BACKEND. Here is a backtrace, pointing out that something is
broken with LWLock initialization:
2026-05-18 05:20:50.186 UTC client backend[870830]
pg_regress/test_slru STATEMENT: SELECT
test_slru_page_readonly(12377); TRAP: failed
Assert("LWLockHeldByMe(TestSLRULock)"), File: "test_slru.c", Line:
124, PID: 870830
postgres: popo contrib_regression [local]
SELECT(ExceptionalCondition+0x16c) [0xaaaaabcf4d88]
/home/popo/lib/test_slru.so(test_slru_page_readonly+0xe4)
[0xffffedf83060]
postgres: popo contrib_regression [local] SELECT(+0x885c40) [0xaaaaab325c40]
postgres: popo contrib_regression [local] SELECT(ExecInterpExprStillValid+0x84)
[0xaaaaab329a4c]
postgres: popo contrib_regression [local] SELECT(+0x9405fc) [0xaaaaab3e05fc]
postgres: popo contrib_regression [local] SELECT(+0x9406d4) [0xaaaaab3e06d4]
postgres: popo contrib_regression [local] SELECT(+0x940b34) [0xaaaaab3e0b34]
postgres: popo contrib_regression [local] SELECT(+0x8b7ac0) [0xaaaaab357ac0]
postgres: popo contrib_regression [local] SELECT(+0x89de14) [0xaaaaab33de14]
postgres: popo contrib_regression [local] SELECT(+0x8a46c0) [0xaaaaab3446c0]
postgres: popo contrib_regression [local] SELECT(standard_ExecutorRun+0x2d0)
[0xaaaaab33ec68]
postgres: popo contrib_regression [local] SELECT(ExecutorRun+0xb8)
[0xaaaaab33e970]
postgres: popo contrib_regression [local] SELECT(+0xe550dc) [0xaaaaab8f50dc]
postgres: popo contrib_regression [local] SELECT(PortalRun+0x460)
[0xaaaaab8f4958]
postgres: popo contrib_regression [local] SELECT(+0xe43150) [0xaaaaab8e3150]
postgres: popo contrib_regression [local] SELECT(PostgresMain+0x15e8)
[0xaaaaab8f0560]
postgres: popo contrib_regression [local] SELECT(postmaster_forkexec+0x0)
[0xaaaaab70f644]
postgres: popo contrib_regression [local] SELECT(SubPostmasterMain+0x6fc)
[0xaaaaab7106d8]
postgres: popo contrib_regression [local] SELECT(main+0x6d0)
[0xaaaaab463f6c] /lib/aarch64-linux-gnu/libc.so.6(+0x2225c)
[0xfffff725225c]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x9c)
[0xfffff725233c]
postgres: popo contrib_regression [local] SELECT(_start+0x30) [0xaaaaaad3d4b0]
The server logs include the following, pointing to a broken state
(these two should not fail):
2026-05-18 05:20:50.184 UTC client backend[870830] pg_regress/test_slru
ERROR: lock <unassigned:0> is not held
2026-05-18 05:20:50.184 UTC client backend[870830] pg_regress/test_slru
STATEMENT: SELECT test_slru_page_write(12345, 'Test SLRU');
Note that the tests pass without -DEXEC_BACKEND.
While reading through the module, I think that the LWLock
initialization logic is borked, where we decide to do a
LWLockInitialize() more times than necessary, confusing the internal
states. Honestly, I have no clue why the test has suddenly been
failing, and why other buildfarm members don't complain. The host has
been upgraded a couple of days ago to the latest Debian, but I also
had a few clean runs in the buildfarm before this began showing up.
What I do know is that the patch attached is able to make the tests of
the module pass for v16 on the problematic host with -DEXEC_BACKEND.
Comments or opinions?
--
Michael
From ad64c6a38603f8606674e16c9830084eff4b54d4 Mon Sep 17 00:00:00 2001 From: Michael Paquier <[email protected]> Date: Mon, 18 May 2026 20:27:34 +0900 Subject: [PATCH] test_slru: Fix LWLock allocation logic Only for REL_16_STABLE, per report from buildfarm members gokiburi. --- src/test/modules/test_slru/test_slru.c | 36 +++++++++++++++++++------- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c index ae21444c4763..72b4c66658b8 100644 --- a/src/test/modules/test_slru/test_slru.c +++ b/src/test/modules/test_slru/test_slru.c @@ -40,9 +40,16 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all); /* Number of SLRU page slots */ #define NUM_TEST_BUFFERS 16 -/* SLRU control lock */ -LWLock TestSLRULock; -#define TestSLRULock (&TestSLRULock) +typedef struct TestSlruSharedState +{ + /* SLRU control lock */ + LWLock lock; +} TestSlruSharedState; + +/* Pointer to shared-memory state. */ +static TestSlruSharedState *test_slru_state = NULL; + +#define TestSLRULock (&test_slru_state->lock) static SlruCtlData TestSlruCtlData; #define TestSlruCtl (&TestSlruCtlData) @@ -202,6 +209,7 @@ test_slru_shmem_request(void) /* reserve shared memory for the test SLRU */ RequestAddinShmemSpace(SimpleLruShmemSize(NUM_TEST_BUFFERS, 0)); + RequestAddinShmemSpace(MAXALIGN(sizeof(TestSlruSharedState))); } static bool @@ -214,7 +222,7 @@ static void test_slru_shmem_startup(void) { const char slru_dir_name[] = "pg_test_slru"; - int test_tranche_id; + bool found; if (prev_shmem_startup_hook) prev_shmem_startup_hook(); @@ -225,15 +233,25 @@ test_slru_shmem_startup(void) */ (void) MakePGDirectory(slru_dir_name); - /* initialize the SLRU facility */ - test_tranche_id = LWLockNewTrancheId(); - LWLockRegisterTranche(test_tranche_id, "test_slru_tranche"); - LWLockInitialize(TestSLRULock, test_tranche_id); + LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE); + test_slru_state = ShmemInitStruct("test_slru", + sizeof(TestSlruSharedState), + &found); + if (!found) + { + /* First time through ... */ + LWLockInitialize(&test_slru_state->lock, LWLockNewTrancheId()); + LWLockRegisterTranche(test_slru_state->lock.tranche, "test_slru"); + } + + LWLockRelease(AddinShmemInitLock); + + /* initialize the SLRU facility */ TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically; SimpleLruInit(TestSlruCtl, "TestSLRU", NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name, - test_tranche_id, SYNC_HANDLER_NONE); + test_slru_state->lock.tranche, SYNC_HANDLER_NONE); } void -- 2.54.0
signature.asc
Description: PGP signature
