On Sat, May 7, 2022 at 4:52 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > I think we'll probably also want to invent a way > to report which backend is holding up progress, since without that > it's practically impossible for an end user to understand why their > command is hanging.
Simple idea: how about logging the PID of processes that block progress for too long? In the attached, I arbitrarily picked 5 seconds as the wait time between LOG messages. Also, DEBUG1 messages let you see the processing speed on eg build farm animals. Thoughts? To test this, kill -STOP a random backend, and then try an ALTER DATABASE SET TABLESPACE in another backend. Example output: DEBUG: waiting for all backends to process ProcSignalBarrier generation 1 LOG: still waiting for pid 1651417 to accept ProcSignalBarrier STATEMENT: alter database mydb set tablespace ts1; LOG: still waiting for pid 1651417 to accept ProcSignalBarrier STATEMENT: alter database mydb set tablespace ts1; LOG: still waiting for pid 1651417 to accept ProcSignalBarrier STATEMENT: alter database mydb set tablespace ts1; LOG: still waiting for pid 1651417 to accept ProcSignalBarrier STATEMENT: alter database mydb set tablespace ts1; ... then kill -CONT: DEBUG: finished waiting for all backends to process ProcSignalBarrier generation 1 Another thought is that it might be nice to be able to test with a dummy PSB that doesn't actually do anything. You could use it to see how fast your system processes it, while doing various other things, and to find/debug problems in other code that fails to handle interrupts correctly. Here's an attempt at that. I guess it could go into a src/test/modules/something instead of core, but on the other hand the PSB itself has to be in core anyway, so maybe not. Thoughts? No documentation yet, just seeing if people think this is worth having... better names/ideas welcome. To test this, just SELECT pg_test_procsignal_barrier().
From 9fe475b557143cc324ce14de4f29ad8bef206fc8 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Mon, 9 May 2022 10:24:01 +1200 Subject: [PATCH 1/2] Add logging for ProcSignalBarrier mechanism. To help with diagnosis of systems that are not processing ProcSignalBarrier requests promptly, add a LOG message every 5 seconds if we seem to be wedged. Although you could already see this state via wait events in pg_stat_activity, the log message also shows the PID of the process that is preventing progress. Also add DEBUG1 logging around the whole loop. --- src/backend/storage/ipc/procsignal.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c index 00d66902d8..4fe6f087db 100644 --- a/src/backend/storage/ipc/procsignal.c +++ b/src/backend/storage/ipc/procsignal.c @@ -393,6 +393,11 @@ WaitForProcSignalBarrier(uint64 generation) { Assert(generation <= pg_atomic_read_u64(&ProcSignal->psh_barrierGeneration)); + elog(DEBUG1, + "waiting for all backends to process ProcSignalBarrier generation " + UINT64_FORMAT, + generation); + for (int i = NumProcSignalSlots - 1; i >= 0; i--) { ProcSignalSlot *slot = &ProcSignal->psh_slot[i]; @@ -407,13 +412,22 @@ WaitForProcSignalBarrier(uint64 generation) oldval = pg_atomic_read_u64(&slot->pss_barrierGeneration); while (oldval < generation) { - ConditionVariableSleep(&slot->pss_barrierCV, - WAIT_EVENT_PROC_SIGNAL_BARRIER); + if (ConditionVariableTimedSleep(&slot->pss_barrierCV, + 5000, + WAIT_EVENT_PROC_SIGNAL_BARRIER)) + elog(LOG, + "still waiting for pid %d to accept ProcSignalBarrier", + slot->pss_pid); oldval = pg_atomic_read_u64(&slot->pss_barrierGeneration); } ConditionVariableCancelSleep(); } + elog(DEBUG1, + "finished waiting for all backends to process ProcSignalBarrier generation " + UINT64_FORMAT, + generation); + /* * The caller is probably calling this function because it wants to read * the shared state or perform further writes to shared state once all -- 2.30.2
From 266528b553dcc128d2d38039ae821fe515ab1db1 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Mon, 9 May 2022 11:06:48 +1200 Subject: [PATCH 2/2] Add pg_test_procsignal_barrier(). A superuser-only test function to exercise the ProcSignalBarrier mechanism. It has no effect, other than to interrupt every process connected to shared memory and wait for them all to acknowledge the request. --- src/backend/storage/ipc/procsignal.c | 20 ++++++++++++++++++++ src/include/catalog/pg_proc.dat | 4 ++++ src/include/storage/procsignal.h | 1 + 3 files changed, 25 insertions(+) diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c index 4fe6f087db..93c31bc6b9 100644 --- a/src/backend/storage/ipc/procsignal.c +++ b/src/backend/storage/ipc/procsignal.c @@ -541,6 +541,10 @@ ProcessProcSignalBarrier(void) type = (ProcSignalBarrierType) pg_rightmost_one_pos32(flags); switch (type) { + case PROCSIGNAL_BARRIER_TEST: + processed = true; /* dummy test barrier */ + break; + case PROCSIGNAL_BARRIER_SMGRRELEASE: processed = ProcessBarrierSmgrRelease(); break; @@ -679,3 +683,19 @@ procsignal_sigusr1_handler(SIGNAL_ARGS) errno = save_errno; } + +/* + * A user-callable test of ProcSignalBarrier speed, that does nothing. + */ +Datum +pg_test_procsignal_barrier(PG_FUNCTION_ARGS) +{ + if (!superuser()) + ereport(ERROR, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + errmsg("must be superuser"))); + + WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_TEST)); + + PG_RETURN_VOID(); +} diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 6d378ff785..434ee35364 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -6398,6 +6398,10 @@ prorettype => 'text', proargtypes => 'text', prosrc => 'pg_current_logfile_1arg' }, +{ oid => '9777', descr => 'test ProcSignalBarrier', + proname => 'pg_test_procsignal_barrier', provolatile => 'v', prorettype => 'void', + proargtypes => '', prosrc => 'pg_test_procsignal_barrier' }, + { oid => '2623', descr => 'get information about file', proname => 'pg_stat_file', provolatile => 'v', prorettype => 'record', proargtypes => 'text', diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h index ee636900f3..b96d85d179 100644 --- a/src/include/storage/procsignal.h +++ b/src/include/storage/procsignal.h @@ -49,6 +49,7 @@ typedef enum typedef enum { + PROCSIGNAL_BARRIER_TEST, /* empty barrier for testing */ PROCSIGNAL_BARRIER_SMGRRELEASE /* ask smgr to close files */ } ProcSignalBarrierType; -- 2.30.2