On Sat, May 7, 2022 at 4:52 PM Thomas Munro <thomas.mu...@gmail.com> wrote:
> I think we'll probably also want to invent a way
> to report which backend is holding up progress, since without that
> it's practically impossible for an end user to understand why their
> command is hanging.

Simple idea: how about logging the PID of processes that block
progress for too long?  In the attached, I arbitrarily picked 5
seconds as the wait time between LOG messages.  Also, DEBUG1 messages
let you see the processing speed on eg build farm animals.  Thoughts?

To test this, kill -STOP a random backend, and then try an ALTER
DATABASE SET TABLESPACE in another backend.  Example output:

DEBUG:  waiting for all backends to process ProcSignalBarrier generation 1
LOG:  still waiting for pid 1651417 to accept ProcSignalBarrier
STATEMENT:  alter database mydb set tablespace ts1;
LOG:  still waiting for pid 1651417 to accept ProcSignalBarrier
STATEMENT:  alter database mydb set tablespace ts1;
LOG:  still waiting for pid 1651417 to accept ProcSignalBarrier
STATEMENT:  alter database mydb set tablespace ts1;
LOG:  still waiting for pid 1651417 to accept ProcSignalBarrier
STATEMENT:  alter database mydb set tablespace ts1;

... then kill -CONT:

DEBUG:  finished waiting for all backends to process ProcSignalBarrier
generation 1

Another thought is that it might be nice to be able to test with a
dummy PSB that doesn't actually do anything.  You could use it to see
how fast your system processes it, while doing various other things,
and to find/debug problems in other code that fails to handle
interrupts correctly.  Here's an attempt at that.  I guess it could go
into a src/test/modules/something instead of core, but on the other
hand the PSB itself has to be in core anyway, so maybe not.  Thoughts?
 No documentation yet, just seeing if people think this is worth
having... better names/ideas welcome.

To test this, just SELECT pg_test_procsignal_barrier().
From 9fe475b557143cc324ce14de4f29ad8bef206fc8 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Mon, 9 May 2022 10:24:01 +1200
Subject: [PATCH 1/2] Add logging for ProcSignalBarrier mechanism.

To help with diagnosis of systems that are not processing
ProcSignalBarrier requests promptly, add a LOG message every 5 seconds
if we seem to be wedged.  Although you could already see this state via
wait events in pg_stat_activity, the log message also shows the PID of
the process that is preventing progress.

Also add DEBUG1 logging around the whole loop.
---
 src/backend/storage/ipc/procsignal.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 00d66902d8..4fe6f087db 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -393,6 +393,11 @@ WaitForProcSignalBarrier(uint64 generation)
 {
 	Assert(generation <= pg_atomic_read_u64(&ProcSignal->psh_barrierGeneration));
 
+	elog(DEBUG1,
+		 "waiting for all backends to process ProcSignalBarrier generation "
+		 UINT64_FORMAT,
+		 generation);
+
 	for (int i = NumProcSignalSlots - 1; i >= 0; i--)
 	{
 		ProcSignalSlot *slot = &ProcSignal->psh_slot[i];
@@ -407,13 +412,22 @@ WaitForProcSignalBarrier(uint64 generation)
 		oldval = pg_atomic_read_u64(&slot->pss_barrierGeneration);
 		while (oldval < generation)
 		{
-			ConditionVariableSleep(&slot->pss_barrierCV,
-								   WAIT_EVENT_PROC_SIGNAL_BARRIER);
+			if (ConditionVariableTimedSleep(&slot->pss_barrierCV,
+											5000,
+											WAIT_EVENT_PROC_SIGNAL_BARRIER))
+				elog(LOG,
+					 "still waiting for pid %d to accept ProcSignalBarrier",
+					 slot->pss_pid);
 			oldval = pg_atomic_read_u64(&slot->pss_barrierGeneration);
 		}
 		ConditionVariableCancelSleep();
 	}
 
+	elog(DEBUG1,
+		 "finished waiting for all backends to process ProcSignalBarrier generation "
+		 UINT64_FORMAT,
+		 generation);
+
 	/*
 	 * The caller is probably calling this function because it wants to read
 	 * the shared state or perform further writes to shared state once all
-- 
2.30.2

From 266528b553dcc128d2d38039ae821fe515ab1db1 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Mon, 9 May 2022 11:06:48 +1200
Subject: [PATCH 2/2] Add pg_test_procsignal_barrier().

A superuser-only test function to exercise the ProcSignalBarrier
mechanism.  It has no effect, other than to interrupt every process
connected to shared memory and wait for them all to acknowledge the
request.
---
 src/backend/storage/ipc/procsignal.c | 20 ++++++++++++++++++++
 src/include/catalog/pg_proc.dat      |  4 ++++
 src/include/storage/procsignal.h     |  1 +
 3 files changed, 25 insertions(+)

diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 4fe6f087db..93c31bc6b9 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -541,6 +541,10 @@ ProcessProcSignalBarrier(void)
 				type = (ProcSignalBarrierType) pg_rightmost_one_pos32(flags);
 				switch (type)
 				{
+					case PROCSIGNAL_BARRIER_TEST:
+						processed = true;		/* dummy test barrier */
+						break;
+
 					case PROCSIGNAL_BARRIER_SMGRRELEASE:
 						processed = ProcessBarrierSmgrRelease();
 						break;
@@ -679,3 +683,19 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 
 	errno = save_errno;
 }
+
+/*
+ * A user-callable test of ProcSignalBarrier speed, that does nothing.
+ */
+Datum
+pg_test_procsignal_barrier(PG_FUNCTION_ARGS)
+{
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("must be superuser")));
+
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_TEST));
+
+	PG_RETURN_VOID();
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6d378ff785..434ee35364 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6398,6 +6398,10 @@
   prorettype => 'text', proargtypes => 'text',
   prosrc => 'pg_current_logfile_1arg' },
 
+{ oid => '9777', descr => 'test ProcSignalBarrier',
+  proname => 'pg_test_procsignal_barrier', provolatile => 'v', prorettype => 'void',
+  proargtypes => '', prosrc => 'pg_test_procsignal_barrier' },
+
 { oid => '2623', descr => 'get information about file',
   proname => 'pg_stat_file', provolatile => 'v', prorettype => 'record',
   proargtypes => 'text',
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index ee636900f3..b96d85d179 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -49,6 +49,7 @@ typedef enum
 
 typedef enum
 {
+	PROCSIGNAL_BARRIER_TEST,	/* empty barrier for testing */
 	PROCSIGNAL_BARRIER_SMGRRELEASE	/* ask smgr to close files */
 } ProcSignalBarrierType;
 
-- 
2.30.2

Reply via email to