Hello Michael,

21.03.2026 04:46, Michael Paquier wrote:
So we are able to send the requests to the workers, and these can take
a long time before being processed by the postmaster.  Querying
directly "postgres" for the worker_spi_launch() and pg_stat_activity
queries seems to have reduced the friction, with less requests to
send.   However, I don't think that this is the end of the story, even
after 79a5911fe65b I have spotted one case of RENAME TO where the
requests were sent for a bit more than 4s, before the postmaster had
the idea to catch up.  RENAME TO is the only one that can get slow
(really no idea why), so I guess that we could always tweak things a
bit more:
1) Extra injection point to increase the timeout (30s or 60s?) and
give the postmaster more room to proceed the requests.
2) Remove this portion of the test, but it would be sad.

I'll keep an eye for more failures, even if the situation is looking
slightly better.

Having reproduced this locally (running 3 tests in parallel with
ALTER DATABASE RENAME repeated 200 times, on a slow riscv64 machine), I
discovered that in the bad case the worker doesn't reach the main loop in
time (and CHECK_FOR_INTERRUPTS() inside it), because it doesn't get out of
initialize_worker_spi() -> CommitTransactionCommand().

With this modification:
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -3752,3 +3752,3 @@ CountOtherDBBackends(Oid databaseId, int *nbackends, int 
*nprepared)
         */
-       int                     ntries = 50;
+       int                     ntries = 500;

@@ -3798,3 +3798,6 @@ CountOtherDBBackends(Oid databaseId, int *nbackends, int 
*nprepared)
                if (!found)
+{
+elog(LOG, "!!!CountOtherDBBackends| found no backends, try %d", tries);
                        return false;           /* no conflicting backends, so 
done */
+}

I can see the following:
... !!!CountOtherDBBackends| found no backends, try 1
# most of the calls (200 of 201) succeeded with try 1, but there are also:
... !!!CountOtherDBBackends| found no backends, try 7
... !!!CountOtherDBBackends| found no backends, try 51
... !!!CountOtherDBBackends| found no backends, try 74
... !!!CountOtherDBBackends| found no backends, try 84

So the backend is not completely stuck, but CommitTransactionCommand()
may take more than 5 seconds under some circumstances (maybe it's worth
investigating which exactly).

Best regards,
Alexander


Reply via email to