On Sat, Sep 18, 2021 at 05:19:04PM -0300, Alvaro Herrera wrote:
> Hmm, sounds a possibly useful idea to explore, but I would only do so if
> the other ideas prove fruitless, because it sounds like it'd have more
> moving parts.  Can you please first test if the idea of sending the signal
> twice is enough?

This idea does not work.  I got one failure after 5 tries.

> If that doesn't work, let's try Horiguchi-san's idea
> of using some `ps` flags to find the process.

Tried this one as well, to see the same failure.  I was just looking
at the state of the test while it was querying pg_replication_slots
and that was the expected state after the WAL sender received SIGCONT:
USER    PID  %CPU %MEM      VSZ    RSS   TT  STAT STARTED      TIME COMMAND
toto  12663   0.0  0.0  5014468   3384   ??  Ss    8:30PM   0:00.00 postgres: 
primary3: walsender toto [local] streaming 0/720000   
toto  12662   0.0  0.0  4753092   3936   ??  Ts    8:30PM   0:00.01 postgres: 
standby_3: walreceiver streaming 0/7000D8 

The test gets the right PIDs, as the logs showed:
ok 17 - have walsender pid 12663
ok 18 - have walreceiver pid 12662

So it does not seem that this is not an issue with the signals.
Perhaps we'd better wait for a checkpoint to complete by for example
scanning the logs before running the query on pg_replication_slots to
make sure that the slot is invalidated?
--
Michael

Attachment: signature.asc
Description: PGP signature

Reply via email to