Re: Logical replication launcher did not automatically restart when got SIGKILL

Fujii Masao Tue, 15 Jul 2025 08:08:35 -0700



On 2025/07/15 19:34, shveta malik wrote:

On Tue, Jul 15, 2025 at 2:56 PM cca5507 <[email protected]> wrote:


Hi, hackers

I found the $SUBJECT, the main reason is that RegisteredBgWorker::rw_pid has 
not been cleaned.

Attach a patch to fix it.


Thanks for the report!

This issue appears to have been introduced by commit 28a520c0b77. As a result,
not only the logical replication launcher but also other background workers
(like autoprewarm) may fail to restart after a server crash.

Thank You for reporting this. The problem exists and the patch works
as expected.

In the patch, we are resetting the PID during shared memory
initialization. Is there a better place to handle PID reset in the
case of a SIGKILL, possibly within a cleanup flow? For example, during
a regular shutdown, we reset the launcher (background worker) PID in
CleanupBackend(). Or is this the only possibility?


From a quick look at the code, it seems that the second half of CleanupBackend()
is responsible for cleaning up background workers and resetting rw_pid to 0.
However, in the crash case, the function exits immediately after calling
HandleChildCrash(), skipping that cleanup:

        if (crashed)
        {
                HandleChildCrash(bp_pid, exitstatus, procname);
                return;
        }

This could be the problem? Shouldn't the background worker cleanup still
happen even in the crash case?

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

Re: Logical replication launcher did not automatically restart when got SIGKILL

Reply via email to