Re: [HACKERS] kill -KILL: What happens?

Florian Pflug Thu, 13 Jan 2011 15:00:33 -0800

On Jan13, 2011, at 21:42 , Tom Lane wrote:
> Aidan Van Dyk <ai...@highrise.ca> writes:
>> If postmaster has a few fds to spare, what about having it open a pipe
>> to every child it spawns.  It never has to read/write to it, but
>> postmaster closing will signal the client's fd.  The client just has
>> to pop the fd into whatever nrmal poll/select event handlign it uses
>> to notice when the "parent's pipe" is closed.
> 
> Hmm.  Or more generally: there's one FIFO.  The postmaster holds both
> sides open.  Backends hold the write side open.  (They can close the
> read side, but that would just be to free up a FD.)  Background children
> close the write side.  Now a background process can use EOF on the read
> side of the FIFO to tell it that postmaster and all backends have
> exited.  You still don't get a signal, but at least the condition you're
> testing for is the one we actually want and not an approximation.


I was thinking along a similar line, and put together small test case to
prove that this actually works. The attached test program simulates the
interactions of a parent process (think postmaster), some utility processes
(think walwriter, bgwriter, ...) and some backends. It uses two pairs of
fd created with pipe(), called LifeSignParent and LifeSignParentBackends.

The writing end of the former is held open only in the parent process,
while the writing end of the latter is held open in the parent process and
all regular backend processes. Backend processes use select() to monitor
the reading end of the LifeSignParent fd pair. Since nothing is ever written
to the writing end, the fd becomes readable only when the parent exits,
because that is how select() signals EOF. Once that happens the backend
exits. The utility processes do the same, but monitor the reading end of
LifeSignParentBackends, and thus exit only after the parent and all regular
backends have died.

Since the lifesign checking uses select(), any place that already uses 
select can easily check for vanishing life signs. CHECK_FOR_INTERRUPTS could
simply check the life sign once every few seconds.

If we want an absolutely reliable signal instead of checking in
CHECK_FOR_INTERRUPTS, every backend would need to launch a monitor subprocess
which monitors the life sign, and exits once it vanishes. The backend would
then get a SIGCHLD once the postmaster dies. Seems like overkill, though.

The whole thing won't work on Windows, since even if it's got a pipe() or
socketpair() call, with EXEC_BACKEND there's no way of transferring these
fds to the child processes. AFAIK, however, Windows has other means with 
which such life signs can be implemented. For example, I seem to remember
that WaitForMultipleObjects() can be used to wait for process-related events.
But windows really isn't my area of expertise...

I have tested this on the latest Ubunutu LTS release (10.04.1) as well as
Mac OS X 10.6.6, and it seems to work correctly on both systems. I'd be
happy to hear from anyone who has access to other systems on whether this
works or not. The expected output is

Launched utility 5095
Launched backend 5097
Launched utility 5096
Launched backend 5099
Launched backend 5098
Utility 5095 detected live parent or backend
Backend 5097 detected live parent
Utility 5096 detected live parent or backend
Backend 5099 detected live parent
Backend 5098 detected live parent
Parent exiting
Backend 5097 exiting after parent died
Backend 5098 exiting after parent died
Backend 5099 exiting after parent died
Utility 5096 exiting after parent and backends died
Utility 5095 exiting after parent and backends died

Everything after "Parent exiting" might be interleaved with a shell prompt,
of course.

best regards,
Florian Pflug

liveness.c
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

Reply via email to