On Sat, Mar 17, 2012 at 05:28:11PM -0700, Daniel Farina wrote:
Noah offered me these comments:
This patch still changes the policy for pg_terminate_backend(), and it does
not fix other SIGINT senders like processCancelRequest() and ProcSleep().
?If
you're concerned about PID-reuse races, audit all backend signalling.
?Either
fix all such problems or propose a plan to get there eventually.
Is the postmaster signaling its children intrinsically vulnerable to
PID racing? Because it controls when it can call wait() or waitpid()
on child processes, it can unambiguously know that PIDs have not been
cycled for use.
Agreed, for Unix anyway.
For this reason, a credible and entirely alternate
design might be to bounce IPC requests through the postmaster, but
since postmaster is so critical I had decided not to introduce nor
change mechanics there.
Good point, but I also agree with your decision there.
The Postmaster I think keeps a private copy of cancellation keys that
are not in shared memory, if I read it properly (not 100% sure), and
uses that for cancellation requests. This has a useful property of
allowing cancellations even in event that shared memory goes insane
(and since postmaster is typically left as last sane process of the
group I thought it wise to not have it reuse a shared-memory based
approach).
Yes.
Currently, when pg_terminate_backend() follows a pg_cancel_backend() on
which
the target has yet to act, the eventual outcome is a terminated process.
?With
this patch, the pg_terminate_backend() becomes a no-op with this warning:
! ? ? ? ? ? ? ? ? ? ? ?ereport(WARNING,
! ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
?(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
! ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (errmsg(process is busy
responding to administrative
! ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? request)),
! ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (errhint(This is temporary, and
may be retried.;
That's less useful than the current behavior.
Yes. It could be fixed with dynamic allocation (holding more
administration requests), but for just getting a flavor of what a
solution might be like. I wanted to avoid additional dynamic
allocation (which would necessitate a similar condition in the form of
much-less likely OOM), but at some point I think this error condition
is inevitable in some form. I see it as akin to EAGAIN. Right now,
administrative requests are so short (copying and clearing a handful
of words out of PGPROC) that it's unlikely that this would be a
problem in practice.
I nominally agree that the new race would be rare, but not rarer than the race
this patch purposes to remove. You could also fix this by having the sender
wait until the target is ready to accept an admin request. For the particular
case of cancel/terminate, a terminate could overwrite a cancel; a cancel can
reduce to a no-op when either request is pending. I share your interest in
not tying a design to the narrow needs of cancel/terminate, though.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers